P(X = x) for continuous data is problematic #6

mhoehle · 2024-09-18T11:15:47Z

DataScienceInteractivePython/Interactive_Model_Fitting.ipynb

Line 56 in adf0515

    
           "P(X | \\hat{f}_{\\beta}) = \\prod_{\\alpha = 1}^{n} P(X_{\\alpha}|\\hat{f}_{\\beta}(X)), \\alpha = 1,\\ldots,n\n",

The notebook useses the P(X | ... ) notation, which I would interpret as the conditional probability of the data. However, linear models would typically be used for continuous response data where P(X_i = | ... ) is zero. Instead, one would use the densities, i.e. small p or f.

Furthermore, since a product is used, this implies that the observations are independent from each other. Hence, as written a little further down:

OLS: - assumes that the errors have a mean of zero, constant variance and are independent of eachother (no correlation in error).

Is incomplete, because the same was assumed for the ML approach.

Altogether, I find that the post a little confusion. As far as I know: For a Gaussian response distribution with KNOWN $\sigma$ the OLS and MLE should be identical. I fail to completely understand what the exact data generating mechanism is in the example due to a lot of code, but for a simple normal X_1,...,X_n \iid N(\mu, \sigma^2) there are explicit solutions available? As a suggestion: Maybe write the data generating mechanism clearer in math notation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

P(X = x) for continuous data is problematic #6

P(X = x) for continuous data is problematic #6

mhoehle commented Sep 18, 2024 •

edited

Loading

P(X = x) for continuous data is problematic #6

P(X = x) for continuous data is problematic #6

Comments

mhoehle commented Sep 18, 2024 • edited Loading

mhoehle commented Sep 18, 2024 •

edited

Loading