Intercept, prediction function and fitted values #11

pcarbo · 2024-07-02T20:51:22Z

@william-denault It is important to output the intercept for several reasons, but one is that it is often helpful to be able to use the model to predict Y (e.g., to compare the observed Y with the predicted Y). This also provides a simple "sanity check" to make sure that the model fitting is working. You will notice that susieR has two related features: (i) it returns the "fitted" values (in fit$fitted) and (ii) it has a "predict" method. Here's an example of what I mean:

# library(susieR)
set.seed(1)
n <- 400
p <- 1000
beta <- rep(0,p)
beta[1:4] <- 1
X <- matrix(rnorm(n*p),nrow = n,ncol = p)
X <- scale(X,center = TRUE,scale = TRUE)
y <- drop(X %*% beta + rnorm(n))
fit <- susie(X,y,L = 10)
ypred <- predict(fit,X)
plot(y,fit$fitted,pch = 20,xlab = "true",ylab = "estimated")
abline(a = 0,b = 1,lty = "dotted",col = "magenta")
plot(y,ypred,pch = 20,xlab = "true",ylab = "estimated")
abline(a = 0,b = 1,lty = "dotted",col = "magenta")

(In this case, fit$fitted and ypred give the same result.)

I suggest implementing (i) and (ii) for fsusieR, which should be straightforward once you have the intercept.

Note that the intercept is not simply the the mean of Y, but the mean after removing the effects of the SNPs. See line 460 of susie.R for how this is implemented in susieR. The general formula looks something like this:

$$ (Z^TZ)^{-1} Z\bar{r}, \quad \bar{r} = y - X\bar{b} $$

For an intercept, Z is simply a column of ones.

The text was updated successfully, but these errors were encountered:

william-denault · 2024-07-03T10:42:52Z

Hi @pcarbo ,

It is less straightforward in this context.
The main problem about fitted values/prediction is that to run fsusie, the Y matrix is expected to have 2^J columns, and when it is not the case, we actually remap the data using an interpolation scheme so that there are 2^J columns (see intro vignette section #### Handling function of any length and unevenly space data) . So the actual output of fsusie is a list of the fitted functions of length 2^J

So, the "real" predictions are actually made on this grid. How should we proceed? Should we make different "predict" functions, one that uses the estimate and the other one that outputs an "interpolated version" of the prediction on the original grid?

pcarbo · 2024-07-08T16:30:35Z

I've removed the section on "prediction" in the introductory vignette since I agree it is a bit complicated, and we don't want to mislead. (This example suggests that it is simpler than it actually is.)

I want to leave this issue open as a reminder to myself to fix some other things.

pcarbo added the enhancement New feature or request label Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intercept, prediction function and fitted values #11

Intercept, prediction function and fitted values #11

pcarbo commented Jul 2, 2024

william-denault commented Jul 3, 2024

pcarbo commented Jul 8, 2024

Intercept, prediction function and fitted values #11

Intercept, prediction function and fitted values #11

Comments

pcarbo commented Jul 2, 2024

william-denault commented Jul 3, 2024

pcarbo commented Jul 8, 2024