Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intercept, prediction function and fitted values #11

Open
pcarbo opened this issue Jul 2, 2024 · 2 comments
Open

Intercept, prediction function and fitted values #11

pcarbo opened this issue Jul 2, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@pcarbo
Copy link
Member

pcarbo commented Jul 2, 2024

@william-denault It is important to output the intercept for several reasons, but one is that it is often helpful to be able to use the model to predict Y (e.g., to compare the observed Y with the predicted Y). This also provides a simple "sanity check" to make sure that the model fitting is working. You will notice that susieR has two related features: (i) it returns the "fitted" values (in fit$fitted) and (ii) it has a "predict" method. Here's an example of what I mean:

# library(susieR)
set.seed(1)
n <- 400
p <- 1000
beta <- rep(0,p)
beta[1:4] <- 1
X <- matrix(rnorm(n*p),nrow = n,ncol = p)
X <- scale(X,center = TRUE,scale = TRUE)
y <- drop(X %*% beta + rnorm(n))
fit <- susie(X,y,L = 10)
ypred <- predict(fit,X)
plot(y,fit$fitted,pch = 20,xlab = "true",ylab = "estimated")
abline(a = 0,b = 1,lty = "dotted",col = "magenta")
plot(y,ypred,pch = 20,xlab = "true",ylab = "estimated")
abline(a = 0,b = 1,lty = "dotted",col = "magenta")

(In this case, fit$fitted and ypred give the same result.)

I suggest implementing (i) and (ii) for fsusieR, which should be straightforward once you have the intercept.

Note that the intercept is not simply the the mean of Y, but the mean after removing the effects of the SNPs. See line 460 of susie.R for how this is implemented in susieR. The general formula looks something like this:

$$ (Z^TZ)^{-1} Z\bar{r}, \quad \bar{r} = y - X\bar{b} $$

For an intercept, Z is simply a column of ones.

@pcarbo pcarbo added the enhancement New feature or request label Jul 2, 2024
@william-denault
Copy link
Collaborator

Hi @pcarbo ,

It is less straightforward in this context.
The main problem about fitted values/prediction is that to run fsusie, the Y matrix is expected to have 2^J columns, and when it is not the case, we actually remap the data using an interpolation scheme so that there are 2^J columns (see intro vignette section #### Handling function of any length and unevenly space data) . So the actual  output of fsusie is a list of the fitted functions of length 2^J

So, the "real" predictions are actually made on this grid. How should we proceed? Should we make different "predict" functions, one that uses the estimate and the other one that outputs an "interpolated version" of the  prediction on the original grid?

@pcarbo
Copy link
Member Author

pcarbo commented Jul 8, 2024

I've removed the section on "prediction" in the introductory vignette since I agree it is a bit complicated, and we don't want to mislead. (This example suggests that it is simpler than it actually is.)

I want to leave this issue open as a reminder to myself to fix some other things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants