-
-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Computing likelihoods #766
Comments
I believe you can use the predict function. Just construct a dataframe with
all ids and covariate values and feed it through the predict function. The
predict function is very flexible and you can even predict on unseen
groups.
…On Tue, Dec 12, 2023 at 07:59 DanielRobertNicoud ***@***.***> wrote:
For some application of BAMBI on which I am working, I would need to do
the following.
I have a BAMBI model with some parameter $\theta$, a prior $p(\theta)$,
and a likelihood $p(y\mid x,\theta)$. After getting some training data,
fitting gives me parameters samples $\theta_{1:T}$. Given a new
observation $(x^\star,y^\star)$, I need to compute $p(y^\star\mid
x^\star, \theta_t)$ for all $t$ (and actually at multiple possible values
of $y^\star$). This should be as fast as possible, as I need this
computation for a rather large number of points.
As far as I can tell, there is no easy way of doing this. What I have
resorted to until now is to implement this by hand depending on the model
specification, but it lacks flexibility. Is anyone aware of a proper way to
accomplish what I described above exploiting BAMBI directly? If not, would
it be possible to implement it in the package?
Thanks all in advance!
—
Reply to this email directly, view it on GitHub
<#766>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AH3QQV6QN7TZSBBSBAY62HLYJBIKDAVCNFSM6AAAAABARNZI42VHI2DSMVQWIX3LMV43ASLTON2WKOZSGAZTONZQGM2TAOI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
@zwelitunyiswa Unfortunately, as far as I know this doesn't work. I am not looking for sampling the posterior predictive at an unseen point |
I think this could be easily (?) implemented using the |
@DanielRobertNicoud I want to check if I understand your problem. Using a training dataset you get draws from a posterior distribution. Then you have, let's say, a test dataset where you know both the value of the predictors and the response. Using the draws of the posterior you got with the training data, you want to compute the likelihood for the values of the predictors and responses in the test set. Is my understanding correct? If that's correct, I think it's not extremely simple but it should be possible. I'll wait for your input before trying something. |
@tomicapretto Yes, that seems correct. As a simplified example, suppose I am doing a Bayesian linear regression I am trying to implement this with raw |
@tomicapretto The following works in pure import pymc
import pandas as pd
import numpy as np
import arviz as az
import scipy.stats as sps
# generate some gaussian data for training
x_train = pd.DataFrame(np.random.normal(size=(200, 1)), columns=["x1"])
x_train["intercept"] = 1.
y_train = pd.Series(1 + .5 * x_train.x1 + np.random.normal(scale=0.1, size=200), name="y")
# single testing point
x0 = .5
x_test = pd.DataFrame({"x1": [x0], "intercept": [1.]})
y_test = pd.Series(np.zeros(x_test.shape[0]), name="y")
coords = {"coeffs": x_train.columns}
with pymc.Model(coords=coords) as model:
# data containers
X = pymc.MutableData("X", x_train)
y = pymc.MutableData("y", y_train)
# priors
b = pymc.Normal("b", mu=0, sigma=1, dims="coeffs")
sigma = pymc.HalfCauchy("sigma", beta=10)
# linear model
mu = pymc.math.dot(X, b)
# likelihood
likelihood = pymc.Normal("obs", mu=mu, sigma=sigma, observed=y)
# fit
idata = pymc.sample(return_inferencedata=True, tune=200, draws=20, chains=1)
# testing
pymc.set_data({"X": x_test.to_numpy(), "y": y_test.to_numpy()})
out = pymc.compute_log_likelihood(
az.InferenceData(posterior=idata.posterior), # keep min amount possible of info
var_names=["obs"]
)
display([
np.log(sps.norm(loc=0., scale=sigma).pdf(intercept + x0 * beta))
for (beta, intercept), sigma in zip(
idata.posterior.b.to_numpy().reshape(-1, 2),
idata.posterior.sigma.to_numpy().flatten()
)
])
display(out.log_likelihood.obs) |
Thanks for the detailed example, I can't check it right now but I'll check it soon-ish. |
@tomicapretto Hi, did you have time to take a look at this? |
Hi @DanielRobertNicoud sorry for the delay! With holidays and work, I couldn't find time to work on Bambi. Now I could make a gap and decided to review this issue. Turns out it was not hard to add a |
Hi @DanielRobertNicoud did you have time to check it? |
For some application of BAMBI on which I am working, I would need to do the following.
I have a BAMBI model with some parameter$\theta$ , a prior $p(\theta)$ , and a likelihood $p(y\mid x,\theta)$ . After getting some training data, fitting gives me parameters samples $\theta_{1:T}$ . Given a new observation $(x^\star,y^\star)$ , I need to compute $p(y^\star\mid x^\star, \theta_t)$ for all $t$ (and actually at multiple possible values of $y^\star$ ). This should be as fast as possible, as I need this computation for a rather large number of points.
As far as I can tell, there is no easy way of doing this. What I have resorted to until now is to implement this by hand depending on the model specification, but it lacks flexibility. Is anyone aware of a proper way to accomplish what I described above exploiting BAMBI directly? If not, would it be possible to implement it in the package?
Thanks all in advance!
The text was updated successfully, but these errors were encountered: