Computing likelihoods #766

DanielRobertNicoud · 2023-12-12T12:59:01Z

For some application of BAMBI on which I am working, I would need to do the following.

I have a BAMBI model with some parameter $\theta$, a prior $p(\theta)$, and a likelihood $p(y\mid x,\theta)$. After getting some training data, fitting gives me parameters samples $\theta_{1:T}$. Given a new observation $(x^\star,y^\star)$, I need to compute $p(y^\star\mid x^\star, \theta_t)$ for all $t$ (and actually at multiple possible values of $y^\star$). This should be as fast as possible, as I need this computation for a rather large number of points.

As far as I can tell, there is no easy way of doing this. What I have resorted to until now is to implement this by hand depending on the model specification, but it lacks flexibility. Is anyone aware of a proper way to accomplish what I described above exploiting BAMBI directly? If not, would it be possible to implement it in the package?

Thanks all in advance!

zwelitunyiswa · 2023-12-12T13:08:10Z

I believe you can use the predict function. Just construct a dataframe with all ids and covariate values and feed it through the predict function. The predict function is very flexible and you can even predict on unseen groups.

…

On Tue, Dec 12, 2023 at 07:59 DanielRobertNicoud ***@***.***> wrote: For some application of BAMBI on which I am working, I would need to do the following. I have a BAMBI model with some parameter $\theta$, a prior $p(\theta)$, and a likelihood $p(y\mid x,\theta)$. After getting some training data, fitting gives me parameters samples $\theta_{1:T}$. Given a new observation $(x^\star,y^\star)$, I need to compute $p(y^\star\mid x^\star, \theta_t)$ for all $t$ (and actually at multiple possible values of $y^\star$). This should be as fast as possible, as I need this computation for a rather large number of points. As far as I can tell, there is no easy way of doing this. What I have resorted to until now is to implement this by hand depending on the model specification, but it lacks flexibility. Is anyone aware of a proper way to accomplish what I described above exploiting BAMBI directly? If not, would it be possible to implement it in the package? Thanks all in advance! — Reply to this email directly, view it on GitHub <#766>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AH3QQV6QN7TZSBBSBAY62HLYJBIKDAVCNFSM6AAAAABARNZI42VHI2DSMVQWIX3LMV43ASLTON2WKOZSGAZTONZQGM2TAOI> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

DanielRobertNicoud · 2023-12-12T13:11:08Z

@zwelitunyiswa Unfortunately, as far as I know this doesn't work. I am not looking for sampling the posterior predictive at an unseen point $x^\star$. What I want is to compute the likelihood of a given (new) data point relative to the samples for the parameters coming from fitting on some training set.

DanielRobertNicoud · 2023-12-12T13:44:21Z

I think this could be easily (?) implemented using the pymc.compute_log_likelihood function.

tomicapretto · 2023-12-13T02:46:46Z

@DanielRobertNicoud I want to check if I understand your problem. Using a training dataset you get draws from a posterior distribution. Then you have, let's say, a test dataset where you know both the value of the predictors and the response. Using the draws of the posterior you got with the training data, you want to compute the likelihood for the values of the predictors and responses in the test set. Is my understanding correct?

If that's correct, I think it's not extremely simple but it should be possible. I'll wait for your input before trying something.

DanielRobertNicoud · 2023-12-13T10:39:28Z

@tomicapretto Yes, that seems correct. As a simplified example, suppose I am doing a Bayesian linear regression $y = x^T\beta + \epsilon$ with $\epsilon\sim N(0,\sigma^2)$ with $\sigma$ known. Fitting the model will give me samples $\beta_1,\ldots,\beta_T$. Then given a new point $(x^\star,y^\star)$ what I want to compute is $\frac{1}{\sigma\sqrt{2\pi}}\exp\left(-\frac{1}{2}\left(\frac{y^\star - (x^\star)^T\beta_t}{\sigma}\right)^2\right)$ for all $t$. Of course, I'd like to do this in such a way that any model and likelihood function is supported.

I am trying to implement this with raw pymc, if I manage I can share a minimal working example if that helps!

DanielRobertNicoud · 2023-12-13T13:11:30Z

@tomicapretto The following works in pure pymc (as you can see, I test against manually computed values).

import pymc
import pandas as pd
import numpy as np
import arviz as az
import scipy.stats as sps

# generate some gaussian data for training
x_train = pd.DataFrame(np.random.normal(size=(200, 1)), columns=["x1"])
x_train["intercept"] = 1.
y_train = pd.Series(1 + .5 * x_train.x1 + np.random.normal(scale=0.1, size=200), name="y")

# single testing point
x0 = .5
x_test = pd.DataFrame({"x1": [x0], "intercept": [1.]})
y_test = pd.Series(np.zeros(x_test.shape[0]), name="y")

coords = {"coeffs": x_train.columns}

with pymc.Model(coords=coords) as model:
    # data containers
    X = pymc.MutableData("X", x_train)
    y = pymc.MutableData("y", y_train)
    # priors
    b = pymc.Normal("b", mu=0, sigma=1, dims="coeffs")
    sigma = pymc.HalfCauchy("sigma", beta=10)
    # linear model
    mu = pymc.math.dot(X, b)
    # likelihood
    likelihood = pymc.Normal("obs", mu=mu, sigma=sigma, observed=y)
    
    # fit
    idata = pymc.sample(return_inferencedata=True, tune=200, draws=20, chains=1)
    
    # testing
    pymc.set_data({"X": x_test.to_numpy(), "y": y_test.to_numpy()})
    out = pymc.compute_log_likelihood(
        az.InferenceData(posterior=idata.posterior),   # keep min amount possible of info
        var_names=["obs"]
    )

display([
    np.log(sps.norm(loc=0., scale=sigma).pdf(intercept + x0 * beta))
    for (beta, intercept), sigma in zip(
        idata.posterior.b.to_numpy().reshape(-1, 2),
        idata.posterior.sigma.to_numpy().flatten()
    )
])
display(out.log_likelihood.obs)

tomicapretto · 2023-12-13T17:22:23Z

Thanks for the detailed example, I can't check it right now but I'll check it soon-ish.

DanielRobertNicoud · 2023-12-22T16:32:39Z

@tomicapretto Hi, did you have time to take a look at this?

tomicapretto · 2024-01-05T13:18:22Z

Hi @DanielRobertNicoud sorry for the delay! With holidays and work, I couldn't find time to work on Bambi. Now I could make a gap and decided to review this issue. Turns out it was not hard to add a compute_log_likelihood method to the Model class. Have a look at #769 and let me know if it works for your use case. You'll need to install Bambi from the branch in that PR.

tomicapretto · 2024-02-18T19:28:30Z

Hi @DanielRobertNicoud did you have time to check it?

tomicapretto mentioned this issue Jan 5, 2024

Implement compute_log_likelihood method #769

Merged

tomicapretto added the feature request label Feb 18, 2024

tomicapretto closed this as completed in #769 Apr 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Computing likelihoods #766

Computing likelihoods #766

DanielRobertNicoud commented Dec 12, 2023

zwelitunyiswa commented Dec 12, 2023 via email

DanielRobertNicoud commented Dec 12, 2023

DanielRobertNicoud commented Dec 12, 2023

tomicapretto commented Dec 13, 2023

DanielRobertNicoud commented Dec 13, 2023

DanielRobertNicoud commented Dec 13, 2023 •

edited

Loading

tomicapretto commented Dec 13, 2023

DanielRobertNicoud commented Dec 22, 2023

tomicapretto commented Jan 5, 2024

tomicapretto commented Feb 18, 2024

Computing likelihoods #766

Computing likelihoods #766

Comments

DanielRobertNicoud commented Dec 12, 2023

zwelitunyiswa commented Dec 12, 2023 via email

DanielRobertNicoud commented Dec 12, 2023

DanielRobertNicoud commented Dec 12, 2023

tomicapretto commented Dec 13, 2023

DanielRobertNicoud commented Dec 13, 2023

DanielRobertNicoud commented Dec 13, 2023 • edited Loading

tomicapretto commented Dec 13, 2023

DanielRobertNicoud commented Dec 22, 2023

tomicapretto commented Jan 5, 2024

tomicapretto commented Feb 18, 2024

DanielRobertNicoud commented Dec 13, 2023 •

edited

Loading