Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add constrained transform #764

Merged
merged 3 commits into from
Dec 2, 2023
Merged

Conversation

tomicapretto
Copy link
Collaborator

@tomicapretto tomicapretto commented Nov 29, 2023

This PR adds a new transformation that can be used in response terms. This is similar to truncated but it directly constrains the bounds of the response distribution (remember, truncated refers to the missing data mechanism).

The usage is constrained(value, lower, upper) ~ whatever. lower and upper must be scalar or None.

It uses the original bounds when obtaining draws from the posterior predictive distribution.

Edit closes #750

bambi/terms/utils.py Outdated Show resolved Hide resolved
@@ -1132,6 +1132,33 @@ def test_truncated_response(self, truncated_data):
assert isinstance(model.backend.model.observed_RVs[0]._owner.op, pm.TruncatedNormal.rv_type)


class TestConstrainedResponse(FitPredictParent):
Copy link
Collaborator

@GStechschulte GStechschulte Dec 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a question regarding the test, but it made me think if we should make sure this also works with a categorical or multinomial model by chance? Did you by chance do this? If not, I can test one.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me wonder, does it make sense to constrain such distributions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there exists such a thing as a truncated categorical distribution. What would be the domain we would like to constrain? For the record, before thinking about it I tried the following, which doesn't make sense and Bambi/formulae failed (the reason is not connected to our problem, it's that Bambi is looking for the levels of the response variable but it's numeric)

import bambi as bmb
import numpy as np
import pandas as pd

rng = np.random.default_rng(1234)
y = rng.choice(10, 200)
y_ = y[(y >= 2) & (y <= 8)]

df = pd.DataFrame({"y": y_})
model = bmb.Model("constrained(y, 2, 8) ~ 1", df, family="categorical")
model.build()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tomas/Desktop/OSS/bambinos/bambi/bambi/models.py", line 360, in build
    self.backend.build(self)
  File "/home/tomas/Desktop/OSS/bambinos/bambi/bambi/backend/pymc.py", line 62, in build
    for name, values in spec.response_component.response_term.coords.items():
  File "/home/tomas/Desktop/OSS/bambinos/bambi/bambi/terms/response.py", line 59, in coords
    return self.family.get_coords(self)
  File "/home/tomas/Desktop/OSS/bambinos/bambi/bambi/families/univariate.py", line 143, in get_coords
    return {name: [level for level in response.levels if level != response.reference]}
TypeError: 'NoneType' object is not iterable

However, one may want to truncate/constrain a binomial or a multinomial distribution. For example, if n=30 we may want to truncate the binomial so the number of successes can only be between 5 and 25. I don't know in which scenarios that would apply, but I think there's nothing that prevents us from implementing it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, it doesn't work and I don't think supporting it will be easy. See

rng = np.random.default_rng(1234)
y = rng.binomial(20, 0.5, size=200)
y_ = y[(y >= 6) & (y <= 12)]


df = pd.DataFrame({"y": y_})
model = bmb.Model("constrained(p(y, 20), 6, 12) ~ 1", df, family="binomial")
model.build()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tomas/Desktop/OSS/bambinos/bambi/bambi/models.py", line 170, in __init__
    design = fm.design_matrices(
  File "/home/tomas/anaconda3/envs/bambi/lib/python3.10/site-packages/formulae/matrices.py", line 556, in design_matrices
    design = DesignMatrices(description, data, env)
  File "/home/tomas/anaconda3/envs/bambi/lib/python3.10/site-packages/formulae/matrices.py", line 58, in __init__
    self.response.evaluate(data, env)
  File "/home/tomas/anaconda3/envs/bambi/lib/python3.10/site-packages/formulae/matrices.py", line 137, in evaluate
    self.term.set_type(self.data, self.env)
  File "/home/tomas/anaconda3/envs/bambi/lib/python3.10/site-packages/formulae/terms/terms.py", line 820, in set_type
    self.term.set_type(data, env)
  File "/home/tomas/anaconda3/envs/bambi/lib/python3.10/site-packages/formulae/terms/terms.py", line 433, in set_type
    component.set_type(data, env)
  File "/home/tomas/anaconda3/envs/bambi/lib/python3.10/site-packages/formulae/terms/call.py", line 104, in set_type
    x = self.call.eval(data_mask, self.env)
  File "/home/tomas/anaconda3/envs/bambi/lib/python3.10/site-packages/formulae/terms/call_resolver.py", line 272, in eval
    return callee(*args, **kwargs)
  File "/home/tomas/Desktop/OSS/bambinos/bambi/bambi/transformations.py", line 145, in constrained
    return truncated(x, lb, ub)
  File "/home/tomas/Desktop/OSS/bambinos/bambi/bambi/transformations.py", line 93, in truncated
    raise ValueError("'truncated' only works with 1-dimensional arrays")
ValueError: 'truncated' only works with 1-dimensional arrays

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomicapretto I have a situation where the response is indeed a constrained binomial (sum score) for a clinical risk scale. I am thinking that there would be situations where a user might have a constrained multinomial as well. I have not run into that personally.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation in brms says truncation should work with the binomial family but I cannot make it work and I don't find examples where someone does it (doc: https://paul-buerkner.github.io/brms/reference/brmsformula.html). See:

library(brms)

y <- rbinom(200, 20, 0.5)
df <- data.frame(y = y[(y >= 8 & y <= 14)])

# model <- brm(y | trials(20) ~ 1, data=df, family="binomial")

model1 <- brm(
  y | trials(20) | trunc(lb=0, ub=14) ~ 1, data=df, family="binomial"
)
# Error in trunc(lb = 0, ub = 14) : 
#   supplied argument name 'lb' does not match 'x'

model2 <- brm(
  y | trunc(lb=0, ub=14) | trials(20)  ~ 1, data=df, family="binomial"
)
# Error in trials(20) : could not find function "trials"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomicapretto I have a situation where the response is indeed a constrained binomial (sum score) for a clinical risk scale. I am thinking that there would be situations where a user might have a constrained multinomial as well. I have not run into that personally.

Now that I think of the all the sum scores that I seen, I have only ever seen lower bound constrained values on the numerator.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zwelitunyiswa do you mean that the lower bound is greater than zero?

I think in the future we can try to support that. But given how things are implemented now, it's not possible.

I would recommend using a custom family where the likelihood is a truncated binomial. The pattern would be the one shown here #750 (comment)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomicapretto yeah, there is a sum score for a risk scale in wound care whose min score is 4/23.

Copy link
Collaborator

@GStechschulte GStechschulte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thanks a lot. I left two comments that don't necessarily require changes, but rather if we have thought of that comment.

@codecov-commenter
Copy link

codecov-commenter commented Dec 2, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (2d4b260) 89.83% compared to head (c0b52ac) 89.93%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #764      +/-   ##
==========================================
+ Coverage   89.83%   89.93%   +0.09%     
==========================================
  Files          45       45              
  Lines        3768     3784      +16     
==========================================
+ Hits         3385     3403      +18     
+ Misses        383      381       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tomicapretto tomicapretto merged commit 312afa2 into bambinos:main Dec 2, 2023
4 checks passed
@tomicapretto tomicapretto deleted the constrained branch December 2, 2023 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ValueError: Non-parent parameter not found in posterior
4 participants