-
-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add constrained transform #764
Conversation
@@ -1132,6 +1132,33 @@ def test_truncated_response(self, truncated_data): | |||
assert isinstance(model.backend.model.observed_RVs[0]._owner.op, pm.TruncatedNormal.rv_type) | |||
|
|||
|
|||
class TestConstrainedResponse(FitPredictParent): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a question regarding the test, but it made me think if we should make sure this also works with a categorical or multinomial model by chance? Did you by chance do this? If not, I can test one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me wonder, does it make sense to constrain such distributions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there exists such a thing as a truncated categorical distribution. What would be the domain we would like to constrain? For the record, before thinking about it I tried the following, which doesn't make sense and Bambi/formulae failed (the reason is not connected to our problem, it's that Bambi is looking for the levels of the response variable but it's numeric)
import bambi as bmb
import numpy as np
import pandas as pd
rng = np.random.default_rng(1234)
y = rng.choice(10, 200)
y_ = y[(y >= 2) & (y <= 8)]
df = pd.DataFrame({"y": y_})
model = bmb.Model("constrained(y, 2, 8) ~ 1", df, family="categorical")
model.build()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/tomas/Desktop/OSS/bambinos/bambi/bambi/models.py", line 360, in build
self.backend.build(self)
File "/home/tomas/Desktop/OSS/bambinos/bambi/bambi/backend/pymc.py", line 62, in build
for name, values in spec.response_component.response_term.coords.items():
File "/home/tomas/Desktop/OSS/bambinos/bambi/bambi/terms/response.py", line 59, in coords
return self.family.get_coords(self)
File "/home/tomas/Desktop/OSS/bambinos/bambi/bambi/families/univariate.py", line 143, in get_coords
return {name: [level for level in response.levels if level != response.reference]}
TypeError: 'NoneType' object is not iterable
However, one may want to truncate/constrain a binomial or a multinomial distribution. For example, if n=30
we may want to truncate the binomial so the number of successes can only be between 5 and 25. I don't know in which scenarios that would apply, but I think there's nothing that prevents us from implementing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, it doesn't work and I don't think supporting it will be easy. See
rng = np.random.default_rng(1234)
y = rng.binomial(20, 0.5, size=200)
y_ = y[(y >= 6) & (y <= 12)]
df = pd.DataFrame({"y": y_})
model = bmb.Model("constrained(p(y, 20), 6, 12) ~ 1", df, family="binomial")
model.build()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/tomas/Desktop/OSS/bambinos/bambi/bambi/models.py", line 170, in __init__
design = fm.design_matrices(
File "/home/tomas/anaconda3/envs/bambi/lib/python3.10/site-packages/formulae/matrices.py", line 556, in design_matrices
design = DesignMatrices(description, data, env)
File "/home/tomas/anaconda3/envs/bambi/lib/python3.10/site-packages/formulae/matrices.py", line 58, in __init__
self.response.evaluate(data, env)
File "/home/tomas/anaconda3/envs/bambi/lib/python3.10/site-packages/formulae/matrices.py", line 137, in evaluate
self.term.set_type(self.data, self.env)
File "/home/tomas/anaconda3/envs/bambi/lib/python3.10/site-packages/formulae/terms/terms.py", line 820, in set_type
self.term.set_type(data, env)
File "/home/tomas/anaconda3/envs/bambi/lib/python3.10/site-packages/formulae/terms/terms.py", line 433, in set_type
component.set_type(data, env)
File "/home/tomas/anaconda3/envs/bambi/lib/python3.10/site-packages/formulae/terms/call.py", line 104, in set_type
x = self.call.eval(data_mask, self.env)
File "/home/tomas/anaconda3/envs/bambi/lib/python3.10/site-packages/formulae/terms/call_resolver.py", line 272, in eval
return callee(*args, **kwargs)
File "/home/tomas/Desktop/OSS/bambinos/bambi/bambi/transformations.py", line 145, in constrained
return truncated(x, lb, ub)
File "/home/tomas/Desktop/OSS/bambinos/bambi/bambi/transformations.py", line 93, in truncated
raise ValueError("'truncated' only works with 1-dimensional arrays")
ValueError: 'truncated' only works with 1-dimensional arrays
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomicapretto I have a situation where the response is indeed a constrained binomial (sum score) for a clinical risk scale. I am thinking that there would be situations where a user might have a constrained multinomial as well. I have not run into that personally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation in brms says truncation should work with the binomial family but I cannot make it work and I don't find examples where someone does it (doc: https://paul-buerkner.github.io/brms/reference/brmsformula.html). See:
library(brms)
y <- rbinom(200, 20, 0.5)
df <- data.frame(y = y[(y >= 8 & y <= 14)])
# model <- brm(y | trials(20) ~ 1, data=df, family="binomial")
model1 <- brm(
y | trials(20) | trunc(lb=0, ub=14) ~ 1, data=df, family="binomial"
)
# Error in trunc(lb = 0, ub = 14) :
# supplied argument name 'lb' does not match 'x'
model2 <- brm(
y | trunc(lb=0, ub=14) | trials(20) ~ 1, data=df, family="binomial"
)
# Error in trials(20) : could not find function "trials"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomicapretto I have a situation where the response is indeed a constrained binomial (sum score) for a clinical risk scale. I am thinking that there would be situations where a user might have a constrained multinomial as well. I have not run into that personally.
Now that I think of the all the sum scores that I seen, I have only ever seen lower bound constrained values on the numerator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zwelitunyiswa do you mean that the lower bound is greater than zero?
I think in the future we can try to support that. But given how things are implemented now, it's not possible.
I would recommend using a custom family where the likelihood is a truncated binomial. The pattern would be the one shown here #750 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomicapretto yeah, there is a sum score for a risk scale in wound care whose min score is 4/23.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Thanks a lot. I left two comments that don't necessarily require changes, but rather if we have thought of that comment.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #764 +/- ##
==========================================
+ Coverage 89.83% 89.93% +0.09%
==========================================
Files 45 45
Lines 3768 3784 +16
==========================================
+ Hits 3385 3403 +18
+ Misses 383 381 -2 ☔ View full report in Codecov by Sentry. |
This PR adds a new transformation that can be used in response terms. This is similar to
truncated
but it directly constrains the bounds of the response distribution (remember,truncated
refers to the missing data mechanism).The usage is
constrained(value, lower, upper) ~ whatever
.lower
andupper
must be scalar orNone
.It uses the original bounds when obtaining draws from the posterior predictive distribution.
Edit closes #750