-
-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return interpret data #758
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #758 +/- ##
==========================================
+ Coverage 89.90% 89.91% +0.01%
==========================================
Files 45 45
Lines 3713 3729 +16
==========================================
+ Hits 3338 3353 +15
- Misses 375 376 +1 ☔ View full report in Codecov by Sentry. |
@@ -473,11 +475,16 @@ def predictions( | |||
sample_new_groups : bool, optional | |||
If the model contains group-level effects, and data is passed for unseen groups, whether | |||
to sample from the new groups. Defaults to ``False``. | |||
return_idata : bool, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use a different name for this? From the name, I assumed the returned object would be an instance of InferenceData, but I see it's a data frame. Maybe return_data
? Or, are you thinking we would be able to return an InferenceData instance in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe "return_posterior_draws_dataframe".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zwelitunyiswa thanks for the suggestion! Tomas and I are looking into returning the InferenceData
object after all. So stay tuned to this PR 😄
@@ -393,16 +394,10 @@ def make_group_values(x: np.ndarray, groups_n: int = 5) -> np.ndarray: | |||
|
|||
|
|||
def get_group_offset(n, lower: float = 0.05, upper: float = 0.4) -> np.ndarray: | |||
# Complementary log log function, scaled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for removing these comments :D
def test_return_idata_common_effects(mtcars, return_idata): | ||
model, idata = mtcars | ||
|
||
bmb.interpret.predictions( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a check for the type of the object returned and the number of rows/columns? Notice we would need to fix the number of chains above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the nice feature! Just a couple of suggestions.
Thanks for the review. I will incorporate these once we finalize the implementation per our conversation on Slack. |
45ca258
to
6208fe8
Compare
Closing in favor of #762 |
This PR addresses issue #703 and #751 by adding a parameter
return_idata: bool = False
in comparisons(), predictions(), and slopes() that merges the posterior draws with the corresponding observation that "produced" that draw and returns it as a dataframe.Most of the code diff is from adding a new test file that tests non-plotting functionality of the interpret sub-package not tested in
test_plots.py
.With
return_idata=True
, one data frame is returned. This dataframe contains the inference data from the posterior groupInferenceData
object, observed data, and parameter estimates. In the case that a user is callingpredictions
withpps=True
, then the posterior predictive group is used. {marginaleffects} has a similar functionality for Bayesian models.Below are a few examples:
1200000 rows × 15 columns
Returning the inference data when calling comparisons will allow the user to conduct more specific or complex comparisons leveraging group by aggregations:
48000 rows × 15 columns
Initially, I wanted to return the
az.InferenceData
object. However, due to the following limitations I settled on a DataFrame:I could add new groups to the inference data object, but then it isn't clear to me how to perform group by aggregations on the posterior dataset while taking into account this new group.
Additionally, the data shouldn't be merged as a data variable in the az.InferenceData.posterior dataset because, when aggregations are performed along the coordinates, these aggregations will also be applied to the data used to generate the predictions (since they were merged as a data variable).
Lastly, the data could be merged and made as a coordinate so you can specify along which dimension(s) you want to compute the aggregation. Although, I can't seem to groupby more than one coordinate. For example,
xr.Dataset.groupby([coord1, coord2])
results in the following error:TypeError: group must be an xarray.DataArray or the name of an xarray variable or dimension. Received ['coord1', 'coord2'] instead.
Note: depending on the model specification and number of chains and draws, it is possible there will be millions of rows returned.
To do:
pps=True
(posterior predictive samples) inpredictions
. Currently, I only access the posterior group of the InferenceData to build the dataframe.