Pin pandas>=2.2 #413

xiki-tempula · 2024-09-23T14:30:47Z

#407 won't work with pandas<2.2

Gives

Cell In[3], line 1
----> 1 wf.run(estimators='MBAR')

File ~/miniconda3/envs/md-flux-workflows/lib/python3.10/site-packages/alchemlyb/workflows/abfe.py:275, in ABFE.run(self, skiptime, uncorr, threshold, estimators, overlap, breakdown, forwrev, *args, **kwargs)
    273     self.preprocess(skiptime=skiptime, uncorr=uncorr, threshold=threshold)
    274 if estimators is not None:
--> 275     self.estimate(estimators)
    276     self.generate_result()
    278 if overlap is not None and use_FEP:

File ~/miniconda3/envs/md-flux-workflows/lib/python3.10/site-packages/alchemlyb/workflows/abfe.py:447, in ABFE.estimate(self, estimators, **kwargs)
    442     logger.info("Run MBAR estimator.")
    443     warnings.warn(
    444         "From 2.2.0, n_bootstraps=50 will be the default for estimating MBAR error.",
    445         DeprecationWarning,
    446     )
--> 447     self.estimator[estimator] = MBAR(**kwargs).fit(u_nk)
    448 elif estimator == "BAR":
    449     logger.info("Run BAR estimator.")

File ~/miniconda3/envs/md-flux-workflows/lib/python3.10/site-packages/alchemlyb/estimators/mbar_.py:129, in MBAR.fit(self, u_nk)
    126 u_nk = u_nk.sort_index(level=u_nk.index.names[1:])
    128 groups = u_nk.groupby(level=u_nk.index.names[1:])
--> 129 N_k = [
    130     (
    131         len(groups.get_group(i if isinstance(i, tuple) else (i,)))
    132         if i in groups.groups
    133         else 0
    134     )
    135     for i in u_nk.columns
    136 ]
    137 self._states_ = u_nk.columns.values.tolist()
    139 if isinstance(self.initial_f_k, str) and self.initial_f_k == "BAR":

File ~/miniconda3/envs/md-flux-workflows/lib/python3.10/site-packages/alchemlyb/estimators/mbar_.py:131, in <listcomp>(.0)
    126 u_nk = u_nk.sort_index(level=u_nk.index.names[1:])
    128 groups = u_nk.groupby(level=u_nk.index.names[1:])
    129 N_k = [
    130     (
--> 131         len(groups.get_group(i if isinstance(i, tuple) else (i,)))
    132         if i in groups.groups
    133         else 0
    134     )
    135     for i in u_nk.columns
    136 ]
    137 self._states_ = u_nk.columns.values.tolist()
    139 if isinstance(self.initial_f_k, str) and self.initial_f_k == "BAR":

File ~/miniconda3/envs/md-flux-workflows/lib/python3.10/site-packages/pandas/core/groupby/groupby.py:1063, in BaseGroupBy.get_group(self, name, obj)
   1061 inds = self._get_index(name)
   1062 if not len(inds):
-> 1063     raise KeyError(name)
   1065 if obj is None:
   1066     indexer = inds if self.axis == 0 else (slice(None), inds)

KeyError: (0.0,)

For workflow.

The text was updated successfully, but these errors were encountered:

xiki-tempula · 2024-09-23T14:39:35Z

@jaclark5 Do you think it is possible to make the current code work with both pandas 2.2 and 2.1?
The quite annoy thing is that stable version of rdkit doesn't always support the latest version of pandas.
You shall be able to reproduce the problem with test_fep_estimators.py::TestMBAR::test_mbar

jaclark5 · 2024-09-23T16:33:33Z

This error is likely because the columns for single lambda simulations are not tuples when calling groupby in the preceding line. Before my PR, your code was consistent in not having a tuple at all, but then there was a future warning that the input for get_groups needed to be a tuple, so I fixed that. Now that the input is a tuple like pandas wanted, it seems to be complaining that the tuple isn't recognized by the output of groupby it must be in the preceding version of panadas that if a tuple was of length one it would pull the value... that's quite frustrating.

It seems to me that the solution would be to have all lambda state column headers be tuples all the time. That would take a bit of reformatting.

xiki-tempula · 2024-09-24T07:13:38Z

@jaclark5 I do agree that ave all lambda state column headers be tuples all the time would be good but that might break a lot of people's implementation as in most cases lambda would be a single float.

jaclark5 · 2024-09-30T15:33:16Z

@xiki-tempula @orbeckst I looked at the changes to pandas.core.groupby.BaseGroupBy.get_group() and the difference between pandas v2.2 and <v2.2 is the new addition of the following code:

        keys = self.keys
        level = self.level
        # mypy doesn't recognize level/keys as being sized when passed to len
        if (is_list_like(level) and len(level) == 1) or (  # type: ignore[arg-type]
            is_list_like(keys) and len(keys) == 1  # type: ignore[arg-type]
        ):
            # GH#25971
            if isinstance(name, tuple) and len(name) == 1:
                # Allow users to pass tuples of length 1 to silence warning
                name = name[0]
            elif not isinstance(name, tuple):
                warnings.warn(
                    "When grouping with a length-1 list-like, "
                    "you will need to pass a length-1 tuple to get_group in a future "
                    "version of pandas. Pass `(name,)` instead of `name` to silence "
                    "this warning.",
                    FutureWarning,
                    stacklevel=find_stack_level(),
                )

This warning is quite annoying since as soon as you given it the tuple it requests, it undoes that... it seems like heeding this warning was premature as pandas has not yet transitioned to the internal handling of groups that they envision and are seemingly laying the ground work for. It seems my PR should be removed to allow alchemlyb to function with the greatest breadth of pandas versions and it can be reintroduced or adjusted in the future when groupby is more compatible to the requested behavior of get_group in pandas >= v2.2

I no longer think adjusting alchemlyb to use tuples in lambda state columns will resolve the issue, since the output of u_nk.index.names[1:] for a single lambda state will still be integer values in groupby and not tuples like the get_group method for pandas >= v2.2 wants.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pin pandas>=2.2 #413

Pin pandas>=2.2 #413

xiki-tempula commented Sep 23, 2024 •

edited

Loading

xiki-tempula commented Sep 23, 2024 •

edited

Loading

jaclark5 commented Sep 23, 2024 •

edited

Loading

xiki-tempula commented Sep 24, 2024

jaclark5 commented Sep 30, 2024 •

edited

Loading

Pin pandas>=2.2 #413

Pin pandas>=2.2 #413

Comments

xiki-tempula commented Sep 23, 2024 • edited Loading

xiki-tempula commented Sep 23, 2024 • edited Loading

jaclark5 commented Sep 23, 2024 • edited Loading

xiki-tempula commented Sep 24, 2024

jaclark5 commented Sep 30, 2024 • edited Loading

xiki-tempula commented Sep 23, 2024 •

edited

Loading

xiki-tempula commented Sep 23, 2024 •

edited

Loading

jaclark5 commented Sep 23, 2024 •

edited

Loading

jaclark5 commented Sep 30, 2024 •

edited

Loading