Support for Awkward Arrays in AnnData's X #1235

xinyuejohn · 2023-11-17T11:06:00Z

Please describe your wishes and possible alternatives to achieve the desired result.

I'm thrilled to see that AnnData now supports awkward arrays. This feature has been incredibly useful. I'd like to inquire if there are plans to extend this support to the X of AnnData. Implementing this would significantly benefit our ongoing projects with ehrapy 2.0 (https://github.com/theislab/ehrapy) and EHRData.

To explain further, in our current use of AnnData with ehrapy, each patient is represented as a row with several variables. However, as shown in the figure below, some of the variables couldn't be fit into current X (numpy array) because they are lists-of-lists or lists-of-dicts. But users expect processing on these data, for example, getting statistics (min/max/avg), perform imputation, etc. So we don't want to save these variables in .layers, .obsm, or in .varm. Because it is not user-friendly and adds complexity to integrating this data into computational workflow.

Is there an estimated timeline for when we might expect this feature? Thanks for your continuous efforts in improving AnnData!

flying-sheep · 2023-11-20T14:48:02Z

Hi! Thanks for the feature request. I think that’s feasible, but I need to discuss this with @ivirshup and @ilan-gold. We need to formalize what the supported array types in all of anndata’s fields are.

grst · 2023-11-22T13:00:36Z

I had hoped that this gets eventually solved with #244.

Back in the PR that introduced awkward arrays, we decided against implementing it in X (for now) as it would have required duplication of a lot of custom code. Checking the constraints on X is already a huge mess and adding the checks for awkward arrays makes it worse.

Personally, I'd suggest you set adata.X = None and just put it in a layer.

Zethson · 2023-12-03T23:10:34Z

@grst

Personally, I'd suggest you set adata.X = None and just put it in a layer.

This'd mean that people that load in complex EHR data will have an "empty" object. Yeah, everything is in a layer, but one needs to either always use the layer argument when doing stuff with it or copy it to X which err doesn't work. Just not the nicest experience.

It'd also deviate from the rest of the scverse workflows where the working data is usually in X.

I want everything in layers but scverse is not there yet.

grst · 2023-12-04T07:41:30Z

It'd also deviate from the rest of the scverse workflows where the working data is usually in X.

In scirpy, X is empty by default (unless you store paired gene expression in the same AnnData object, which is not recommended in favor of MuData). The TCR data is in .obsm.

It of course depends on your interface, but at least in the scirpy case only very advanced users would want to interact with the awkward array directly. All others only access it through scirpy API calls (including a get function to retreive some variables) and there you can just set appropriate default to get it from a layer or obsm.

grst · 2023-12-07T10:48:34Z

I want everything in layers but scverse is not there yet.

and why repeat the old mistake for new packages

ivirshup · 2023-12-11T14:14:49Z

I would suggest you try working with it in layers for now too. Most scverse workflows assume the data is in X, but also most scverse workflows assume that X and layers contain matrix-like arrays with homogenous dtypes.

I would be interested in hearing how this goes.

Zethson · 2023-12-11T14:16:32Z

I want everything in layers but scverse is not there yet.

and why repeat the old mistake for new packages

Because it builds upon scanpy which has the assumption that it works with X by default.
But yeah, I could probably pass a default layer everywhere and modify that behavior.

xinyuejohn added the enhancement label Nov 17, 2023

flying-sheep added the type: awkward array 😐 label Nov 20, 2023

flying-sheep added the topic: api label Nov 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Awkward Arrays in AnnData's X #1235

Support for Awkward Arrays in AnnData's X #1235

xinyuejohn commented Nov 17, 2023 •

edited

Loading

flying-sheep commented Nov 20, 2023

grst commented Nov 22, 2023 •

edited

Loading

Zethson commented Dec 3, 2023

grst commented Dec 4, 2023

grst commented Dec 7, 2023

ivirshup commented Dec 11, 2023

Zethson commented Dec 11, 2023

Support for Awkward Arrays in AnnData's X #1235

Support for Awkward Arrays in AnnData's X #1235

Comments

xinyuejohn commented Nov 17, 2023 • edited Loading

Please describe your wishes and possible alternatives to achieve the desired result.

flying-sheep commented Nov 20, 2023

grst commented Nov 22, 2023 • edited Loading

Zethson commented Dec 3, 2023

grst commented Dec 4, 2023

grst commented Dec 7, 2023

ivirshup commented Dec 11, 2023

Zethson commented Dec 11, 2023

xinyuejohn commented Nov 17, 2023 •

edited

Loading

grst commented Nov 22, 2023 •

edited

Loading