Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Awkward Arrays in AnnData's X #1235

Open
xinyuejohn opened this issue Nov 17, 2023 · 7 comments
Open

Support for Awkward Arrays in AnnData's X #1235

xinyuejohn opened this issue Nov 17, 2023 · 7 comments

Comments

@xinyuejohn
Copy link

xinyuejohn commented Nov 17, 2023

Please describe your wishes and possible alternatives to achieve the desired result.

I'm thrilled to see that AnnData now supports awkward arrays. This feature has been incredibly useful. I'd like to inquire if there are plans to extend this support to the X of AnnData. Implementing this would significantly benefit our ongoing projects with ehrapy 2.0 (https://github.com/theislab/ehrapy) and EHRData.

To explain further, in our current use of AnnData with ehrapy, each patient is represented as a row with several variables. However, as shown in the figure below, some of the variables couldn't be fit into current X (numpy array) because they are lists-of-lists or lists-of-dicts. But users expect processing on these data, for example, getting statistics (min/max/avg), perform imputation, etc. So we don't want to save these variables in .layers, .obsm, or in .varm. Because it is not user-friendly and adds complexity to integrating this data into computational workflow.
Screenshot 2023-11-17 at 16 34 42

Is there an estimated timeline for when we might expect this feature? Thanks for your continuous efforts in improving AnnData!

@flying-sheep
Copy link
Member

Hi! Thanks for the feature request. I think that’s feasible, but I need to discuss this with @ivirshup and @ilan-gold. We need to formalize what the supported array types in all of anndata’s fields are.

@grst
Copy link
Contributor

grst commented Nov 22, 2023

I had hoped that this gets eventually solved with #244.

Back in the PR that introduced awkward arrays, we decided against implementing it in X (for now) as it would have required duplication of a lot of custom code. Checking the constraints on X is already a huge mess and adding the checks for awkward arrays makes it worse.

Personally, I'd suggest you set adata.X = None and just put it in a layer.

@Zethson
Copy link
Member

Zethson commented Dec 3, 2023

@grst

Personally, I'd suggest you set adata.X = None and just put it in a layer.

This'd mean that people that load in complex EHR data will have an "empty" object. Yeah, everything is in a layer, but one needs to either always use the layer argument when doing stuff with it or copy it to X which err doesn't work. Just not the nicest experience.

It'd also deviate from the rest of the scverse workflows where the working data is usually in X.

I want everything in layers but scverse is not there yet.

@grst
Copy link
Contributor

grst commented Dec 4, 2023

It'd also deviate from the rest of the scverse workflows where the working data is usually in X.

In scirpy, X is empty by default (unless you store paired gene expression in the same AnnData object, which is not recommended in favor of MuData). The TCR data is in .obsm.

It of course depends on your interface, but at least in the scirpy case only very advanced users would want to interact with the awkward array directly. All others only access it through scirpy API calls (including a get function to retreive some variables) and there you can just set appropriate default to get it from a layer or obsm.

@grst
Copy link
Contributor

grst commented Dec 7, 2023

I want everything in layers but scverse is not there yet.

and why repeat the old mistake for new packages

@ivirshup
Copy link
Member

I would suggest you try working with it in layers for now too. Most scverse workflows assume the data is in X, but also most scverse workflows assume that X and layers contain matrix-like arrays with homogenous dtypes.

I would be interested in hearing how this goes.

@Zethson
Copy link
Member

Zethson commented Dec 11, 2023

I want everything in layers but scverse is not there yet.

and why repeat the old mistake for new packages

Because it builds upon scanpy which has the assumption that it works with X by default.
But yeah, I could probably pass a default layer everywhere and modify that behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants