Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can hub paper #2379

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Can hub paper #2379

wants to merge 5 commits into from

Conversation

canergen
Copy link
Member

@canergen canergen commented Jan 5, 2024

Adds minified mode to CondSCVI. Simplifies minification for new models. Adds add_latent_posterior mode which keeps count data. Fixes glitches in DE function.

  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed
  • Added type annotations to new arguments/methods/functions
  • Added an entry in the latest docs/release_notes/index.md file if fixing a bug or adding a new feature

Copy link

codecov bot commented Jan 5, 2024

Codecov Report

Attention: Patch coverage is 92.94118% with 6 lines in your changes missing coverage. Please review.

Project coverage is 88.94%. Comparing base (9b5f553) to head (0178d45).
Report is 150 commits behind head on main.

Files Patch % Lines
scvi/model/base/_base_model.py 88.00% 3 Missing ⚠️
scvi/module/_vaec.py 92.30% 2 Missing ⚠️
scvi/model/_scanvi.py 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2379      +/-   ##
==========================================
- Coverage   89.42%   88.94%   -0.48%     
==========================================
  Files         153      153              
  Lines       12566    12566              
==========================================
- Hits        11237    11177      -60     
- Misses       1329     1389      +60     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@martinkim0
Copy link
Contributor

If possible, could you separate this out into 4 PRs:

  • Minified mode for CondSCVI
  • Simplification of minification for new models
  • add_latent_posterior mode
  • Fixes in differential expression

Thanks

@martinkim0 martinkim0 added this to the scvi-tools 1.1.0 milestone Jan 22, 2024
@martinkim0 martinkim0 removed this from the scvi-tools 1.1.0 milestone Jan 29, 2024
Copy link
Contributor

@martinkim0 martinkim0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM overall! Just one major comment about switching the new minified mode to a new argument and some minor comments about performance/storage.

Since there's a lot of conflicts w/ the main branch, I can take care of transferring this to another PR and tagging you as co-author.

Comment on lines +493 to +528
def minify_adata(
self,
minified_data_type: MinifiedDataType = ADATA_MINIFY_TYPE.LATENT_POSTERIOR,
use_latent_qzm_key: str = "X_latent_qzm",
use_latent_qzv_key: str = "X_latent_qzv",
):
"""Minifies the model's adata.

Minifies the adata, and registers new anndata fields: latent qzm, latent qzv, adata uns
containing minified-adata type, and library size.
This also sets the appropriate property on the module to indicate that the adata is minified.

Parameters
----------
minified_data_type
How to minify the data. Currently only supports `latent_posterior_parameters` and `add_posterior_parameters`,.
If minified_data_type == `latent_posterior_parameters`:

* the original count data is removed (`adata.X`, adata.raw, and any layers)
* the parameters of the latent representation of the original data is stored
* everything else is left untouched
If minified_data_type == `add_posterior_parameters`:

* the original count data is kept (`adata.X`, adata.raw, and any layers)
* the parameters of the latent representation of the original data is stored
* everything else is left untouched
use_latent_qzm_key
Key to use in `adata.obsm` where the latent qzm params are stored
use_latent_qzv_key
Key to use in `adata.obsm` where the latent qzv params are stored

Notes
-----
The modification is not done inplace -- instead the model is assigned a new (minified)
version of the adata.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding, the only difference between "latent_posterior_parameters" and "add_posterior_parameters" is that the count data is preserved in the latter.

I'm thinking it might be less confusing for the end user if, instead of specifying a minified_data_type, we can directly let them pass in keep_count_data to distinguish between these two modes. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Will be cleaner. Inititally the idea was to have more modes. However, I can't imagine a different mode.

version of the adata.
"""
# TODO(adamgayoso): Add support for a scenario where we want to cache the latent posterior
if not ADATA_MINIFY_TYPE.__contains__(minified_data_type):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason this part uses __contains__ instead of a minified_data_type in ADATA_MINIFY_TYPE? I think the latter is more readable and should work with the NamedTuple.

@@ -481,6 +490,103 @@ def _check_if_trained(
else:
raise RuntimeError(message)

def minify_adata(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the plan is to put this method under BaseModelClass, we should also move over the other methods under BaseMinifiedModeModelClass and then delete it. What do you think?

all_zeros = csr_matrix(adata.X.shape)
layers = {layer: all_zeros.copy() for layer in adata.layers}
else:
all_zeros = adata.X
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on checking whether adata.X is a sparse matrix and if not, sparsify it (in the spirit of minification)?

layers = {layer: all_zeros.copy() for layer in adata.layers}
else:
all_zeros = adata.X
layers = adata.layers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may also make sense to only copy the count layer registered with the manager so we don't have to carry around the unused layers as part of the minified data.

@martinkim0 martinkim0 added this to the scvi-tools 1.2.0 milestone May 28, 2024
@martinkim0 martinkim0 removed this from the scvi-tools 1.2.0 milestone Jun 21, 2024
@martinkim0 martinkim0 added the P0 label Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants