Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add approximate KNN backend #759

Open
eroell opened this issue Jun 19, 2024 · 1 comment · May be fixed by #791
Open

Add approximate KNN backend #759

eroell opened this issue Jun 19, 2024 · 1 comment · May be fixed by #791
Assignees
Labels
enhancement New feature or request

Comments

@eroell
Copy link
Collaborator

eroell commented Jun 19, 2024

Description of feature

scanpy has a fast approximate KNN backend option via the transfomer argument for pp.neighbors, which we block at the moment.

Adding this can overcome a significant bottleneck for large datasets

@eroell eroell added the enhancement New feature or request label Jun 19, 2024
@eroell eroell self-assigned this Jun 19, 2024
@eroell
Copy link
Collaborator Author

eroell commented Jul 29, 2024

So in a bit more detail:

scanpy allows to use alternative knn backends, see here for a tutorial.

This makes it possible to compute kNN matrices with a default kNN implementation

import scanpy as sc

adata = sc.datasets.blobs(n_variables=1000, n_centers=4, n_observations=10000)
sc.pp.neighbors(adata)

or with faster backends

import scanpy as sc
from sklearn_ann.kneighbors.annoy import AnnoyTransformer

adata = sc.datasets.blobs(n_variables=1000, n_centers=4, n_observations=10000)
sc.pp.neighbors(adata, transformer=AnnoyTransformer(5))

In ehrapy, the transformer argument is not yet implemented:

While the default kNN implementation is available

import scanpy as sc

adata = sc.datasets.blobs(n_variables=1000, n_centers=4, n_observations=10000)
ep.pp.neighbors(adata)

using an sklearn-like Transformer is not supported; having this option can be a speedup for users with large datasets.

# this fails!
import ehrapy as ep
import scanpy as sc
from sklearn_ann.kneighbors.annoy import AnnoyTransformer

adata = sc.datasets.blobs(n_variables=1000, n_centers=4, n_observations=10000)
ep.pp.neighbors(adata, transformer=AnnoyTransformer(5)) # FAILS
TypeError: neighbors() got an unexpected keyword argument 'transformer'

@nicolassidoux nicolassidoux linked a pull request Aug 24, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants