Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'FieldAttr(Gene.symbol)' when trying to validate against public Gene #2096

Open
Zethson opened this issue Oct 22, 2024 · 0 comments
Assignees
Labels

Comments

@Zethson
Copy link
Member

Zethson commented Oct 22, 2024

Report

To reproduce:

!lamin init --storage ./run-tests --name run-tests --schema bionty

import lamindb as ln
import bionty as bt
import pandas as pd

gene_symbols = [
    'TP53', 'BRCA1', 'EGFR', 'PTEN', 'MYC', 'KRAS', 'CDKN2A', 'APC', 'SMAD4', 'RB1', 
    'VHL', 'P53', 'BRAF', 'ABL1', 'AKT1', 'PIK3CA', 'ALK', 'NRAS', 'ERBB2', 'KIT',
    'MET', 'CDK4', 'MDM2', 'FGFR1', 'FGFR3', # Real gene symbols
    'FAKE1', 'FAKE2', 'FAKE3', 'FAKE4', 'FAKE5', 'FAKE6', 'FAKE7', 'FAKE8', 'FAKE9', 'FAKE10', 
    'FAKE11', 'FAKE12', 'FAKE13', 'FAKE14', 'FAKE15', 'FAKE16', 'FAKE17', 'FAKE18', 'FAKE19', 'FAKE20',
    'FAKE21', 'FAKE22', 'FAKE23', 'FAKE24', 'FAKE25' # Non-existent gene symbols
]

df = pd.DataFrame(gene_symbols, columns=['Gene Symbol'])

bt.Gene.public().validate(df["Gene Symbol"], field=bt.Gene.symbol, organism="human")

results in:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/miniconda3/envs/lamindb/lib/python3.11/site-packages/pandas/core/indexes/base.py:3805, in Index.get_loc(self, key)
   [3804](https://file+.vscode-resource.vscode-cdn.net/home/zeth/PycharmProjects/lamindb/~/miniconda3/envs/lamindb/lib/python3.11/site-packages/pandas/core/indexes/base.py:3804) try:
-> [3805](https://file+.vscode-resource.vscode-cdn.net/home/zeth/PycharmProjects/lamindb/~/miniconda3/envs/lamindb/lib/python3.11/site-packages/pandas/core/indexes/base.py:3805)     return self._engine.get_loc(casted_key)
   [3806](https://file+.vscode-resource.vscode-cdn.net/home/zeth/PycharmProjects/lamindb/~/miniconda3/envs/lamindb/lib/python3.11/site-packages/pandas/core/indexes/base.py:3806) except KeyError as err:

File index.pyx:167, in pandas._libs.index.IndexEngine.get_loc()

File index.pyx:196, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:7081, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:7089, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'FieldAttr(Gene.symbol)'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[14], [line 1](vscode-notebook-cell:?execution_count=14&line=1)
----> [1](vscode-notebook-cell:?execution_count=14&line=1) bt.Gene.public().validate(df["Gene Symbol"], field=bt.Gene.symbol, organism="human")

File ~/miniconda3/envs/lamindb/lib/python3.11/site-packages/bionty/base/_public_ontology.py:396, in PublicOntology.validate(self, values, field, mute, **kwargs)
    [393](https://file+.vscode-resource.vscode-cdn.net/home/zeth/PycharmProjects/lamindb/~/miniconda3/envs/lamindb/lib/python3.11/site-packages/bionty/base/_public_ontology.py:393) if isinstance(values, str):
    [394](https://file+.vscode-resource.vscode-cdn.net/home/zeth/PycharmProjects/lamindb/~/miniconda3/envs/lamindb/lib/python3.11/site-packages/bionty/base/_public_ontology.py:394)     values = [values]
--> [396](https://file+.vscode-resource.vscode-cdn.net/home/zeth/PycharmProjects/lamindb/~/miniconda3/envs/lamindb/lib/python3.11/site-packages/bionty/base/_public_ontology.py:396) field_values = self._df[str(field)]
    [397](https://file+.vscode-resource.vscode-cdn.net/home/zeth/PycharmProjects/lamindb/~/miniconda3/envs/lamindb/lib/python3.11/site-packages/bionty/base/_public_ontology.py:397) return validate(
    [398](https://file+.vscode-resource.vscode-cdn.net/home/zeth/PycharmProjects/lamindb/~/miniconda3/envs/lamindb/lib/python3.11/site-packages/bionty/base/_public_ontology.py:398)     identifiers=values,
    [399](https://file+.vscode-resource.vscode-cdn.net/home/zeth/PycharmProjects/lamindb/~/miniconda3/envs/lamindb/lib/python3.11/site-packages/bionty/base/_public_ontology.py:399)     field_values=field_values,
   (...)
    [402](https://file+.vscode-resource.vscode-cdn.net/home/zeth/PycharmProjects/lamindb/~/miniconda3/envs/lamindb/lib/python3.11/site-packages/bionty/base/_public_ontology.py:402)     **kwargs,
    [403](https://file+.vscode-resource.vscode-cdn.net/home/zeth/PycharmProjects/lamindb/~/miniconda3/envs/lamindb/lib/python3.11/site-packages/bionty/base/_public_ontology.py:403) )

File ~/miniconda3/envs/lamindb/lib/python3.11/site-packages/pandas/core/frame.py:4102, in DataFrame.__getitem__(self, key)
   [4100](https://file+.vscode-resource.vscode-cdn.net/home/zeth/PycharmProjects/lamindb/~/miniconda3/envs/lamindb/lib/python3.11/site-packages/pandas/core/frame.py:4100) if self.columns.nlevels > 1:
   [4101](https://file+.vscode-resource.vscode-cdn.net/home/zeth/PycharmProjects/lamindb/~/miniconda3/envs/lamindb/lib/python3.11/site-packages/pandas/core/frame.py:4101)     return self._getitem_multilevel(key)
-> [4102](https://file+.vscode-resource.vscode-cdn.net/home/zeth/PycharmProjects/lamindb/~/miniconda3/envs/lamindb/lib/python3.11/site-packages/pandas/core/frame.py:4102) indexer = self.columns.get_loc(key)
   [4103](https://file+.vscode-resource.vscode-cdn.net/home/zeth/PycharmProjects/lamindb/~/miniconda3/envs/lamindb/lib/python3.11/site-packages/pandas/core/frame.py:4103) if is_integer(indexer):
   [4104](https://file+.vscode-resource.vscode-cdn.net/home/zeth/PycharmProjects/lamindb/~/miniconda3/envs/lamindb/lib/python3.11/site-packages/pandas/core/frame.py:4104)     indexer = [indexer]

File ~/miniconda3/envs/lamindb/lib/python3.11/site-packages/pandas/core/indexes/base.py:3812, in Index.get_loc(self, key)
   [3807](https://file+.vscode-resource.vscode-cdn.net/home/zeth/PycharmProjects/lamindb/~/miniconda3/envs/lamindb/lib/python3.11/site-packages/pandas/core/indexes/base.py:3807)     if isinstance(casted_key, slice) or (
   [3808](https://file+.vscode-resource.vscode-cdn.net/home/zeth/PycharmProjects/lamindb/~/miniconda3/envs/lamindb/lib/python3.11/site-packages/pandas/core/indexes/base.py:3808)         isinstance(casted_key, abc.Iterable)
   [3809](https://file+.vscode-resource.vscode-cdn.net/home/zeth/PycharmProjects/lamindb/~/miniconda3/envs/lamindb/lib/python3.11/site-packages/pandas/core/indexes/base.py:3809)         and any(isinstance(x, slice) for x in casted_key)
   [3810](https://file+.vscode-resource.vscode-cdn.net/home/zeth/PycharmProjects/lamindb/~/miniconda3/envs/lamindb/lib/python3.11/site-packages/pandas/core/indexes/base.py:3810)     ):
   [3811](https://file+.vscode-resource.vscode-cdn.net/home/zeth/PycharmProjects/lamindb/~/miniconda3/envs/lamindb/lib/python3.11/site-packages/pandas/core/indexes/base.py:3811)         raise InvalidIndexError(key)
-> [3812](https://file+.vscode-resource.vscode-cdn.net/home/zeth/PycharmProjects/lamindb/~/miniconda3/envs/lamindb/lib/python3.11/site-packages/pandas/core/indexes/base.py:3812)     raise KeyError(key) from err
   [3813](https://file+.vscode-resource.vscode-cdn.net/home/zeth/PycharmProjects/lamindb/~/miniconda3/envs/lamindb/lib/python3.11/site-packages/pandas/core/indexes/base.py:3813) except TypeError:
   [3814](https://file+.vscode-resource.vscode-cdn.net/home/zeth/PycharmProjects/lamindb/~/miniconda3/envs/lamindb/lib/python3.11/site-packages/pandas/core/indexes/base.py:3814)     # If we have a listlike key, _check_indexing_error will raise
   [3815](https://file+.vscode-resource.vscode-cdn.net/home/zeth/PycharmProjects/lamindb/~/miniconda3/envs/lamindb/lib/python3.11/site-packages/pandas/core/indexes/base.py:3815)     #  InvalidIndexError. Otherwise we fall through and re-raise
   [3816](https://file+.vscode-resource.vscode-cdn.net/home/zeth/PycharmProjects/lamindb/~/miniconda3/envs/lamindb/lib/python3.11/site-packages/pandas/core/indexes/base.py:3816)     #  the TypeError.
   [3817](https://file+.vscode-resource.vscode-cdn.net/home/zeth/PycharmProjects/lamindb/~/miniconda3/envs/lamindb/lib/python3.11/site-packages/pandas/core/indexes/base.py:3817)     self._check_indexing_error(key)

KeyError: 'FieldAttr(Gene.symbol)'

but instead doing

bt.Gene.import_from_source()
bt.Gene.validate(df["Gene Symbol"], field=bt.Gene.symbol, organism="human")

works.

Version information

0.76.14

@Zethson Zethson assigned sunnyosun and Zethson and unassigned sunnyosun Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants