This repository processes NCBI gene information, extracting data for protein-coding genes, and saves it in a CSV file.
conda env create -f environment.yml
conda activate human-gene
bash download.sh
python process.py
- Homo_sapiens.gene_info.gz
- The file from NCBI is a compressed archive containing detailed information on genes.
- protein_coding_gene.csv
- The output file is a CSV containing extracted data on protein-coding genes from the NCBI dataset.