Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt to new DH template #184

Merged
merged 79 commits into from
Oct 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
c37cd21
make JSON file required in vcf2gvf.py
miseminger Apr 29, 2024
9a4f86d
add new attribute, 'transcript_id', using locus_tag from JSON
miseminger Apr 29, 2024
407381c
add transcript_id in parentheses for HGVS nt names
miseminger Apr 29, 2024
ecd19c4
get rid of UserWarning for match groups
miseminger Apr 29, 2024
1a57d85
remove unused imports
miseminger Apr 29, 2024
047f085
include alias_protein_id in HGVS alias names, and make a new attribut…
miseminger Apr 29, 2024
e18dfa9
update nt_delins_regex
miseminger Apr 29, 2024
82f2959
add new attribute, 'gene_symbol', from 'gene' in the JSON file
miseminger Apr 30, 2024
418c65d
add 'gene_symbol' and 'protein_symbol' columns to mutation index
miseminger Apr 30, 2024
efcce0c
rename 'gene' column in mutation index to 'gene_name'
miseminger Apr 30, 2024
0b6f5e5
change 'protein symbol' column name to 'gene name' to match GVF and m…
miseminger Apr 30, 2024
c93085c
change 'protein symbol' column in df made from Pokay repo to 'gene na…
miseminger Apr 30, 2024
257d230
change 'gene' attribute to 'gene_name' in VCF
miseminger Jul 30, 2024
f998ed3
change 'gene' to 'gene_name' and add 'gene_symbol'
miseminger Jul 30, 2024
d3174fe
make all args required for log and index creation
miseminger Jul 30, 2024
0ff0245
update column names to match DH template, make mutation index optional
miseminger Jul 30, 2024
ed67400
fix formatting for non-index file
miseminger Jul 30, 2024
28ad84d
merge dfs on 'protein symbol' instead
miseminger Jul 30, 2024
50689b0
sort by nucleotide position
miseminger Jul 30, 2024
baf3cb0
update for new functional annotation format
miseminger Jul 30, 2024
ec976f8
update to match new functional annotation format
miseminger Jul 30, 2024
bfd06d3
rename 'alias_protein' to 'mat_pep'
miseminger Jul 30, 2024
51b3f94
add Pokay names as 'pokay_id'
miseminger Aug 8, 2024
2d72b02
add missing 'pokay_id' for ORF8
miseminger Aug 8, 2024
da12255
Add script
miseminger Sep 15, 2024
8551053
Add SARS-CoV-2 ontology terms JSONs
miseminger Sep 15, 2024
21d5b6a
Delete unwanted files
miseminger Sep 15, 2024
e6592d2
upload script
miseminger Sep 16, 2024
54bb4ac
update comment
miseminger Sep 16, 2024
8955e30
Add ontology terms to JSON, and remove pokay_id
miseminger Sep 17, 2024
4c66d82
add aliases from Sept 9 issue
miseminger Sep 17, 2024
b1df7ff
remove alias names that are the same as the gene name
miseminger Sep 17, 2024
9303eb8
Temporarily add Pokay name for PLpro to aliases
miseminger Sep 18, 2024
d13e2cc
Add RdRp back in to alias list
miseminger Sep 18, 2024
220dc08
Add protein_alias lists manually
miseminger Sep 18, 2024
c55d674
Add ontology terms to functional annotation file
miseminger Sep 18, 2024
5f8cc07
Update column names to match template
miseminger Sep 18, 2024
fc78ab0
Don't merge on mat_pep anymore
miseminger Sep 18, 2024
e0896c8
add BioRegistry prefix for doi
miseminger Sep 18, 2024
668c0dc
Add MPOX ROBOT table
miseminger Sep 19, 2024
bd967d5
update for MPOX
miseminger Sep 27, 2024
fba32a9
Add ontology term JSONs for MPOX
miseminger Sep 27, 2024
a4f40bb
Add strand orientation for MPOX
miseminger Oct 1, 2024
5166d9e
add gene and strand orientation for MPOX
miseminger Oct 1, 2024
9a53e60
make code useful for SC2 or MPOX
miseminger Oct 1, 2024
2c62b20
Update SARS-CoV-2 JSONs with gene and strand orientations:
miseminger Oct 1, 2024
3043b5a
Add gene and strand orientation
miseminger Oct 1, 2024
363c410
Add new functional annotation file
miseminger Oct 1, 2024
9490474
add gene and strand orientation for MPOX, and change unknown publicat…
miseminger Oct 4, 2024
7671210
add MPOX functional annotations in DH template format
miseminger Oct 4, 2024
94a67bb
remove mutation index rows that don't have a nucleotide mutation
miseminger Oct 22, 2024
02ad2e1
use pd.explode() in unnest_multi()
miseminger Oct 22, 2024
48d068f
workaround unmatched list lengths to work with pd.explode()
miseminger Oct 22, 2024
dc7932d
Adapt vcf2gvf to new JSON keys
miseminger Oct 23, 2024
2734f7c
align attribute keys with DH template
miseminger Oct 24, 2024
8f415b5
align attribute keys with DH template
miseminger Oct 24, 2024
3bbe34d
adapt to new JSON keys
miseminger Oct 24, 2024
8b6307c
archive copy of gvf2tsv.py
miseminger Oct 24, 2024
58df48a
Produce 1 TSV from GVF, ignoring clades
miseminger Oct 24, 2024
85a6160
Update to match new GVF keys
miseminger Oct 24, 2024
ac59ae3
add gene_orientation and strand_orientation to gvf
miseminger Oct 24, 2024
49bbdba
change VP37 to be alias of OPG057 protein, and add to JSON
miseminger Oct 24, 2024
55186f2
remove VP37 mentions from MPXVgp025
miseminger Oct 24, 2024
977fc0f
add protein_alias key to all CDS entries
miseminger Oct 24, 2024
179844e
add indent to fix bug
miseminger Oct 24, 2024
496ca37
replace 'pokay' with 'template' for generalizability
miseminger Oct 24, 2024
a214a34
Rename functional annotation script
miseminger Oct 24, 2024
3ed9712
take 'doi:' off saved dois
miseminger Oct 24, 2024
673ad82
add definitely the latest version
miseminger Oct 25, 2024
d689609
match new DH format
miseminger Oct 25, 2024
8e68d0c
Delete assets/virus_functionalAnnotation/NC_045512.2/Pokay_functional…
miseminger Oct 25, 2024
41f8cc5
Delete assets/virus_functionalAnnotation/NC_045512.2/Pokay_functional…
miseminger Oct 25, 2024
4be0584
add 'clade' argument and attribute
miseminger Oct 25, 2024
33d97c7
Merge branch 'madeline-1' of github.com:cidgoh/nf-ncov-voc into madel…
miseminger Oct 25, 2024
422f97b
add 'functional_annotation_resource' attribute
miseminger Oct 25, 2024
b72ebcc
add functional annotation columns as GVF attributes
miseminger Oct 25, 2024
ce13700
change 'measured_variant_functional_effect_description' attribute to …
miseminger Oct 25, 2024
626dae7
adapt to new DH template format
miseminger Oct 25, 2024
377ad62
update column names
miseminger Oct 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view

This file was deleted.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
"organism" "reference accession" "reference database name" "nucleotide position" "original mutation description" "nucleotide mutation" "amino acid mutation" "amino acid mutation alias" "gene name" "gene symbol" "gene orientation" "strand orientation" "protein name" "protein symbol" "measured variant functional effect" "inferred variant functional effect" "viral life cycle functional effect" "measured variant functional effect description" "CVX code" "DrugBank Accession Number" "Antibody Registry ID" "author" "publication year" "URL" "DOI" "PMID" "peer review status" "curator" "mutation functional annotation resource"
"Monkeypox virus" "NC_063383.1" "RefSeq" "" "T220A" "" "" "" "" "" "" "" "" "" "drug resistence" "" "" "May represent novel Tecovirimat-resistence mutation as likely at the interaction interface for the drug and observed in two site samples (0.44 and 0.49 frequency, culture contaminated and culture negative respectively) from an HIV-positive patient treated with Tecovirimat." "" "" "" "Garrigues" "2023" "" "" "" "unknown" "Paul Gordon" "Pokay"
"Monkeypox virus" "NC_063383.1" "RefSeq" "" "T220I" "" "" "" "" "" "" "" "" "" "drug resistence" "" "" "May represent novel Tecovirimat-resistence mutation as likely at the interaction interface for the drug and observed in two site samples (0.44 and 0.49 frequency, culture contaminated and culture negative respectively) from an HIV-positive patient treated with Tecovirimat." "" "" "" "Garrigues" "2023" "" "" "" "unknown" "Paul Gordon" "Pokay"
"Monkeypox virus" "NC_063383.1" "RefSeq" "" "T245I" "" "" "" "" "" "" "" "" "" "drug resistence" "" "" "May represent novel Tecovirimat-resistence mutation as likely at the interaction interface for the drug and observed in a single site sample (0.12 frequency, culture positive with EC50 0.3820uM) from an HIV-positive patient treated with Tecovirimat, may be dragged along by D294V (known from literature) from same sample with 0.91 frequency." "" "" "" "Garrigues" "2023" "" "" "" "unknown" "Paul Gordon" "Pokay"
"Monkeypox virus" "NC_063383.1" "RefSeq" "" "D294V" "" "" "" "" "" "" "" "" "" "drug resistence" "" "" "May represent novel Tecovirimat-resistence mutation as likely at the interaction interface for the drug and observed in a single site sample (0.91 frequency, culture positive with EC50 0.3820uM) from an HIV-positive patient treated with Tecovirimat." "" "" "" "Garrigues" "2023" "" "" "" "unknown" "Paul Gordon" "Pokay"
"Monkeypox virus" "NC_063383.1" "RefSeq" "" "A265D" "" "" "" "" "" "" "" "" "" "drug resistence" "" "" "May represent novel Tecovirimat-resistence mutation as likely at the interaction interface for the drug and observed in a single site sample (0.5 frequency, culture contaminated) from an HIV-positive patient treated with Tecovirimat. Novel T220A also found at similar frequency (0.44) in the same sample, as well as A295E which is known from the literature (0.51 frequency)." "" "" "" "Garrigues" "2023" "" "" "" "unknown" "Paul Gordon" "Pokay"
"Monkeypox virus" "NC_063383.1" "RefSeq" "" "T289A" "" "" "" "" "" "" "" "" "" "drug resistence" "" "" "May represent novel Tecovirimat-resistence mutation as likely at the interaction interface for the drug and observed in a single site sample (0.41 frequency, culture data pending) from an HIV-positive patient treated with Tecovirimat." "" "" "" "Garrigues" "2023" "" "" "" "unknown" "Paul Gordon" "Pokay"
Loading