Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASE for Synonymous + Non-synonymous Variants #93

Open
JPFinnigan opened this issue Mar 9, 2018 · 3 comments
Open

ASE for Synonymous + Non-synonymous Variants #93

JPFinnigan opened this issue Mar 9, 2018 · 3 comments

Comments

@JPFinnigan
Copy link

The current implementation outputs VAR and REF read counts for non-synonymous variants only. I would be great, as a user to have the option to output read-support counts for all variants. I've used Varlens to get around this current limitation in Isovar, but that route has it's own limitations which I'll discuss below.

Per a conversation w/ Alex:

Hey John,

I looked a little bit and found that on line 67 of isovar.effect_prediction I'm doing the following:

nonsynonymous_coding_effects = effects.drop_silent_and_noncoding()

Do you want me to make this optional for the purposes of counting variant reads and assembling variant sequences?

If so, can you file an issue on the repo? https://github.com/openvax/isovar/issues

Eliminating the hard filter for non-synonymous variants affords the user a bit of added flexibility, but would necessitate additional descriptors for each variant to enable filtering to variant classes of interest. I think two additional columns, "Effect_Class" and "Effect" would solve the filtering problem and make working with the isovar output relatively easy.

I believe two columns may be required largely because of my experience working with Varlens. The Varlens output has an "effect" column that describes the specific coding effect of a variant (e.g. p.G12D). However, I've found this to be difficult to work-with in practice as AFAIK there is no easy way to parse non-synonymous SNVs ("p.G12D"), in-frame INDEL ("p.HDVPS811del") and framshifts (p.A117fs). It may be better to have separate columns for effect class ("Exon, non-synonymous") separated from the descriptor of the specific effect (p.G12D).

Ideally an effect class column would provide the same filtering as the current hard-coded isovar filters, or use the standard Ensembl classes.

  • 3' UTR
  • 5' UTR
  • exonic-splice-site
  • Incomplete
  • Intergenic
  • Intragenic
  • Intronic
  • intronic-splice-site
  • non-coding-transcript
  • Silent
  • splice-acceptor
  • Splice-donor
  • Stop-loss
  • Stop-gain
  • Exon, Non-synonymous

The specific use case I have in mind is counting the number of variants, the number of variants with RNA read-support; and finally how the latter category breaks down by variant type (e.g. SNV, SNV w/ coding effect, Indel, etc).

@iskandr
Copy link
Contributor

iskandr commented Mar 12, 2018

Hey @JPFinnigan,

This could work how you'd like with very few changes. Do you, by any chance, have a test dataset of a few variants and their supporting RNA reads, along with expected counts and annotations? If not, I can make that myself but it would speed things up a little bit.

@JPFinnigan
Copy link
Author

Hey @iskandr , that's shouldn't be a problem. I'll send the materials to you tomorrow morning.

Take care! And thank you for working on this

@iskandr
Copy link
Contributor

iskandr commented May 1, 2018

Hey @JPFinnigan -- sorry it took me a long time to get to this, starting to look at it now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants