Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gnomAD AF & family-based vcf analysis #34

Open
XiaKwan opened this issue Oct 13, 2023 · 3 comments
Open

gnomAD AF & family-based vcf analysis #34

XiaKwan opened this issue Oct 13, 2023 · 3 comments

Comments

@XiaKwan
Copy link

XiaKwan commented Oct 13, 2023

Hi Xihao,

Thanks for your useful tool! And I've met some problems really confused me:

  1. I noticed that allele frequencies of gnomAD and 1000G are contained in FAVOR full database but not in FAVOR essential database. Are there any approaches that I could make use of these AF annotations in the STAAR procedure? (Like the variants with gnomAD AF < 5‰ will be given priority).

  2. I'm now working on a rare disease and my cohort contains ~700 trio families (only the child has the disease, parents are healthy), so my data is a vcf file with ~2000 samples. Does STAAR pipeline support family-based analysis? If so, how can I represent this family relationship while analyzing? (maybe in the pheno.csv?)

  3. Do I need to add more irrelevant healthy samples as control?

Thanks a lot!!!

@xihaoli
Copy link
Owner

xihaoli commented Dec 7, 2023

Hi @XiaKwan,

Thanks for your patience. Regarding your questions,

  1. Yes you can, and in this case, you may need to a) annotate your genotype data using the FAVOR full database through FAVORannotator, where you may want to update the scripts in specific steps (see this thread for more details); b) perform some transformations on the AF to Phred-scale, to be used as weights; and c) incorporate the AF annotations in the STAAR procedure.

  2. We are currently developing methods for analyzing family data (trio design) using STAAR/STAARpipeline. We will let you know when they are ready to use.

  3. Using common controls, rather than sequencing new controls for every study, can boost power to detect genotype–phenotype associations by increasing the sample size or providing a control set where none existed (see this review paper for more details), but keep in mind that if your case/control is imbalanced due to the inclusion of healthy samples as controls, a saddlepoint approximation may be needed to calibrate the association analysis p-values, which is enabled in the version 0.9.7 of STAAR and STAARpipeline packages.

Hope this helps.

Best,
Xihao

@bmuchmore
Copy link

Just curious, any update on trio-specific analysis using STAARpipeline? If you need an early tester using outside data, I would be happy to try it out.

-Brian

@xihaoli
Copy link
Owner

xihaoli commented Feb 13, 2024

Hi @bmuchmore,

Thank you very much for your interest! We are still developing methods for analyzing family data (trio design) using STAAR/STAARpipeline. We will let you know as soon as they are ready to be tested.

Best,
Xihao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants