finemap-uk-biobank

Code and scripts implementing genetic association and fine-mapping analyses of UK Biobank data.

Current fine-mapping analysis pipeline for UK Biobank data

All scripts implementing the data processing and analysis can be found in the scripts directory. The pipeline is currently illustrated for fine-mapping standing height in a region on chromosome 3 near gene ZBTB38, and can be adapted for other traits. The steps are as follows:

Prepare phenotype data. Run R script get_pheno.R to prepare a CSV file containing the phenotype and covariate data from the UK Biobank source files. For height, this step creates a new CSV file, height.csv, containing the phenotype and covariate data.
Prepare SNP data. Run R script get_geneatlas_snps.R to create a table containing independently computed summary statistics. These are used to validate our association results. For height, this generates a new CSV file, geneatlas-neale-height.csv, containing the association results. Alternatively, run get_region_snps.R to generate a text file containing the ids of the genetic variants within the selected region, accompanied by independently computed summary statistics, when available. This produces a new CSV file, region-variants-ZBTB38.csv, containing information about the selected genetic variants, such as base-pair positions, SNP variant ids, and association statistics.
Prepare genotype data and SuSiE sufficient statistics Run bash script prepare.region.sh to create an RDS file containing sufficient statistics using the genetype data and height. The script requires 4 input: chromosome number, start base-pair position, stop base-pair position, region name. For example,
```
scripts/prepare.region.sh 3 140.8e6 141.8e6 ZBTB38
```

Current analysis pipeline for UK Biobank Blood Cells data

Prepare phenotype data. Run R script get_bloodcells.R to prepare a CSV file containing the phenotype and covariate data from the UK Biobank source files. Run R script prepare_plink_pheno_bloodcells.R to prepare phenotype and covariate txt files for PLINK.
Run GWAS. Run plink_gwas.sh and gwas_results.sh to get GWAS results.
Get fine-mapping regions. Run R script get_bloodcells_trait_regions.R to get regions for each trait. Run R script get_bloodcells_regions.R to combine overlapping regions for each trait and across traits.
Prepare genotype data, LD and z scores for each region. Run get_bloodcells_region_genotype_ld.sh to get genotype data and LD for each region. Run R script get_bloodcells_zscores.R to get z scores and XtY for each region.

Name		Name	Last commit message	Last commit date
Latest commit History 242 Commits
analysis		analysis
code		code
data		data
docs		docs
output		output
scripts		scripts
.Rprofile		.Rprofile
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
TODO.txt		TODO.txt
_workflowr.yml		_workflowr.yml
finemap-uk-biobank.Rproj		finemap-uk-biobank.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

finemap-uk-biobank

Current fine-mapping analysis pipeline for UK Biobank data

Current analysis pipeline for UK Biobank Blood Cells data

About

Releases

Packages

Contributors 2

Languages

stephenslab/finemap-uk-biobank

Folders and files

Latest commit

History

Repository files navigation

finemap-uk-biobank

Current fine-mapping analysis pipeline for UK Biobank data

Current analysis pipeline for UK Biobank Blood Cells data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages