Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-cancer colorectal polyps: HTAN Vanderbilt dataset. Chen et al. Cell 2021 #1935

Merged
merged 9 commits into from
Dec 27, 2023

Conversation

rmadupuri
Copy link
Collaborator

@rmadupuri rmadupuri commented Oct 6, 2023

Fixes #1915
Testing Instance:
Triage Portal: https://triage.cbioportal.mskcc.org/study/summary?id=crc_hta11_htan_2021
Private Portal: https://private.cbioportal.mskcc.org/study/summary?id=crc_hta11_htan_2021

Curation and transformation of Pre-cancer HTAN CRC Vanderbilt Dataset:

Data collection:

Sample size selection

  • Discovery set samples were used to generate the study (27 polyps). 15 samples were whole exome sequenced. All 27 samples went though scRNA-seq exp analysis.
  • We can extend the cohort with Validation set samples if needed.

Clinical data

  • Patient-Level Data: Table S1, Participants tab in HTAN Portal
  • Sample-Level Data:Table S1, Biospecimens tab in HTAN Portal

Mutation data

  • The Level 3 Bulk DNA filtered data files were obtained from the Vanderbilt team.
  • Variants were annotated using Genome Nexus.

scRNA-seq data

  • The Discovery set h5ad files for both epithelial and non-epithelial cells were used from cellxgene
    • Files used: Discovery (DIS) set of human colorectal tumor: Epithelial && VAL and DIS datasets: Non-Epithelial
  • The absolute and relative cell frequencies in Generic Assay format were calculated
  • The pseudo bulk RNA expression counts per sample was calculated from scRNA-seq by averaging the values across the cells.
  • The script to generate the absolute, relative cell freq and pseudo bulk RNA-seq data from h5ad files : https://gist.github.com/rmadupuri/ffcdd2c753e28fd057a9c4bebf0fd9ca
  • Zscores were calculated on pseudo bulk RNA by log transforming the data.

Imaging data

  • H&E data was available for 24 samples: H&E tab
  • MxIF images were available for 23 samples: MxIF tab. Multiple images were available per sample and as we do not support this, the images were split to multiple tabs for now as MxIF Image 1, 2, 3..

@ritikakundra ritikakundra merged commit cea0efb into master Dec 27, 2023
2 checks passed
@rmadupuri rmadupuri deleted the htan_crc branch June 10, 2024 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants