Skip to content

dataset preparation scripts for "RNA Alternative Splicing Prediction with Discrete Compositional Energy Network"

Notifications You must be signed in to change notification settings

anyakors/CAPD_CHIL2021

Repository files navigation

Scripts are using RNA-seq data from the ARCHS4 database. The generated dataset is available here.

Download the following files necessary to generate the inputs and labels for the model to the ./data folder:

wget "https://s3.amazonaws.com/mssm-seq-matrix/human_matrix.h5"

wget --timestamping 'ftp://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz' -O hg38.fa.gz

gunzip hg38.fa.gz

Run the main bash script for inputs, labels generation:

generate_inputsLabels_trainTest.sh

About

dataset preparation scripts for "RNA Alternative Splicing Prediction with Discrete Compositional Energy Network"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published