Single sample RNA expression analysis is performed using the analyze_rna.php
script. Please have a look at the help using:
php megSAP/src/Pipelines/analyze_rna.php --help
The main parameters that you have to provide are:
folder
- The output folder containing all result files.name
- Basename/prefix for all output files.
In addition, you may want to specify:
steps
- Analysis steps to perform. Usema,rc,an,fu
to perform mapping, read counting, annotation and fusion detection.system
- The processing system INI file.
If all data to analyze resides in a sample folder as produced by Illumina's bcl2fastq tool, the whole analysis is performed with one command, for example like this:
php megSAP/src/Pipelines/analyze_rna.php \
-folder Sample_X_01 -name X_01 \
-system truseq.ini -steps ma,rc,an
In the example above, the configuration of the pipeline is done using the
truseq.ini
file, which contains all necessary information (see processing
system INI file).
The tools used in this analysis pipline can are described here.
A complete list of all tools and databases used in megSAP and when they were last updated can be found here.
After the analysis, these files are created in the output folder:
- mapped reads in BAM format
- raw read counts, in featureCounts tabular output format
- normalized read counts, annotated with gene symbols
- QC data in qcML format, which can be opened with a web browser
To use the RNA expression pipeline with other genomes, you need to provide
- the genome FASTA file, e.g.
megSAP/data/genomes/CustomGenome.fa
- the STAR genome index, e.g.
megSAP/data/genomes/STAR/CustomGenome/
- the gene annotation file in Ensembl-like GTF format, e.g.
megSAP/data/dbs/gene_annotations/CustomGenome.gtf
The genome can by specified in the processing system INI file.