The microPIECE
(microRNA pipeline enhanced by CLIP experiments) takes the AGO-CLIP data from a speciesA and transfers it to a speciesB. Given a set of miRNAs from speciesB it then predicts their targets on the transfered CLIP regions.
For the minimal workflow it needs a genome file, as well as its annotation file in GFF format for speciesA and speciesB. For speciesA at least one AGO-CLIP dataset is needed and speciesB needs a set of miRNAs for the target prediction. For the full workflow, a set of smallRNA-sequencing data is additionally needed and a set of non-coding RNAs can be provided as filter. The pipeline uses the smallRNA data for the mining of novel microRNAs and the completion of the given miRNA dataset, if needed. It further performs expression calculation, isoform detection, genomic loci identification and orthology determination.
- bwa (0.7.12-r1039)
- samtools (1.4.1)
- bedtools (2.27.1)
- bowtie (1.1.2)
- miRDeep2 (2.0.0.8)
- miraligner (1.2.4a)
- NCBI-BLAST+ (2.2.31+)
- Proteinortho (5.16b)
- Cutadapt (1.9.1)
- gmap/gsnap (2018-02-12)
- Piranha (1.2.1)
- miranda (aug2010)
- Getopt::Long (2.5)
- File::Temp (0.2304)
- RNA::HairpinFigure (0.141212)
- Pod::Usage (1.69)
- Log::Log4perl (1.49)
Please install the dependencies and run
git clone -b v1.5.2 https://github.com/microPIECE-team/microPIECE.git
or download the latest release as *.tar.gz
or *.zip
file:
curl -L -o microPIECE_v1.5.2.tar.gz https://github.com/microPIECE-team/microPIECE/archive/v1.5.2.tar.gz
# or
curl -L -o microPIECE_v1.5.2.zip https://github.com/microPIECE-team/microPIECE/archive/v1.5.2.zip
We also provide microPIECE
as DOCKER image. We tested the image on Ubuntu, Debian and MacOS. For the latter one, the Piranha
command make test
fails during the build, but when entering the container, the test succeds. Therefore, we temporarily excluded this statement.
Branch | Size | Layers | Comment |
---|---|---|---|
Latest release v1.5.2 | |||
docker pull micropiece/micropiece:v1.5.2
git clone https://github.com/microPIECE-team/microPIECE-testset.git testset
docker run -it --rm -v $PWD:/data micropiece/micropiece:v1.5.2 microPIECE.pl \
--genomeA testset/NC_035109.1_reduced_AAE_genome.fa \
--genomeB testset/NC_007416.3_reduced_TCA_genome.fa \
--annotationA testset/NC_035109.1_reduced_AAE_genome.gff \
--annotationB testset/NC_007416.3_reduced_TCA_genome.gff \
--clip testset/SRR5163632_aae_clip_reduced.fastq,testset/SRR5163633_aae_clip_reduced.fastq,testset/SRR5163634_aae_clip_reduced.fastq \
--clip testset/SRR5163635_aae_clip_reduced.fastq,testset/SRR5163636_aae_clip_reduced.fastq,testset/SRR5163637_aae_clip_reduced.fastq --adapterclip GTGTCAGTCACTTCCAGCGG \
--overwrite \
--smallrnaseq a=testset/tca_smallRNAseq_rna_contaminated.fastq \
--adaptersmallrnaseq3=TGGAATTCTCGGGTGCCAAGG \
--adaptersmallrnaseq5 GTTCAGAGTTCTACAGTCCGACGATC \
--filterncrnas testset/TCA_all_ncRNA_but_miR.fa \
--speciesB tca 2>&1 | tee out.log
- minimal workflow
- speciesA genome
- speciesA GFF
- speicesA AGO-CLIP-sequencing library/libraries
- speciesB genome
- speciesB GFF
- speciesB microRNA set (mature)
- full workflow (in addition to the minimal workflow)
- speciesB non-codingRNA set (without miRNAs)
- speciesB microRNA set (precursor)
- speciesB smallRNA-sequencing library/libraries
-
--version|-V
version of this pipeline
-
--help|-h
prints a helpful help message
-
--genomeA
and--genomeB
Genome of the species with the CLIP data (species A,
--genomeA
) and the genome of the species where we want to predict the miRNA targets (species B,--genomeB
) -
--gffA
and--gffB
Genome feature file (GFF) of the species with the CLIP data (species A,
--gffA
) and the GFF of the species where we want to predict the miRNA targets (species B,--gffB
) -
--clip
Comma-separated CLIP-seq .fastq files in Format
--clip con1_rep1_clip.fq,con1_rep2_clip.fq,con2_clip.fq # OR --clip con1_rep1_clip.fq --clip con1_rep2_clip.fq --clip con2_clip.fq
-
--adapterclip
Sequencing-adapter of CLIP reads
-
--smallrnaseq
Comma-separated smallRNA-seq FASTQ files, initialized with 'condition=' in Format
--smallrnaseq con1=A.fastq,B.fastq --smallrnaseq con2=C.fq # OR --smallrnaseq con1=A.fastq --smallrnaseq con1=B.fastq --smallrnaseq con2=C.fq
-
--adaptersmallrnaseq5
and--adaptersmallrnaseq3
5' adapter of smallRNA-seq reads (
--adaptersmallrnaseq5
) and for 3' end (--adaptersmallrnaseq3
) -
--filterncrnas
Multi-fasta file of ncRNAs to filter smallRNA-seq reads. Those must not contain miRNAs.
-
--threads
Number of threads to be used
-
--overwrite
set this parameter to overwrite existing files
-
--testrun
sets this pipeline to testmode (accounting for small testset in piranha). This option should not be used in real analysis!
-
--out
output folder
-
--mirna
miRNA set, if set, mining is disabled and this set is used for prediction
-
--speciesBtag
Three letter code of species where we want to predict the miRNA targets (species B,
--speciesBtag
). -
--mirbasedir
The folder specified by
--mirbasedir
is searched for the filesorganisms.txt.gz
,mature.fa.gz
, andhairpin.fa.gz
. If the files are not exist, they will be downloaded. -
--tempdir
The folder specified by
--tempdir
is used for temporary files. The default value istmp/
inside the output folder specified by the--out
parameter. -
--piranahbinsize
Sets the
Piranah
bin size and has a default value of30
. -
--CLIPminProcessLength
and--CLIPmaxProcessLength
Both are integer values and set the lower and upper limit for the processed peak length. Peaks having a width below
--CLIPminProcessLength
or above--CLIPmaxProcessLength
are ignored. Default values are 22 for--CLIPminProcessLength
and 50 for--CLIPmaxProcessLength
. -
--CLIPminlength
An integer value specifying the minimal length of a CLIP peak to be processed. Default value is 0, meaning no minimal length for CLIP peaks.
-
pseudo mirBASE dat file:
final_mirbase_pseudofile.dat
A pseudo mirBASE dat file containing all precursor sequences with their named mature sequences and their coordinates. It only contain the fields:
ID
FH
andFT
SQ
-
mature miRNA set:
mature_combined_mirbase_novel.fa
mature microRNA set, containing novels and miRBase-completed (if mined), together with the known miRNAs from miRBase
-
precursor miRNA set:
hairpin_combined_mirbase_novel.fa
precursor microRNA set, containing novels (if mined), together with the known miRNAs from miRBase
-
mature miRNA expression per condition:
miRNA_expression.csv
Semicolon-separated file containing:
-
rpm
-
condition
-
miRNA
-
-
orthologous prediction file:
miRNA_orthologs.csv
tab-separated file containing:
-
query_id
-
subject_id
-
identity
-
alignment length
-
number mismatches
-
number gap openings
-
start position inside query
-
end position inside query
-
start position inside subject
-
end position inside subject
-
evalue
-
bitscore
-
aligned query sequence
-
aligned subject sequence
-
length query sequence
-
length subject sequence
-
coverage for query sequence
-
coverage for subject sequence
-
-
miRDeep2 mining result in HTML/CSV
mirdeep_output.html/csv
the standard output HTML/CSV file of miRDeep2
-
ISOMIR prediction files:
isomir_output_CONDITION.csv
semincolon delimited file containing:
-
mirna
-
substitutions
-
added nucleotids on 3' end
-
nucleotides at 5' end different from the annonated sequence
-
nucleotides at 3' end different from the annonated sequence
-
sequence
-
rpm
-
condition
-
-
genomics location of miRNAs:
miRNA_genomic_position.csv
tab delimited file containing:
-
miRNA
-
genomic contig
-
identify
-
length
-
miRNA-length
-
number mismatches
-
number gapopens
-
miRNA-start
-
miRNA-stop
-
genomic-start
-
genomic-stop
-
evalue
-
bitscore
-
-
all library support-level target predictions:
*_miranda_output.txt
miranda output, reduced to the lines, starting with > only
-
all library support-level CLIP transfer .bed files:
*transfered_merged.bed
bed-file of the transferred CLIP-regions in speciesB transcriptome
Feel free to test the pipeline with our microPIECE-testset :
git clone https://github.com/microPIECE-team/microPIECE-testset.git
-
minimal workflow
- speciesA genome AAE genome : ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/002/204/515/GCF_002204515.2_AaegL5.0/GCF_002204515.2_AaegL5.0_genomic.fna.gz
- speciesA GFF AAE GFF : ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/002/204/515/GCF_002204515.2_AaegL5.0/GCF_002204515.2_AaegL5.0_genomic.gff.gz
- speicesA AGO-CLIP-sequencing library/libraries AGO-CLIP of AAE
- speciesB genome TCA genome : ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/002/335/GCF_000002335.3_Tcas5.2/GCF_000002335.3_Tcas5.2_genomic.fna.gz
- speciesB GFF TCA GFF : ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/002/335/GCF_000002335.3_Tcas5.2/GCF_000002335.3_Tcas5.2_genomic.gff.gz
- speciesB microRNA set (mature) TCA mature miRNAs
-
full workflow (in addition to the minimal workflow)
- speciesB microRNA set (precursor) TCA stem-loop miRNAs
- speciesB smallRNA-sequencing library/libraries TCA smallRNA-sequencing data
- speciesB non-codingRNA set (without miRNAs - to filter smRNA-seq data) (OPTIONAL)
Complete list of open issues is available on Github-Issues.
Please report any new issues ad new Github-Issue.
-
scheduled for next release
No features planned
-
v1.5.2 (2018-04-13)
Refactoring of
CLIP_merge_bed_files.pl
to reduce memory footprint by a factor of 10x (Fixes #174)Refactoring of Piranha run to support multithreading (Fixes #177)
Fixing copy process of final files (Fixes #184)
Setting default bin size for Piranha to 30 (Fixes #178)
-
v1.5.1 (2018-04-11)
Added optimized pre-binning step for
Piranha
(Fixes #132) -
v1.5.0 (2018-04-10)
Removing additional length cutoff during CLIP transfer (Fixes #153)
Add command line options
--CLIPminProcessLength
,--CLIPmaxProcessLength
, and--CLIPminlength
for length limits used inrun_CLIP_process
andrun_CLIP_clip_mapper
steps enabling processing of peaks with user defined widths (Fixes #145)Dynamic naming of output files based on minlength variable in
run_CLIP_clip_mapper
(Fixes #146)Correct calculation of length of a bed feature and moving
scripts/CLIP_bedtool_discard_sizes.pl
intolib/microPIECE.pm
(Fixes [#147](https://github.com/microPIECE-team/microPIECE/iss ues/147))Add an optimized pre-binning step with pseudocounts for bins covered by an exon as preparation for
Piranha
(Fixes #132 and #155)This version was accepted by The Journal of Open Source Software (Review issue #616)
-
v1.4.0 (2018-03-31)
Copying pseudo mirBASE dat file
final_mirbase_pseudofile.dat
into output folder (Fixes #131)Corrected
RNA::HairpinFigure
output (Fixes #137)Fix the requirement of an accession inside mirBASE dat file (Fixes #134)
Avoiding error message while copying the out file for genomic location into base folder (Fixes #117)
-
v1.3.0 (2018-03-29)
Creating all structures on the fly using pseudo-mirBASE-dat as input.
Using
miRNA.dat
from mirBASE as source for mature/precursor sequence and relationship (Fixes #127)Fix of division-by-zero bug for empty mapping files (Fixes #118)
Fix of typo in
--piranhabinsize
option (Fixes #116) -
v1.2.3 (2018-03-26)
Fix transformation of precursor sequences based on mirbase #22 precursor sequences with a single mature. (Fixes L<#109|https://github.com/microPIECE-team/microPIECE/issues/109>)
-
v1.2.2 (2018-03-23)
Improved collision detection for newly identified miRNAs avoiding crashed caused by genomic copies. (Fixes #105)
-
v1.2.1 (2018-03-23)
Enables stable numbering for newly identified miRNAs based on their precursor and mature sequences (Fixes #101)
-
v1.2.0 (2018-03-22)
We are using miraligner which requires a java version 1.7, but 1.8 was installed by default. This was fixed by switching to v1.4 of the docker base image. Additionally, miraligner requires fix filenames for its databases. Therefore, the version v1.2.0 solved miraligner related bugs and reenables the isomir detection. (Fixes #97 and #98)
-
v1.1.0 (2018-03-12)
Add isomir detection and copy the final genomic location file to the output filter (Fixes #34)
-
v1.0.7 (2018-03-08)
Piranha was lacking of a bin_size parameter. Added parameter
--piranahbinsize
with a default value of20
(Fixes #66) -
v1.0.6 (2018-03-08)
Added parameter
--mirbasedir
and--tempdir
to support local mirbase files and relocation of directory for temporary files (Fixes #66, #73, and #76) -
v1.0.5 (2018-03-07)
Update of documentation and correct spelling of
--mirna
parameter -
v1.0.4 (2018-03-07)
Fixes complete mature in final output (Fixes #69)
-
v1.0.3 (2018-03-06)
Add tests for perl scripts in script folder which ensure the correct handling of BED stop coordinates (Fixes #65)
-
v1.0.2 (2018-03-05)
Fixes the incorrect sorting of BED files, result was correct, but sorting was performed in the wrong order. (Fixes #63)
-
v1.0.1 (2018-03-05)
Fix an error conserning BED file handling of start and stop coordinates. (Fixes #59)
-
v1.0.0 (2018-03-05)
is archived as and submitted to The Journal of Open Source Software. -
v0.9.0 (2018-03-05)
This program is released under GPLv2. For further license information, see LICENSE.md shipped with this program. Copyright(c)2018 Daniel Amsel and Frank Förster (employees of Fraunhofer Institute for Molecular Biology and Applied Ecology IME) All rights reserved.
- Daniel Amsel <[email protected]>
- Frank Förster <[email protected]>
Project source code on Github Docker image on DockerHub Travis continuous integration page Test coverage reports