microPIECE

The microPIECE (microRNA pipeline enhanced by CLIP experiments) takes the AGO-CLIP data from a speciesA and transfers it to a speciesB. Given a set of miRNAs from speciesB it then predicts their targets on the transfered CLIP regions.

For the minimal workflow it needs a genome file, as well as its annotation file in GFF format for speciesA and speciesB. For speciesA at least one AGO-CLIP dataset is needed and speciesB needs a set of miRNAs for the target prediction. For the full workflow, a set of smallRNA-sequencing data is additionally needed and a set of non-coding RNAs can be provided as filter. The pipeline uses the smallRNA data for the mining of novel microRNAs and the completion of the given miRNA dataset, if needed. It further performs expression calculation, isoform detection, genomic loci identification and orthology determination.

Status

Required Software

bwa (0.7.12-r1039)
samtools (1.4.1)
bedtools (2.27.1)
bowtie (1.1.2)
miRDeep2 (2.0.0.8)
miraligner (1.2.4a)
NCBI-BLAST+ (2.2.31+)
Proteinortho (5.16b)
Cutadapt (1.9.1)
gmap/gsnap (2018-02-12)
Piranha (1.2.1)
miranda (aug2010)

Required Perl modules

Getopt::Long (2.5)
File::Temp (0.2304)
RNA::HairpinFigure (0.141212)
Pod::Usage (1.69)
Log::Log4perl (1.49)

Installation

Please install the dependencies and run

git clone -b v1.5.2 https://github.com/microPIECE-team/microPIECE.git

or download the latest release as *.tar.gz or *.zip file:

curl -L -o microPIECE_v1.5.2.tar.gz https://github.com/microPIECE-team/microPIECE/archive/v1.5.2.tar.gz
# or
curl -L -o microPIECE_v1.5.2.zip https://github.com/microPIECE-team/microPIECE/archive/v1.5.2.zip

Docker

We also provide microPIECE as DOCKER image. We tested the image on Ubuntu, Debian and MacOS. For the latter one, the Piranha command make test fails during the build, but when entering the container, the test succeds. Therefore, we temporarily excluded this statement.

Information about the docker images:

Branch	Size	Layers	Comment
			Latest release v1.5.2

docker pull micropiece/micropiece:v1.5.2
git clone https://github.com/microPIECE-team/microPIECE-testset.git testset
docker run -it --rm -v $PWD:/data micropiece/micropiece:v1.5.2 microPIECE.pl   \
  --genomeA testset/NC_035109.1_reduced_AAE_genome.fa  \
  --genomeB testset/NC_007416.3_reduced_TCA_genome.fa   \
  --annotationA testset/NC_035109.1_reduced_AAE_genome.gff   \
  --annotationB testset/NC_007416.3_reduced_TCA_genome.gff   \
  --clip testset/SRR5163632_aae_clip_reduced.fastq,testset/SRR5163633_aae_clip_reduced.fastq,testset/SRR5163634_aae_clip_reduced.fastq   \
  --clip testset/SRR5163635_aae_clip_reduced.fastq,testset/SRR5163636_aae_clip_reduced.fastq,testset/SRR5163637_aae_clip_reduced.fastq --adapterclip GTGTCAGTCACTTCCAGCGG  \
  --overwrite \
  --smallrnaseq a=testset/tca_smallRNAseq_rna_contaminated.fastq \
  --adaptersmallrnaseq3=TGGAATTCTCGGGTGCCAAGG \
  --adaptersmallrnaseq5 GTTCAGAGTTCTACAGTCCGACGATC \
  --filterncrnas testset/TCA_all_ncRNA_but_miR.fa \
  --speciesB tca 2>&1 | tee out.log

Usage

Input data

minimal workflow
- speciesA genome
- speciesA GFF
- speicesA AGO-CLIP-sequencing library/libraries
- speciesB genome
- speciesB GFF
- speciesB microRNA set (mature)
full workflow (in addition to the minimal workflow)
- speciesB non-codingRNA set (without miRNAs)
- speciesB microRNA set (precursor)
- speciesB smallRNA-sequencing library/libraries

PARAMETERS

--version|-V

version of this pipeline
--help|-h

prints a helpful help message
--genomeA and --genomeB

Genome of the species with the CLIP data (species A, --genomeA) and the genome of the species where we want to predict the miRNA targets (species B, --genomeB)
--gffA and --gffB

Genome feature file (GFF) of the species with the CLIP data (species A, --gffA) and the GFF of the species where we want to predict the miRNA targets (species B, --gffB)

--clip

Comma-separated CLIP-seq .fastq files in Format

  --clip con1_rep1_clip.fq,con1_rep2_clip.fq,con2_clip.fq
  # OR
  --clip con1_rep1_clip.fq --clip con1_rep2_clip.fq --clip con2_clip.fq

--adapterclip

Sequencing-adapter of CLIP reads

--smallrnaseq

Comma-separated smallRNA-seq FASTQ files, initialized with 'condition=' in Format

  --smallrnaseq con1=A.fastq,B.fastq --smallrnaseq con2=C.fq
  # OR
  --smallrnaseq con1=A.fastq --smallrnaseq con1=B.fastq --smallrnaseq con2=C.fq

--adaptersmallrnaseq5 and --adaptersmallrnaseq3

5' adapter of smallRNA-seq reads (--adaptersmallrnaseq5) and for 3' end (--adaptersmallrnaseq3)
--filterncrnas

Multi-fasta file of ncRNAs to filter smallRNA-seq reads. Those must not contain miRNAs.
--threads

Number of threads to be used
--overwrite

set this parameter to overwrite existing files
--testrun

sets this pipeline to testmode (accounting for small testset in piranha). This option should not be used in real analysis!
--out

output folder
--mirna

miRNA set, if set, mining is disabled and this set is used for prediction
--speciesBtag

Three letter code of species where we want to predict the miRNA targets (species B, --speciesBtag).
--mirbasedir

The folder specified by --mirbasedir is searched for the files organisms.txt.gz, mature.fa.gz, and hairpin.fa.gz. If the files are not exist, they will be downloaded.
--tempdir

The folder specified by --tempdir is used for temporary files. The default value is tmp/ inside the output folder specified by the --out parameter.
--piranahbinsize

Sets the Piranah bin size and has a default value of 30.
--CLIPminProcessLength and --CLIPmaxProcessLength

Both are integer values and set the lower and upper limit for the processed peak length. Peaks having a width below --CLIPminProcessLength or above --CLIPmaxProcessLength are ignored. Default values are 22 for --CLIPminProcessLength and 50 for --CLIPmaxProcessLength.
--CLIPminlength

An integer value specifying the minimal length of a CLIP peak to be processed. Default value is 0, meaning no minimal length for CLIP peaks.

OUTPUT

pseudo mirBASE dat file: final_mirbase_pseudofile.dat

A pseudo mirBASE dat file containing all precursor sequences with their named mature sequences and their coordinates. It only contain the fields:
- ID
- FH and FT
- SQ
mature miRNA set: mature_combined_mirbase_novel.fa

mature microRNA set, containing novels and miRBase-completed (if mined), together with the known miRNAs from miRBase
precursor miRNA set: hairpin_combined_mirbase_novel.fa

precursor microRNA set, containing novels (if mined), together with the known miRNAs from miRBase
mature miRNA expression per condition: miRNA_expression.csv

Semicolon-separated file containing:
- 1. rpm
- 1. condition
- 1. miRNA
orthologous prediction file: miRNA_orthologs.csv

tab-separated file containing:
- 1. query_id
- 1. subject_id
- 1. identity
- 1. alignment length
- 1. number mismatches
- 1. number gap openings
- 1. start position inside query
- 1. end position inside query
- 1. start position inside subject
- 1. end position inside subject
- 1. evalue
- 1. bitscore
- 1. aligned query sequence
- 1. aligned subject sequence
- 1. length query sequence
- 1. length subject sequence
- 1. coverage for query sequence
- 1. coverage for subject sequence
miRDeep2 mining result in HTML/CSV mirdeep_output.html/csv

the standard output HTML/CSV file of miRDeep2
ISOMIR prediction files: isomir_output_CONDITION.csv

semincolon delimited file containing:
- 1. mirna
- 1. substitutions
- 1. added nucleotids on 3' end
- 1. nucleotides at 5' end different from the annonated sequence
- 1. nucleotides at 3' end different from the annonated sequence
- 1. sequence
- 1. rpm
- 1. condition
genomics location of miRNAs: miRNA_genomic_position.csv

tab delimited file containing:
- 1. miRNA
- 1. genomic contig
- 1. identify
- 1. length
- 1. miRNA-length
- 1. number mismatches
- 1. number gapopens
- 1. miRNA-start
- 1. miRNA-stop
- 1. genomic-start
- 1. genomic-stop
- 1. evalue
- 1. bitscore
all library support-level target predictions: *_miranda_output.txt

miranda output, reduced to the lines, starting with > only
all library support-level CLIP transfer .bed files: *transfered_merged.bed

bed-file of the transferred CLIP-regions in speciesB transcriptome

Example

Testset

Feel free to test the pipeline with our microPIECE-testset :

git clone https://github.com/microPIECE-team/microPIECE-testset.git

Alternative

minimal workflow
- speciesA genome AAE genome : ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/002/204/515/GCF_002204515.2_AaegL5.0/GCF_002204515.2_AaegL5.0_genomic.fna.gz
- speciesA GFF AAE GFF : ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/002/204/515/GCF_002204515.2_AaegL5.0/GCF_002204515.2_AaegL5.0_genomic.gff.gz
- speicesA AGO-CLIP-sequencing library/libraries AGO-CLIP of AAE
- speciesB genome TCA genome : ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/002/335/GCF_000002335.3_Tcas5.2/GCF_000002335.3_Tcas5.2_genomic.fna.gz
- speciesB GFF TCA GFF : ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/002/335/GCF_000002335.3_Tcas5.2/GCF_000002335.3_Tcas5.2_genomic.gff.gz
- speciesB microRNA set (mature) TCA mature miRNAs
full workflow (in addition to the minimal workflow)
- speciesB microRNA set (precursor) TCA stem-loop miRNAs
- speciesB smallRNA-sequencing library/libraries TCA smallRNA-sequencing data
- speciesB non-codingRNA set (without miRNAs - to filter smRNA-seq data) (OPTIONAL)

CAVEATS

Complete list of open issues is available on Github-Issues.

Please report any new issues ad new Github-Issue.

Changelog

scheduled for next release

No features planned
v1.5.2 (2018-04-13)

Refactoring of CLIP_merge_bed_files.pl to reduce memory footprint by a factor of 10x (Fixes #174)

Refactoring of Piranha run to support multithreading (Fixes #177)

Fixing copy process of final files (Fixes #184)

Setting default bin size for Piranha to 30 (Fixes #178)

This version is archived as .
v1.5.1 (2018-04-11)

Added optimized pre-binning step for Piranha (Fixes #132)

This version is archived as .
v1.5.0 (2018-04-10)

Removing additional length cutoff during CLIP transfer (Fixes #153)

Add command line options --CLIPminProcessLength, --CLIPmaxProcessLength, and --CLIPminlength for length limits used in run_CLIP_process and run_CLIP_clip_mapper steps enabling processing of peaks with user defined widths (Fixes #145)

Dynamic naming of output files based on minlength variable in run_CLIP_clip_mapper (Fixes #146)

Correct calculation of length of a bed feature and moving scripts/CLIP_bedtool_discard_sizes.pl into lib/microPIECE.pm (Fixes [#147](https://github.com/microPIECE-team/microPIECE/iss ues/147))

Add an optimized pre-binning step with pseudocounts for bins covered by an exon as preparation for Piranha (Fixes #132 and #155)

This version is archived as .

This version was accepted by The Journal of Open Source Software (Review issue #616)
v1.4.0 (2018-03-31)

Copying pseudo mirBASE dat file final_mirbase_pseudofile.dat into output folder (Fixes #131)

Corrected RNA::HairpinFigure output (Fixes #137)

Fix the requirement of an accession inside mirBASE dat file (Fixes #134)

Avoiding error message while copying the out file for genomic location into base folder (Fixes #117)
v1.3.0 (2018-03-29)

Creating all structures on the fly using pseudo-mirBASE-dat as input.

Using miRNA.dat from mirBASE as source for mature/precursor sequence and relationship (Fixes #127)

Fix of division-by-zero bug for empty mapping files (Fixes #118)

Fix of typo in --piranhabinsize option (Fixes #116)
v1.2.3 (2018-03-26)

Fix transformation of precursor sequences based on mirbase #22 precursor sequences with a single mature. (Fixes L<#109|https://github.com/microPIECE-team/microPIECE/issues/109>)
v1.2.2 (2018-03-23)

Improved collision detection for newly identified miRNAs avoiding crashed caused by genomic copies. (Fixes #105)
v1.2.1 (2018-03-23)

Enables stable numbering for newly identified miRNAs based on their precursor and mature sequences (Fixes #101)
v1.2.0 (2018-03-22)

We are using miraligner which requires a java version 1.7, but 1.8 was installed by default. This was fixed by switching to v1.4 of the docker base image. Additionally, miraligner requires fix filenames for its databases. Therefore, the version v1.2.0 solved miraligner related bugs and reenables the isomir detection. (Fixes #97 and #98)
v1.1.0 (2018-03-12)

Add isomir detection and copy the final genomic location file to the output filter (Fixes #34)
v1.0.7 (2018-03-08)

Piranha was lacking of a bin_size parameter. Added parameter --piranahbinsize with a default value of 20 (Fixes #66)
v1.0.6 (2018-03-08)

Added parameter --mirbasedir and --tempdir to support local mirbase files and relocation of directory for temporary files (Fixes #66, #73, and #76)
v1.0.5 (2018-03-07)

Update of documentation and correct spelling of --mirna parameter
v1.0.4 (2018-03-07)

Fixes complete mature in final output (Fixes #69)
v1.0.3 (2018-03-06)

Add tests for perl scripts in script folder which ensure the correct handling of BED stop coordinates (Fixes #65)
v1.0.2 (2018-03-05)

Fixes the incorrect sorting of BED files, result was correct, but sorting was performed in the wrong order. (Fixes #63)
v1.0.1 (2018-03-05)

Fix an error conserning BED file handling of start and stop coordinates. (Fixes #59)
v1.0.0 (2018-03-05)

is archived as and submitted to The Journal of Open Source Software.
v0.9.0 (2018-03-05)

first version archived at Zenodo with the

License

This program is released under GPLv2. For further license information, see LICENSE.md shipped with this program. Copyright(c)2018 Daniel Amsel and Frank Förster (employees of Fraunhofer Institute for Molecular Biology and Applied Ecology IME) All rights reserved.

AUTHORS

Daniel Amsel <[email protected]>
Frank Förster <[email protected]>

Name		Name	Last commit message	Last commit date
Latest commit History 1,094 Commits
docker		docker
lib		lib
paper		paper
scripts		scripts
t		t
.gitignore		.gitignore
.travis.yml		.travis.yml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
issue_template.md		issue_template.md
microPIECE.pl		microPIECE.pl
pull_request_template.md		pull_request_template.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

microPIECE

Status

Required Software

Required Perl modules

Installation

Docker

Information about the docker images:

Usage

Input data

PARAMETERS

OUTPUT

Example

Testset

Alternative

CAVEATS

Changelog

License

AUTHORS

SEE ALSO

About

Releases 19

Packages

Contributors 2

Languages

License

microPIECE-team/microPIECE

Folders and files

Latest commit

History

Repository files navigation

microPIECE

Status

Required Software

Required Perl modules

Installation

Docker

Information about the docker images:

Usage

Input data

PARAMETERS

OUTPUT

Example

Testset

Alternative

CAVEATS

Changelog

License

AUTHORS

SEE ALSO

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 19

Packages 0

Contributors 2

Languages

Packages