Skip to content

Latest commit

 

History

History
71 lines (56 loc) · 6.87 KB

platforms.rst

File metadata and controls

71 lines (56 loc) · 6.87 KB

Platforms

This text contains information about the different linked-read platforms supported or not supported by the pipeline and which requirements for input and possible preprocessing they have.

DBS

Droplet Barcode Sequencing (DBS) is based on the technology described in Redin el al. 2019. Long DNA fragments are subjected to tagmentation using Tn5-covered beads to cut, tag and wrap the fragment around the beads. The DNA-wrapped beads are then used in emmulsion PCR along with barcoded oligo. Within each emmulsion droplet the barcode and tagged fragments are amplified independently and then linked using overlap-extension. Barcode-linked fragments are recovered and indexed for Illumina sequencing.

Barcodes:Semi-degerate sequence of 20 bases with about 3.5 billion possible sequences.
Preprocessing:Barcode sequences are extracted from read 1 and clustered using starcode to error-correct barcodes. Read 1 and 2 is then trimmed using cutadapt and the corrected barcode attached to the header.
Additional input files:None

10x Genomics

10x Genomics linked-read technology comes in two versions; the older GemCode (v1) and more recent Chromium Genome (v2). Long DNA fragments are combined in droplets with barcode-containing gel-beads to create GEMs ((Gel Bead-In EMulsions). The fragments are amplified and barcoded using a combination of free random hexamers and barcode-linked random hexamers from the gel beads. Following this barcoded fragments are recovered and fragments before ligation of 3' sequencing adaptor. Libraries are sequenced using Illumina Sequencing. The commercial version of the technology is currently discontinued.

Barcodes:10x Genomics uses a barcode library of 16 base sequences. GemCode libraries have maximum of 737 thousand sequences while the Chromium Genome has about 4 million.
Preprocessing:Barcodes need to be extracted from read 1 and error-corrected using a whitelist (Download from here). Reads are trimmed and the corrected barcode appended to the header. Preprocessing uses the ema count and preproc tools.
Additional input files:TXT with valid barcode sequences, parameter barcode_whitelist in configs.

stLFR

stLFR (single-tube long fragment read) is based on the technology described in Wang et al. 2019 and is commercially available from MGI. The technology uses tagmentation to individually cut-and-hold long DNA fragments in solution. The tagmentase-DNA complex is then hybridized and individual wrapped around barcoded beads through the adaptor introduced by the tagmentation. The barcode is then ligated to each subfragment before recovery and final library prepration. Sequencing is preformed on the DNBSEQ platfroms.

Barcodes:The barcodes are genereated using a combinatorial split-and-pool approach. The barcode is a combination of three barcodes from a 1,536 barcode library in which each barcode is 10 bp. The total number of possible barcodes is about 3.6 billion.
Preprocessing:The barcode needs to be extracted and error-corrected externaly using the stLFR_read_demux or alternatively SuperPlus split_barcode before beign inputted to the pipeline. Following this the reads are trimmed using cutadapt. The stLFR_read_demux inserts a index based barcode to the read header (e.g. #1024_323_231) based on which three barcodes were detected. This is not compatible with some aligners such as ema due to not cosisting of IUPAC base symbols. Therefore the index barcodes are replaced with either (A) a concatemer of the three detect barcodes or (B) a uniquely generated 16 base sequence (recommended). For option B a whitelist of barcodes from the stLFR_read_demux (Download from here tools needs to be provided.
Additional input files:Optional list of barcodes with corresponding index, see parameter stlfr_barcodes in configs.

TELL-seq

TELL-seq is based on the technology from Chen et al. 2020 and is commercially available from the company Universal Sequencing. The method uses clonaly barcode beads with attacted tagmentases to cut and barcode individual long DNA fragments in solution. A second tagmentation is also preformed in solution to introduce a second adaptor. The library is sequenced using Illumina sequencing with special setup to sequence the barcode as index 1.

Barcodes:Semi-degenerated sequence of 18 bases with about 2.4 billion possible sequences.
Preprocessing:Barcodes are either (A) clustered using starcode as for DBS or (B) single count barcodes are corrected to any barcode within a hamming distance of 1 or discarded. Option B follows the method used in Chen et al. 2020. Reads are subsequently tagged in the header with the corrected barcode.
Additional input files:Index1 FASTQ containing the barcode sequences, see parameter tellseq_index in configs.

CPT-seq

Technology based Amini et al. 2014 and the follow-up CPTv2-seq from Zhang et al. 2017. These technologies were developed by Illumin but are not commercially available. Due to the limited number of available datasets and limited use these technolgies will not be supported.

Haplotagging

Haplotagging is based on the technology presented in Meier et al. 2021. The technology uses barcoded beads covered with Tn5 tagmentase to cut and barcode individual long DNA fragments in solution. The beads are coated in a combination of two barcodes AB and CB that become inserted at the 5' and 3' of each cut fragment. Barcodes are combinatorialy generated with about 85 million possible combinations in total. This platform is not yet supported as no public data currently exists. Preprocessing is available through evolgenomics/haplotagging.