Skip to content

QC TSV output file

martinghunt edited this page Jan 3, 2024 · 3 revisions

This page describes the QC gzipped tab-delimited file qc.tsv.gz made when running viridian run_one_sample.

It contains per-base information on the consensus sequence and how it aligns to the reference genome.

The first 11 columns are:

  1. Ref_pos - reference position (1-based)
  2. Ref_nt - reference nucleotide
  3. Cons_pos - consensus position (1-based)
  4. Cons_nt - consensus nucleotide
  5. Masked_cons_nt - consensus nucleotide after masking
  6. Amplicon - name of amplicon(s) this position belongs to
  7. Primer - name of primer(s) this position belongs to
  8. Mask - list of mask reasons, or PASS if not masked
  9. Total_depth - total read depth at this position
  10. Clean_depth - total "clean" read depth, ie excluding primer portions of reads
  11. Cons_depth - total clean depth that supports the consensus call

The remaining columns are the count of A/C/G/T/insertion/deletion pileup depths from the reads on each strand. The counts are split into "clean" read depth (ie excluding primer portions of reads) and "bad" read depth (primer portions of reads): A/a are the clean counts of A from the reads on the forward/reverse strand, and similarly for the other nucleotides. I/i and D/d show the insertion and deletion clean counts (but not their lengths - these are in the detailed entries of the self_qc entry of the log JSON file). The "bad" read depths are given in columns of the same name, but with _X appended, for example A_X/a_X for bad read depth of A on the forward and reverse strands.

Here is a toy example, showing the 11 columns only (otherwise it is far too wide!):

Ref_pos  Ref_nt  Cons_pos  Cons_nt  Masked_cons_nt  Amplicon  Primer  Mask   Total_depth  Clean_depth  Cons_depth
1        A       0         -        -               .         .       .      .            .            .
2        T       0         -        -               .         .       .      .            .            .
3        G       1         G        N               A1        A1_l_0  DEPTH  700          0            0
4        C       2         C        N               A1        A1_l_0  DEPTH  702          0            0
5        G       3         G        N               A1        A1_l_0  DEPTH  705          0            0
6        G       4         G        N               A1        A1_l_0  DEPTH  701          0            0
7        A       5         A        A               A1        .       PASS   800          800          800
8        A       6         A        A               A1        .       PASS   804          804          799
9        C       7         C        C               A1        .       PASS   805          805          804
10       A       8         A        A               A1        .       PASS   800          800          800
11       A       8         -        -               A1        .       .      .            .            .
12       T       9         T        T               A1        .       PASS   800          800          800
13       C       10        C        C               A1;A2     A2_l_0  PASS   1500         802          801
14       G       11        G        G               A1;A2     A2_l_0  PASS   1503         801          799
15       C       12        C        C               A1;A2     A2_l_0  PASS   1501         804          804

In this example, the first amplicon, called A1, starts at reference position 3. Its left primer, called A1_l_0, is at reference position 3-6. The only read depth there is "bad", and so those positions are masked in the consensus sequence.

The consensus has the reference position 11 deleted (there is - in the Cons_nt column).

The second amplicon, called A2, starts at reference position 13. Positions 13-15 in the reference belong to amplicons A1 and A2, and also the left primer A2_l_0 of amplicon A2. This means there is a mix of good and bad coverage at positions 13-15. The good coverage is from the reads from amplicon A1, and the bad coverage is from the primer parts of the reads from amplicon A2. We see that the total depth is around 1500X, but the good coverage that supports the consensus call (Clean_depth) is around 800X.

Clone this wiki locally