Skip to content

Amplicon schemes

martinghunt edited this page Aug 23, 2024 · 4 revisions

Amplicon schemes

Viridian workflow can use its built-in schemes, you can provide your own scheme, or you can mix and match between those two options.

When the pipeline runs, it automatically discovers which amplicon scheme the reads match best to. You can force the choice if you like, using the option --force_amp_scheme NAME, where NAME must exactly match one of the scheme names.

Built-in schemes

Viridian has built-in SARS-CoV-2 amplicon schemes. At the time of writing, these are: ARTIC version 3, ARTIC version 4, and Midnight-1200.

You can find out the current built-in schemes by looking at the help message from viridian_workflow run_one_sample --help. You should see this amongst the output:

--built_in_amp_schemes scheme1,scheme2,...
                        Comma-separated list of built in amplicon schemes to use [COVID-ARTIC-V3,COVID-ARTIC-V4,COVID-MIDNIGHT-1200]

By default, all built-in schemes are used, and are listed as the default option in square brackets. In this case, they are: COVID-ARTIC-V3,COVID-ARTIC-V4,COVID-MIDNIGHT-1200.

You can restrict to using just some (or even none - see how later) of the built-in schemes. For example, to just use ARTIC versions 3 and 4, add this when running the workflow: --built_in_amp_schemes COVID-ARTIC-V3,COVID-ARTIC-V4.

Custom schemes

File formats

Viridian needs the primer scheme in one of two file formats:

  1. Its own custom TAB-delimited file, described below
  2. The "PrimalScheme" BED format, see for example this SARS-CoV-2/400/v5.3.2 BED file.

If the filename ends with .bed then Viridian assumes it is in the second format. Otherwise it assumes it is in the first format.

Viridian amplicon scheme TSV format

An amplicon scheme needs to be defined in a TAB-delimited file. That file has one primer per line, and must include the following column headings (any other columns are simply ignored):

  • Amplicon_name: the name of the amplicon
  • Primer_name: the name of the primer
  • Left_or_right: must be "left" or "right", indicating if this is the left or right primer for the amplicon.
  • Sequence: the nucleotide sequence of the primer. If it is a left primer, then must be on the forward strand of the reference genome. If it is a right primer, then it must be on the reverse strand of the reference genome.
  • Position: zero-based position of the start of the primer when aligned to the reference genome (ie what you would get in a SAM/BAM file). In other words, it should be min(start in ref, end in ref), however you consider the various orientations of the primer and whether or not it needs reverse complementing.

As an example, here are the first four lines of the ARTIC version 3 file that is built in to the pipeline:

Amplicon_name      Primer_name        Left_or_right  Sequence                   Position
nCoV-2019_1_pool1  nCoV-2019_1_LEFT   left           ACCAACCAACTTTCGATCTCTTGT   30
nCoV-2019_1_pool1  nCoV-2019_1_RIGHT  right          CATCTTTAAGATGTTGACGTGCCTC  385
nCoV-2019_2_pool2  nCoV-2019_2_LEFT   left           CTGTTTTACAGGTTCGCGACGT     320
nCoV-2019_2_pool2  nCoV-2019_2_RIGHT  right          TAAGGATCAGTGCCAAGCTCGT     704

It is important to note that we assume that left primer sequences match the forward strand of the reference, and right primer sequences match the reverse strand of the genome. Here is a diagram of the first primer:

ref genome: ...ACCAACCAACTTTCGATCTCTTGT... 
               |
ref position:  30

and the second primer:

                [ rev comp of primer seq]
ref genome:  ...GAGGCACGTCAACATCTTAAAGATG...
                |
ref position:   385

ie the right hand primer sequence in the TSV file must be reverse complemented to then match the forward strand of the reference.

WARNING: currently, the TSV file is not checked for correctness (this will change in the future). It is up to you to make a sensible TSV file.

Custom amplicon schemes index file

Custom schemes must be provided to the workflow using a TAB-delimited file listing the name of the scheme and the absolute path to the file of that scheme (in the TAB-delimited format described above). It must have two columns: Name and File. Even if you are only using one custom scheme, this file is required. For example:

Name       File
My_scheme  /path/to/scheme.tsv

This file is then provided to the pipeline using the option --amp_schemes_tsv. See the examples below.

Using any combination of schemes

There are various cases - we will give an example of each one. They are controlled by the three options --built_in_amp_schemes, --amp_schemes_tsv and --force_amp_scheme.

Default behaviour

Simply use no extra options. The built-in schemes will be used, and the one that best agrees with the reads will be chosen.

Force a particular built-in scheme

Use the option --force_amp_scheme. The value given must exactly match one of the built-in names. For example: --force_amp_scheme COVID-ARTIC-V3.

Use some of the built-in schemes

Use the option --built_in_amp_schemes to list only the ones you want to use. For example for ARTIC version 3 and 4 only: --built_in_amp_schemes COVID-ARTIC-V3,COVID-ARTIC-V4. The value must be a comma-separated list of the scheme names.

Use only custom schemes

Use the option --amp_schemes_tsv schemes.tsv. This will use the schemes listed in schemes.tsv only. Using this option on its own disables the use of built-in schemes.

Use only custom schemes, force choice of one scheme

Use the option --amp_schemes_tsv schemes.tsv, and the option --force_amp_scheme scheme1. Note that whatever value you use to force the scheme choice (in this case scheme1), that must be a name of a scheme in schemes.tsv.

Use a mix of custom and built-in schemes

Use the option --amp_schemes_tsv schemes.tsv to provide your own scheme(s). Additionally, list all of the built-in schemes you would also like to use like this: --built_in_amp_schemes COVID-ARTIC-V3,COVID-ARTIC-V4.

Additionally, you can still force the choice of scheme with the option --force_amp_scheme NAME, as long as NAME is the name of one of your schemes or the built-in schemes.