Skip to content

Latest commit

 

History

History
65 lines (38 loc) · 3.41 KB

File metadata and controls

65 lines (38 loc) · 3.41 KB

BaseSpace download FASTQ with checksums

Downloads FASTQ files from Illumina BaseSpace via the CLI with md5 checksums.

Requirements

Requires a Conda installation.

Requires bs and bs-cp from BaseSpace Sequence Hub CLI to be on the path - there are no Conda packages available.

For University of Edinburgh users only: Anaconda and the CLI is in the module system.

module load anaconda
module load igmm/apps/BaseSpaceCLI/0.10.7

BaseSpace access token

Log in to BaseSpace Developers with your BaseSpace account credentials. Create a new native application, call it whatever you like. Click on the 'Credentials' tab, and copy the access token from the text box. Save it somewhere, e.g. in a file $HOME/.basespace_access_token. This is the value you pass to the --access_token parameter in this pipeline.

Your Access token

BaseSpace authentication

Authenticate against the account you want to download from. This has to be done interactively because the command generates a URL which you need to copy & paste into a browser. You'll be directed to the Basespace website to sign in.

bs auth

Get the run name

Log in to the BaseSpace website and look in the RUNS section for your run name. Note that this is the run name assigned by the sequencing facility, not the run ID from the machine (e.g. 200423_NB551016_0613_AHLGWCBGXF). If the run name isn't visible in the table, click through to the run page.

Run names

Get the project name or id

The project name can be found either in the PROJECTS section as for the run name, or by clicking through to the project page.

Project name

If your run has gone through basecalling more than once, use the project id (numeric, found in the URL when viewing the project) instead of the name, as the name may be duplicated.

Project id

Download the run

Reads for all the samples in the run will be downloaded to a subfolder of the output folder, named using the run ID from the machine.

The runs will be moved to a sub-folder of <output_dir> named by the run id (e.g. 200423_NB551016_0613_AHLGWCBGXF) as this is more likely to be an invariant format than the run names assigned by the sequencing facility. The output folder will contain the gzipped FASTQ files for each sample, a text file of format sample.md5sum.txt containing the md5 checksums, a text file of format sample.md5_check with the results of checking those, and a text file samples.txt with the list of sample ids.

nextflow run ameynert/base-space-download-fastq-with-checksums \
  --access_token <access_token> \
  --project <project_name> \
  --run <run_name> \
  --outdir <output_dir>

For sequencing from the University of Edinburgh Clinical Research Facility

If you are downloading a NextSeq 550 run, use the command as above. For a NextSeq 2000 run, add the parameter --dragen. For anyone else, you may need to experiment with this flag.