05 Command Line Interface

Guidescanpy has a command line interface to perform certain functionalities, mostly for developers use. To use the CLI tool, run the command in the format:

guidescanpy command [options] [arguments]

Web

guidescanpy web

This command will start the Guidescanpy web application on localhost. The default port is 5001, with debug mode on.

guidescanpy worker

Most queries in the application take some time to execute, and we do not want the web application to become unresponsive during that time. This command will start a Guidescanpy "worker" (a Celery task), which processes any jobs that are submitted on the web application.

Decode

guidescanpy decode [options] [arguments]

This command will decode the given bam file to a human-readable data file.

Positional arguments:
- grna_database: SAM/BAM file containing Guidescan2 processed gRNAs. Commonly as guidescanpy/docker/snakemake/data/databases/[enzyme]/[organism].bam.sorted.
- fasta_file FASTA file for resolving off-target sequences. Commonly as guidescanpy/docker/snakemake/data/raw/[organism].fna.
- chr2acc_file chr2acc file for chromosome resolution. Commonly as guidescanpy/docker/snakemake/data/raw/[organism]_chr2acc.
Options and flags:
- -h, --help: Show this help message and exit.
- --region REGION: One or more region strings.
- --mode {succinct,complete}: Succinct or complete off-target information.
Output:

Print a CSV-formatted output with the following columns:
- id: The identifier for each sequence.
- sequence: The target DNA sequence.
- chromosome: The chromosome where the target sequence is located.
- position: The position of the target sequence on the chromosome.
- sense: The direction of the sequence (+ or -).
- distance_0_matches: The number of perfect matches found.
- distance_1_matches: The number of matches with 1 mismatch found.
- distance_2_matches: The number of matches with 2 mismatches found.
- distance_3_matches: The number of matches with 3 mismatches found.
- specificity: The specificity of the sequence.
- cutting_efficiency: The cutting efficiency of the sequence.

Example:

 guidescanpy decode data/databases/cas9/sacCer3.bam.sorted data/raw/sacCer3.fna data/raw/sacCer3_chr2acc

The output is:

 id,sequence,chromosome,position,sense,distance_0_matches,distance_1_matches,distance_2_matches,distance_3_matches,specificity,cutting_efficiency
 NC_001133.9:44:-,AGGATGTGTGTGTGTGGGTGNGG,NC_001133.9,44,-,0,0,4,20,0.10518199950456619,0.46371179819107056
 NC_001133.9:49:-,GTGTTAGGATGTGTGTGTGTNGG,NC_001133.9,49,-,0,0,2,2,0.54339200258255,0.4851852059364319
 NC_001133.9:50:-,AGTGTTAGGATGTGTGTGTGNGG,NC_001133.9,50,-,0,0,1,2,0.8631880283355713,0.5173904299736023
 NC_001133.9:64:-,GGCTGTGTTAGGGTAGTGTTNGG,NC_001133.9,64,-,0,0,3,5,0.39531800150871277,0.38337841629981995
 NC_001133.9:74:-,GTTAGATTAGGGCTGTGTTANGG,NC_001133.9,74,-,0,0,0,4,0.4179899990558624,0.5174511671066284
 ......

Generate Kmers

guidescanpy generate-kmers [options] [arguments]

This command will generate kmers from the given FASTA file, based on the specified PAM sequence.

Positional arguments:
- fasta: FASTA file to use as a reference for kmer generation. Commonly as guidescanpy/docker/snakemake/data/raw/[organism].fna.
Options and flags:
- -h, --help: Show this help message and exit.
- --pam PAM Protospacer adjacent motif to match. The default is NGG.
- --kmer-length KMER_LENGTH Length of kmers to generate. The default is 20.
- --min-chr-length MIN_CHR_LENGTH Minimum chromosome length to consider for kmer generation. The default is 0.
- --prefix PREFIX Prefix to use for kmer identifiers. The default is no prefix.
- --start Match PAM at start of kmer instead at end (default).
- --max-kmers MAX_KMERS Maximum number of kmers to generate. The default is no limit.
Output:

Print a CSV-formatted output with the following columns:
- id: The identifier for each sequence.
- sequence: The target DNA sequence.
- pam: The PAM of the target.
- chromosome: The chromosome where the target sequence is located.
- position: The position of the target sequence on the chromosome.
- sense: The direction of the sequence (+ or -).

Example:

 guidescanpy generate-kmers data/raw/sacCer3.fna --max-kmers 100

The output is:

 id,sequence,pam,chromosome,position,sense
 NC_001133.9:882:+,AGAATATTTCGTACTTACAC,NGG,NC_001133.9,882,+
 NC_001133.9:1079:+,ATGTGACACTACTCATACGA,NGG,NC_001133.9,1079,+
 NC_001133.9:1112:+,AGTCAAGACGATACTGTGAT,NGG,NC_001133.9,1112,+
 NC_001133.9:1128:+,TGATAGGTACGTTATTTAAT,NGG,NC_001133.9,1128,+
 NC_001133.9:1340:+,ATTTTACGTGTCAAAAAATG,NGG,NC_001133.9,1340,+
 NC_001133.9:1488:+,CAGCGACTCATTTTTATTTA,NGG,NC_001133.9,1488,+
 ...... (100 records in total)

Initialize Database

guidescanpy init-db

This command serves to initialize the database by creating five tables: libraries, chromosomes, genes, exons, and essential_genes, if they do not already exist. (The use of --force flag with this command will wipe out any existing data from these tables, so use with caution!).

Add Organism

guidescanpy add-organism [options] [arguments]

This command will add all data of the given organism to the database.

Positional arguments:
- organism: Organism that needs to be added to the database.
- gtf_gz: The gtf.gz file for the organism. Commonly as guidescanpy/docker/snakemake/data/raw/[organism].gtf.gz
- chr2acc: The chr2acc file for the organism. Commonly as guidescanpy/docker/snakemake/data/raw/[organism]_chr2acc
Options and flags:
- -h, --help: show this help message and exit

Example:

 guidescanpy add-organism sacCer3 data/raw/sacCer3.gtf.gz data/raw/sacCer3_chr2acc

Filter Tag

guidescanpy filter-tag [options]

This command can filter a SAM/BAM file based on the number of offtargets at a given distance.

Options and flags: - -h, --help: Show this help message and exit - --input INPUT, -i INPUT (Required): Path to the input sam/bam file. - --output OUTPUT, -o OUTPUT (Required): Path to the output sam/bam file. - --k0 K0: Max number of off-targets at distance 0. The default is 1. - --k1 K1: Max number of off-targets at distance 1. The default is 0. - --k2 K2: Max number of off-targets at distance 2. The default is inf. - --k3 K3: Max number of off-targets at distance 3. The default is inf.
Output:

A filtered SAM/BAM file.

Example:

 guidescanpy filter-tag --input data/databases/cas9/sacCer3.sam --output data/databases/cas9/sacCer3.bam

Add Tag

guidescanpy add-tag [options] [arguments]

This incomplete command can add new tags to the SAM/BAM files. It was originally designed to add ce tag to the SAM files generated by guidescan enumerate, but due to model version incompatibility, this command wasn't put into use. It may have potential usage in the future.

Positional arguments:
- tag: List of tags to add.
Options and flags:
- -h, --help: Show this help message and exit
- --input INPUT (Required), -i INPUT: Path to the input sam/bam file.
- --output OUTPUT (Required), -o OUTPUT: Path to the output sam/bam file.
Output:

The SAM/BAM file with added tag(s).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

05 Command Line Interface

Web

Decode

Generate Kmers

Initialize Database

Add Organism

Filter Tag

Add Tag

Clone this wiki locally