Skip to content
/ samsift Public
forked from karel-brinda/samsift

SAMsift – advanced filtering and tagging of SAM/BAM alignments using Python expressions.

License

Notifications You must be signed in to change notification settings

c2-d2/samsift

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SAMsift

https://travis-ci.org/karel-brinda/samsift.svg?branch=master https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat-square

SAMsift is a program for advanced filtering and tagging of SAM/BAM alignments using Python expressions.

Getting started

git clone http://github.com/karel-brinda/samsift
cd samsift
# keep only alignments with alignment score >94
samsift/samsift -i tests/test.bam -o filtered.bam -f 'AS>94'
# add tags 'ln' with sequence length and 'ab' with average base quality
samsift/samsift -i tests/test.bam -o with_ln_ab.bam -c 'ln=len(SEQ);ab=1.0*sum(QUAL)/ln'

Installation

Using Bioconda:

# add all necessary Bioconda channels
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda

# install samsift
conda install samsift

Using PIP from PyPI:

pip install --upgrade samsift

Using PIP from Github:

pip install --upgrade git+https://github.com/karel-brinda/samsift

Command-line parameters

Program: samsift (advanced filtering and tagging of SAM/BAM alignments using Python expressions)
Version: 0.1.0
Author:  Karel Brinda <[email protected]>

Usage:   samsift.py [-h] [-v] [-i FILE] [-o FILE] [-f PY_EXPR] [-c PY_CODE] [-d PY_EXPR] [-t PY_EXPR] [-m STR]

Options:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  -i FILE               input SAM/BAM file [-]
  -o FILE               output SAM/BAM file [-]
  -f PY_EXPR            filter [True]
  -c PY_CODE            code to be executed (e.g., assigning new tags) [None]
  -d PY_EXPR            debugging expression to print [None]
  -t PY_EXPR            debugging trigger [True]
  -m STR                mode: strict (stop upon first error)
                              nonstop-keep (keep alignments causing errors)
                              nonstop-remove (remove alignments causing errors) [strict]

Algorithm

for ALIGNMENT in ALIGNMENTS:
        if eval(DEBUG_TRIGER):
                print(eval(DEBUG_EXPR))
        if eval(FILTER):
                exec(CODE)
                print(ALIGNMENT)

Python expression. All expressions should be valid Python 3 expressions. They are evaluated using the eval function.

Python code. Code is executed using the exec function.

SAM fields. All Python expressions and code can access variables mirroring all the fields from the alignment section of the SAM specification, i.e., QNAME, FLAG, RNAME, POS (1-based), MAPQ, CIGAR , RNEXT, PNEXT, TLEN, SEQ, and QUAL. For instance, we can filter reads, keeping only those with POS smaller than 10000, by

samsift -i tests/test.bam -f 'POS<10000'

The PySAM representation of the current alignment (class pysam.AlignedSegment) is available through the variable a. Therefore, the previous example is equivalent to

samsift -i tests/test.bam -f 'a.reference_start+1<10000'

SAM tags. All SAM tags are translated to variables with the same name. For instance, if alignment scores are provided through the AS tag (as defined in the Sequence Alignment/Map Optional Fields Specification), then alignments with score smaller or equal to the sequence length can be removed using

samsift -i tests/test.bam -f 'AS>len(SEQ)'

If CODE is provided, all two-letter variables are back-translated to tags. For instance, a tag ab carrying the average base quality can be added by

samsift -i tests/test.bam -c 'ab=1.0*sum(QUAL)/len(QUAL)'

Similar programs

Author

Karel Brinda <[email protected]>

About

SAMsift – advanced filtering and tagging of SAM/BAM alignments using Python expressions.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 85.4%
  • Makefile 13.3%
  • Shell 1.3%