Common utilities for parsing and handling peptide-spectrum matches and search engine results in Python.
psm_utils is a Python package with utilities for parsing and handling peptide-spectrum matches (PSMs) and proteomics search engine results. It is mainly developed to be used in Python packages developed at CompOmics, such as MS²PIP, DeepLC, and MS²Rescore, but can be useful to anyone dealing with PSMs and PSM files. Moreover, it provides an easy-to-use CLI and web server to convert search engine results from one PSM file format into another.
- To provide an easy-to-use Python API for handling PSMs.
- To provide a unified Python API to the plethora of proteomics search engine output formats that are in existence.
- To follow community standards: psm_utils pragmatically adheres to the standards developed by the HUPO Proteomics Standards Initiative, such as ProForma 2.0 , the Universal Spectrum Identifier, and mzIdentML
- To be open and dynamic: psm_utils is fully open source, under the permissive Apache 2.0 license. New reader and writer modules can easily be added, and we welcome everyone to contribute to the project. See Contributing for more information.
- NOT to reinvent the wheel: Instead, psm_utils heavily makes
use of packages such as pyteomics and
psims that have existing
functionality for reading and/or writing PSM files.
psm_utils.io
provides a unified, higher level Python API build on top of these packages.
File format | psm_utils tag | Read support | Write support |
---|---|---|---|
AlphaDIA precursors TSV | alphadia |
✅ | ❌ |
DIA-NN TSV | diann |
✅ | ❌ |
FlashLFQ generic TSV | flashlfq |
✅ | ✅ |
FragPipe PSM TSV | fragpipe |
✅ | ❌ |
ionbot CSV | ionbot |
✅ | ❌ |
OpenMS idXML | idxml |
✅ | ✅ |
MaxQuant msms.txt | msms |
✅ | ❌ |
MS Amanda CSV | msamanda |
✅ | ❌ |
mzIdentML | mzid |
✅ | ✅ |
Parquet | parquet |
✅ | ✅ |
Peptide Record | peprec |
✅ | ✅ |
pepXML | pepxml |
✅ | ❌ |
Percolator tab | percolator |
✅ | ✅ |
Proteome Discoverer MSF | proteome_discoverer |
✅ | ❌ |
Sage Parquet | sage_parquet |
✅ | ❌ |
Sage TSV | sage_tsv |
✅ | ❌ |
ProteoScape Parquet | proteoscape |
✅ | ❌ |
TSV | tsv |
✅ | ✅ |
X!Tandem XML | xtandem |
✅ | ❌ |
Legend: ✅ Supported, ❌ Unsupported
psm_utils online is a Streamlit-based web server built on top of the psm_utils Python package. It allows you to easily retrieve proteomics PSM statistics for any supported PSM file type, and to convert search engine results from one PSM file format into another. Click the badge above to get started!
pip install psm-utils
conda install -c bioconda psm-utils
The full documentation, including a quickstart guide and Python API reference is available on psm_utils.readthedocs.io.
If you use psm_utils for your research, please cite the following publication:
psm_utils: A high-level Python API for parsing and handling peptide-spectrum-matches and proteomics search results.Ralf Gabriels, Arthur Declercq, Robbin Bouwmeester, Sven Degroeve, Lennart Martens.Journal of Proteome Research (2022). doi:10.1021/acs.jproteome.2c00609