https://github.com/bigbio/quantms-utils

A python library with scripts and helpers classes for quantms workflow

Science Score: 39.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.5%) to scientific vocabulary

Keywords

bigdata cloud mass-spectrometry nextflow proteogenomics proteomics

Last synced: 5 months ago · JSON representation

Repository

A python library with scripts and helpers classes for quantms workflow

Basic Info

Host: GitHub
Owner: bigbio
License: mit
Language: Python
Default Branch: main
Homepage: https://quantms.org
Size: 80.6 MB

Statistics

Stars: 5
Watchers: 4
Forks: 1
Open Issues: 0
Releases: 19

Topics

bigdata cloud mass-spectrometry nextflow proteogenomics proteomics

Created almost 2 years ago · Last pushed 10 months ago

Metadata Files

Readme License

quantms-utils

Python package with scripts and functions for the quantms workflow for the analysis of quantitative proteomics data.

The package is available on PyPI: quantms-utils pip install quantms-utils

Available Scripts

The following functionalities are available in the package:

Diann scripts

dianncfg - Create a configuration file for Diann including enzymes, modifications, and other parameters.
diann2mztab - Convert Diann output to mzTab format. In addition, convert DIA-NN output to MSstats, Triqler or mzTab. The output formats are used for quality control and downstream analysis in quantms.

SDRF scripts

openms2sample - Extra sample information from OpenMS experimental design file. An example of OpenMS experimental design file is available here.
checksamplesheet - Check the sample sheet for errors and inconsistencies. The experimental design coult be an OpenMS experimental design file or and SDRF file.

Other scripts

psmconvert - The convert_psm function converts peptide spectrum matches (PSMs) from an idXML file to a CSV file, optionally filtering out decoy matches. It extracts and processes data from both the idXML and an associated spectra file, handling multiple search engines and scoring systems.
mzmlstats - The mzmlstats processes mass spectrometry data files in either .mzML or Bruker .d formats to extract and compile statistics about the spectra. It supports generating detailed or ID-only CSV files based on the spectra data.

mzml statistics

quantms-utils have multiple scripts to generate mzML stats. These files are used by multiple tools and packages within quantms ecosystem for quality control, mzTab generation, etc. Here are some details about the formats, the fields they contain and gow they are computed.

MS info and details

`mzmlstats` allows the user to produce a file containing all features for every signal in the MS/MS experiment. The produced file is a parquet file, with the original name of the file plus the following postfix `{file_name}_ms_info.parquet`. Here, the definition of each column and how they are estimated and used: - `scan`: The scan accession for each MS and MS/MS signal in the mzML, depending on the manufacturer, the scan will have different formats. Example, for thermo (e.g `controllerType=0 controllerNumber=1 scan=43920`). We tried to find the definition of [quantms.io](https://github.com/bigbio/quantms.io/blob/main/docs/README.adoc#scan). - `ms_level`: The MS level of the signal, 1 for MS and 2 for MS/MS. - `num_peaks`: The number of peaks in the MS. Compute with pyopenms with `spectrum.get_peaks()`. - `base_peak_intensity`: The max intensity in the spectrum (MS or MS/MS). - `summed_peak_intensities`: The sum of all intensities in the spectrum (MS or MS/MS). - `rt`: The retention time of the spectrum, capture with pyopenms with `spectrum.getRT()`. For MS/MS signals, we have the following additional columns: - `precursor_charge`: The charge of the precursor ion, if the signal is MS/MS. Capture with pyopenms with `spectrum.getPrecursors()[0].getCharge()`. - `precursor_mz`: The m/z of the precursor ion, if the signal is MS/MS. Capture with pyopenms with `spectrum.getPrecursors()[0].getMZ()`. - `precursor_intensity`: The intensity of the precursor ion, if the signal is MS/MS. Capture with pyopenms with `spectrum.getPrecursors()[0].getIntensity()`. If the precursor is not annotated (present), we use the purity object to get the information; see note below. - `precursor_rt`: The retention time of the precursor ion, if the signal is MS/MS. See note below. - `precursor_total_intensity`: The total intensity of the precursor ion, if the signal is MS/MS. See note below. > **NOTE**: For all the precursor-related information, we are using the first precursor in the spectrum. The following columns `intensity` (if not annotated), `precursor_rt`, and `precursor_total_intensity` we use the following pyopnems code: > ```python > precursor_spectrum = mzml_exp.getSpectrum(precursor_spectrum_index) > precursor_rt = precursor_spectrum.getRT() > purity = oms.PrecursorPurity().computePrecursorPurity(precursor_spectrum, precursor, 100, True) > precursor_intensity = purity.target_intensity > total_intensity = purity.total_intensity > ```

MS2 info and details

`mzmlstats` allows the user to produce a file containing all the MS2 spectra including the intesities and masses of every peak. The produced file is a parquet file, with the original name of the file plus the following postfix `{file_name}_ms2_info.parquet`. Here, the definition of each column and how they are estimated and used: - `scan`: The scan accession for each MS and MS/MS signal in the mzML, depending on the manufacturer, the scan will have different formats. Example, for thermo (e.g `controllerType=0 controllerNumber=1 scan=43920`). We tried to find the definition of [quantms.io](https://github.com/bigbio/quantms.io/blob/main/docs/README.adoc#scan). - `ms_level`: The MS level of the signal, all of them will be 2. - `mz_array`: The m/z array of the peaks in the MS/MS signal. Capture with pyopenms with `mz_array, intensity_array = spectrum.get_peaks()`. - `intensity_array`: The intensity array of the peaks in the MS/MS signal. Capture with pyopenms with `mz_array, intensity_array = spectrum.get_peaks()`.

MS1 Feature Maps

We use the FeatureFinderMultiplexAlgorithm from OpenMS to extract the features from the MS1 spectra. We use an algorithm based on the original implementation by Andy Lin. The output of this algorithm is a feature map, which contains the following information:

feature_mz: The m/z of the feature.
feature_rt: The retention time of the feature.
feature_intensity: The intensity of the feature.
feature_charge: The charge of the feature.
feature_quality: The quality of the feature.
feature_percentile_tic: The percentile of the feature in the total ion current.
feature_id: The unique identifier of the feature generated by OpenMS.
feature_min_rt: The minimum retention time of the feature within the feature map.
feature_min_mz: The minimum m/z of the feature within the feature map.
feature_max_rt: The maximum retention time of the feature within the feature map.
feature_max_mz: The maximum m/z of the feature within the feature map.
feature_num_scans: The number of scans that the feature is present in the feature map.
feature_scans: The scans where the feature is present in the feature map.

The tool will generate a gzip compressed parquet file with the extension {file_name}_ms1_feature_info.parquet.

Contributions and issues

Contributions and issues are welcome. Please, open an issue in the GitHub repository or PR in the GitHub repository.

Owner

Name: BigBio Stack
Login: bigbio
Kind: organization
Email: proteomicsstack@gmail.com
Location: Cambridge, UK

Website: http://bigbio.xyz
Repositories: 24
Profile: https://github.com/bigbio

Provide big data solutions Bioinformatics

GitHub Events

Total

Create event: 10
Commit comment event: 1
Issues event: 1
Release event: 11
Watch event: 2
Delete event: 3
Issue comment event: 39
Push event: 238
Pull request event: 42
Pull request review comment event: 117
Pull request review event: 199
Fork event: 3

Last Year

Create event: 10
Commit comment event: 1
Issues event: 1
Release event: 11
Watch event: 2
Delete event: 3
Issue comment event: 39
Push event: 238
Pull request event: 42
Pull request review comment event: 117
Pull request review event: 199
Fork event: 3

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 0
Total pull requests: 13
Average time to close issues: N/A
Average time to close pull requests: about 12 hours
Total issue authors: 0
Total pull request authors: 3
Average comments per issue: 0
Average comments per pull request: 1.23
Merged pull requests: 12
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 13
Average time to close issues: N/A
Average time to close pull requests: about 12 hours
Issue authors: 0
Pull request authors: 3
Average comments per issue: 0
Average comments per pull request: 1.23
Merged pull requests: 12
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

ypriverol (1)

Pull Request Authors

ypriverol (37)
daichengxin (12)
fabianegli (1)

Top Labels

Issue Labels

Pull Request Labels

enhancement (2) Review effort [1-5]: 3 (2) configuration changes (1) Review effort [1-5]: 1 (1)

Packages

Total packages: 1
Total downloads:
- pypi 88 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 23
Total maintainers: 1

pypi.org: quantms-utils

Python scripts and helpers for the quantMS workflow

Documentation: https://quantms-utils.readthedocs.io/
License: MIT
Latest release: 0.0.23
published 9 months ago

Versions: 23
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 88 Last month

Rankings

Dependent packages count: 10.9%

Average: 36.1%

Dependent repos count: 61.2%

Maintainers (1)

ypriverol

Last synced: 6 months ago

Dependencies

.github/workflows/python-app.yml actions

actions/checkout v4 composite
actions/setup-python v3 composite

.github/workflows/python-package.yml actions

actions/checkout v4 composite
actions/setup-python v3 composite

pyproject.toml pypi

pytest * develop
click >=7.0
pydantic >=1.10,<2
python ^3.7
sdrf_pipelines >=0.0.26

requirements.txt pypi

click *
sdrf_pipelines *

setup.py pypi

.github/workflows/python-publish.yml actions

actions/checkout v4 composite
actions/setup-python v3 composite
pypa/gh-action-pypi-publish 27b31702a0e7fc50959f5ad993c78deac1bdfc29 composite

environment.yml conda

click
ms2rescore 3.0.2.*
numpy
pandas
pip
protobuf >=3.9.2,<4
psm-utils 0.8.0.*
pydantic
pyopenms
sdrf-pipelines

https://github.com/bigbio/quantms-utils

Science Score: 39.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

quantms-utils

Available Scripts

Diann scripts

SDRF scripts

Other scripts

mzml statistics

Contributions and issues

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: quantms-utils

Rankings

Maintainers (1)

Dependencies