dbdipy

A Python library for the inspection, curation and interpretation of DBDI-MS data.

https://github.com/leopold-weidner/dbdipy

Keywords

bioinformatics datamining ionization-source mass-spectrometry matchms

Last synced: 6 months ago · JSON representation ·

Repository

A Python library for the inspection, curation and interpretation of DBDI-MS data.

Basic Info

Host: GitHub
Owner: leopold-weidner
License: mit
Language: Python
Default Branch: master
Homepage:
Size: 1.62 MB

Statistics

Stars: 4
Watchers: 2
Forks: 2
Open Issues: 0
Releases: 1

Topics

bioinformatics datamining ionization-source mass-spectrometry matchms

Created over 3 years ago · Last pushed about 1 year ago

Metadata Files

Readme Citation

DBDIpy (Version 2.0)

DBDIpy is an open-source Python library for the curation and interpretation of dielectric barrier discharge ionisation mass spectrometric datasets.

Introduction

Mass spectrometric data from direct injection analysis is hard to interpret as missing chromatographic separation complicates identification of fragments and adducts generated during the ionization process.

Here we present an in-silico approach to putatively identify multiple ion species arising from one analyte compound specially tailored for time-resolved datasets from plasma ionization techniques. These are rapidly gaining popularity in applications as breath analysis, process control or food research.

DBDIpy's core functionality relys on putative identification of in-source fragments (eg. [M-H₂O+H]⁺) and in-source generated adducts (eg. [M+nO+H]⁺). Custom adduct species can be defined by the user and passed to this open-search algorithm. The identification is performed in a three-step procedure (from V > 2.* on, in preparation): - calculation of pointwise correlation identifies features with matching temporal intensity profiles through the experiment. - (exact) mass differences are used to refine the nature of potential candidates. - calculation of MS2 spectral similarity score by ...

DBDIpy further comes along with functions optimized for preprocessing of experimental data and visualization of identified adducts. The library is integrated into the matchms ecosystem to assimilate DBDIpy's functionalities into existing workflows.

For details, we invite you to read the tutorial or to try out the functions with our demonstrational dataset or your own data!

| | Badges | |:------------- |:-----------------------------------------------------------------------------------| | License | | | Version | | | Downloads | | | Status | | | Updated | latest commit | | Language | | | Version | | | Operating Systems | | | Documentation | | | Supporting Data | | | Articel (open access) | |

Latest Changes (since V 1.2.*)

major implementation for V2: modification of the former two-step search algorithm towards refinement by MS2 spectral similarity scoring.
plotadducts() is now called plotadducts_XIC()
addition of utility functions, e.g. calculation of consensus spectra.
improved spectral alignment.
introduction of an open search option to propose potential adducts / in-source fragments: propose_adducts()

Currently under development:

advanced plotting options
runtime optimization.

User guide

Installation

Prerequisites:

Anaconda (recommended)
Python 3.7, 3.8, 3.9 or 3.10

DBDIpy can be installed from PyPI with:

```python

we recommend installing DBDIpy in a new virtual environment

conda create --name DBDIpy python=3.9 conda activate DBDIpy pip install DBDIpy ```

Known installation issues: Apple M1 chip users might encounter issues with automatic installation of matchms. Manual installation of the dependency as described on the libraries official site helps solving the issue.

Tutorial

The following tutorial showcases an ordinary data analysis workflow by going through all functions of DBDIpy from loading data until visualization of correlation results. Therefore, we supplied a demo dataset which is publicly available here.

The demo data is from an experiments where wheat bread was roasted for 20 min and monitored by DBDI coupled to FT-ICR-MS. It consists of 500 randomly selected features.

bitmap

Fig.1 - Schematic DBDIpy workflow for in-source adduct and fragment detection: imported MS1 data are aligned, imputed and parsed to combined correlation and mass difference analysis.

1. Importing MS data

DBDIpy core functions utilize 2D tabular data. Raw mass spectra containing m/z-intensity-pairs first will need to be aligned to a DataFrame of features. We build features by using the align_spectra() function. align_spectra() is the interface to load data from open file formats such as .mgf, .mzML or .mzXML files via matchms.importing.

If your data already is formatted accordingly, you can skip this step.

```python

loading libraries for the tutorial

import os import feather import numpy as np import pandas as pd import DBDIpy as dbdi from matchms.importing import loadfrommgf from matchms.exporting import saveasmgf

importing the downloaded .mgf files from demo data by matchms

demopath = "" #enter path to demo dataset demomgf = os.path.join(demopath, "exampledataset.mgf") spectrums = list(loadfrommgf(demo_mgf))

align the listed Spectra

specsaligned = dbdi.alignspectra(spec = spectrums, ppmwindow = 2) `We first imported the demo MS1 data into a list ofmatchms.Spectraobjects. At this place you can run your personalmatchmspreprocessing pipelines or manually apply filters like noise reduction. By aplication ofalignspectra(), we transformed the list of spectra objects to a two-dimensionalpandas.DataFrame. Now you have a column for each mass spectrometric scan and features are aligned to rows. The first column shows the mean *m/z* of a feature. If a signal was not detected in a scan, the according field will be set to an instance ofnp.nan``.

Remember to set the ppm_window parameter according to the resolution of you mass spectrometric system.

We now can inspect the aligned data, e.g. by running:

python specs_aligned.describe() specs_aligned.info()

Several metabolomics data processing steps can be applied here if not already performed in matchms. These might include application of noise-cutoffs, feature selection based on missing values, normalization or many others.

specs_aligned.isnull().values.any() will give us an idea if there are missing values in the data. These cannot be handled by successive DBDIpy functions and most machine learning algorithms, so we need to impute them.

2. Imputation of missing values

impute_intensities() will assure that after imputation we will have a set of uniform length extracted ion chromatograms (XIC) in our DataFrame. This is an important prerequisite for pointwise correlation calculation and for many tools handling time series data.

Missing values in our feature table will be imputed by a two-stage imputation algorithm. - First, missing values within the detected signal region are interpolated in between. - Second, a noisy baseline is generated for all XIC to be of uniform length which the length of the longest XIC in the dataset.

The function lets the user decide which imputation method to use. Default mode is linear, however several others are available.

```python featuremz = specsaligned["mean"] specsaligned = specsaligned.drop("mean", axis = 1)

impute the dataset

specsimputed = dbdi.imputeintensities(df = specs_aligned, method = "linear") ```

Now specs_imputed does not contain any missing values anymore and is ready for adduct and in-source fragment detection.

```python

check if NaN are present in DataFrame

specs_imputed.isnull().values.any() Out[]: False ```

3. Detection of adducts and in-source fragments: MS1 data only

Based on the specs_imputed, we compute pointwise correlation of XIC traces to identify in-source adducts or in-source fragments generated during the plasma ionization process. The identification is performed in a two-step procedure: - First, calculation of pointwise intensity correlation identifies feature groups with matching temporal intensity profiles through the experiment. - Second, (exact) mass differences are used to refine the nature of potential candidates.

By default, identify_adducts() searches for [M-H₂O+H]⁺, [M+1O+H]⁺ and [M+2O+H]⁺. For demonstrational purposes we also want to search for [M+3O+H]⁺ in this example. Note that identify_adducts() has a variety of other parameters which allow high user customization. See the help file of the functions for details.

```python

prepare a DataFrame to search for O3-adducts

adduct_rule = pd.DataFrame({'deltamz': [47.984744],'motive': ["O3"]})

identify in-source fragments and adducts

searchres = dbdi.identifyadducts(df = specsimputed, masses = featuremz, customadducts = adductrule, method = "pearson", threshold = 0.9, mass_error = 2) ```

The function will return a dictionary holding one DataFrame for each adduct type that was defined. A typical output looks like the following:

```python

output search results

searchres Out[24]: {'O': basemz baseindex matchmz matchindex mzdiff corr 19 215.11789 24 231.11280 ID40 15.99491 0.963228 310 224.10699 33 240.10191 ID51 15.99492 0.939139 605 231.11280 39 215.11789 ID25 15.99491 0.963228 1413 240.10191 50 224.10699 ID34 15.99492 0.939139 1668 244.13321 55 260.12812 ID67 15.99491 0.976541, ... 'O2': basemz baseindex matchmz matchindex mzdiff corr 1437 240.10191 50 272.09174 ID77 31.98983 0.988866 1677 244.13321 55 276.12304 ID84 31.98983 0.972251 2362 260.12812 66 292.11795 ID100 31.98983 0.964096 3024 272.09174 76 240.10191 ID51 31.98983 0.988866 3354 276.12304 83 244.13321 ID56 31.98983 0.972251, ... 'H2O': basemz baseindex matchmz matchindex mzdiff corr 621 231.11280 39 249.12337 ID60 18.01057 0.933640 1883 249.12337 59 231.11280 ID40 18.01057 0.933640 3263 275.13902 82 293.14958 ID102 18.01056 0.948774 4775 293.14958 101 275.13902 ID83 18.01056 0.948774 5573 300.08665 112 318.09722 ID140 18.01057 0.905907 ... 'O3': basemz baseindex matchmz matchindex mzdiff corr 320 224.10699 33 272.09174 ID77 47.98475 0.924362 1688 244.13321 55 292.11795 ID100 47.98474 0.964896 3013 272.09174 76 224.10699 ID34 47.98475 0.924362 4631 292.11795 99 244.13321 ID56 47.98474 0.964896 13597 438.28502 308 486.26976 ID356 47.98474 0.935359 ... ``Thebasemzandbaseindexcolumn give us the index of the features which correlates with a correlation partner specified inmatchmzandmatch_index``. The mass difference between both is given for validation purpose and the correlation coefficient between both features is listed.

Now we can for example search series of Oxygen adducts of a single analyte:

```python

search for oxygenation series

twoadducts = np.intersect1d(searchres["O"]["baseindex"], np.intersect1d(searchres["O"]["baseindex"],searchres["O2"]["baseindex"])) threeadducts = np.intersect1d(twoadducts , searchres["O3"]["base_index"])

three_adducts Out[33]: array([55, 99], dtype=int64) ```

This tells us that features 55 and 99 both putatively have [M+1-3O+H]⁺ adduct ions with correlations of r > 0.9 in our dataset. Let's visualize this finding!

4. Detection of adducts and in-source fragments: refined scoring by MS2 similarity matching

...

5. Visualization of correlation results

Now that we putatively identified some related ions of a single analyte, we want to check their temporal response during the baking experiment. Therefore, we can use the plot_adducts() function to conveniently draw XICs. The demo dataset even comes along with some annotated metadata for our features, so we can decorate the plot and check our previous results!

```python

load annotation metadta

demopath = "" #enter path to demo dataset demometa = os.path.join(demopath, "examplemetadata.feather") annotationmetadata = feather.readdataframe(demo_meta)

plot the XIC

dbdi.plotadducts(IDs = [55,66,83,99], df = specsimputed, metadata = annotation_metadata, transform = True) ```

Fig.2 - XIC plots for features 55, 66, 83 and 99 which have highly correlated intensity profile through the baking experiment.

We see that the XIC traces show a similar intensity profile through the experiment. The plot further tells us the correlation coefficients of the identified adducts. From the metadata we can see that the detected mass signals were previously annotated as C₁₅H₁₇O_2-5N which tells us that we most probably found an Oxgen-adduct series.

If MS2 data was recorded during the experiment we now can go on further and compare fragment spectra to reassure the identifications. You might find ms2deepscore to be a usefull library to do so in an automated way.

6. Exporting tabular MS data to match.Spectra objects

If you want to export your (imputed) tabular data to matchms.Spectra objects, you can do so by calling the export_to_spectra() function. We just need to re-add a column containing m/z values of the features. This gives you access to the matchms suite and enables you to safe your mass spectrometric data to open file formats. Hint: you can manually add some metadata after construction of the list of spectra.

```python

export tabular MS data back to list of spectrums.

specsimputed["mean"] = featuremz

speclist = dbdi.exporttospectra(df = specs_imputed, mzcol = 88)

write processed data to .mgf file

saveasmgf(speclist, "DBDIpyprocessedspectra.mgf") ```

We hope you liked this quick introduction into DBDIpy and will find its functions helpful and inspiring on your way to work through data from direct infusion mass spectrometry. Of course, the functions are applicable to all sort of ionisation mechanisms and you can modify the set of adducts to search in accordance to your source.

If you have open questions left about functions, their parameter or the algorithms we invite you to read through the built-in help files. If this does not clarify the issues, please do not hesitate to get in touch with us!

Contact

leopold.weidner@tum.de

Acknowledgements

We thank Erwin Kupczyk and Nicolas Schmidt for testing the software and their feedback during development.

Owner

Name: Leopold Weidner
Login: leopold-weidner
Kind: user
Location: Munich

Repositories: 2
Profile: https://github.com/leopold-weidner

Dr. rer nat. - Analytical Food Chemistry

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Weidner"
  given-names: "Leopold"
  orcid: "https://orcid.org/0000-0002-6801-3647"
title: "DBDIpy - A Python library for the inspection, curation and interpretation of DBDI-MS data."
version: 0.8.4
doi: 10.5281/zenodo.7221089
date-released: 2022-10-27
url: "https://github.com/leopold-weidner/DBDIpy"

GitHub Events

Total

Last Year

Committers

Last synced: almost 3 years ago

All Time

Total Commits: 154
Total Committers: 3
Avg Commits per committer: 51.333
Development Distribution Score (DDS): 0.019

Top Committers

Name	Email	Commits
Leopold Weidner	8**r@u**m	151
Leopold Weidner	lw@M****l	2
Leopold Weidner	l**r@t**e	1

Committer Domains (Top 20 + Academic)

tum.de: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 0
Total pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: 8 minutes
Total issue authors: 0
Total pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 1.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Pull Request Authors

synth-schnitte0o (1)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 21 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 20
Total maintainers: 1

pypi.org: dbdipy

A python package for the curation and interpretation of datasets from plasma ionisation mass spectrometric.

Homepage: https://github.com/leopold-weidner/DBDIpy
Documentation: https://dbdipy.readthedocs.io/
License: docs/license.txt
Latest release: 1.2.2
published over 2 years ago

Versions: 20
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 21 Last month

Rankings

Dependent packages count: 6.6%

Downloads: 13.7%

Average: 21.9%

Stargazers count: 28.2%

Forks count: 30.5%

Dependent repos count: 30.6%

Maintainers (1)

leopold.weidner

Last synced: 6 months ago

dbdipy

Science Score: 77.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

DBDIpy (Version 2.0)

Introduction

Latest Changes (since V 1.2.*)

Currently under development:

User guide

Installation

we recommend installing DBDIpy in a new virtual environment

Tutorial

1. Importing MS data

loading libraries for the tutorial

importing the downloaded .mgf files from demo data by matchms

align the listed Spectra

2. Imputation of missing values

impute the dataset

check if NaN are present in DataFrame

3. Detection of adducts and in-source fragments: MS1 data only

prepare a DataFrame to search for O3-adducts

identify in-source fragments and adducts

output search results

search for oxygenation series

4. Detection of adducts and in-source fragments: refined scoring by MS2 similarity matching

5. Visualization of correlation results

load annotation metadta

plot the XIC

6. Exporting tabular MS data to match.Spectra objects

export tabular MS data back to list of spectrums.

write processed data to .mgf file

Contact

Acknowledgements

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: dbdipy

Rankings

Maintainers (1)

Dependencies