ms-inditification

https://github.com/2022wnbl/ms-inditification

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 4 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.4%) to scientific vocabulary

Last synced: 6 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: 2022wnbl
License: apache-2.0
Language: Python
Default Branch: main
Size: 18.6 KB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Created 8 months ago · Last pushed 7 months ago

Metadata Files

Readme Changelog Contributing License Citation Zenodo

ms2deepscore

ms2deepscore provides a Siamese neural network that is trained to predict molecular structural similarities (Tanimoto scores) from pairs of mass spectrometry spectra.

The library provides intuitive classes to prepare data, train a Siamese model, and compute similarities between pairs of spectra.

In addition to the prediction of a structural similarity, MS2DeepScore can also make use of Monte-Carlo dropout to assess the model's uncertainty.

Reference

If you use MS2DeepScore for your research, please cite the following:

"MS2DeepScore - a novel deep learning similarity measure to compare tandem mass spectra" Florian Huber, Sven van der Burg, Justin J.J. van der Hooft, Lars Ridder, 13, Article number: 84 (2021), Journal of Cheminformatics, doi: https://doi.org/10.1186/s13321-021-00558-4

If you use MS2Deepscore 2.0 or higher please also cite: Reliable cross-ion mode chemical similarity prediction between MS2 spectra Niek de Jonge, David Joas, Lem-Joe Truong, Justin J.J. van der Hooft, Florian Huber bioRxiv 2024.03.25.586580; doi: https://doi.org/10.1101/2024.03.25.586580

Setup

Requirements

Python 3.10, 3.11, 3.12 (higher will likely work but is not tested systematically).

Installation

Installation is expected to take 10-20 minutes.

Prepare environment

We recommend creating an Anaconda environment with

conda create --name ms2deepscore python=3.9 conda activate ms2deepscore pip install ms2deepscore

Or, via conda: conda create --name ms2deepscore python=3.9 conda activate ms2deepscore conda install --channel bioconda --channel conda-forge matchms pip install ms2deepscore

Alternatively, simply install in the environment of your choice by pip install ms2deepscore

Getting started: How to prepare data, train a model, and compute similarities.

We recommend to run the complete tutorial in notebooks/MS2DeepScore_tutorial.ipynb for a more extensive fully-working example on test data. The expected run time on a laptop is less than 5 minutes, including automatic model and dummy data download. Alternatively there are some example scripts below. If you are not familiar with matchms yet, then we also recommand our tutorial on how to get started using matchms.

1) Compute spectral similarities

We provide a model which was trained on > 500,000 MS/MS combined spectra from GNPS, Mona, MassBank and MSnLib. This model can be downloaded from from zenodo here. Only the ms2deepscore_model.pt is needed. The model works for spectra in both positive and negative ionization modes and even predictions across ionization modes can be made by this model.

To compute the similarities between spectra of your choice you can run the code below. There is a small example dataset available in the folder "./tests/resources/pesticidesprocessed.mgf". Alternatively you can of course use your own spectra, most common formats are supported, e.g. msp, mzml, mgf, mzxml, json, usi. ```python from ms2deepscore.models import loadmodel from matchms.Pipeline import Pipeline, createworkflow from matchms.filtering.defaultpipelines import DEFAULT_FILTERS from ms2deepscore import MS2DeepScore

modelfilename = "ms2deepscoremodel.pt" spectrumfile_name = "pesticides.mgf"

load in the ms2deepscore model

model = loadmodel(modelfile_name)

pipeline = Pipeline(createworkflow(queryfilters=DEFAULTFILTERS, scorecomputations=[[MS2DeepScore, {"model": model}]])) report = pipeline.run(spectrumfilename) similaritymatrix = pipeline.scores.toarray() ``` The resulting similarity matrix, is a numpy array containing all the MS2DeepScore predicitons between all spectra.

2 Create embeddings

To calculate chemical similarity scores MS2DeepScore first calculates an embedding (vector) representing each spectrum. This intermediate product can also be used to visualize spectra in "chemical space" by using a dimensionality reduction technique, like UMAP.

```python cleanedspectra = pipeline.spectraqueries

ms2dsmodel = MS2DeepScore(model) ms2dsembeddings = ms2dsmodel.getembeddingarray(cleanedspectra) ``` The tutorial shows how to use these embeddings to create an interactive UMAP with overlaying smiles.

3) Train your own MS2DeepScore model

Training your own model is only recommended if you have some familiarity with machine learning. You can train a new model on a dataset of your choice. That, however, should preferentially contain a substantial amount of spectra to learn relevant features, say > 100,000 spectra of sufficiently diverse types. Alternatively you can add your in house spectra to an already available public library, for instance the data used for training the default MS2DeepScore model. To train your own model you can run the code below. Please first ensure cleaning your spectra. We recommend using the cleaning pipeline in matchms.

```python from ms2deepscore import SettingsMS2Deepscore from ms2deepscore.wrapperfunctions.trainingwrapperfunctions import trainms2deepscore_wrapper

spectrumfile = "./combinedlibraries.mgf"

The settins below use default training settings and use precursor mz and ionmode as additional metadata input.

Have a look in the SettingsMS2Deepscore class to check other hyperparameters.

settings = SettingsMS2Deepscore( additionalmetadata=[("CategoricalToBinary", {"metadatafield": "ionmode", "entriesbecomingone": "positive", "entriesbecomingzero": "negative"}), ("StandardScaler", {"metadatafield": "precursormz", "mean": 0, "standard_deviation": 1000})],)

trainms2deepscorewrapper(spectrumfile, settings, validationsplit_fraction=20) ```

Contributing

We welcome contributions to the development of ms2deepscore! Have a look at the contribution guidelines.

Owner

Login: 2022wnbl
Kind: user

Repositories: 1
Profile: https://github.com/2022wnbl

Citation (CITATION.cff)

# YAML 1.2
---
abstract: "Deep learning similarity measure for comparing MS/MS spectra with respect to their chemical similarity."
authors:
  -
    affiliation: "University of Applied Sciences Düsseldorf"
    family-names: Huber
    given-names: Florian
    orcid: https://orcid.org/0000-0002-3535-9406
  -
    affiliation: "Netherlands eScience Center"
    family-names: "van der Burg"
    given-names: Sven
    orcid: https://orcid.org/0000-0003-1250-6968
  -
    affiliation: "Wageningen University and Research"
    family-names: Hooft
    name-particle: "van der"
    given-names: "Justin J. J."
    orcid: https://orcid.org/0000-0002-9340-5511
  -
    affiliation: "Netherlands eScience Center"
    family-names: Ridder
    given-names: Lars
    orcid: https://orcid.org/0000-0002-7635-9533
  -
    affiliation: "University of Applied Sciences Düsseldorf"
    family-names: Joas
    given-names: David
    orcid: https://orcid.org/0000-0001-9567-2157
  -
    affiliation: "Netherlands eScience Center"
    family-names: de Jonge
    given-names: Niek
    orcid: "https://orcid.org/0000-0002-3054-6210"

cff-version: 1.2.0
license: "Apache-2.0"
message: "If you use this software, please cite it using these metadata."
repository-code: "https://github.com/matchms/ms2deepscore"
title: MS2DeepScore

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science