spec2vec_gnps_data_analysis

Analysis and benchmarking of mass spectra similarity measures using gnps data set.

https://github.com/iomega/spec2vec_gnps_data_analysis

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.5%) to scientific vocabulary
Last synced: 7 months ago · JSON representation ·

Repository

Analysis and benchmarking of mass spectra similarity measures using gnps data set.

Basic Info
  • Host: GitHub
  • Owner: iomega
  • License: apache-2.0
  • Language: Jupyter Notebook
  • Default Branch: master
  • Size: 20.8 MB
Statistics
  • Stars: 24
  • Watchers: 5
  • Forks: 11
  • Open Issues: 2
  • Releases: 0
Created almost 6 years ago · Last pushed over 4 years ago
Metadata Files
Readme License Citation

README.md

GitHub GitHub Workflow Status

spec2vecgnpsdata_analysis

Analysis and benchmarking of mass spectra similarity measures using gnps data set.

If you use spec2vec for your research, please cite the following references:

F Huber, L Ridder, S Verhoeven, JH Spaaks, F Diblen, S Rogers, JJJ van der Hooft, "Spec2Vec: Improved mass spectral similarity scoring through learning of structural relationships", bioRxiv, https://doi.org/10.1101/2020.08.11.245928

(and if you use matchms as well: F. Huber, S. Verhoeven, C. Meijer, H. Spreeuw, E. M. Villanueva Castilla, C. Geng, J.J.J. van der Hooft, S. Rogers, A. Belloum, F. Diblen, J.H. Spaaks, (2020). matchms - processing and similarity evaluation of mass spectrometry data. Journal of Open Source Software, 5(52), 2411, https://doi.org/10.21105/joss.02411 )

Thanks!

Tutorial on matchms and Spec2Vec

Possibly the easiest way to learn how to run Spec2Vec is to follow our tutorial on matchms and Spec2Vec.

Create environment

Current spec2vec works with Python 3.7 or 3.8, it might also work with earlier versions but we haven't tested. conda create --name spec2vec_analysis python=3.7 # or 3.8 if you prefer conda activate spec2vec_analysis conda install --channel nlesc --channel bioconda --channel conda-forge spec2vec pip install jupyter

Clone this repository and run notebooks

git clone https://github.com/iomega/spec2vec_gnps_data_analysis cd spec2vec_gnps_data_analysis jupyter notebook

Download data

  • Original data was obtained from GNPS: https://gnps-external.ucsd.edu/gnpslibrary/ALL_GNPS.json
  • Cleaned and processed GNPS dataset for positive mode spectra (raw data accessed on 2020-05-11), can be found on zenodo: https://zenodo.org/record/3978072

Download pre-trained models

Pretrained Word2Vec models to be used with Spec2Vec can be found on zenodo. - Model trained on UniqueInchikey subset (12,797 spectra): https://zenodo.org/record/3978054 - Model trained on AllPositive set of all positive ionization mode spectra (after filtering): https://zenodo.org/record/4173596

Owner

  • Name: Integrated Omics for MEtabolomics and Genomics Annotation
  • Login: iomega
  • Kind: organization

Citation (CITATION.cff)

# YAML 1.2
---
abstract: "Python library for fuzzy comparison of mass spectrum data and other Python objects."
authors:
  -
    affiliation: "Netherlands eScience Center"
    family-names: Huber
    given-names: Florian
    orcid: "https://orcid.org/0000-0002-3535-9406"
  -
    affiliation: "Netherlands eScience Center"
    family-names: Ridder
    given-names: Lars
    orcid: "https://orcid.org/0000-0002-7635-9533"
  -
    affiliation: "University of Glasgow"
    family-names: Rogers
    given-names: Simon
    orcid: "https://orcid.org/0000-0003-3578-4477"
  -
    affiliation: "Wageningen University and Research"
    family-names: Hooft
    name-particle: van der
    given-names: Justin J. J.
    orcid: "https://orcid.org/0000-0002-9340-5511"

cff-version: "1.1.0"
license: "Apache-2.0"
message: "If you use this software, please cite it using these metadata."
repository-code: "https://github.com/matchms/matchms"
title: Spec2Vec
...

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1

Dependencies

setup.py pypi
  • gensim *
  • matchms >=0.6.2
  • networkx *
  • numpy *
  • pandas *
  • spec2vec *