spec2vec_gnps_data_analysis
Analysis and benchmarking of mass spectra similarity measures using gnps data set.
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.5%) to scientific vocabulary
Repository
Analysis and benchmarking of mass spectra similarity measures using gnps data set.
Basic Info
- Host: GitHub
- Owner: iomega
- License: apache-2.0
- Language: Jupyter Notebook
- Default Branch: master
- Size: 20.8 MB
Statistics
- Stars: 24
- Watchers: 5
- Forks: 11
- Open Issues: 2
- Releases: 0
Metadata Files
README.md
spec2vecgnpsdata_analysis
Analysis and benchmarking of mass spectra similarity measures using gnps data set.
If you use spec2vec for your research, please cite the following references:
F Huber, L Ridder, S Verhoeven, JH Spaaks, F Diblen, S Rogers, JJJ van der Hooft, "Spec2Vec: Improved mass spectral similarity scoring through learning of structural relationships", bioRxiv, https://doi.org/10.1101/2020.08.11.245928
(and if you use matchms as well: F. Huber, S. Verhoeven, C. Meijer, H. Spreeuw, E. M. Villanueva Castilla, C. Geng, J.J.J. van der Hooft, S. Rogers, A. Belloum, F. Diblen, J.H. Spaaks, (2020). matchms - processing and similarity evaluation of mass spectrometry data. Journal of Open Source Software, 5(52), 2411, https://doi.org/10.21105/joss.02411 )
Thanks!
Tutorial on matchms and Spec2Vec
Possibly the easiest way to learn how to run Spec2Vec is to follow our tutorial on matchms and Spec2Vec.
- Part I - Import and process MS/MS data using matchms
- Part II - Compute spectral similarity using Spec2Vec
- Part III - Create molecular networks from Spec2Vec similarities
Create environment
Current spec2vec works with Python 3.7 or 3.8, it might also work with earlier versions but we haven't tested.
conda create --name spec2vec_analysis python=3.7 # or 3.8 if you prefer
conda activate spec2vec_analysis
conda install --channel nlesc --channel bioconda --channel conda-forge spec2vec
pip install jupyter
Clone this repository and run notebooks
git clone https://github.com/iomega/spec2vec_gnps_data_analysis
cd spec2vec_gnps_data_analysis
jupyter notebook
Download data
- Original data was obtained from GNPS: https://gnps-external.ucsd.edu/gnpslibrary/ALL_GNPS.json
- Cleaned and processed GNPS dataset for positive mode spectra (raw data accessed on 2020-05-11), can be found on zenodo: https://zenodo.org/record/3978072
Download pre-trained models
Pretrained Word2Vec models to be used with Spec2Vec can be found on zenodo. - Model trained on UniqueInchikey subset (12,797 spectra): https://zenodo.org/record/3978054 - Model trained on AllPositive set of all positive ionization mode spectra (after filtering): https://zenodo.org/record/4173596
Owner
- Name: Integrated Omics for MEtabolomics and Genomics Annotation
- Login: iomega
- Kind: organization
- Website: https://www.esciencecenter.nl/project/integrated-omics-analysis-for-small-molecule-mediated-host-microbiome-inter
- Repositories: 7
- Profile: https://github.com/iomega
Citation (CITATION.cff)
# YAML 1.2
---
abstract: "Python library for fuzzy comparison of mass spectrum data and other Python objects."
authors:
-
affiliation: "Netherlands eScience Center"
family-names: Huber
given-names: Florian
orcid: "https://orcid.org/0000-0002-3535-9406"
-
affiliation: "Netherlands eScience Center"
family-names: Ridder
given-names: Lars
orcid: "https://orcid.org/0000-0002-7635-9533"
-
affiliation: "University of Glasgow"
family-names: Rogers
given-names: Simon
orcid: "https://orcid.org/0000-0003-3578-4477"
-
affiliation: "Wageningen University and Research"
family-names: Hooft
name-particle: van der
given-names: Justin J. J.
orcid: "https://orcid.org/0000-0002-9340-5511"
cff-version: "1.1.0"
license: "Apache-2.0"
message: "If you use this software, please cite it using these metadata."
repository-code: "https://github.com/matchms/matchms"
title: Spec2Vec
...
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1
Dependencies
- gensim *
- matchms >=0.6.2
- networkx *
- numpy *
- pandas *
- spec2vec *