https://github.com/bojarlab/fragmentfactory

Devising human interpretable diagnostic glycan fragments

https://github.com/bojarlab/fragmentfactory

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.9%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Devising human interpretable diagnostic glycan fragments

Basic Info
  • Host: GitHub
  • Owner: BojarLab
  • Language: Python
  • Default Branch: main
  • Size: 30.3 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 1
  • Releases: 0
Created about 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme

README.md

FragmentFactory

Devising human interpretable diagnostic glycan fragments from MS/MS-spectra.

Abstract

Structural details of oligosaccharides, or glycans, often carry biological relevance, which is why they are typically elucidated using tandem mass spectrometry. Common approaches to distinguish isomers rely on diagnostic glycan fragments for annotating topologies or linkages. Diagnostic fragments are often only known informally among practitioners or stem from individual studies, with unclear validity or generalizability, causing annotation heterogeneity and hampering new analysts. Drawing on a curated set of 237,000 O-glycomics spectra, we here present a rule-based machine learning workflow to uncover quantifiably valid and generalizable diagnostic fragments. This results in fragmentation rules to robustly distinguish common O-glycan isomers. We envision this resource to improve glycan annotation accuracy and concomitantly make annotations more transparent and homogeneous across analysts.

Setup

Download the code from github via

bash git clone git@github.com:BojarLab/FragmentFactory.git

Then, setup an environment via

bash conda create -y -n ff python=3.9 conda activate ff mamba install -c conda-forge -c bioconda -c kalininalab datasail-lite pip install -r requirements.txt

Download the dataset from Zenodo here: https://doi.org/10.5281/zenodo.7940046

Usage

Preprocess the downloaded dataset FragmentFactory_dataset.pkl

[!NOTE]
Make sure the dataset is in the FragmentFactory folder or the path is set correctly within FF_data_preprocessing.py

bash python FF_data_preprocessing.py

Inside the FragmentFactory folder, one can run

bash python train.py <path/to/spectra_df_processed.pkl> <output-prefix> --weighting --GPID_SIM <val>

to create custom trees and a rough visualization thereof.

Owner

  • Name: BojarLab
  • Login: BojarLab
  • Kind: organization
  • Email: daniel.bojar@gu.se
  • Location: Gothenburg, Sweden

Machine Learning in Glycobiology and Systems Biology

GitHub Events

Total
  • Issues event: 1
  • Watch event: 1
  • Issue comment event: 8
  • Push event: 3
  • Fork event: 1
Last Year
  • Issues event: 1
  • Watch event: 1
  • Issue comment event: 8
  • Push event: 3
  • Fork event: 1