glycowork

Package for processing and analyzing glycans and their role in biology.

https://github.com/bojarlab/glycowork

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: pubmed.ncbi, ncbi.nlm.nih.gov, wiley.com, zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (18.5%) to scientific vocabulary

Keywords

bioinformatics computational-biology data-science glycans glycobiology glycomics machine-learning molecular-biology open-source python

Scientific Fields

Economics Social Sciences - 85% confidence
Sociology Social Sciences - 64% confidence
Engineering Computer Science - 60% confidence
Last synced: 6 months ago · JSON representation ·

Repository

Package for processing and analyzing glycans and their role in biology.

Basic Info
Statistics
  • Stars: 72
  • Watchers: 5
  • Forks: 16
  • Open Issues: 1
  • Releases: 21
Topics
bioinformatics computational-biology data-science glycans glycobiology glycomics machine-learning molecular-biology open-source python
Created about 5 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog Contributing License Citation

README.md

glycowork

CI PyPI -
Downloadscontributions
welcomeDOIcodecov

glycowork logo

Glycans are fundamental biological sequences that are as crucial as DNA, RNA, and proteins. As complex carbohydrates forming branched structures, glycans are ubiquitous yet often overlooked in biological research.

Why Glycans are Important

  • Ubiquitous in biology
  • Integral to protein and lipid function
  • Relevant to human diseases

Challenges in Glycan Analysis

Analyzing glycans is complicated due to their non-linear structures and enormous diversity. But that’s where glycowork comes in.

Introducing glycowork: Your Solution for Glycan-Focused Data Science

Glycowork is a Python package specifically designed to simplify glycan sequence processing and analysis. It offers:

  • Functions for glycan analysis
  • Datasets for model training
  • Full support for IUPAC-condensed string representation. Broad support for IUPAC-extended, LinearCode, Oxford, GlycoCT, WURCS, GLYCAM, CSDB-linear, GlycoWorkBench, GlyTouCan IDs, and more.
  • Powerful graph-based architecture for in-depth analysis

Documentation: https://bojarlab.github.io/glycowork/

Contribute: Interested in contributing? Read our Contribution Guidelines

Citation: If glycowork adds value to your project, please cite Thomes et al., 2021

Install

Not familiar with Python? Try our no-code, graphical user interface (glycoworkGUI.exe, can be downloaded at the bottom of the latest Release page) for accessing some of the most useful glycowork functions!

via pip:
pip install glycowork
import glycowork

alternative:
pip install git+https://github.com/BojarLab/glycowork.git
import glycowork

Note that we have optional extra installs for specialized use (even further instructions can be found in the Examples tab; on Mac you might need to use "glycowork[ml]"), such as:
deep learning
pip install glycowork[ml]
analyzing atomic/chemical properties of glycans
pip install glycowork[chem]
everything
pip install glycowork[all]

Data & Models

Glycowork currently contains the following main datasets that are freely available to everyone:

  • df_glycan
    • contains ~50,500 unique glycan sequences, including labels such as ~39,500 species associations, ~20,000 tissue associations, and ~1,000 disease associations
  • glycan_binding
    • contains >790,000 protein-glycan binding interactions, from >2,000 unique glycan-binding proteins

Additionally, we store these trained deep learning models for easy usage, which can be retrieved with the prep_model function:

  • LectinOracle
    • can be used to predict glycan-binding specificity of a protein, given its ESMC representation; from Lundstrom et al., 2021
  • LectinOracle_flex
    • operates the same as LectinOracle but can directly use the raw protein sequence as input (no ESMC representation required)
  • SweetNet
    • a graph convolutional neural network trained to predict species from glycan, can be used to generate learned glycan representations; from Burkholz et al., 2021
  • NSequonPred
    • given the ESM-1b representation of an N-sequon (+/- 20 AA), this model can predict whether the sequon will be glycosylated

How to use

Glycowork currently contains four main modules:

  • glycan_data
    • stores several glycan datasets and contains helper functions
  • ml
    • here are all the functions for training and using machine learning models, including train-test-split, getting glycan representations, etc.
  • motif
    • contains functions for processing & drawing glycan sequences, identifying motifs and features, and analyzing them
  • network
    • contains functions for constructing and analyzing glycan networks (e.g., biosynthetic networks)

Below are some examples of what you can do with glycowork; be sure to check out the other examples in the full documentation for everything that’s there. –> Learn more A non-exhaustive list includes:

``` python

drawing publication-quality glycan figures

from glycowork import GlycoDraw GlycoDraw("Neu5Ac(a2-3)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-2)Man(a1-3)[Neu5Gc(a2-6)Gal(b1-4)GlcNAc(b1-2)Man(a1-6)][GlcNAc(b1-4)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc", highlight_motif = "Neu5Ac(a2-3)Gal(b1-4)[Fuc(a1-3)]GlcNAc") ```

``` python

get motifs, graph features, and sequence features of a set of glycan sequences to train models or analyze glycan properties

glycans = ["Neu5Ac(a2-3)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-2)Man(a1-3)[Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc", "Ma3(Ma6)Mb4GNb4GN;N", "α-D-Manp-(1→3)[α-D-Manp-(1→6)]-β-D-Manp-(1→4)-β-D-GlcpNAc-(1→4)-β-D-GlcpNAc-(1→", "F(3)XA2", "WURCS=2.0/5,11,10/[a2122h-1b1-52*NCC/3=O][a1122h-1b1-5][a1122h-1a1-5][a2112h-1b1-5][a1221m-1a1-5]/1-1-2-3-1-4-3-1-4-5-5/a4-b1a6-k1b4-c1c3-d1c6-g1d2-e1e4-f1g2-h1h4-i1i2-j1", """RES 1b:b-dglc-HEX-1:5 2s:n-acetyl 3b:b-dglc-HEX-1:5 4s:n-acetyl 5b:b-dman-HEX-1:5 6b:a-dman-HEX-1:5 7b:b-dglc-HEX-1:5 8s:n-acetyl 9b:b-dgal-HEX-1:5 10s:sulfate 11s:n-acetyl 12b:a-dman-HEX-1:5 13b:b-dglc-HEX-1:5 14s:n-acetyl 15b:b-dgal-HEX-1:5 16s:n-acetyl LIN 1:1d(2+1)2n 2:1o(4+1)3d 3:3d(2+1)4n 4:3o(4+1)5d 5:5o(3+1)6d 6:6o(2+1)7d 7:7d(2+1)8n 8:7o(4+1)9d 9:9o(-1+1)10n 10:9d(2+1)11n 11:5o(6+1)12d 12:12o(2+1)13d 13:13d(2+1)14n 14:13o(4+1)15d 15:15d(2+1)16n"""] from glycowork.motif.annotate import annotatedataset out = annotatedataset(glycans, featureset = ['known', 'terminal', 'exhaustive'], condense=True) ```

| | InternalLewisX | InternalLewisA | Hantigentype2 | Chitobiose | Trimannosylcore | TerminalLacNActype1 | InternalLacNActype2 | TerminalLacNActype2 | TerminalLacdiNActype2 | corefucose | corefucose(a1-3) | Fuc | Gal | GalNAc | GalNAcOS | GlcNAc | Man | Neu5Ac | Xyl | Fuc(a1-2)Gal | Fuc(a1-3)GlcNAc | Fuc(a1-4)GlcNAc | Fuc(a1-6)GlcNAc | Fuc(a1-?)GlcNAc | Gal(b1-3)GlcNAc | Gal(b1-4)GlcNAc | Gal(b1-?)GlcNAc | GalNAc(b1-4)GlcNAc | GalNAcOS(b1-4)GlcNAc | GlcNAc(b1-2)Man | GlcNAc(b1-4)GlcNAc | Man(a1-3)Man | Man(a1-6)Man | Man(a1-?)Man | Man(b1-4)GlcNAc | Neu5Ac(a2-3)Gal | Xyl(b1-2)Man | TerminalFuc(a1-3) | TerminalMan(a1-3) | TerminalMan(a1-6) | TerminalGalNAcOS(b1-4) | TerminalFuc(a1-4) | TerminalGal(b1-4) | TerminalGalNAc(b1-4) | TerminalXyl(b1-2) | TerminalGal(b1-3) | TerminalGlcNAc(b1-2) | TerminalFuc(a1-6) | TerminalNeu5Ac(a2-3) | TerminalFuc(a1-2) | TerminalFuc(a1-?) | TerminalMan(a1-?) | TerminalGal(b1-?) | |----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----| | Neu5Ac(a2-3)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-2)Man(a1-3)[Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 3 | 2 | 0 | 0 | 4 | 3 | 1 | 0 | 0 | 1 | 1 | 1 | 3 | 1 | 1 | 2 | 0 | 0 | 2 | 1 | 1 | 1 | 2 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 3 | 0 | 1 | | Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 2 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | | Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 2 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | | GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 4 | 3 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 1 | 1 | 2 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 2 | 0 | 0 | 0 | 1 | 0 | 0 | | Fuc(a1-2)Gal(b1-4)GlcNAc(b1-2)Man(a1-6)[Gal(b1-4)GlcNAc(b1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 2 | 2 | 0 | 0 | 4 | 3 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 2 | 2 | 0 | 0 | 2 | 1 | 1 | 1 | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 2 | 0 | 1 | | GalNAcOS(b1-4)GlcNAc(b1-2)Man(a1-3)[GalNAc(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 4 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 2 | 1 | 1 | 1 | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

``` python

using graphs, you can easily check whether a glycan contains a specific motif; how about internal Lewis A/X motifs?

from glycowork.motif.graph import subgraphisomorphism print(subgraphisomorphism('Neu5Ac(a2-3)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-6)[Gal(b1-3)]GalNAc', 'Fuc(a1-?)[Gal(b1-?)]GlcNAc', terminilist = ['terminal', 'internal', 'flexible'])) print(subgraphisomorphism('Neu5Ac(a2-3)Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-6)[Gal(b1-3)]GalNAc', 'Fuc(a1-3/4)[Gal(b1-3/4)]GlcNAc', terminilist = ['t', 'i', 'f'])) print(subgraphisomorphism('Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-6)[Gal(b1-3)]GalNAc', 'dHex(a1-?)[Hex(b1-?)]GlcNAc', termini_list = ['t', 'i', 'f']))

or you could find the terminal epitopes of a glycan

from glycowork.motif.annotate import getterminalstructures print("\nTerminal structures:") print(getterminalstructures('Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc')) ```

True
True
False

Terminal structures:
['Man(a1-3)', 'Man(a1-6)', 'Fuc(a1-6)']

``` python

given a composition, find matching glycan structures in SugarBase; specific for glycan classes and taxonomy

from glycowork.motif.tokenization import compositionstostructures print(compositionstostructures([{'Hex':3, 'HexNAc':4}], glycan_class = 'N'))

or we could calculate the mass of this composition

from glycowork.motif.tokenization import compositiontomass print("\nMass of the composition Hex3HexNAc4") print(compositiontomass({'Hex':3, 'HexNAc':4})) print(compositiontomass("H3N4")) print(compositiontomass("Hex3HexNAc4")) ```

0 compositions could not be matched. Run with verbose = True to see which compositions.
                                               glycan  abundance
0   GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)]Ma...          0
1   GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-4)][Man(a1-6)]...          0
2   GlcNAc(b1-2)[GlcNAc(b1-4)]Man(a1-3)[Man(a1-6)]...          0
3   GalNAc(b1-4)GlcNAc(b1-2)Man(a1-3)[Man(a1-6)]Ma...          0
4   GalNAc(b1-3/4)GlcNAc(b1-2)Man(a1-3)[Man(a1-6)]...          0
5   GlcNAc(b1-2)Man(a1-6)[Man(a1-3)][GlcNAc(b1-4)]...          0
6   GlcNAc(b1-2)Man(a1-3/6)[GlcNAc(b1-4)][Man(a1-3...          0
7   Man(a1-3)[GlcNAc(b1-2)Man(a1-6)][GlcNAc(b1-4)]...          0
8   GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-6)Man(a1-6)]Ma...          0
9   GlcNAc(b1-4)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)]Ma...          0
10  GlcNAc(b1-4)Man(a1-3)[GlcNAc(b1-4)Man(a1-6)]Ma...          0
11  GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)[GlcNAc(b1-4...          0
12  GlcNAc(b1-4)Man(a1-3)[GlcNAc(b1-6)Man(a1-6)]Ma...          0
13  Man(a1-3)[GlcNAc(b1-2)[GlcNAc(b1-6)]Man(a1-6)]...          0
14  GalNAc(b1-4)GlcNAc(b1-2)Man(a1-6)[Man(a1-3)]Ma...          0

Mass of the composition Hex3HexNAc4
1316.4865545999999
1316.4865545999999
1316.4865545999999

Owner

  • Name: BojarLab
  • Login: BojarLab
  • Kind: organization
  • Email: daniel.bojar@gu.se
  • Location: Gothenburg, Sweden

Machine Learning in Glycobiology and Systems Biology

Citation (CITATION.bib)

@article{thomes2021glycowork,
  title={Glycowork: A Python package for glycan data science and machine learning},
  author={Thom{\`e}s, Luc and Burkholz, Rebekka and Bojar, Daniel},
  journal={Glycobiology},
  volume={31},
  number={10},
  pages={1240--1244},
  year={2021},
  month={11},
  publisher={Oxford University Press},
  doi={10.1093/glycob/cwab067},
  pmid={34192308},
  pmcid={PMC8600276}
}

GitHub Events

Total
  • Create event: 11
  • Release event: 4
  • Issues event: 8
  • Watch event: 15
  • Delete event: 5
  • Issue comment event: 48
  • Push event: 190
  • Pull request review event: 4
  • Pull request event: 46
  • Fork event: 3
Last Year
  • Create event: 11
  • Release event: 4
  • Issues event: 8
  • Watch event: 15
  • Delete event: 5
  • Issue comment event: 48
  • Push event: 190
  • Pull request review event: 4
  • Pull request event: 46
  • Fork event: 3

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 461
  • Total Committers: 6
  • Avg Commits per committer: 76.833
  • Development Distribution Score (DDS): 0.148
Past Year
  • Commits: 199
  • Committers: 3
  • Avg Commits per committer: 66.333
  • Development Distribution Score (DDS): 0.226
Top Committers
Name Email Commits
Bribak d****l@b****t 393
lundstrm j****m@g****m 48
lthomes l****s@e****r 10
Kathryn k****h@r****m 8
Rebekka Burkholz r****z@g****m 1
viktoriakarlsson 9****n 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 32
  • Total pull requests: 69
  • Average time to close issues: 8 days
  • Average time to close pull requests: 1 day
  • Total issue authors: 17
  • Total pull request authors: 12
  • Average comments per issue: 3.19
  • Average comments per pull request: 0.83
  • Merged pull requests: 51
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 8
  • Pull requests: 36
  • Average time to close issues: 4 days
  • Average time to close pull requests: about 4 hours
  • Issue authors: 6
  • Pull request authors: 6
  • Average comments per issue: 2.5
  • Average comments per pull request: 1.06
  • Merged pull requests: 28
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • mattias-erhardsson (10)
  • TheLostLambda (3)
  • fubin1999 (2)
  • Old-Shatterhand (2)
  • dtchang (2)
  • edwardsnj (2)
  • Qinulinuli (2)
  • peterthorpe5 (1)
  • JacobPFrick (1)
  • benwest-york (1)
  • klarich (1)
  • Ojas-Singh (1)
  • ilsenatorov (1)
  • tomdstanton (1)
  • mobiusklein (1)
Pull Request Authors
  • Bribak (29)
  • Glycocalex (15)
  • Old-Shatterhand (12)
  • urbj (10)
  • Alex-RW-Bennett (5)
  • lthomes (4)
  • mattias-erhardsson (4)
  • viktoriakarlsson (2)
  • TheLostLambda (2)
  • cthoyt (1)
  • DelusionalSimon (1)
  • klarich (1)
Top Labels
Issue Labels
bug (4) enhancement (2) documentation (1)
Pull Request Labels
enhancement (4)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 769 last-month
  • Total dependent packages: 1
  • Total dependent repositories: 1
  • Total versions: 20
  • Total maintainers: 1
pypi.org: glycowork

Package for processing and analyzing glycans

  • Versions: 20
  • Dependent Packages: 1
  • Dependent Repositories: 1
  • Downloads: 769 Last month
Rankings
Dependent packages count: 7.3%
Stargazers count: 10.4%
Forks count: 12.0%
Average: 13.3%
Downloads: 14.7%
Dependent repos count: 22.1%
Maintainers (1)
Last synced: 6 months ago

Dependencies

glycowork.egg-info/requires.txt pypi
  • mpld3 *
  • networkx *
  • pandas *
  • regex *
  • requests *
  • scipy *
  • seaborn *
  • sklearn *
  • statsmodels *
  • torch *
  • xgboost *
.github/workflows/deploy.yaml actions
  • fastai/workflows/quarto-ghp master composite
.github/workflows/test.yaml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite