glycowork
Package for processing and analyzing glycans and their role in biology.
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: pubmed.ncbi, ncbi.nlm.nih.gov, wiley.com, zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (18.5%) to scientific vocabulary
Keywords
Scientific Fields
Repository
Package for processing and analyzing glycans and their role in biology.
Basic Info
- Host: GitHub
- Owner: BojarLab
- License: mit
- Language: Jupyter Notebook
- Default Branch: master
- Homepage: https://Bojarlab.github.io/glycowork
- Size: 705 MB
Statistics
- Stars: 72
- Watchers: 5
- Forks: 16
- Open Issues: 1
- Releases: 21
Topics
Metadata Files
README.md
glycowork

Glycans are fundamental biological sequences that are as crucial as DNA, RNA, and proteins. As complex carbohydrates forming branched structures, glycans are ubiquitous yet often overlooked in biological research.
Why Glycans are Important
- Ubiquitous in biology
- Integral to protein and lipid function
- Relevant to human diseases
Challenges in Glycan Analysis
Analyzing glycans is complicated due to their non-linear structures and
enormous diversity. But that’s where glycowork comes in.
Introducing glycowork: Your Solution for Glycan-Focused Data Science
Glycowork is a Python package specifically designed to simplify glycan sequence processing and analysis. It offers:
- Functions for glycan analysis
- Datasets for model training
- Full support for IUPAC-condensed string representation. Broad support for IUPAC-extended, LinearCode, Oxford, GlycoCT, WURCS, GLYCAM, CSDB-linear, GlycoWorkBench, GlyTouCan IDs, and more.
- Powerful graph-based architecture for in-depth analysis
Documentation: https://bojarlab.github.io/glycowork/
Contribute: Interested in contributing? Read our Contribution Guidelines
Citation: If glycowork adds value to your project, please cite
Thomes et al.,
2021
Install
Not familiar with Python? Try our no-code, graphical user
interface (glycoworkGUI.exe, can be downloaded at the bottom of the
latest Release page)
for accessing some of the most useful glycowork functions!
via pip:
pip install glycowork
import glycowork
alternative:
pip install git+https://github.com/BojarLab/glycowork.git
import glycowork
Note that we have optional extra installs for specialized use (even
further instructions can be found in the Examples tab; on Mac you
might need to use "glycowork[ml]"), such as:
deep learning
pip install glycowork[ml]
analyzing atomic/chemical
properties of glycans
pip install glycowork[chem]
everything
pip install glycowork[all]
Data & Models
Glycowork currently contains the following main datasets that are
freely available to everyone:
df_glycan- contains ~50,500 unique glycan sequences, including labels such as ~39,500 species associations, ~20,000 tissue associations, and ~1,000 disease associations
glycan_binding- contains >790,000 protein-glycan binding interactions, from >2,000 unique glycan-binding proteins
Additionally, we store these trained deep learning models for easy
usage, which can be retrieved with the prep_model function:
LectinOracle- can be used to predict glycan-binding specificity of a protein, given its ESMC representation; from Lundstrom et al., 2021
LectinOracle_flex- operates the same as LectinOracle but can directly use the raw protein sequence as input (no ESMC representation required)
SweetNet- a graph convolutional neural network trained to predict species from glycan, can be used to generate learned glycan representations; from Burkholz et al., 2021
NSequonPred- given the ESM-1b representation of an N-sequon (+/- 20 AA), this model can predict whether the sequon will be glycosylated
How to use
Glycowork currently contains four main modules:
glycan_data- stores several glycan datasets and contains helper functions
ml- here are all the functions for training and using machine learning models, including train-test-split, getting glycan representations, etc.
motif- contains functions for processing & drawing glycan sequences, identifying motifs and features, and analyzing them
network- contains functions for constructing and analyzing glycan networks (e.g., biosynthetic networks)
Below are some examples of what you can do with glycowork; be sure to
check out the other examples in the full documentation for everything
that’s there. –> Learn more A non-exhaustive
list includes:
- using trained AI models for prediction –> Learn more
- training your own AI models –> Learn more
- motif enrichment analyses –> Learn more
- differential glycomics expression analysis –> Learn more
- annotating motifs in glycans –> Learn more
- drawing publication-quality glycan figures –> Learn more
- finding out whether & where glycans are describing the same sequence –> Learn more
- m/z to composition to structure to motif mappings –> Learn more
- mass calculation –> Learn more
- visualizing motif distribution / glycan similarities / sequence properties –> Learn more
- constructing and analyzing biosynthetic networks –> Learn more
``` python
drawing publication-quality glycan figures
from glycowork import GlycoDraw GlycoDraw("Neu5Ac(a2-3)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-2)Man(a1-3)[Neu5Gc(a2-6)Gal(b1-4)GlcNAc(b1-2)Man(a1-6)][GlcNAc(b1-4)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc", highlight_motif = "Neu5Ac(a2-3)Gal(b1-4)[Fuc(a1-3)]GlcNAc") ```
``` python
get motifs, graph features, and sequence features of a set of glycan sequences to train models or analyze glycan properties
glycans = ["Neu5Ac(a2-3)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-2)Man(a1-3)[Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc", "Ma3(Ma6)Mb4GNb4GN;N", "α-D-Manp-(1→3)[α-D-Manp-(1→6)]-β-D-Manp-(1→4)-β-D-GlcpNAc-(1→4)-β-D-GlcpNAc-(1→", "F(3)XA2", "WURCS=2.0/5,11,10/[a2122h-1b1-52*NCC/3=O][a1122h-1b1-5][a1122h-1a1-5][a2112h-1b1-5][a1221m-1a1-5]/1-1-2-3-1-4-3-1-4-5-5/a4-b1a6-k1b4-c1c3-d1c6-g1d2-e1e4-f1g2-h1h4-i1i2-j1", """RES 1b:b-dglc-HEX-1:5 2s:n-acetyl 3b:b-dglc-HEX-1:5 4s:n-acetyl 5b:b-dman-HEX-1:5 6b:a-dman-HEX-1:5 7b:b-dglc-HEX-1:5 8s:n-acetyl 9b:b-dgal-HEX-1:5 10s:sulfate 11s:n-acetyl 12b:a-dman-HEX-1:5 13b:b-dglc-HEX-1:5 14s:n-acetyl 15b:b-dgal-HEX-1:5 16s:n-acetyl LIN 1:1d(2+1)2n 2:1o(4+1)3d 3:3d(2+1)4n 4:3o(4+1)5d 5:5o(3+1)6d 6:6o(2+1)7d 7:7d(2+1)8n 8:7o(4+1)9d 9:9o(-1+1)10n 10:9d(2+1)11n 11:5o(6+1)12d 12:12o(2+1)13d 13:13d(2+1)14n 14:13o(4+1)15d 15:15d(2+1)16n"""] from glycowork.motif.annotate import annotatedataset out = annotatedataset(glycans, featureset = ['known', 'terminal', 'exhaustive'], condense=True) ```
| | InternalLewisX | InternalLewisA | Hantigentype2 | Chitobiose | Trimannosylcore | TerminalLacNActype1 | InternalLacNActype2 | TerminalLacNActype2 | TerminalLacdiNActype2 | corefucose | corefucose(a1-3) | Fuc | Gal | GalNAc | GalNAcOS | GlcNAc | Man | Neu5Ac | Xyl | Fuc(a1-2)Gal | Fuc(a1-3)GlcNAc | Fuc(a1-4)GlcNAc | Fuc(a1-6)GlcNAc | Fuc(a1-?)GlcNAc | Gal(b1-3)GlcNAc | Gal(b1-4)GlcNAc | Gal(b1-?)GlcNAc | GalNAc(b1-4)GlcNAc | GalNAcOS(b1-4)GlcNAc | GlcNAc(b1-2)Man | GlcNAc(b1-4)GlcNAc | Man(a1-3)Man | Man(a1-6)Man | Man(a1-?)Man | Man(b1-4)GlcNAc | Neu5Ac(a2-3)Gal | Xyl(b1-2)Man | TerminalFuc(a1-3) | TerminalMan(a1-3) | TerminalMan(a1-6) | TerminalGalNAcOS(b1-4) | TerminalFuc(a1-4) | TerminalGal(b1-4) | TerminalGalNAc(b1-4) | TerminalXyl(b1-2) | TerminalGal(b1-3) | TerminalGlcNAc(b1-2) | TerminalFuc(a1-6) | TerminalNeu5Ac(a2-3) | TerminalFuc(a1-2) | TerminalFuc(a1-?) | TerminalMan(a1-?) | TerminalGal(b1-?) | |----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----| | Neu5Ac(a2-3)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-2)Man(a1-3)[Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 3 | 2 | 0 | 0 | 4 | 3 | 1 | 0 | 0 | 1 | 1 | 1 | 3 | 1 | 1 | 2 | 0 | 0 | 2 | 1 | 1 | 1 | 2 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 3 | 0 | 1 | | Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 2 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | | Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 2 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | | GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)][Xyl(b1-2)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 4 | 3 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 1 | 1 | 2 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 2 | 0 | 0 | 0 | 1 | 0 | 0 | | Fuc(a1-2)Gal(b1-4)GlcNAc(b1-2)Man(a1-6)[Gal(b1-4)GlcNAc(b1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 2 | 2 | 0 | 0 | 4 | 3 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 2 | 2 | 0 | 0 | 2 | 1 | 1 | 1 | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 2 | 0 | 1 | | GalNAcOS(b1-4)GlcNAc(b1-2)Man(a1-3)[GalNAc(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 4 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 2 | 1 | 1 | 1 | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
``` python
using graphs, you can easily check whether a glycan contains a specific motif; how about internal Lewis A/X motifs?
from glycowork.motif.graph import subgraphisomorphism print(subgraphisomorphism('Neu5Ac(a2-3)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-6)[Gal(b1-3)]GalNAc', 'Fuc(a1-?)[Gal(b1-?)]GlcNAc', terminilist = ['terminal', 'internal', 'flexible'])) print(subgraphisomorphism('Neu5Ac(a2-3)Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-6)[Gal(b1-3)]GalNAc', 'Fuc(a1-3/4)[Gal(b1-3/4)]GlcNAc', terminilist = ['t', 'i', 'f'])) print(subgraphisomorphism('Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-6)[Gal(b1-3)]GalNAc', 'dHex(a1-?)[Hex(b1-?)]GlcNAc', termini_list = ['t', 'i', 'f']))
or you could find the terminal epitopes of a glycan
from glycowork.motif.annotate import getterminalstructures print("\nTerminal structures:") print(getterminalstructures('Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc')) ```
True
True
False
Terminal structures:
['Man(a1-3)', 'Man(a1-6)', 'Fuc(a1-6)']
``` python
given a composition, find matching glycan structures in SugarBase; specific for glycan classes and taxonomy
from glycowork.motif.tokenization import compositionstostructures print(compositionstostructures([{'Hex':3, 'HexNAc':4}], glycan_class = 'N'))
or we could calculate the mass of this composition
from glycowork.motif.tokenization import compositiontomass print("\nMass of the composition Hex3HexNAc4") print(compositiontomass({'Hex':3, 'HexNAc':4})) print(compositiontomass("H3N4")) print(compositiontomass("Hex3HexNAc4")) ```
0 compositions could not be matched. Run with verbose = True to see which compositions.
glycan abundance
0 GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)]Ma... 0
1 GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-4)][Man(a1-6)]... 0
2 GlcNAc(b1-2)[GlcNAc(b1-4)]Man(a1-3)[Man(a1-6)]... 0
3 GalNAc(b1-4)GlcNAc(b1-2)Man(a1-3)[Man(a1-6)]Ma... 0
4 GalNAc(b1-3/4)GlcNAc(b1-2)Man(a1-3)[Man(a1-6)]... 0
5 GlcNAc(b1-2)Man(a1-6)[Man(a1-3)][GlcNAc(b1-4)]... 0
6 GlcNAc(b1-2)Man(a1-3/6)[GlcNAc(b1-4)][Man(a1-3... 0
7 Man(a1-3)[GlcNAc(b1-2)Man(a1-6)][GlcNAc(b1-4)]... 0
8 GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-6)Man(a1-6)]Ma... 0
9 GlcNAc(b1-4)Man(a1-3)[GlcNAc(b1-2)Man(a1-6)]Ma... 0
10 GlcNAc(b1-4)Man(a1-3)[GlcNAc(b1-4)Man(a1-6)]Ma... 0
11 GlcNAc(b1-2)Man(a1-3)[GlcNAc(b1-2)[GlcNAc(b1-4... 0
12 GlcNAc(b1-4)Man(a1-3)[GlcNAc(b1-6)Man(a1-6)]Ma... 0
13 Man(a1-3)[GlcNAc(b1-2)[GlcNAc(b1-6)]Man(a1-6)]... 0
14 GalNAc(b1-4)GlcNAc(b1-2)Man(a1-6)[Man(a1-3)]Ma... 0
Mass of the composition Hex3HexNAc4
1316.4865545999999
1316.4865545999999
1316.4865545999999
Owner
- Name: BojarLab
- Login: BojarLab
- Kind: organization
- Email: daniel.bojar@gu.se
- Location: Gothenburg, Sweden
- Website: https://dbojar.com/bojar-lab/
- Twitter: daniel_bojar
- Repositories: 4
- Profile: https://github.com/BojarLab
Machine Learning in Glycobiology and Systems Biology
Citation (CITATION.bib)
@article{thomes2021glycowork,
title={Glycowork: A Python package for glycan data science and machine learning},
author={Thom{\`e}s, Luc and Burkholz, Rebekka and Bojar, Daniel},
journal={Glycobiology},
volume={31},
number={10},
pages={1240--1244},
year={2021},
month={11},
publisher={Oxford University Press},
doi={10.1093/glycob/cwab067},
pmid={34192308},
pmcid={PMC8600276}
}
GitHub Events
Total
- Create event: 11
- Release event: 4
- Issues event: 8
- Watch event: 15
- Delete event: 5
- Issue comment event: 48
- Push event: 190
- Pull request review event: 4
- Pull request event: 46
- Fork event: 3
Last Year
- Create event: 11
- Release event: 4
- Issues event: 8
- Watch event: 15
- Delete event: 5
- Issue comment event: 48
- Push event: 190
- Pull request review event: 4
- Pull request event: 46
- Fork event: 3
Committers
Last synced: over 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| Bribak | d****l@b****t | 393 |
| lundstrm | j****m@g****m | 48 |
| lthomes | l****s@e****r | 10 |
| Kathryn | k****h@r****m | 8 |
| Rebekka Burkholz | r****z@g****m | 1 |
| viktoriakarlsson | 9****n | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 32
- Total pull requests: 69
- Average time to close issues: 8 days
- Average time to close pull requests: 1 day
- Total issue authors: 17
- Total pull request authors: 12
- Average comments per issue: 3.19
- Average comments per pull request: 0.83
- Merged pull requests: 51
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 8
- Pull requests: 36
- Average time to close issues: 4 days
- Average time to close pull requests: about 4 hours
- Issue authors: 6
- Pull request authors: 6
- Average comments per issue: 2.5
- Average comments per pull request: 1.06
- Merged pull requests: 28
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- mattias-erhardsson (10)
- TheLostLambda (3)
- fubin1999 (2)
- Old-Shatterhand (2)
- dtchang (2)
- edwardsnj (2)
- Qinulinuli (2)
- peterthorpe5 (1)
- JacobPFrick (1)
- benwest-york (1)
- klarich (1)
- Ojas-Singh (1)
- ilsenatorov (1)
- tomdstanton (1)
- mobiusklein (1)
Pull Request Authors
- Bribak (29)
- Glycocalex (15)
- Old-Shatterhand (12)
- urbj (10)
- Alex-RW-Bennett (5)
- lthomes (4)
- mattias-erhardsson (4)
- viktoriakarlsson (2)
- TheLostLambda (2)
- cthoyt (1)
- DelusionalSimon (1)
- klarich (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 769 last-month
- Total dependent packages: 1
- Total dependent repositories: 1
- Total versions: 20
- Total maintainers: 1
pypi.org: glycowork
Package for processing and analyzing glycans
- Homepage: https://github.com/BojarLab/glycowork
- Documentation: https://glycowork.readthedocs.io/
- License: mit
-
Latest release: 1.6.3
published 7 months ago
Rankings
Maintainers (1)
Dependencies
- mpld3 *
- networkx *
- pandas *
- regex *
- requests *
- scipy *
- seaborn *
- sklearn *
- statsmodels *
- torch *
- xgboost *
- fastai/workflows/quarto-ghp master composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
