CAZy-parser a way to extract information from the Carbohydrate-Active enZYmes Database

CAZy-parser a way to extract information from the Carbohydrate-Active enZYmes Database - Published in JOSS (2016)

https://github.com/rvhonorato/cazy-parser

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: ncbi.nlm.nih.gov, joss.theoj.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.4%) to scientific vocabulary

Keywords

carbohydrates cazy data-mining enzymes scrapper text-mining

Keywords from Contributors

blackhole gravitational-lenses meshes pypi annotations simulations hydrology stellar exoplanets pde

Scientific Fields

Biology Life Sciences - 34% confidence
Last synced: 4 months ago · JSON representation

Repository

A way to extract specific information from CAZy

Basic Info
  • Host: GitHub
  • Owner: rvhonorato
  • License: gpl-3.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 123 KB
Statistics
  • Stars: 13
  • Watchers: 3
  • Forks: 8
  • Open Issues: 1
  • Releases: 9
Topics
carbohydrates cazy data-mining enzymes scrapper text-mining
Created over 9 years ago · Last pushed about 1 year ago
Metadata Files
Readme Contributing License Code of conduct

README.md

cazy-parser

A way to extract specific information from the Carbohydrate-Active enZYmes.

Downloads status unittests Codacy Badge Codacy Badge

Make sure to visit and cite the CAZy website!

  • http://www.cazy.org/
  • Lombard V, Golaconda Ramulu H, Drula E, Coutinho PM, Henrissat B (2014) The Carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res 42:D490–D495. [PMID: 24270786].

License: GNU GPLv3

RV Honorato. CAZy-parser a way to extract information from the Carbohydrate-Active enZYmes Database. The Journal of Open Source Software, 1(8), dec 2016.

doi: 10.21105/joss.00053

Recommendation

cazy_webscraper provides broader functionality by integrating CAZy data with external resources (protein sequences, taxonomy, and structures) to build comprehensive local databases for large-scale analyses. It also includes additional features for taxonomic exploration and structural analysis that aren't available here.

Introduction

cazy-parser is a tool that extract information from CAZy in a more usable and readable format. Firstly, a script reads the HTML structure and creates a mirror of the database as a tab delimited file. Secondly, information is extracted from the database according to user inputted parameters and presented to the user as a set of accession codes.

Install / Upgrade

text pip install --upgrade cazy-parser

Usage (internet connection required)

```text cazy-parser -h usage: cazy-parser [-h] [-f FAMILY] [-s SUBFAMILY] [-c CHARACTERIZED] [-v] {GH,GT,PL,CA,AA}

positional arguments: {GH,GT,PL,CA,AA}

optional arguments: -h, --help show this help message and exit -f FAMILY, --family FAMILY -s SUBFAMILY, --subfamily SUBFAMILY -c CHARACTERIZED, --characterized CHARACTERIZED -v, --version show version ```

Example

Extract all fasta sequences from family 43 of Glycoside Hydrolase subfamily 1

text $ cazy-parser GH -f 43 -s 1 [2022-05-26 16:39:21,511 91 INFO] ------------------------------------------ [2022-05-26 16:39:21,511 92 INFO] [2022-05-26 16:39:21,511 93 INFO] ┌─┐┌─┐┌─┐┬ ┬ ┌─┐┌─┐┬─┐┌─┐┌─┐┬─┐ [2022-05-26 16:39:21,511 94 INFO] │ ├─┤┌─┘└┬┘───├─┘├─┤├┬┘└─┐├┤ ├┬┘ [2022-05-26 16:39:21,511 95 INFO] └─┘┴ ┴└─┘ ┴ ┴ ┴ ┴┴└─└─┘└─┘┴└─ v2.0.1 [2022-05-26 16:39:21,511 96 INFO] [2022-05-26 16:39:21,511 97 INFO] ------------------------------------------ [2022-05-26 16:39:21,511 183 INFO] Fetching links for Glycoside-Hydrolases, url: http://www.cazy.org/Glycoside-Hydrolases.html [2022-05-26 16:39:22,454 189 INFO] Only using links of family 43 subfamily 1 [2022-05-26 16:39:23,029 26 INFO] Dowloading 1415 fasta sequences... [2022-05-26 16:40:32,187 51 INFO] Dumping fasta sequences to file GH43_1_26052022.fasta

This will generate the following file GH43_1_DDMMYYY.fasta containing the fasta sequences.

To-do and how to contribute

Please refer to CONTRIBUTING 🤓

Owner

  • Name: Rodrigo V Honorato
  • Login: rvhonorato
  • Kind: user
  • Location: Utrecht, Netherlands
  • Company: @UtrechtUniversity

Biologist with a PhD in Bioinformatics working as Research Software and Full Stack developer with a bit of DevOps (:

GitHub Events

Total
  • Push event: 1
Last Year
  • Push event: 1

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 70
  • Total Committers: 5
  • Avg Commits per committer: 14.0
  • Development Distribution Score (DDS): 0.686
Past Year
  • Commits: 1
  • Committers: 1
  • Avg Commits per committer: 1.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Rodrigo V Honorato r****o@u****l 22
rodrigo honorato r****s 18
rodrigo r****s@i****r 18
dependabot[bot] 4****] 10
Arfon Smith a****n 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 12
  • Total pull requests: 31
  • Average time to close issues: 6 months
  • Average time to close pull requests: 4 days
  • Total issue authors: 9
  • Total pull request authors: 3
  • Average comments per issue: 1.5
  • Average comments per pull request: 0.35
  • Merged pull requests: 26
  • Bot issues: 0
  • Bot pull requests: 15
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • rvhonorato (4)
  • phiweger (1)
  • lonsbio (1)
  • geo80 (1)
  • bj600800 (1)
  • BinhongLiu (1)
  • jkahn (1)
  • jtamames (1)
  • Geize (1)
Pull Request Authors
  • rvhonorato (14)
  • dependabot[bot] (14)
  • arfon (2)
Top Labels
Issue Labels
bug (5) enhancement (3) invalid (1) question (1) support (1) testing (1) help wanted (1)
Pull Request Labels
dependencies (14) testing (3) bug (3) enhancement (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 97 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 11
  • Total maintainers: 1
pypi.org: cazy-parser

A way to extract specific information from CAZy

  • Versions: 11
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 97 Last month
Rankings
Dependent packages count: 10.1%
Downloads: 28.1%
Average: 35.1%
Dependent repos count: 67.1%
Maintainers (1)
Last synced: 4 months ago

Dependencies

requirements.txt pypi
  • beautifulsoup4 ==4.11.1
  • biopython ==1.79
  • certifi ==2022.5.18.1
  • progressbar2 ==4.0.0
  • python-utils ==3.2.3
  • requests ==2.27.1
  • soupsieve ==2.3.2.post1
.github/workflows/lint.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2.2.2 composite
.github/workflows/unittests.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2.2.2 composite
  • codacy/codacy-coverage-reporter-action v1 composite