CAZy-parser a way to extract information from the Carbohydrate-Active enZYmes Database
CAZy-parser a way to extract information from the Carbohydrate-Active enZYmes Database - Published in JOSS (2016)
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: ncbi.nlm.nih.gov, joss.theoj.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.4%) to scientific vocabulary
Keywords
Keywords from Contributors
Scientific Fields
Repository
A way to extract specific information from CAZy
Basic Info
Statistics
- Stars: 13
- Watchers: 3
- Forks: 8
- Open Issues: 1
- Releases: 9
Topics
Metadata Files
README.md
cazy-parser
A way to extract specific information from the Carbohydrate-Active enZYmes.
Make sure to visit and cite the CAZy website!
- http://www.cazy.org/
- Lombard V, Golaconda Ramulu H, Drula E, Coutinho PM, Henrissat B (2014) The Carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res 42:D490–D495. [PMID: 24270786].
License: GNU GPLv3
doi: 10.21105/joss.00053
Recommendation
cazy_webscraper provides broader functionality by integrating CAZy data with external resources (protein sequences, taxonomy, and structures) to build comprehensive local databases for large-scale analyses. It also includes additional features for taxonomic exploration and structural analysis that aren't available here.
Introduction
cazy-parser is a tool that extract information from CAZy in a more usable and readable format. Firstly, a script reads the HTML structure and creates a mirror of the database as a tab delimited file. Secondly, information is extracted from the database according to user inputted parameters and presented to the user as a set of accession codes.
Install / Upgrade
text
pip install --upgrade cazy-parser
Usage (internet connection required)
```text cazy-parser -h usage: cazy-parser [-h] [-f FAMILY] [-s SUBFAMILY] [-c CHARACTERIZED] [-v] {GH,GT,PL,CA,AA}
positional arguments: {GH,GT,PL,CA,AA}
optional arguments: -h, --help show this help message and exit -f FAMILY, --family FAMILY -s SUBFAMILY, --subfamily SUBFAMILY -c CHARACTERIZED, --characterized CHARACTERIZED -v, --version show version ```
Example
Extract all fasta sequences from family 43 of Glycoside Hydrolase subfamily 1
text
$ cazy-parser GH -f 43 -s 1
[2022-05-26 16:39:21,511 91 INFO] ------------------------------------------
[2022-05-26 16:39:21,511 92 INFO]
[2022-05-26 16:39:21,511 93 INFO] ┌─┐┌─┐┌─┐┬ ┬ ┌─┐┌─┐┬─┐┌─┐┌─┐┬─┐
[2022-05-26 16:39:21,511 94 INFO] │ ├─┤┌─┘└┬┘───├─┘├─┤├┬┘└─┐├┤ ├┬┘
[2022-05-26 16:39:21,511 95 INFO] └─┘┴ ┴└─┘ ┴ ┴ ┴ ┴┴└─└─┘└─┘┴└─ v2.0.1
[2022-05-26 16:39:21,511 96 INFO]
[2022-05-26 16:39:21,511 97 INFO] ------------------------------------------
[2022-05-26 16:39:21,511 183 INFO] Fetching links for Glycoside-Hydrolases, url: http://www.cazy.org/Glycoside-Hydrolases.html
[2022-05-26 16:39:22,454 189 INFO] Only using links of family 43 subfamily 1
[2022-05-26 16:39:23,029 26 INFO] Dowloading 1415 fasta sequences...
[2022-05-26 16:40:32,187 51 INFO] Dumping fasta sequences to file GH43_1_26052022.fasta
This will generate the following file GH43_1_DDMMYYY.fasta containing the fasta sequences.
To-do and how to contribute
Please refer to CONTRIBUTING 🤓
Owner
- Name: Rodrigo V Honorato
- Login: rvhonorato
- Kind: user
- Location: Utrecht, Netherlands
- Company: @UtrechtUniversity
- Repositories: 7
- Profile: https://github.com/rvhonorato
Biologist with a PhD in Bioinformatics working as Research Software and Full Stack developer with a bit of DevOps (:
GitHub Events
Total
- Push event: 1
Last Year
- Push event: 1
Committers
Last synced: 5 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Rodrigo V Honorato | r****o@u****l | 22 |
| rodrigo honorato | r****s | 18 |
| rodrigo | r****s@i****r | 18 |
| dependabot[bot] | 4****] | 10 |
| Arfon Smith | a****n | 2 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 12
- Total pull requests: 31
- Average time to close issues: 6 months
- Average time to close pull requests: 4 days
- Total issue authors: 9
- Total pull request authors: 3
- Average comments per issue: 1.5
- Average comments per pull request: 0.35
- Merged pull requests: 26
- Bot issues: 0
- Bot pull requests: 15
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- rvhonorato (4)
- phiweger (1)
- lonsbio (1)
- geo80 (1)
- bj600800 (1)
- BinhongLiu (1)
- jkahn (1)
- jtamames (1)
- Geize (1)
Pull Request Authors
- rvhonorato (14)
- dependabot[bot] (14)
- arfon (2)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 97 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 11
- Total maintainers: 1
pypi.org: cazy-parser
A way to extract specific information from CAZy
- Documentation: https://cazy-parser.readthedocs.io/
- License: GPLv3
-
Latest release: 2.0.3
published about 2 years ago
Rankings
Maintainers (1)
Dependencies
- beautifulsoup4 ==4.11.1
- biopython ==1.79
- certifi ==2022.5.18.1
- progressbar2 ==4.0.0
- python-utils ==3.2.3
- requests ==2.27.1
- soupsieve ==2.3.2.post1
- actions/checkout v2 composite
- actions/setup-python v2.2.2 composite
- actions/checkout v2 composite
- actions/setup-python v2.2.2 composite
- codacy/codacy-coverage-reporter-action v1 composite