chemdataextractor
Automatically extract chemical information from scientific documents
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
1 of 2 committers (50.0%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.6%) to scientific vocabulary
Keywords
chemistry
information-extraction
natural-language-processing
nlp
python
text-mining
Last synced: 6 months ago
·
JSON representation
Repository
Automatically extract chemical information from scientific documents
Basic Info
- Host: GitHub
- Owner: mcs07
- License: mit
- Language: Python
- Default Branch: master
- Homepage: http://chemdataextractor.org
- Size: 542 KB
Statistics
- Stars: 333
- Watchers: 17
- Forks: 120
- Open Issues: 23
- Releases: 7
Topics
chemistry
information-extraction
natural-language-processing
nlp
python
text-mining
Created over 9 years ago
· Last pushed over 2 years ago
Metadata Files
Readme
Changelog
Contributing
License
README.rst
ChemDataExtractor
=================
.. image:: http://img.shields.io/pypi/v/ChemDataExtractor.svg?style=flat-square
:target: https://pypi.python.org/pypi/ChemDataExtractor
.. image:: http://img.shields.io/pypi/l/ChemDataExtractor.svg?style=flat-square
:target: https://github.com/mcs07/ChemDataExtractor/blob/master/LICENSE
.. image:: http://img.shields.io/travis/mcs07/ChemDataExtractor.svg?style=flat-square
:target: https://travis-ci.org/mcs07/ChemDataExtractor
ChemDataExtractor is a toolkit for extracting chemical information from the scientific literature.
Features
--------
- HTML, XML and PDF document readers
- Chemistry-aware natural language processing pipeline
- Chemical named entity recognition
- Rule-based parsing grammars for property and spectra extraction
- Table parser for extracting tabulated data
- Document processing to resolve data interdependencies
Installation
------------
To install ChemDataExtractor, simply run::
pip install chemdataextractor
Or if you are an Anaconda user, run::
conda install -c chemdataextractor chemdataextractor
Alternatively, try one of the other `installation options`_.
Documentation
-------------
Full documentation is available at http://chemdataextractor.org/docs
License
-------
ChemDataExtractor is licensed under the `MIT license`_, a permissive, business-friendly license for open source
software.
.. _`installation options`: http://chemdataextractor.org/docs/install
.. _`MIT license`: https://github.com/mcs07/ChemDataExtractor/blob/master/LICENSE
Owner
- Name: Matt Swain
- Login: mcs07
- Kind: user
- Location: New York, NY
- Company: @DEShawResearch
- Website: https://matt-swain.com
- Twitter: mattswain123
- Repositories: 76
- Profile: https://github.com/mcs07
Developing software for drug discovery
GitHub Events
Total
- Issues event: 1
- Watch event: 23
- Fork event: 5
Last Year
- Issues event: 1
- Watch event: 23
- Fork event: 5
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Matt Swain | m****n@m****m | 97 |
| roselyne@uchicago.edu | r****e@u****u | 1 |
Committer Domains (Top 20 + Academic)
uchicago.edu: 1
me.com: 1
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 29
- Total pull requests: 16
- Average time to close issues: 2 months
- Average time to close pull requests: 11 days
- Total issue authors: 25
- Total pull request authors: 7
- Average comments per issue: 1.52
- Average comments per pull request: 0.56
- Merged pull requests: 8
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- gihanpanapitiya (2)
- ejmurray (2)
- chemlynx (2)
- dzhang228 (1)
- nnarahari-tech (1)
- sgbaird (1)
- user-agent-eng (1)
- giordan12 (1)
- sophiatabchouri (1)
- ravila4 (1)
- zmzeng (1)
- OlgaGKononova (1)
- scicontent (1)
- dan2097 (1)
- chrismattmann (1)
Pull Request Authors
- mcs07 (7)
- JeffersonH44 (4)
- pbulsink (1)
- ralic (1)
- kunlu-ou (1)
- rtchoua (1)
- rseragon (1)
Top Labels
Issue Labels
bug (5)
question (1)
Pull Request Labels
bug (5)
enhancement (4)
Packages
- Total packages: 3
-
Total downloads:
- pypi 944 last-month
-
Total dependent packages: 1
(may contain duplicates) -
Total dependent repositories: 15
(may contain duplicates) - Total versions: 10
- Total maintainers: 2
pypi.org: chemdataextractor
A toolkit for extracting chemical information from the scientific literature.
- Homepage: https://github.com/mcs07/ChemDataExtractor
- Documentation: https://chemdataextractor.readthedocs.io/
- License: MIT
-
Latest release: 1.3.0
published about 9 years ago
Rankings
Dependent repos count: 3.9%
Stargazers count: 3.9%
Forks count: 4.4%
Average: 6.3%
Downloads: 9.1%
Dependent packages count: 10.1%
Maintainers (1)
Last synced:
6 months ago
pypi.org: chemdataextractor-c
A toolkit for extracting chemical information from the scientific literature.
- Homepage: https://github.com/mcs07/ChemDataExtractor
- Documentation: https://chemdataextractor-c.readthedocs.io/
- License: MIT
-
Latest release: 1.0.0
published almost 3 years ago
Rankings
Stargazers count: 4.0%
Forks count: 4.5%
Dependent packages count: 7.3%
Average: 14.1%
Dependent repos count: 40.8%
Maintainers (1)
Last synced:
6 months ago
conda-forge.org: chemdataextractor
- Homepage: http://chemdataextractor.org/
- License: MIT
-
Latest release: 1.3.0
published almost 8 years ago
Rankings
Forks count: 18.1%
Average: 24.0%
Dependent repos count: 24.4%
Stargazers count: 24.5%
Dependent packages count: 29.0%
Last synced:
6 months ago
Dependencies
requirements/development.txt
pypi
- pytest >=3.0.6 development
- twine >=1.8.1 development
- wheel >=0.29.0 development
requirements/production.txt
pypi
- DAWG >=0.7.8
- PyYAML >=3.12
- appdirs >=1.4.0
- beautifulsoup4 >=4.5.3
- click >=6.7
- cssselect >=1.0.1
- lxml >=3.7.2
- nltk >=3.2.2
- pdfminer.six >=20160614
- python-crfsuite >=0.9.1
- python-dateutil >=2.6.0
- requests >=2.12.5
- six >=1.10.0
setup.py
pypi
- DAWG *
- PyYAML *
- appdirs *
- beautifulsoup4 *
- click *
- cssselect *
- lxml *
- nltk *
- pdfminer.six *
- python-crfsuite *
- python-dateutil *
- requests *
- six *