chemdataextractor

Automatically extract chemical information from scientific documents

https://github.com/mcs07/chemdataextractor

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    1 of 2 committers (50.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.6%) to scientific vocabulary

Keywords

chemistry information-extraction natural-language-processing nlp python text-mining
Last synced: 6 months ago · JSON representation

Repository

Automatically extract chemical information from scientific documents

Basic Info
Statistics
  • Stars: 333
  • Watchers: 17
  • Forks: 120
  • Open Issues: 23
  • Releases: 7
Topics
chemistry information-extraction natural-language-processing nlp python text-mining
Created over 9 years ago · Last pushed over 2 years ago
Metadata Files
Readme Changelog Contributing License

README.rst

ChemDataExtractor
=================

.. image:: http://img.shields.io/pypi/v/ChemDataExtractor.svg?style=flat-square
    :target: https://pypi.python.org/pypi/ChemDataExtractor

.. image:: http://img.shields.io/pypi/l/ChemDataExtractor.svg?style=flat-square
    :target: https://github.com/mcs07/ChemDataExtractor/blob/master/LICENSE

.. image:: http://img.shields.io/travis/mcs07/ChemDataExtractor.svg?style=flat-square
    :target: https://travis-ci.org/mcs07/ChemDataExtractor

ChemDataExtractor is a toolkit for extracting chemical information from the scientific literature.


Features
--------

- HTML, XML and PDF document readers
- Chemistry-aware natural language processing pipeline
- Chemical named entity recognition
- Rule-based parsing grammars for property and spectra extraction
- Table parser for extracting tabulated data
- Document processing to resolve data interdependencies


Installation
------------

To install ChemDataExtractor, simply run::

    pip install chemdataextractor

Or if you are an Anaconda user, run::

    conda install -c chemdataextractor chemdataextractor

Alternatively, try one of the other `installation options`_.


Documentation
-------------

Full documentation is available at http://chemdataextractor.org/docs


License
-------

ChemDataExtractor is licensed under the `MIT license`_, a permissive, business-friendly license for open source
software.


.. _`installation options`: http://chemdataextractor.org/docs/install
.. _`MIT license`: https://github.com/mcs07/ChemDataExtractor/blob/master/LICENSE

Owner

  • Name: Matt Swain
  • Login: mcs07
  • Kind: user
  • Location: New York, NY
  • Company: @DEShawResearch

Developing software for drug discovery

GitHub Events

Total
  • Issues event: 1
  • Watch event: 23
  • Fork event: 5
Last Year
  • Issues event: 1
  • Watch event: 23
  • Fork event: 5

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 98
  • Total Committers: 2
  • Avg Commits per committer: 49.0
  • Development Distribution Score (DDS): 0.01
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Matt Swain m****n@m****m 97
roselyne@uchicago.edu r****e@u****u 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 29
  • Total pull requests: 16
  • Average time to close issues: 2 months
  • Average time to close pull requests: 11 days
  • Total issue authors: 25
  • Total pull request authors: 7
  • Average comments per issue: 1.52
  • Average comments per pull request: 0.56
  • Merged pull requests: 8
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • gihanpanapitiya (2)
  • ejmurray (2)
  • chemlynx (2)
  • dzhang228 (1)
  • nnarahari-tech (1)
  • sgbaird (1)
  • user-agent-eng (1)
  • giordan12 (1)
  • sophiatabchouri (1)
  • ravila4 (1)
  • zmzeng (1)
  • OlgaGKononova (1)
  • scicontent (1)
  • dan2097 (1)
  • chrismattmann (1)
Pull Request Authors
  • mcs07 (7)
  • JeffersonH44 (4)
  • pbulsink (1)
  • ralic (1)
  • kunlu-ou (1)
  • rtchoua (1)
  • rseragon (1)
Top Labels
Issue Labels
bug (5) question (1)
Pull Request Labels
bug (5) enhancement (4)

Packages

  • Total packages: 3
  • Total downloads:
    • pypi 944 last-month
  • Total dependent packages: 1
    (may contain duplicates)
  • Total dependent repositories: 15
    (may contain duplicates)
  • Total versions: 10
  • Total maintainers: 2
pypi.org: chemdataextractor

A toolkit for extracting chemical information from the scientific literature.

  • Versions: 8
  • Dependent Packages: 0
  • Dependent Repositories: 14
  • Downloads: 927 Last month
Rankings
Dependent repos count: 3.9%
Stargazers count: 3.9%
Forks count: 4.4%
Average: 6.3%
Downloads: 9.1%
Dependent packages count: 10.1%
Maintainers (1)
Last synced: 6 months ago
pypi.org: chemdataextractor-c

A toolkit for extracting chemical information from the scientific literature.

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 17 Last month
Rankings
Stargazers count: 4.0%
Forks count: 4.5%
Dependent packages count: 7.3%
Average: 14.1%
Dependent repos count: 40.8%
Maintainers (1)
Last synced: 6 months ago
conda-forge.org: chemdataextractor
  • Versions: 1
  • Dependent Packages: 1
  • Dependent Repositories: 1
Rankings
Forks count: 18.1%
Average: 24.0%
Dependent repos count: 24.4%
Stargazers count: 24.5%
Dependent packages count: 29.0%
Last synced: 6 months ago

Dependencies

requirements/development.txt pypi
  • pytest >=3.0.6 development
  • twine >=1.8.1 development
  • wheel >=0.29.0 development
requirements/production.txt pypi
  • DAWG >=0.7.8
  • PyYAML >=3.12
  • appdirs >=1.4.0
  • beautifulsoup4 >=4.5.3
  • click >=6.7
  • cssselect >=1.0.1
  • lxml >=3.7.2
  • nltk >=3.2.2
  • pdfminer.six >=20160614
  • python-crfsuite >=0.9.1
  • python-dateutil >=2.6.0
  • requests >=2.12.5
  • six >=1.10.0
setup.py pypi
  • DAWG *
  • PyYAML *
  • appdirs *
  • beautifulsoup4 *
  • click *
  • cssselect *
  • lxml *
  • nltk *
  • pdfminer.six *
  • python-crfsuite *
  • python-dateutil *
  • requests *
  • six *