Adeft

Adeft: Acromine-based Disambiguation of Entities from Text with applications to the biomedical literature - Published in JOSS (2020)

https://github.com/gyorilab/adeft

Science Score: 59.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 5 DOI reference(s) in README
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Committers with academic emails
    2 of 9 committers (22.2%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.5%) to scientific vocabulary

Keywords

acronym-disambiguation

Scientific Fields

Artificial Intelligence and Machine Learning Computer Science - 83% confidence
Last synced: 4 months ago · JSON representation

Repository

Tool for disambiguating acronyms and abbreviations in text for NLP applications

Basic Info
  • Host: GitHub
  • Owner: gyorilab
  • License: bsd-2-clause
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 11.1 MB
Statistics
  • Stars: 22
  • Watchers: 5
  • Forks: 10
  • Open Issues: 1
  • Releases: 14
Topics
acronym-disambiguation
Created about 7 years ago · Last pushed over 1 year ago
Metadata Files
Readme Contributing License

README.md

Adeft

DOI DOI License Tests Documentation PyPI version Python 3

Adeft (Acromine based Disambiguation of Entities From Text context) is a utility for building models to disambiguate acronyms and other abbreviations of biological terms in the scientific literature. It makes use of an implementation of the Acromine algorithm developed by the NaCTeM at the University of Manchester to identify possible longform expansions for shortforms in a text corpus. It allows users to build disambiguation models to disambiguate shortforms based on their text context. A growing number of pretrained disambiguation models are publicly available to download through adeft.

Citation

If you use Adeft in your research, please cite the paper in the Journal of Open Source Software:

Steppi A, Gyori BM, Bachman JA (2020). Adeft: Acromine-based Disambiguation of Entities from Text with applications to the biomedical literature. Journal of Open Source Software, 5(45), 1708, https://doi.org/10.21105/joss.01708

Installation

Adeft works with Python versions 3.5 and above. It is available on PyPi and can be installed with the command

$ pip install adeft

Adeft's pretrained machine learning models can then be downloaded with the command

$ python -m adeft.download

If you choose to install by cloning this repository

$ git clone https://github.com/indralab/adeft.git

You should also run

$ python setup.py build_ext --inplace

at the top level of your local repository in order to build the extension module for alignment based longform detection and scoring.

Using Adeft

A dictionary of available models can be imported with from adeft import available_models

The dictionary maps shortforms to model names. It's possible for multiple equivalent shortforms to map to the same model.

Here's an example of running a disambiguator for ER on a list of texts

```python from adeft.disambiguate import load_disambiguator

erdd = loaddisambiguator('ER')

...

er_dd.disambiguate(texts) ```

Users may also build and train their own disambiguators. See the documention for more info.

Documentation

Documentation is available at https://adeft.readthedocs.io

Jupyter notebooks illustrating Adeft workflows are available under notebooks: - Introduction - Model building

Testing

Adeft uses pytest for unit testing, and uses Github Actions as a continuous integration environment. To run tests locally, make sure to install the test-specific requirements listed in setup.py as

bash pip install adeft[test]

and download all pre-trained models as shown above. Then run pytest in the top-level adeft folder.

Funding

Development of this software was supported by the Defense Advanced Research Projects Agency under awards W911NF018-1-0124 and W911NF-15-1-0544, and the National Cancer Institute under award U54-CA225088.

Owner

  • Name: Gyori Lab for Computational Biomedicine
  • Login: gyorilab
  • Kind: organization
  • Email: indra.sysbio@gmail.com
  • Location: United States of America

Accelerating discovery in biomedicine using AI @ Northeastern University

GitHub Events

Total
  • Watch event: 2
Last Year
  • Watch event: 2

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 931
  • Total Committers: 9
  • Avg Commits per committer: 103.444
  • Development Distribution Score (DDS): 0.151
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
steppi a****i@h****u 790
steppi a****i@h****s 43
Ben Gyori b****i@g****m 26
Albert Steppi a****i@A****l 25
John Bachman b****n@g****m 22
Albert Steppi a****i@g****m 15
Charles Tapley Hoyt c****t@g****m 8
Kyle Niemeyer k****r@g****m 1
Ubuntu u****u@i****l 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 10
  • Total pull requests: 71
  • Average time to close issues: 26 days
  • Average time to close pull requests: 5 days
  • Total issue authors: 6
  • Total pull request authors: 5
  • Average comments per issue: 2.0
  • Average comments per pull request: 0.28
  • Merged pull requests: 67
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • bgyori (3)
  • kkaris (2)
  • cthoyt (2)
  • arvindpdmn (1)
  • teja0508 (1)
  • izikeros (1)
Pull Request Authors
  • steppi (60)
  • cthoyt (8)
  • bgyori (4)
  • jim-sheldon (2)
  • kyleniemeyer (1)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

doc/requirements.txt pypi
  • appdirs *
  • boto3 *
  • flask *
  • nltk *
  • scikit-learn >=0.20.0
  • sphinx *
  • sphinx_rtd_theme *
setup.py pypi
  • appdirs *
  • boto3 *
  • flask *
  • nltk *
  • scikit-learn >=0.20.0
.github/workflows/tests.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
pyproject.toml pypi
  • appdirs *
  • boto3 *
  • flask *
  • nltk *
  • scikit-learn >=1.0