bio-epidemiology-ner

Recognize bio-medical entities from a text corpus

https://github.com/dreji18/bio-epidemiology-ner

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: plos.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.4%) to scientific vocabulary

Keywords

biomedical epidemiology ner nlp transformers
Last synced: 6 months ago · JSON representation ·

Repository

Recognize bio-medical entities from a text corpus

Basic Info
  • Host: GitHub
  • Owner: dreji18
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 16.3 MB
Statistics
  • Stars: 120
  • Watchers: 2
  • Forks: 9
  • Open Issues: 5
  • Releases: 0
Topics
biomedical epidemiology ner nlp transformers
Created over 3 years ago · Last pushed over 2 years ago
Metadata Files
Readme Citation

README.md

Bio-Epidemiology-NER is an Python library built on top of biomedical-ner-all model to recognize bio-medical entities from a corpus or a medical report

Downloads CI CI CI

| Feature | Output | |---|---| | Named Entity Recognition | Recognize 84 bio-medical entities | | PDF Input | Read Pdf and tabulate the entities| | PDF Annotation | Annotate Entities in a medical pdf report|

Tutorial

Installation

Use the package manager pip to install Bio-Epidemiology-NER

bash pip install Bio-Epidemiology-NER

This package has dependency over Pytorch, please install the required configuration from this link https://pytorch.org/get-started/locally/

Usage

NER with Bio-Epidemiology-NER

```python

load all the functions

from BioEpidemiologyNER.biorecognizer import nerprediction

returns the predicted class along with the probability of the actual EnvBert model

doc = """ CASE: A 28-year-old previously healthy man presented with a 6-week history of palpitations. The symptoms occurred during rest, 2–3 times per week, lasted up to 30 minutes at a time and were associated with dyspnea. Except for a grade 2/6 holosystolic tricuspid regurgitation murmur (best heard at the left sternal border with inspiratory accentuation), physical examination yielded unremarkable findings. """

returns a dataframe output

ner_prediction(corpus=doc, compute='cpu') #pass compute='gpu' if using gpu

```

Annotate the entities in a Medical Report and export as pdf/csv format

```python

load all the functions

from BioEpidemiologyNER.biorecognizer import pdfannotate

enter pdf file name

pdffile = 'Alhashash-2020-Emergency surgical management.pdf'

returns a annotated pdf file

pdfannotate(pdffile,compute='cpu', outputformat='pdf') #pass compute='gpu' if using gpu

returns a csv file with entities

pdfannotate(pdffile,compute='cpu', outputformat='csv') #pass compute='gpu' if using gpu

return both annotated pdf and csv file

pdfannotate(pdffile,compute='cpu', outputformat='all') #pass compute='gpu' if using gpu

```

About the Model

The model within this package is an English Named Entity Recognition model, trained on Maccrobat to recognize the bio-medical entities (84 entities) from a given text corpus (case reports etc.). This model was built on top of distilbert-base-uncased

  • Dataset : Maccrobat https://figshare.com/articles/dataset/MACCROBAT2018/9764942
  • Carbon emission : 0.0279399890043426 Kg
  • Training time : 30.16527 minute
  • GPU used : 1 x GeForce RTX 3060 Laptop GPU

for more details regarding the entities supported, check the config file https://huggingface.co/d4data/biomedical-ner-all/blob/main/config.json

Ownership & License

This Package is part of the Research topic "AI in Biomedical field" conducted by Deepak John Reji, Shaina Raza. If you use this work (code, model or dataset),

Please cite our Research Paper

and star at: https://github.com/dreji18/biomedicalNER

MIT License

You can support me :)

Buy Me A Coffee

Owner

  • Name: Deepak John Reji
  • Login: dreji18
  • Kind: user
  • Location: Bangalore
  • Company: ERM

I am an NLP practitioner with experience in developing and structuring solutions for the data science environment

Citation (CITATION.cff)

@article{raza2022large,
  title={Large-scale application of named entity recognition to biomedicine and epidemiology},
  author={Raza, Shaina and Reji, Deepak John and Shajan, Femi and Bashir, Syed Raza},
  journal={PLOS Digital Health},
  volume={1},
  number={12},
  pages={e0000152},
  year={2022},
  publisher={Public Library of Science San Francisco, CA USA}
}
https://journals.plos.org/digitalhealth/article/metrics?id=10.1371/journal.pdig.0000152

GitHub Events

Total
  • Watch event: 14
  • Issue comment event: 3
  • Fork event: 1
Last Year
  • Watch event: 14
  • Issue comment event: 3
  • Fork event: 1

Committers

Last synced: about 2 years ago

All Time
  • Total Commits: 50
  • Total Committers: 3
  • Avg Commits per committer: 16.667
  • Development Distribution Score (DDS): 0.1
Past Year
  • Commits: 20
  • Committers: 1
  • Avg Commits per committer: 20.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Deepak John Reji 4****8 45
shainaraza 3****a 3
deepak d****4@g****m 2

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 12
  • Total pull requests: 0
  • Average time to close issues: 3 months
  • Average time to close pull requests: N/A
  • Total issue authors: 12
  • Total pull request authors: 0
  • Average comments per issue: 3.42
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 3
  • Pull requests: 0
  • Average time to close issues: 3 months
  • Average time to close pull requests: N/A
  • Issue authors: 3
  • Pull request authors: 0
  • Average comments per issue: 1.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • mysteryjeans (1)
  • juanwisz (1)
  • hopez2024 (1)
  • parthplc (1)
  • yonigottesman (1)
  • BhavyaShah1234 (1)
  • dustn1259 (1)
  • paritoshk (1)
  • karla-desouza (1)
  • savarander (1)
  • Arjuman23 (1)
  • WhiteWolf47 (1)
Pull Request Authors
  • kamelCased (1)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

requirements.txt pypi
  • nltk *
  • pandas *
  • transformers *
setup.py pypi