bio-epidemiology-ner
Recognize bio-medical entities from a text corpus
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: plos.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.4%) to scientific vocabulary
Keywords
Repository
Recognize bio-medical entities from a text corpus
Basic Info
Statistics
- Stars: 120
- Watchers: 2
- Forks: 9
- Open Issues: 5
- Releases: 0
Topics
Metadata Files
README.md
Bio-Epidemiology-NER is an Python library built on top of biomedical-ner-all model to recognize bio-medical entities from a corpus or a medical report
| Feature | Output | |---|---| | Named Entity Recognition | Recognize 84 bio-medical entities | | PDF Input | Read Pdf and tabulate the entities| | PDF Annotation | Annotate Entities in a medical pdf report|
Tutorial
Installation
Use the package manager pip to install Bio-Epidemiology-NER
bash
pip install Bio-Epidemiology-NER
This package has dependency over Pytorch, please install the required configuration from this link https://pytorch.org/get-started/locally/
Usage
NER with Bio-Epidemiology-NER
```python
load all the functions
from BioEpidemiologyNER.biorecognizer import nerprediction
returns the predicted class along with the probability of the actual EnvBert model
doc = """ CASE: A 28-year-old previously healthy man presented with a 6-week history of palpitations. The symptoms occurred during rest, 2–3 times per week, lasted up to 30 minutes at a time and were associated with dyspnea. Except for a grade 2/6 holosystolic tricuspid regurgitation murmur (best heard at the left sternal border with inspiratory accentuation), physical examination yielded unremarkable findings. """
returns a dataframe output
ner_prediction(corpus=doc, compute='cpu') #pass compute='gpu' if using gpu
```
Annotate the entities in a Medical Report and export as pdf/csv format
```python
load all the functions
from BioEpidemiologyNER.biorecognizer import pdfannotate
enter pdf file name
pdffile = 'Alhashash-2020-Emergency surgical management.pdf'
returns a annotated pdf file
pdfannotate(pdffile,compute='cpu', outputformat='pdf') #pass compute='gpu' if using gpu
returns a csv file with entities
pdfannotate(pdffile,compute='cpu', outputformat='csv') #pass compute='gpu' if using gpu
return both annotated pdf and csv file
pdfannotate(pdffile,compute='cpu', outputformat='all') #pass compute='gpu' if using gpu
```
About the Model
The model within this package is an English Named Entity Recognition model, trained on Maccrobat to recognize the bio-medical entities (84 entities) from a given text corpus (case reports etc.). This model was built on top of distilbert-base-uncased
- Dataset : Maccrobat https://figshare.com/articles/dataset/MACCROBAT2018/9764942
- Carbon emission : 0.0279399890043426 Kg
- Training time : 30.16527 minute
- GPU used : 1 x GeForce RTX 3060 Laptop GPU
for more details regarding the entities supported, check the config file https://huggingface.co/d4data/biomedical-ner-all/blob/main/config.json
Ownership & License
This Package is part of the Research topic "AI in Biomedical field" conducted by Deepak John Reji, Shaina Raza. If you use this work (code, model or dataset),
Please cite our Research Paper
and star at: https://github.com/dreji18/biomedicalNER
MIT License
You can support me :)
Owner
- Name: Deepak John Reji
- Login: dreji18
- Kind: user
- Location: Bangalore
- Company: ERM
- Website: https://www.youtube.com/channel/UCgOwsx5injeaB_TKGsVD5GQ
- Repositories: 6
- Profile: https://github.com/dreji18
I am an NLP practitioner with experience in developing and structuring solutions for the data science environment
Citation (CITATION.cff)
@article{raza2022large,
title={Large-scale application of named entity recognition to biomedicine and epidemiology},
author={Raza, Shaina and Reji, Deepak John and Shajan, Femi and Bashir, Syed Raza},
journal={PLOS Digital Health},
volume={1},
number={12},
pages={e0000152},
year={2022},
publisher={Public Library of Science San Francisco, CA USA}
}
https://journals.plos.org/digitalhealth/article/metrics?id=10.1371/journal.pdig.0000152
GitHub Events
Total
- Watch event: 14
- Issue comment event: 3
- Fork event: 1
Last Year
- Watch event: 14
- Issue comment event: 3
- Fork event: 1
Committers
Last synced: about 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| Deepak John Reji | 4****8 | 45 |
| shainaraza | 3****a | 3 |
| deepak | d****4@g****m | 2 |
Issues and Pull Requests
Last synced: over 1 year ago
All Time
- Total issues: 12
- Total pull requests: 0
- Average time to close issues: 3 months
- Average time to close pull requests: N/A
- Total issue authors: 12
- Total pull request authors: 0
- Average comments per issue: 3.42
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 3
- Pull requests: 0
- Average time to close issues: 3 months
- Average time to close pull requests: N/A
- Issue authors: 3
- Pull request authors: 0
- Average comments per issue: 1.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- mysteryjeans (1)
- juanwisz (1)
- hopez2024 (1)
- parthplc (1)
- yonigottesman (1)
- BhavyaShah1234 (1)
- dustn1259 (1)
- paritoshk (1)
- karla-desouza (1)
- savarander (1)
- Arjuman23 (1)
- WhiteWolf47 (1)
Pull Request Authors
- kamelCased (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- nltk *
- pandas *
- transformers *
