zensols-mednlp

Medical natural language parsing and utility library

https://github.com/plandes/mednlp

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.8%) to scientific vocabulary

Keywords

medical medical-natural-language-processing nlp nlp-parsing
Last synced: 7 months ago · JSON representation ·

Repository

Medical natural language parsing and utility library

Basic Info
Statistics
  • Stars: 11
  • Watchers: 2
  • Forks: 2
  • Open Issues: 0
  • Releases: 0
Topics
medical medical-natural-language-processing nlp nlp-parsing
Created about 4 years ago · Last pushed 9 months ago
Metadata Files
Readme Changelog Contributing License Citation

README.md

Medical natural language parsing and utility library

PyPI Python 3.11 Build Status

A natural language medical domain parsing library. This library:

  • Provides an interface to the UTS RESTful service with data caching (NIH login needed).
  • Wraps the MedCAT library by parsing medical and clinical text into first class Python objects reflecting the structure of the natural language complete with UMLS entity linking with CUIs and other domain specific features.
  • Combines non-medical (such as POS and NER tags) and medical features (such as CUIs) in one API and resulting data structure and/or as a Pandas data frame.
  • Provides cui2vec as a word embedding model for either fast indexing and access or to use directly as features in a Zensols Deep NLP embedding layer model.
  • Provides access to cTAKES using as a dictionary like Stash abstraction.
  • Includes a command line program to access all of these features without having to write any code.

Documentation

See the full documentation. The API reference is also available.

Installing

Install the library using a Python package manager such as pip: bash pip3 install zensols.mednlp

CUI Embeddings

To use the cui2vec to functionality, the embeddings must be manually downloaded. Start with this commands: bash mkdir -p ~/.cache/zensols/mednlp wget -O ~/.cache/zensols/mednlp/cui2vec.zip https://figshare.com/ndownloader/files/10959626?private_link=00d69861786cd0156d81 If the download fails or the file is not a zip file (rather an HTML error message text), then you will need to download the file manually by browsing to the file, and then moving it to ~/.cache/zensols/mednlp/cui2vec.zip.

Usage

To parse text, create features, and extract clinical concept identifiers: ```python

from zensols.mednlp import ApplicationFactory docparser = ApplicationFactory.getdocparser() doc = docparser('John was diagnosed with kidney failure') for tok in doc.tokens: print(tok.norm, tok.pos, tok.tag, tok.cui, tok.detectedname_) John PROPN NNP -- -- was AUX VBD -- -- diagnosed VERB VBN -- -- with ADP IN -- -- kidney NOUN NN C0035078 kidney~failure failure NOUN NN C0035078 kidney~failure print(doc.entities) (, ) ``` See the full example, and for other functionality, see the examples.

MedCAT Models

By default, this library uses the small MedCAT model used for tutorials, and is not sufficient for any serious project. To get the UMLS trained model,the MedCAT UMLS request form from be filled out (see the MedCAT repository).

After you obtain access and download the new model, add the following to ~/.mednlprc with the following:

ini [medcat_status_resource] url = file:///location/to/the/downloaded/file/umls_sm_wstatus_2021_oct.zip'

Attribution

This API utilizes the following frameworks:

  • MedCAT: used to extract information from Electronic Health Records (EHRs) and link it to biomedical ontologies like SNOMED-CT and UMLS.
  • cTAKES: a natural language processing system for extraction of information from electronic medical record clinical free-text.
  • cui2vec: a new set of (like word) embeddings for medical concepts learned using an extremely large collection of multimodal medical data.
  • Zensols Deep NLP library: a deep learning utility library for natural language processing that aids in feature engineering and embedding layers.
  • ctakes-parser: parses cTAKES output in to a Pandas data frame.

Citation

If you use this project in your research please use the following BibTeX entry:

bibtex @inproceedings{landes-etal-2023-deepzensols, title = "{D}eep{Z}ensols: A Deep Learning Natural Language Processing Framework for Experimentation and Reproducibility", author = "Landes, Paul and Di Eugenio, Barbara and Caragea, Cornelia", editor = "Tan, Liling and Milajevs, Dmitrijs and Chauhan, Geeticka and Gwinnup, Jeremy and Rippeth, Elijah", booktitle = "Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)", month = dec, year = "2023", address = "Singapore, Singapore", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.nlposs-1.16", pages = "141--146" }

Community

Please star the project and let me know how and where you use this API. Contributions as pull requests, feedback and any input is welcome.

Changelog

An extensive changelog is available here.

License

MIT License

Copyright (c) 2021 - 2025 Paul Landes

Owner

  • Name: Paul Landes
  • Login: plandes
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
title: >-
  DeepZensols: Deep Learning Framework
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
date-released: 2023-12-05
repository-code: https://github.com/plandes/deepnlp
authors:
  - given-names: Paul
    family-names: Landes
    email: landes@mailc.net
    affiliation: University of Illinois at Chicago
    orcid: 'https://orcid.org/0000-0003-0985-0864'
preferred-citation:
  type: conference-paper
  authors:
    - given-names: Paul
      family-names: Landes
      email: landes@mailc.net
      affiliation: University of Illinois at Chicago
      orcid: 'https://orcid.org/0000-0003-0985-0864'
    - given-names: Barbara
      family-names: Di Eugenio
      affiliation: University of Illinois at Chicago
    - given-names: Cornelia
      family-names: Caragea
      affiliation: University of Illinois at Chicago
  title: >-
    DeepZensols: A Deep Learning Natural Language Processing Framework for
    Experimentation and Reproducibility
  url: https://aclanthology.org/2023.nlposs-1.16/
  year: 2023
  conference:
    name: >-
      Proceedings of the 3rd Workshop for Natural Language Processing Open
      Source Software, Empirical Methods in Natural Language Processing
    city: Singapore
    country: SG
    date-start: 2023-12-05
    date-end: 2023-12-05

GitHub Events

Total
  • Issues event: 1
  • Watch event: 1
  • Issue comment event: 2
  • Push event: 21
  • Create event: 4
Last Year
  • Issues event: 1
  • Watch event: 1
  • Issue comment event: 2
  • Push event: 21
  • Create event: 4

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 1
  • Total pull requests: 0
  • Average time to close issues: 6 months
  • Average time to close pull requests: N/A
  • Total issue authors: 1
  • Total pull request authors: 0
  • Average comments per issue: 1.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • stardust-xc (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 87 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 19
  • Total maintainers: 1
pypi.org: zensols-mednlp

A natural language medical domain parsing library.

  • Versions: 19
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 87 Last month
Rankings
Dependent packages count: 8.9%
Downloads: 14.3%
Average: 24.5%
Dependent repos count: 50.4%
Maintainers (1)
Last synced: 8 months ago

Dependencies

src/python/requirements.txt pypi
  • ctakes-parser ==0.1.0
  • medcat ==1.2.5
  • numpy <1.22.0,>=1.19.0
  • pandas >=1.2.4
  • scispacy ==0.4.0
  • zensols.install *
  • zensols.nlp *
.github/workflows/test.yml actions
  • actions/checkout v2.4.0 composite
  • actions/setup-python v2 composite
resources/requirements/model.txt pypi
resources/requirements/scispacy.txt pypi
  • scispacy ==0.5.3
src/python/setup.py pypi
resources/requirements/model-extra.txt pypi