medkit-lib

Toolkit for a learning health system

https://github.com/medkit-lib/medkit

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.8%) to scientific vocabulary

Keywords

bert digital-health electronic-health-records nlp umls
Last synced: 6 months ago · JSON representation ·

Repository

Toolkit for a learning health system

Basic Info
  • Host: GitHub
  • Owner: medkit-lib
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage: https://medkit-lib.org/
  • Size: 9.26 MB
Statistics
  • Stars: 18
  • Watchers: 4
  • Forks: 13
  • Open Issues: 7
  • Releases: 8
Topics
bert digital-health electronic-health-records nlp umls
Created over 2 years ago · Last pushed 11 months ago
Metadata Files
Readme Changelog Contributing License Citation

README.md

medkit

medkit logo

| | | |---------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | CI | docs status pre-commit status test: status | | Package | PyPI version PyPI Python versions | | Project | License: MIT Formatter: Ruff Project: Hatch |


medkit is a toolkit for a learning health system, developed by the HeKA research team.

This python library aims at:

  1. Facilitating the manipulation of healthcare data of various modalities (e.g., structured, text, audio data) for the extraction of relevant features.

  2. Developing supervised models from these various modalities for decision support in healthcare.

Installation

To install medkit with basic functionalities:

console pip install medkit-lib

To install medkit with all its optional features:

console pip install 'medkit-lib[all]'

Example

A basic named-entity recognition pipeline using medkit:

```python

1. Define individual operations.

from medkit.text.preprocessing import CharReplacer, LIGATURERULES, SIGNRULES from medkit.text.segmentation import SentenceTokenizer, SyntagmaTokenizer from medkit.text.context.negationdetector import NegationDetector from medkit.text.ner.hfentity_matcher import HFEntityMatcher

Preprocessing

charreplacer = CharReplacer(rules=LIGATURERULES + SIGN_RULES)

Segmentation

senttokenizer = SentenceTokenizer(outputlabel="sentence") synttokenizer = SyntagmaTokenizer(outputlabel="syntagma")

Negation detection

negdetector = NegationDetector(outputlabel="is_negated")

Entity recognition

entitymatcher = HFEntityMatcher(model="my-BERT-model", attrstocopy=["isnegated"])

2. Combine operations into a pipeline.

from medkit.core.pipeline import Pipeline, PipelineStep

nerpipeline = Pipeline( inputkeys=["fulltext"], outputkeys=["entities"], steps=[ PipelineStep(charreplacer, inputkeys=["fulltext"], outputkeys=["cleantext"]), PipelineStep(senttokenizer, inputkeys=["cleantext"], outputkeys=["sentences"]), PipelineStep(synttokenizer, inputkeys=["sentences"], outputkeys=["syntagmas"]), PipelineStep(negdetector, inputkeys=["syntagmas"], outputkeys=[]), PipelineStep(entitymatcher, inputkeys=["syntagmas"], outputkeys=["entities"]), ], )

3. Run the NER pipeline on a BRAT document.

from medkit.io import BratInputConverter

docs = BratInputConverter().load(path="/path/to/dataset/") entities = nerpipeline.run([doc.rawsegment for doc in docs]) ```

Getting started

To get started with medkit, please checkout our documentation.

This documentation also contains tutorials and examples showcasing the use of medkit for different tasks.

Contributing

Thank you for your interest into medkit !

We'll be happy to get your inputs !

If your problem has not been reported by another user, please open an issue, whether it's for:

  • reporting a bug,
  • discussing the current state of the code,
  • submitting a fix,
  • proposing new features,
  • or contributing to documentation, ...

If you want to propose a pull request, you can read CONTRIBUTING.md.

Contact

Feel free to contact us by sending an email to medkit-maintainers@inria.fr.

Owner

  • Name: medkit-lib
  • Login: medkit-lib
  • Kind: organization

Citation (CITATION.cff)

# Citation File Format <https://doi.org/10.5281/zenodo.1003149>
abstract: 'Phenotyping consists in applying algorithms to identify individuals associated with a specific, potentially complex, trait or condition, typically out of a collection of Electronic Health Records (EHRs). Because a lot of the clinical information of EHRs are lying in texts, phenotyping from text takes an important role in studies that rely on the secondary use of EHRs. However, the heterogeneity and highly specialized aspect of both the content and form of clinical texts makes this task particularly tedious, and is the source of time and cost constraints in observational studies. To facilitate the development, evaluation and reproductibility of phenotyping pipelines, we developed an open-source Python library named medkit. It enables composing data processing pipelines made of easy-to-reuse software bricks, named medkit operations. In addition to the core of the library, we share the operations and pipelines we already developed and invite the phenotyping community for their reuse and enrichment.'
authors:
  - family-names: Neuraz
    given-names: Antoine
    orcid: 0000-0001-7142-6728
  - family-names: Vaillant
    given-names: Ghislain
    orcid: 0000-0003-0267-3033
  - family-names: Arias
    given-names: Camila
  - family-names: Birot
    given-names: Olivier
  - family-names: Huynh
    given-names: Kim-Tam
  - family-names: Thibaut
    given-names: Fabacher
  - family-names: Alice
    given-names: Rogier
    orcid: 0000-0002-5499-3197
  - family-names: Nicolas
    given-names: Garcelon
    orcid: 0000-0002-3326-2811
  - family-names: Ivan
    given-names: Lerner
    orcid: 0000-0002-5466-1707
  - family-names: Rance
    given-names: Bastien
    orcid: 0000-0003-4417-1197
  - family-names: Coulet
    given-names: Adrien
    orcid: 0000-0002-1466-062X
cff-version: '1.2.0'
date-released: '2024-08-30'
doi: 10.48550/arXiv.2409.00164
keywords:
  - clinical texts
  - feature extraction
  - open science
  - phenotyping
  - reproducible computing
license: MIT
message: 'If you use this software, please cite it using the metadata from this file.'
repository-code: https://github.com/medkit-lib/medkit
title: 'Facilitating phenotyping from clinical texts: the medkit library'
type: software
url: https://www.medkit-lib.org/

GitHub Events

Total
  • Issues event: 4
  • Watch event: 4
  • Delete event: 2
  • Issue comment event: 4
  • Push event: 12
  • Pull request event: 6
  • Pull request review event: 1
  • Fork event: 5
  • Create event: 3
Last Year
  • Issues event: 4
  • Watch event: 4
  • Delete event: 2
  • Issue comment event: 4
  • Push event: 12
  • Pull request event: 6
  • Pull request review event: 1
  • Fork event: 5
  • Create event: 3

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 24
  • Total pull requests: 74
  • Average time to close issues: 7 days
  • Average time to close pull requests: 9 days
  • Total issue authors: 6
  • Total pull request authors: 5
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.39
  • Merged pull requests: 62
  • Bot issues: 0
  • Bot pull requests: 2
Past Year
  • Issues: 24
  • Pull requests: 74
  • Average time to close issues: 7 days
  • Average time to close pull requests: 9 days
  • Issue authors: 6
  • Pull request authors: 5
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.39
  • Merged pull requests: 62
  • Bot issues: 0
  • Bot pull requests: 2
Top Authors
Issue Authors
  • ghisvail (5)
  • coulet (3)
  • DrFabach (3)
  • Thibeb (2)
  • vincentzossou (1)
  • valentin-marie (1)
  • nourG22 (1)
Pull Request Authors
  • ghisvail (74)
  • cpetresc (6)
  • Thibeb (5)
  • DrFabach (4)
  • dependabot[bot] (3)
  • val461 (1)
Top Labels
Issue Labels
bug (1)
Pull Request Labels
dependencies (3)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 120 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 19
  • Total maintainers: 2
pypi.org: medkit-lib

A Python library for a learning health system

  • Versions: 19
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 120 Last month
Rankings
Dependent packages count: 7.0%
Average: 18.7%
Dependent repos count: 30.5%
Maintainers (2)
Last synced: 6 months ago

Dependencies

.github/workflows/lint.yaml actions
  • actions/checkout v4 composite
  • actions/setup-python v4 composite
  • pre-commit/action v3.0.0 composite
.github/workflows/test.yaml actions
  • actions/checkout v4 composite
  • actions/setup-python v4 composite
  • snok/install-poetry v1 composite
poetry.lock pypi
  • 317 dependencies
pyproject.toml pypi
  • PyRuSH ^1.0
  • Unidecode *
  • duptextfinder *
  • edsnlp ^0.9
  • feather-format ^0.4
  • flashtext ^2.7
  • iamsystem >=0.3
  • intervaltree *
  • numpy *
  • packaging *
  • pandas --- - !ruby/hash:ActiveSupport::HashWithIndifferentAccess version: "^1.4" python: ">=3.8, <4.0" optional: true
  • pyaml *
  • pyannote-audio ^3.1
  • pyannote-core ^5.0.0
  • pyannote-metrics ^3.2.0
  • pysrt ^1.1.2
  • python >=3.8, <4.0
  • quickumls ^1.4
  • requests *
  • resampy ^0.4
  • sacremoses *
  • scikit-learn ^1.3.2
  • sentencepiece *
  • seqeval ^1.2.2
  • smart-open *
  • soundfile *
  • spacy ^3.4
  • speechbrain ^0.5
  • torch ^2.1.1
  • torchaudio ^2.1.1
  • tqdm *
  • transformers ^4.21
  • typing-extensions *
  • unqlite ^0.9.6
  • webrtcvad ^2.0
  • wheel *
.github/workflows/release.yaml actions
  • actions/checkout v4 composite
  • actions/setup-python v4 composite
  • pypa/gh-action-pypi-publish release/v1 composite