Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.8%) to scientific vocabulary
Keywords
Repository
Toolkit for a learning health system
Basic Info
- Host: GitHub
- Owner: medkit-lib
- License: mit
- Language: Python
- Default Branch: main
- Homepage: https://medkit-lib.org/
- Size: 9.26 MB
Statistics
- Stars: 18
- Watchers: 4
- Forks: 13
- Open Issues: 7
- Releases: 8
Topics
Metadata Files
README.md
medkit

| | |
|---------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| CI |
|
| Package |
|
| Project |
|
medkit is a toolkit for a learning health system, developed by the HeKA research team.
This python library aims at:
Facilitating the manipulation of healthcare data of various modalities (e.g., structured, text, audio data) for the extraction of relevant features.
Developing supervised models from these various modalities for decision support in healthcare.
Installation
To install medkit with basic functionalities:
console
pip install medkit-lib
To install medkit with all its optional features:
console
pip install 'medkit-lib[all]'
Example
A basic named-entity recognition pipeline using medkit:
```python
1. Define individual operations.
from medkit.text.preprocessing import CharReplacer, LIGATURERULES, SIGNRULES from medkit.text.segmentation import SentenceTokenizer, SyntagmaTokenizer from medkit.text.context.negationdetector import NegationDetector from medkit.text.ner.hfentity_matcher import HFEntityMatcher
Preprocessing
charreplacer = CharReplacer(rules=LIGATURERULES + SIGN_RULES)
Segmentation
senttokenizer = SentenceTokenizer(outputlabel="sentence") synttokenizer = SyntagmaTokenizer(outputlabel="syntagma")
Negation detection
negdetector = NegationDetector(outputlabel="is_negated")
Entity recognition
entitymatcher = HFEntityMatcher(model="my-BERT-model", attrstocopy=["isnegated"])
2. Combine operations into a pipeline.
from medkit.core.pipeline import Pipeline, PipelineStep
nerpipeline = Pipeline( inputkeys=["fulltext"], outputkeys=["entities"], steps=[ PipelineStep(charreplacer, inputkeys=["fulltext"], outputkeys=["cleantext"]), PipelineStep(senttokenizer, inputkeys=["cleantext"], outputkeys=["sentences"]), PipelineStep(synttokenizer, inputkeys=["sentences"], outputkeys=["syntagmas"]), PipelineStep(negdetector, inputkeys=["syntagmas"], outputkeys=[]), PipelineStep(entitymatcher, inputkeys=["syntagmas"], outputkeys=["entities"]), ], )
3. Run the NER pipeline on a BRAT document.
from medkit.io import BratInputConverter
docs = BratInputConverter().load(path="/path/to/dataset/") entities = nerpipeline.run([doc.rawsegment for doc in docs]) ```
Getting started
To get started with medkit, please checkout our documentation.
This documentation also contains tutorials and examples showcasing the use of medkit for different tasks.
Contributing
Thank you for your interest into medkit !
We'll be happy to get your inputs !
If your problem has not been reported by another user, please open an issue, whether it's for:
- reporting a bug,
- discussing the current state of the code,
- submitting a fix,
- proposing new features,
- or contributing to documentation, ...
If you want to propose a pull request, you can read CONTRIBUTING.md.
Contact
Feel free to contact us by sending an email to medkit-maintainers@inria.fr.
Owner
- Name: medkit-lib
- Login: medkit-lib
- Kind: organization
- Repositories: 2
- Profile: https://github.com/medkit-lib
Citation (CITATION.cff)
# Citation File Format <https://doi.org/10.5281/zenodo.1003149>
abstract: 'Phenotyping consists in applying algorithms to identify individuals associated with a specific, potentially complex, trait or condition, typically out of a collection of Electronic Health Records (EHRs). Because a lot of the clinical information of EHRs are lying in texts, phenotyping from text takes an important role in studies that rely on the secondary use of EHRs. However, the heterogeneity and highly specialized aspect of both the content and form of clinical texts makes this task particularly tedious, and is the source of time and cost constraints in observational studies. To facilitate the development, evaluation and reproductibility of phenotyping pipelines, we developed an open-source Python library named medkit. It enables composing data processing pipelines made of easy-to-reuse software bricks, named medkit operations. In addition to the core of the library, we share the operations and pipelines we already developed and invite the phenotyping community for their reuse and enrichment.'
authors:
- family-names: Neuraz
given-names: Antoine
orcid: 0000-0001-7142-6728
- family-names: Vaillant
given-names: Ghislain
orcid: 0000-0003-0267-3033
- family-names: Arias
given-names: Camila
- family-names: Birot
given-names: Olivier
- family-names: Huynh
given-names: Kim-Tam
- family-names: Thibaut
given-names: Fabacher
- family-names: Alice
given-names: Rogier
orcid: 0000-0002-5499-3197
- family-names: Nicolas
given-names: Garcelon
orcid: 0000-0002-3326-2811
- family-names: Ivan
given-names: Lerner
orcid: 0000-0002-5466-1707
- family-names: Rance
given-names: Bastien
orcid: 0000-0003-4417-1197
- family-names: Coulet
given-names: Adrien
orcid: 0000-0002-1466-062X
cff-version: '1.2.0'
date-released: '2024-08-30'
doi: 10.48550/arXiv.2409.00164
keywords:
- clinical texts
- feature extraction
- open science
- phenotyping
- reproducible computing
license: MIT
message: 'If you use this software, please cite it using the metadata from this file.'
repository-code: https://github.com/medkit-lib/medkit
title: 'Facilitating phenotyping from clinical texts: the medkit library'
type: software
url: https://www.medkit-lib.org/
GitHub Events
Total
- Issues event: 4
- Watch event: 4
- Delete event: 2
- Issue comment event: 4
- Push event: 12
- Pull request event: 6
- Pull request review event: 1
- Fork event: 5
- Create event: 3
Last Year
- Issues event: 4
- Watch event: 4
- Delete event: 2
- Issue comment event: 4
- Push event: 12
- Pull request event: 6
- Pull request review event: 1
- Fork event: 5
- Create event: 3
Issues and Pull Requests
Last synced: over 1 year ago
All Time
- Total issues: 24
- Total pull requests: 74
- Average time to close issues: 7 days
- Average time to close pull requests: 9 days
- Total issue authors: 6
- Total pull request authors: 5
- Average comments per issue: 1.0
- Average comments per pull request: 0.39
- Merged pull requests: 62
- Bot issues: 0
- Bot pull requests: 2
Past Year
- Issues: 24
- Pull requests: 74
- Average time to close issues: 7 days
- Average time to close pull requests: 9 days
- Issue authors: 6
- Pull request authors: 5
- Average comments per issue: 1.0
- Average comments per pull request: 0.39
- Merged pull requests: 62
- Bot issues: 0
- Bot pull requests: 2
Top Authors
Issue Authors
- ghisvail (5)
- coulet (3)
- DrFabach (3)
- Thibeb (2)
- vincentzossou (1)
- valentin-marie (1)
- nourG22 (1)
Pull Request Authors
- ghisvail (74)
- cpetresc (6)
- Thibeb (5)
- DrFabach (4)
- dependabot[bot] (3)
- val461 (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 120 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 19
- Total maintainers: 2
pypi.org: medkit-lib
A Python library for a learning health system
- Documentation: https://medkit.readthedocs.io
- License: MIT License
-
Latest release: 0.17.0
published 11 months ago
Rankings
Maintainers (2)
Dependencies
- actions/checkout v4 composite
- actions/setup-python v4 composite
- pre-commit/action v3.0.0 composite
- actions/checkout v4 composite
- actions/setup-python v4 composite
- snok/install-poetry v1 composite
- 317 dependencies
- PyRuSH ^1.0
- Unidecode *
- duptextfinder *
- edsnlp ^0.9
- feather-format ^0.4
- flashtext ^2.7
- iamsystem >=0.3
- intervaltree *
- numpy *
- packaging *
- pandas --- - !ruby/hash:ActiveSupport::HashWithIndifferentAccess version: "^1.4" python: ">=3.8, <4.0" optional: true
- pyaml *
- pyannote-audio ^3.1
- pyannote-core ^5.0.0
- pyannote-metrics ^3.2.0
- pysrt ^1.1.2
- python >=3.8, <4.0
- quickumls ^1.4
- requests *
- resampy ^0.4
- sacremoses *
- scikit-learn ^1.3.2
- sentencepiece *
- seqeval ^1.2.2
- smart-open *
- soundfile *
- spacy ^3.4
- speechbrain ^0.5
- torch ^2.1.1
- torchaudio ^2.1.1
- tqdm *
- transformers ^4.21
- typing-extensions *
- unqlite ^0.9.6
- webrtcvad ^2.0
- wheel *
- actions/checkout v4 composite
- actions/setup-python v4 composite
- pypa/gh-action-pypi-publish release/v1 composite