transformer-deid
Deidentify medical data with transformers
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.4%) to scientific vocabulary
Repository
Deidentify medical data with transformers
Basic Info
- Host: GitHub
- Owner: kind-lab
- Language: Python
- Default Branch: main
- Size: 253 KB
Statistics
- Stars: 6
- Watchers: 2
- Forks: 5
- Open Issues: 2
- Releases: 1
Metadata Files
README.md
transformer-deid
Fine tune transformer models to deidentify clinical medical data.
Setup
Install dependencies in a conda environment:
conda env create -n transformer_deid --file environment.yml
Data
Data must be in CSV stand-off format: a subfolder (txt/) contains the documents in individual text files with the document identifier as the file stem and .txt as the extension. Another subfolder (ann/) contains a set of CSV files with the annotations with the same document identifier as the file stem and .gs as the extension. The tests/data subfolder contains an example of documents stored in this format.
Training
Models supported: - BERT - DistilBERT - RoBERTa
To run from the repository directory,
python transformer_deid/train.py -m <model_architecture> -i <dataset path> -o <output path> -e <number of epochs>
Options:
* -m --model_architecture Name of model {bert | distilbert | roberta}.
* -i --train_path Path to dataset directory.
* -o --output_path Model save directory.
* -e --epochs Number of epochs.
Evaluation
For evaluation, see Pyclipse.
Owner
- Name: KinD lab
- Login: kind-lab
- Kind: organization
- Repositories: 14
- Profile: https://github.com/kind-lab
Citation (CITATION.cff)
cff-version: 1.0.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Moore
given-names: Callandra
orcid: https://orcid.org/0009-0008-3801-3137
- family-names: Bulgarelli
given-names: Lucas
orcid: https://orcid.org/0000-0001-5456-2170
- family-names: Pollard
given-names: Tom
orcid: https://orcid.org/0000-0002-5676-7898
- family-names: Johnson
given-names: Alistair
orcid: https://orcid.org/0000-0002-8735-3014
title: "Transformer-DeID"
version: 1.0.0
doi:
date-released: 2023-09-08
GitHub Events
Total
- Issues event: 1
- Watch event: 4
Last Year
- Issues event: 1
- Watch event: 4
Issues and Pull Requests
Last synced: 7 months ago
All Time
- Total issues: 1
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 1
- Total pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- gubowen2 (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- seqeval *