transformer-deid

Deidentify medical data with transformers

https://github.com/kind-lab/transformer-deid

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.4%) to scientific vocabulary
Last synced: 7 months ago · JSON representation ·

Repository

Deidentify medical data with transformers

Basic Info
  • Host: GitHub
  • Owner: kind-lab
  • Language: Python
  • Default Branch: main
  • Size: 253 KB
Statistics
  • Stars: 6
  • Watchers: 2
  • Forks: 5
  • Open Issues: 2
  • Releases: 1
Created about 5 years ago · Last pushed over 2 years ago
Metadata Files
Readme Citation

README.md

transformer-deid

Fine tune transformer models to deidentify clinical medical data.

Setup

Install dependencies in a conda environment: conda env create -n transformer_deid --file environment.yml

Data

Data must be in CSV stand-off format: a subfolder (txt/) contains the documents in individual text files with the document identifier as the file stem and .txt as the extension. Another subfolder (ann/) contains a set of CSV files with the annotations with the same document identifier as the file stem and .gs as the extension. The tests/data subfolder contains an example of documents stored in this format.

Training

Models supported: - BERT - DistilBERT - RoBERTa

To run from the repository directory, python transformer_deid/train.py -m <model_architecture> -i <dataset path> -o <output path> -e <number of epochs>

Options: * -m --model_architecture Name of model {bert | distilbert | roberta}. * -i --train_path Path to dataset directory. * -o --output_path Model save directory. * -e --epochs Number of epochs.

Evaluation

For evaluation, see Pyclipse.

Owner

  • Name: KinD lab
  • Login: kind-lab
  • Kind: organization

Citation (CITATION.cff)

cff-version: 1.0.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Moore
    given-names: Callandra
    orcid: https://orcid.org/0009-0008-3801-3137
  - family-names: Bulgarelli
    given-names: Lucas
    orcid: https://orcid.org/0000-0001-5456-2170
  - family-names: Pollard
    given-names: Tom
    orcid: https://orcid.org/0000-0002-5676-7898
  - family-names: Johnson
    given-names: Alistair
    orcid: https://orcid.org/0000-0002-8735-3014
title: "Transformer-DeID"
version: 1.0.0
doi: 
date-released: 2023-09-08

GitHub Events

Total
  • Issues event: 1
  • Watch event: 4
Last Year
  • Issues event: 1
  • Watch event: 4

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 1
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 1
  • Total pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • gubowen2 (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

environment.yml pypi
  • seqeval *
setup.py pypi