brain-mri-nlp

This repository contains a project files, scripts and configs written during the development of a spaCy span categorisation NLP project for the automated extraction of mentions of volumetric assessment from brain MRI reports in a memory clinic context.

https://github.com/adam-mayers/brain-mri-nlp

Last synced: 10 months ago · JSON representation ·

Repository

This repository contains a project files, scripts and configs written during the development of a spaCy span categorisation NLP project for the automated extraction of mentions of volumetric assessment from brain MRI reports in a memory clinic context.

Basic Info

Host: GitHub
Owner: adam-mayers
Language: Jupyter Notebook
Default Branch: master
Homepage: https://github.com/adam-mayers/brain-mri-nlp
Size: 431 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 1

Created about 3 years ago · Last pushed 12 months ago

Metadata Files

Readme Citation

README.md

brain-mri-nlp

This repository contains a project files, scripts and configs written during the development of a spaCy span categorisation NLP project for the automated extraction of mentions of volumetric assessment from brain MRI reports in a memory clinic context. This includes scripts and project.yml options for custom visualisation using adapted spacy-streamlit, kfold cross-validation with per-label performance, and directly calculated precision and recall with confidence intervals (including on a per-label basis)

project.yml

The project.yml defines the data assets required by the project, as well as the available commands and workflows. For details, see the spaCy projects documentation.

Commands

The following new commands are defined within the project.yml project file. They can be executed using spacy project run [name].

visualize-model
- Visualize the model's output interactively using Streamlit. This is based on the spacy-streamlit source, with additions to customise colours for each category (with matching formatting in the output table below), add the spancat component scores to the output table with formatting options, navigation through records and also allows dynamic text input (enter text in the left hand box and then Ctrl+Enter).
- Once run from the command line this will provide a URL to view the streamlit output.
- Example output (NB: not real patient data!):
evaluate-kfold
- Evaluate using k-fold cross validation, with the number of folds set in project.yml. This is heavily based on this GitHub project.
- This has been updated to output all of the metrics for each span category for each fold to ./metrics/kfold.jsonl (once all folds are completed), not just the average of the top-level precision/recall/f-score. This is useful if wanting to assess performance on a per-fold and/or per-category basis.
- Also added further information at the end of each fold to output the appended metrics and add a timestamp (this is useful for debugging and sanity checking in situations where running the full number of folds may take a non-trivial amount of time).
- Also added wandb (weights & biases) integration, although this only outputs the final scores for each fold and does not do the dynamic tracking while each fold is training. However, this is not really required and is still useful for running experiments / hyperparameter tuning.
confidence
- Output the confusion matrix and calculate the precision and recall with 95% confidence intervals.
- See the project.yml for an example of command line usage. Arguments needed are the test data, the spacy config file, the model, the per-label boolean and then a comma-separated string of labels.
- This will output to stdout each of true positive, false positive, false negative, true negative, and both precision and recall (with 95% confidence interval in brackets)
- If only overall performance is required then please change the --per-label option in the project.yml (or at the command line if running directly) to False.
agreement
- Evaluates inter-annotator agreement for dual annotations, assuming one annotator is 'gold standard' and the other annotator is the 'model'. Outputs to the command line the precision, recall and F1 score with confidence intervals, for both standard matching (i.e. exact match of boundaries for a given span) and relaxed matching (allows overlap but does not require the start/end boundaries to be identical). Takes dual.json as the input, and the script which was used to generate this file is provided as dual.py.
dual-annotations
- This simply runs dual.py to generate the JSON containing the dual annotations. This is not currently implemented for direct CLI usage and the variables for this need to be changed in the dual.py file directly.
find-threshold
- This adds the new spacy find-threshold CLI command to the project, purely to ensure consistent usage.

Scripts

visualize-model.py
kfold.py
confidence.py
agreement.py
dual.py
functions.py
- Added for the purposes of defining a custom logging function (which here is simply spaCy's wandbloggerv5) for usage with the kfold evaluation, ultimately not needed but included for completeness

Configs

These are included for the purposes of making them publically available.

Owner

Name: Adam Mayers
Login: adam-mayers
Kind: user
Company: Guy's and St Thomas' Hospitals & King's College London / South London and Maudsley NHS Foundation Trust

Repositories: 1
Profile: https://github.com/adam-mayers

Radiology Registrar / Fellowship in Clinical Artificial Intelligence

Citation (citation.cff)

cff-version: 1.0.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Mayers"
  given-names: "Adam"
  orcid: "https://orcid.org/0009-0008-1948-657X"
title: "Brain MRI NLP"
version: 1.0.0
doi: 10.5281/zenodo.15790492
date-released: 2025-07-02
url: "https://github.com/adam-mayers/brain-mri-nlp"

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

brain-mri-nlp

Science Score: 44.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

brain-mri-nlp

project.yml

Commands

Scripts

Configs

Owner

Citation (citation.cff)

GitHub Events

Total

Last Year