https://github.com/asreview/paper-guidelines-kifms

Scripts to run simulations of systematic reviews with ASReview for 14 datasets openly published on the Dutch database for medical guidelines.

https://github.com/asreview/paper-guidelines-kifms

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.0%) to scientific vocabulary

Keywords

asreview medical medical-guidelines python systematic-reviews systematic-reviews-datasets utrecht-university
Last synced: 6 months ago · JSON representation

Repository

Scripts to run simulations of systematic reviews with ASReview for 14 datasets openly published on the Dutch database for medical guidelines.

Basic Info
  • Host: GitHub
  • Owner: asreview
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 23.9 MB
Statistics
  • Stars: 1
  • Watchers: 3
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Topics
asreview medical medical-guidelines python systematic-reviews systematic-reviews-datasets utrecht-university
Created over 4 years ago · Last pushed over 4 years ago

https://github.com/asreview/paper-guidelines-KIFMS/blob/main/

# Scripts for paper on "Towards up-to-date medical guidelines"

[![DOI](https://zenodo.org/badge/379924501.svg)](https://zenodo.org/badge/latestdoi/379924501)

The purpose of this study was to evaluate the performance and feasibility of active learning to support the selection of relevant publications within the context of medical guideline development. This repository contains scripts to run and analyze simulations for 14 datasets openly published on the Dutch database for [medical guidelines](https://www.richtlijnendatabase.nl). The results are published in the paper "Artificial intelligence supports literature screening in medical guideline development: towards up-to-date medical guidelines". 


## Installation

The scripts in this repository require Python 3.6+. Install the extra dependencies with (in the command line):

```
pip install -r requirements.txt
```

## Datasets

The raw data can be obtained via the Open Science Framework [OSF](https://osf.io/vt3n4/) and contains 14 published guidelines from the [Dutch Medical Guideline Database](https://richtlijnendatabase.nl/). The following files should be obtained from OSF and put in a folder `raw_data`:

```
Distal_radius_fractures_approach.csv
Distal_radius_fractures_closed_reduction.csv
Hallux_valgus_prognostic.csv
Head_and_neck_cancer_bone.csv
Head_and_neck_cancer_imaging.csv
Obstetric_emergency_training.csv
Post_intensive_care_treatment.csv
Pregnancy_medication.csv
Shoulder_replacement_diagnostic.csv
Shoulder_replacement_surgery.csv
Shoulderdystocia_positioning.csv
Shoulderdystocia_recurrence.csv
Total_knee_replacement.csv
Vascular_access.csv
```

Each dataset contains

```
title
abstract
```

and three columns with labeling decisions titled:

```
noisy_inclusion
expert_inclusion
fulltext_inclusion
```

The datasets in *raw_data* are split into three columns with labeling decisions. The resulting 42 datasets are generated by executing `job_splitfiles.sh`. The results are stored in the subfolder *data*. 


## Descriptive dataset statistics

To create descriptive statistics for each dataset run:

```
sh generate_dataset_characteristics.sh
```

The results are stored in `output/simulation/[NAME_DATASET]/descriptives/*.json`, are merged into one table (*csv* and *excel*) by running `python scripts/merge_descriptives.py`, and stored in `output/table/data_descriptives.*`. 


## Create wordclouds

To create wordclouds for each dataset run:

```
sh wordcloud_jobs.sh
```

The results are stored in `output/simulation/[NAME_DATASET]/descriptives/wordcloud`. 
There are three version of the wordcloud available, a wordcloud based on the title/abstract words for:

- the entire set of records;
- for the relevant records only;
- for the irrelevant records only. 


## Simulation

The simulation was conducted for each dataset with an equal amount of runs as the number of relevant records in the dataset with each relevant record being a prior inclusion and 10 randomly chosen irrelevant records. In each run, and for every dataset, the same 10 irrelevant records have been used. To extract information about the records that have been used, run `python scripts/get_prior_knowledge.py`, and the result is stored in `output/tables`. 


To obtain the result of the simulation, run: 

```
sh run_simulation.sh
```

The results are stored in `output/simulation`. The dataset characteristics are obtained with `python scripts/merge_descriptives.py` and stored in `output/tables`. The metrics resulting from the simulation study per run, can be obtained with `python scripts/merge_metrics.py` and stored in `output/tables`.

The raw `h5` files are 28.4Gb and are available on request, see the contact details. However, it is straightforward to obtain the results by running the simulation again by using ASReview v0.16. Seed values are set in  `run_simulation.sh`. 

## Analyses

The Jupyter notebook [analyses/analyses_guidelines_KIFMS.ipynb](analyses/analyses_guidelines_KIFMS.ipynb)
 contains a detailed, step-by-step analysis of the simulations performed in this project. For more information about the analysis, read the [README](analyses). 

## Licence 

The content in this repository is published under the MIT license.

## Contact

For any questions or remarks, please send an email to asreview@uu.nl.

Owner

  • Name: ASReview
  • Login: asreview
  • Kind: organization
  • Email: asreview@uu.nl
  • Location: Utrecht University

ASReview - Active learning for Systematic Reviews

GitHub Events

Total
Last Year