reproin

A setup for automatic generation of shareable, version-controlled BIDS datasets from MR scanners

https://github.com/repronim/reproin

Science Score: 46.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: zenodo.org
✓
Committers with academic emails
3 of 5 committers (60.0%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.4%) to scientific vocabulary

Keywords from Contributors

neuroimaging

Last synced: 10 months ago · JSON representation

Repository

A setup for automatic generation of shareable, version-controlled BIDS datasets from MR scanners

Basic Info

Host: GitHub
Owner: ReproNim
License: mit
Language: Shell
Default Branch: master
Size: 1.68 MB

Statistics

Stars: 49
Watchers: 11
Forks: 15
Open Issues: 40
Releases: 6

Created over 8 years ago · Last pushed about 1 year ago

Metadata Files

Readme License Zenodo

ReproIn

This project is a part of the ReproNim Center suite of tools and frameworks. Its goal is to provide a turnkey flexible setup for automatic generation of shareable, version-controlled BIDS datasets from MR scanners. To not reinvent the wheel, all actual software development is largely done through contribution to existing software projects:

HeuDiConv: a flexible DICOM converter for organizing brain imaging data into structured directory layouts. ReproIn heuristic was developed and now is shipped within HeuDiConv, so it could be used independently of the ReproIn setup on any HeuDiConv installation (specify -f reproin to heudiconv call).
DataLad: a modular version control platform and distribution for both code and data. DataLad support was contributed to HeuDiConv, and could be enabled by adding --datalad option to the heudiconv call.

Specification

The header of the heuristic file describes details of the specification on how to organize and name study sequences at MR console.

If you like to use a GUI for crafting the names, consider using @NPACore's ReproIn namer website.

Overall workflow

Schematic description of the overall setup:

Setup

Note: for your own setup, dcm2niix author recommends to avoid dcm4che and choose another PACS.

Setup

Tutorial/HOWTO

Data collection

Making your sequence compatible with ReproIn heuristic

Walkthrough #1: guides you through ReproIn approach to organizing exam cards and managing canceled runs/sessions on Siemens scanner(s)

Renaming sequences to conform the specification needed by ReproIn

TODO: Describe how sequences could be renamed per study by creating a derived heuristic

Conversion

Install HeuDiConv and DataLad: e.g. apt-get update; apt-get install heudiconv datalad in any NeuroDebian environment. If you do not have one, you could get either of
- NeuroDebian Virtual Machine
- ReproIn Docker image: docker run -it --rm -v $PWD:$PWD repronim/reproin
- ReproIn Singularity image: you can either
  - convert from the docker image: singularity pull docker://repronim/reproin
  - download the most recent version from http://datasets.datalad.org/?dir=/repronim/containers/images/repronim which is a DataLad dataset which you can install via datalad install ///repronim/containers (see/subscribe https://github.com/ReproNim/reproin/issues/64 for HOWTO setup YODA style dataset)
Collect a subject/session (or multiple of them) while placing and naming sequences in the scanner following the specification. But for now we will assume that you have no such dataset yet, and want to try on phantom data:
```
datalad install -J3 -r -g ///dicoms/dartmouth-phantoms/bids_test4-20161014
```

to get all subdatasets recursively, while getting the data as well in parallel 3 streams. This dataset is a sample of multi-session acquisition with anatomicals and functional sequences on a friendly phantom impersonating two different subjects (note: fieldmaps were deficient, without magnitude images). You could also try other datasets such as ///dbic/QA

We are ready to convert all the data at once (heudiconv will sort into accessions) or one accession at a time. The recommended invocation for the heudiconv is
```
heudiconv -f reproin --bids --datalad -o OUTPUT --files INPUT
```

to convert all found in INPUT DICOMs and place then within the hierarchy of DataLad datasets rooted at OUTPUT. So we will start with a single accession of phantom-1/

    heudiconv -f reproin --bids --datalad -o OUTPUT --files bids_test4-20161014/phantom-1

and inspect the result under OUTPUT, probably best with datalad ls command:

    ... WiP ...

HeuDiConv options to overload autodetected variables:

--subject
--session
--locator

Sample converted datasets

You could find sample datasets with original DICOMs

///dbic/QA is a publicly available DataLad dataset with historical data on QA scans from DBIC. You could use DICOM tarballs under sourcedata/ for your sample conversions. TODO: add information from which date it is with scout DICOMs having session identifier
///dicoms/dartmouth-phantoms provides a collection of datasets acquired at DBIC to establish ReproIn specification. Some earlier accessions might not be following the specification. bids_test4-20161014 provides a basic example of multi-subject and multi-session acquisition.

Containers/Images etc

This repository provides a Singularity environment definition file used to generate a complete environment needed to run a conversion. But also, since all work is integrated within the tools, any environment providing them would suffice, such as NeuroDebian docker or Singularity images, virtual appliances, and other Debian-based systems with NeuroDebian repositories configured, which would provide all necessary for ReproIn setup components.

Getting started from scratch

Setup environment

reproin script relies on having datalad, datalad-containers, and singularity available. The simplest way to get them all is to install a conda distribution, e.g. miniforge (link for amd64), and setup the environment with all components installed:

mamba create -n reproin -y datalad datalad-container singularity

Note that in future sessions you will need to activate this environment:

mamba activate reproin

Then make sure you have your git configured. If git config --list does not include these entries, add (adjust to fit your persona)

git config --global user.name  "My Name"
git config --global user.email  "MyName@example.com"

and install the ReproNim/containers

datalad clone https://github.com/ReproNim/containers repronim-containers
cd repronim-containers

which would clone the dataset from GitHub and auto-enable datasets.datalad.org remote to actually get annexed content of the images. Now fetch the image for the most recent version of reproin from under images/repronim, e.g.

datalad get images/repronim/repronim-reproin--0.13.1.sing
cd ..

"Install" reproin script

The singularity image we fetched already comes with reproin installed inside, but to "drive" conversion we need to have reproin available in the base environment. Because we do not have it (yet) packaged for conda distribution, we will just clone this repository and gain access to the script:

git clone https://github.com/ReproNim/reproin

To avoid typing the full path to the reproin script, can do

export "PATH=$PWD/reproin/bin/:$PATH"

to place it in the PATH.

NB. It is important ATM to not just cp that reproin script elsewhere because it relies on being able to find other resources made available in that repository (e.g., cfg_reproin_bids.py).

"Configure" the reproin setup

Currently reproin script hardcodes the path to DICOMS to reside under /inbox/DICOM and extracted lists and converted data to reside under /inbox/BIDS. It is possible to overload location for BIDS via BIDS_DIR env variable, so we can do e.g.

export BIDS_DIR=$HOME/BIDS-demo

and then let's create the top-level datalad dataset to contain all converted data, configuring to store text files in git rather than git-annex,

datalad create -c text2git "$BIDS_DIR"

Collect DICOMs listing

ATM reproin container has an older version of the script, so to use newer version we would just bind mount our cloned script inside,

singularity run -e -c \
   --env BIDS_DIR=$BIDS_DIR \
   -B $HOME/reproin/bin/reproin:/usr/local/bin/reproin \
   -B /inbox/DICOM:/inbox/DICOM:ro \
   -B $BIDS_DIR:$BIDS_DIR \
   ~/repronim-containers/images/repronim/repronim-reproin--0.13.1.sing lists-update-study-shows

which should output summary over the studies it found under /inbox/DICOM, e.g.

dbic/QA: new=16 no studydir yet
PI/Researcher/1110_SuperCool: new=12 no studydir yet

and you should see a file appeared for the current year and month under $BIDS_DIR/reproin/lists.

Create target dataset

Now we can create "studydir" for the study of interest, e.g.

reproin study-create dbic/QA

which would

create target BIDS dataset within the hierarchy
install repronim/containers borrowing the image from the ~/repronim-containers
rerun study-show to output summary over the current state like

todo=4 done=0 /afs/.dbic.dartmouth.edu/usr/haxby/yoh/BIDS-demo/dbic/QA/.git/study-show.sh 2024-11-11

Convert the dataset

Go to the folder of the dataset, e.g.

cd "$BIDS_DIR/dbic/QA"

to see that reproin pre-setup everything needed to run conversion (cat .datalad/config). And now you should be able to run conversion for your study via "datalad-container" extension:

datalad containers-run -n repronim-reproin study-convert dbic/QA

Gotchas

Complete setup at DBIC

It relies on the hardcoded ATM in reproin locations and organization of DICOMs and location of where to keep converted BIDS datasets.

/inbox/DICOM/{YEAR}/{MONTH}/{DAY}/A00{ACCESSION}
/inbox/BIDS/{PI}/{RESEARCHER}/{ID}_{name}/

CRON job

```

m h dom mon dow command

55 */12 * * * $HOME/reproin-env-0.9.0 -c '~/proj/reproin/bin/reproin lists-update-study-shows' && curl -fsS -m 10 --retry 5 -o /dev/null https://hc-ping.com/61dfdedd-SENSORED ```

NB: that curl at the end is to make use of https://healthchecks.io to ensure that we do have CRON job ran as we expected.

ATM we reuse a singularity environment based on reproin 0.9.0 produced from this repo and shipped within ReproNim/containers. For the completeness sake

```shell (reproin-3.8) [bids@rolando lists] > cat $HOME/reproin-env-0.9.0

!/bin/sh

env -i /usr/local/bin/singularity exec -B /inbox -B /afs -H $HOME/singularityhome $(dirname $0)/reproin0.9.0.simg /bin/bash "$@" ```

which produces emails with content like

Wager/Wager/1102_MedMap: new=92 todo=5 done=102 /inbox/BIDS/Wager/Wager/1102_MedMap/.git/study-show.sh 2023-03-30 PI/Researcher/ID_name: new=32 no studydir yet Haxby/Jane/1073_MonkeyKingdom: new=4 todo=39 done=8 fixups=6 /inbox/BIDS/Haxby/Jane/1073_MonkeyKingdom/.git/study-show.sh 2023-03-30

where as you can see it updates on the status for each study which was scanned for from the beginning of the current month. And it ends with the pointer to study-show.sh script which would provide details on already converted or heudiconv line invocations for what yet to do.

reproin study-create

For the "no studydir yet" we need first to generate study dataset (and possibly all leading PI/Researcher super-datasets via

shell reproin study-create PI/Researcher/ID_name

reproin study-convert

Unless there are some warnings/conflicts (subject/session already converted, etc) are found,

shell reproin study-convert PI/Researcher/ID_name

could be used to convert all new subject/sessions for that study.

XNAT

Anonymization or other scripts might obfuscate "Study Description" thus ruining "locator" assignment. See issue #57 for more information.

TODOs/WiP/Related

[ ] add a pre-configured DICOM receiver for fully turnkey deployments
to fully automate conversion of the incoming data
[ ] BIDS dataset manipulation helper

Owner

Name: Center for Reproducible Neuroimaging Computation
Login: ReproNim
Kind: organization

Website: http://repronim.org
Repositories: 75
Profile: https://github.com/ReproNim

GitHub Events

Total

Issues event: 1
Watch event: 2
Issue comment event: 2
Push event: 13
Pull request event: 3

Last Year

Issues event: 1
Watch event: 2
Issue comment event: 2
Push event: 13
Pull request event: 3

Committers

Last synced: over 1 year ago

All Time

Total Commits: 135
Total Committers: 5
Avg Commits per committer: 27.0
Development Distribution Score (DDS): 0.296

Past Year

Commits: 11
Committers: 2
Avg Commits per committer: 5.5
Development Distribution Score (DDS): 0.182

Top Committers

Name	Email	Commits
Yaroslav Halchenko	d**n@o**m	95
DBIC BIDS Team	b**s@d**u	24
Matteo Visconti dOC	m**r@d**u	13
jcf2	4****2	2
Bennet Fauber	b**t@u**u	1

Committer Domains (Top 20 + Academic)

umich.edu: 1 dartmouth.edu: 1 dbic.dartmouth.edu: 1 onerussian.com: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 62
Total pull requests: 11
Average time to close issues: 2 months
Average time to close pull requests: 21 days
Total issue authors: 15
Total pull request authors: 5
Average comments per issue: 2.63
Average comments per pull request: 1.45
Merged pull requests: 10
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 2
Average time to close issues: N/A
Average time to close pull requests: about 2 hours
Issue authors: 1
Pull request authors: 1
Average comments per issue: 0.0
Average comments per pull request: 0.5
Merged pull requests: 2
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

yarikoptic (30)
dlevitas (8)
jmatthews-rotman (4)
dkp (4)
mvdoc (3)
satra (3)
tsalo (2)
mirestrepo (1)
neurolabusc (1)
jdkent (1)
feilong (1)
Evgeniia-Gapontseva (1)
mjstarrett (1)
asmacdo (1)
cni-md (1)

Pull Request Authors

yarikoptic (4)
mvdoc (3)
jcf2 (3)
mgxd (1)
justbennet (1)

Top Labels

Issue Labels

good-for-hackathon (6) enhancement (4) question (2) bug (1) good first issue (1)

Pull Request Labels

Dependencies

.github/workflows/codespell.yml actions

actions/checkout v3 composite
codespell-project/actions-codespell v1 composite

Dockerfile docker

neurodebian bullseye build

reproin

Science Score: 46.0%

Keywords from Contributors

Repository

Basic Info

Statistics

Metadata Files

README.md

ReproIn

Specification

Overall workflow

Tutorial/HOWTO

Data collection

Making your sequence compatible with ReproIn heuristic

Renaming sequences to conform the specification needed by ReproIn

Conversion

HeuDiConv options to overload autodetected variables:

Sample converted datasets

Containers/Images etc

Getting started from scratch

Setup environment

"Install" reproin script

"Configure" the reproin setup

Collect DICOMs listing

Create target dataset

Convert the dataset

Gotchas

Complete setup at DBIC

CRON job

m h dom mon dow command

!/bin/sh

reproin study-create

reproin study-convert

XNAT

TODOs/WiP/Related

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies