opfvta-reexecution

Container-based Replication of https://doi.org/10.1038/s41398-022-01812-5

https://github.com/con/opfvta-reexecution

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
✓
Academic publication links
Links to: frontiersin.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.3%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Container-based Replication of https://doi.org/10.1038/s41398-022-01812-5

Basic Info

Host: GitHub
Owner: con
License: apache-2.0
Language: TeX
Default Branch: master
Size: 8.49 MB

Statistics

Stars: 1
Watchers: 3
Forks: 1
Open Issues: 0
Releases: 2

Created about 3 years ago · Last pushed almost 2 years ago

Metadata Files

Readme License Citation

OPFVTA Reexecution Study

This repository contains code, data, and prose as required to: * re-execute analysis and production of the OPFVTA article in a container environment. * generate a meta-article, detailing the reexecution environment and including a reference example of how to inspect reproducibility across reexecutions.

How to clone this repository

This repository contains submodules/subdatasets, which should be installed alongside the parent repository:

console datalad install -r https://gin.g-node.org/TheChymera/opfvta-reexecution.git cd opfvta-replication-2023

How to re-run

This reexecution system contains two targets, the original article, and publishing documents for the “meta-article”. The latter can be executed in absence of the former, as it draws on the cummulative reexecution record found in the outputs/ directory. Executing the former will produce your own reexecution file, which you can contribute to the record, and which all publishing documents created on your system will take into account. For a graphical break-down, see the following workflow figure:

workflow

I. Reexecuting the OPFVTA Article

Warnings: 1. We estimate that the analysis required more than 500GB, 400GB of which will be stored in a scratch directory, which is ./scratch/ by default and can be configured with the SCRATCH_PATH variable. 1. The analysis self-limits RAM to run on less powerful systems 1. Reexecuting the computation as well as the article is time consuming and resource-intensive, it is recommended to use a tool such as tmux or screen to preserve long running processes.

First, retrieve the data and other large files:

console make submodule-data Note that the outputs/ directory will have some content from other executions (organized by datetime) for comparison.

Once the required content has been fetched, you can reexecute the OPFVTA article via singularity or oci containers. This step generates intermediate results in the scratch directory which are not preserved in this repository by default, as configured in scratch/.gitignore. The final result is a PDF article and its associated elements (mainly volumetric binary data, .nii.gz files) which will be stored in a datestamped and annotated directory under outputs/. Most large files, including the results are stored and versioned via git-annex and therefore present in this repository. After executing the analysis the output will have a new entry for the data produced, which can be saved and recorded with datalad save.

For apptainer/singularity:

console make analysis-singularity or

With docker or podman, you can execute the analysis inside an OCI container.

console make analysis-oci

The results of the analysis will be stored under outputs/<execution-environment>_<datestamp>

II. Reexecuting the Meta-Article

To avoid confusion, we use the term 'article' to refer to a version of the OPFVTA article, and 'meta-article' to refer to the paper regarding the reexecution process and findings.

Generation of the meta-article uses files generated by the OPFVTA analysis which are expected to be in the outputs/ directory. Prior to generating the meta-article, outputs/ must contain the data from previous analyses, which is not locally available by default.

Note: Regenerating the OPFVTA article will create an additional pdf, but the previous pdfs are required to compare.

To fetch the OPFVTA analysis outputs:

console datalad get outputs/*/article.pdf

Finally we generate new graphical elements and compile the text via LaTeX into a novel meta-article PDF.

The meta-article can then be generated by a container with all of the dependencies preinstalled using:

console make container-article

If you prefer to run the generation outside of a container, you will need to install dependencies (suggested to use distribution package manager, packages below are debian names): - laTex - biber - datalad - diff-pdf - graphviz - matplotlib - pandas - seaborn - sklearn - statsmodels - yaml

You will also need to install sourceserifpro font using the tlmgr.

console make article

In either of those cases you can then view the produced meta-article at TODO.

Cleaning up between runs

The steps are designed to be idempotent, and some dynamically generated components will not be regenerated for subsequent runs. If you are not merely trying to get a PDF to read or working on the human-readable text — but instead working on the figure-generating code — it is advisable to always deep-clean the dynamic elements in between re-making the article.

console make article-clean && make article

Internal

We openly share all code and data via the Gin repository referenced above. This open infrastructure is however slow, which may be particularly inconvenient for prolonged development work. Trusted collaborators may instead prefer to use the smaug.dartmouth.edu remote.

To use this remote you should:

Make sure you have SSH access to smaug.dartmouth.edu, and have configured the host via your config file; you can do so by running:

console cat >> ~/.ssh/config<< EOF Host smaug Hostname smaug.dartmouth.edu AddKeysToAgent yes Port <secret_port> User <your_username_for_which_smaug_has_your_SSH_public_key> EOF

Add the remote to the Git repository, and make sure you are synced up:

console cd path/to/your/repo git remote add smaug smaug:/mnt/btrfs/datasets/incoming/con/opfvta-replication-2023.git datalad get . -s smaug

Optionally, and only if you are an internal contributor looking to push t, it makes sense to to automatically keep our three remotes (gin, smaug, GitHub) in sync, you should make use

Owner

Name: Center for Open Neuroscience
Login: con
Kind: organization
Email: debian@oneukrainian.com
Location: Dartmouth College, USA

Website: http://centerforopenneuroscience.org
Repositories: 13
Profile: https://github.com/con

Citation (CITATION.cff)

cff-version: 1.2.0
title: "Neuroimaging Article Reexecution and Reproduction Assesment System"
authors:
  - family-names: Ioanas
    given-names: Horea-Ioan
    orcid: https://orcid.org/0000-0001-7037-2449
  - family-names: Macdonald
    given-names: Austin
    orcid: https://orcid.org/0000-0002-8124-807X
  - family-names: Halchenko
    given-names: Yaroslav O.
    orcid: https://orcid.org/0000-0003-3456-2493
license: Apache-2.0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science