DataLad

DataLad: distributed system for joint management of code, data, and their relationship - Published in JOSS (2021)

https://github.com/datalad/datalad

Science Score: 100.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 7 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Committers with academic emails
    11 of 60 committers (18.3%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

closember data-storage dataset git-annex python usable

Keywords from Contributors

bids neuroimaging eeg neuroscience meg magnetoencephalography electroencephalography electrocorticography ecog singularity
Last synced: 4 months ago · JSON representation ·

Repository

Keep code, data, containers under control with git and git-annex

Basic Info
  • Host: GitHub
  • Owner: datalad
  • License: other
  • Language: Python
  • Default Branch: maint
  • Homepage: http://datalad.org
  • Size: 40.4 MB
Statistics
  • Stars: 603
  • Watchers: 23
  • Forks: 114
  • Open Issues: 560
  • Releases: 114
Topics
closember data-storage dataset git-annex python usable
Created about 12 years ago · Last pushed 4 months ago
Metadata Files
Readme Changelog Contributing License Code of conduct Citation Zenodo

README.md

 ____            _             _                   _
|  _ \    __ _  | |_    __ _  | |       __ _    __| |
| | | |  / _` | | __|  / _` | | |      / _` |  / _` |
| |_| | | (_| | | |_  | (_| | | |___  | (_| | | (_| |
|____/   \__,_|  \__|  \__,_| |_____|  \__,_|  \__,_|
                                              Read me

DOI Test Status Build status Extensions Linters codecov.io Documentation License: MIT GitHub release Supported Python versions Testimonials 4 Contributor Covenant DOI RRID <!-- ALL-CONTRIBUTORS-BADGE:START - Do not remove or modify this section --> All Contributors <!-- ALL-CONTRIBUTORS-BADGE:END -->

Distribution

Anaconda Arch (AUR) Debian Stable Debian Unstable Fedora Rawhide package Gentoo (::science) PyPI package

10000-ft. overview

DataLad's purpose is to make data management and data distribution more accessible. To do so, it stands on the shoulders of Git and Git-annex to deliver a decentralized system for data exchange. This includes automated ingestion of data from online portals and exposing it in readily usable form as Git(-annex) repositories - or datasets. However, the actual data storage and permission management remains with the original data provider(s).

The full documentation is available at http://docs.datalad.org and http://handbook.datalad.org provides a hands-on crash-course on DataLad.

Extensions

A number of extensions are available that provide additional functionality for DataLad. Extensions are separate packages that are to be installed in addition to DataLad. In order to install DataLad customized for a particular domain, one can simply install an extension directly, and DataLad itself will be automatically installed with it. An annotated list of extensions is available in the DataLad handbook.

Support

The documentation for this project is found here: http://docs.datalad.org

All bugs, concerns, and enhancement requests for this software can be submitted here: https://github.com/datalad/datalad/issues

If you have a problem or would like to ask a question about how to use DataLad, please submit a question to NeuroStars.org with a datalad tag. NeuroStars.org is a platform similar to StackOverflow but dedicated to neuroinformatics.

All previous DataLad questions are available here: http://neurostars.org/tags/datalad/

Installation

Debian-based systems

On Debian-based systems, we recommend enabling NeuroDebian, via which we provide recent releases of DataLad. Once enabled, just do:

apt-get install datalad

Gentoo-based systems

On Gentoo-based systems (i.e. all systems whose package manager can parse ebuilds as per the Package Manager Specification), we recommend enabling the ::science overlay, via which we provide recent releases of DataLad. Once enabled, just run:

emerge datalad

Other Linux'es via conda

conda install -c conda-forge datalad

will install the most recently released version, and release candidates are available via

conda install -c conda-forge/label/rc datalad

Other Linux'es, macOS via pip

Before you install this package, please make sure that you install a recent version of git-annex. Afterwards, install the latest version of datalad from PyPI. It is recommended to use a dedicated virtualenv:

# Create and enter a new virtual environment (optional)
virtualenv --python=python3 ~/env/datalad
. ~/env/datalad/bin/activate

# Install from PyPI
pip install datalad

By default, installation via pip installs the core functionality of DataLad, allowing for managing datasets etc. Additional installation schemes are available, so you can request enhanced installation via pip install datalad[SCHEME], where SCHEME could be:

  • tests to also install dependencies used by DataLad's battery of unit tests
  • full to install all dependencies.

More details on installation and initial configuration can be found in the DataLad Handbook: Installation.

License

MIT/Expat

Contributing

See CONTRIBUTING.md if you are interested in internals or contributing to the project.

Acknowledgements

The DataLad project received support through the following grants:

  • US-German collaboration in computational neuroscience (CRCNS) project "DataGit: converging catalogues, warehouses, and deployment logistics into a federated 'data distribution'" (Halchenko/Hanke), co-funded by the US National Science Foundation (NSF 1429999) and the German Federal Ministry of Education and Research (BMBF 01GQ1411).

  • CRCNS US-German Data Sharing "DataLad - a decentralized system for integrated discovery, management, and publication of digital objects of science" (Halchenko/Pestilli/Hanke), co-funded by the US National Science Foundation (NSF 1912266) and the German Federal Ministry of Education and Research (BMBF 01GQ1905).

  • Helmholtz Research Center Jülich, FDM challenge 2022

  • German federal state of Saxony-Anhalt and the European Regional Development Fund (ERDF), Project: Center for Behavioral Brain Sciences, Imaging Platform

  • ReproNim project (NIH 1P41EB019936-01A1).

  • Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under grant SFB 1451 (431549029, INF project)

  • European Union’s Horizon 2020 research and innovation programme under grant agreements:

Mac mini instance for development is provided by MacStadium.

Contributors ✨

Thanks goes to these wonderful people (emoji key):

glalteva
glalteva

💻
adswa
adswa

💻
chrhaeusler
chrhaeusler

💻
soichih
soichih

💻
mvdoc
mvdoc

💻
mih
mih

💻
yarikoptic
yarikoptic

💻
loj
loj

💻
feilong
feilong

💻
jhpoelen
jhpoelen

💻
andycon
andycon

💻
nicholsn
nicholsn

💻
adelavega
adelavega

💻
kskyten
kskyten

💻
TheChymera
TheChymera

💻
effigies
effigies

💻
jgors
jgors

💻
debanjum
debanjum

💻
nellh
nellh

💻
emdupre
emdupre

💻
aqw
aqw

💻
vsoch
vsoch

💻
kyleam
kyleam

💻
driusan
driusan

💻
overlake333
overlake333

💻
akeshavan
akeshavan

💻
jwodder
jwodder

💻
bpoldrack
bpoldrack

💻
yetanothertestuser
yetanothertestuser

💻
Christian Mönch
Christian Mönch

💻
Matt Cieslak
Matt Cieslak

💻
Mika Pflüger
Mika Pflüger

💻
Robin Schneider
Robin Schneider

💻
Sin Kim
Sin Kim

💻
Michael Burgardt
Michael Burgardt

💻
Remi Gau
Remi Gau

💻
Michał Szczepanik
Michał Szczepanik

💻
Basile
Basile

💻
Taylor Olson
Taylor Olson

💻
James Kent
James Kent

💻
xgui3783
xgui3783

💻
tstoeter
tstoeter

💻
Stephan Heunis
Stephan Heunis

💻
Matt McCormick
Matt McCormick

💻
Vicky C Lau
Vicky C Lau

💻
Chris Lamb
Chris Lamb

💻
Austin Macdonald
Austin Macdonald

💻
Yann Büchau
Yann Büchau

💻
Matthias Riße
Matthias Riße

💻
Aksoo
Aksoo

💻
David Guibert
David Guibert

💻
Alex Shields-Weber
Alex Shields-Weber

💻

macstadium

Owner

  • Name: DataLad
  • Login: datalad
  • Kind: organization
  • Email: team@datalad.org
  • Location: USA&Germany

Data distribution and management platform

JOSS Publication

DataLad: distributed system for joint management of code, data, and their relationship
Published
July 01, 2021
Volume 6, Issue 63, Page 3262
Authors
Yaroslav O. Halchenko ORCID
Center for Open Neuroscience, Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA
Kyle Meyer
Center for Open Neuroscience, Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA
Benjamin Poldrack ORCID
Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Center Jülich, Jülich, Germany
Debanjum Singh Solanky ORCID
Center for Open Neuroscience, Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA
Adina S. Wagner ORCID
Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Center Jülich, Jülich, Germany
Jason Gors
Center for Open Neuroscience, Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA
Dave MacFarlane
McGill Center for Integrative Neuroscience, Montreal, Canada
Dorian Pustina ORCID
CHDI Management/CHDI Foundation, Princeton, NJ, USA
Vanessa Sochat ORCID
Lawrence Livermore National Lab, Livermore, CA, USA
Satrajit S. Ghosh ORCID
Massachusetts Institute of Technology, Cambridge, MA, USA
Christian Mönch ORCID
Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Center Jülich, Jülich, Germany
Christopher J. Markiewicz ORCID
Stanford University, Stanford, CA, USA
Laura Waite ORCID
Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Center Jülich, Jülich, Germany
Ilya Shlyakhter ORCID
Quest Diagnostics, Marlborough, MA, USA
Alejandro de la Vega ORCID
The University of Austin at Austin, Austin, TX, USA
Soichi Hayashi ORCID
Indiana University, Bloomington, IN, USA
Christian Olaf Häusler ORCID
Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Center Jülich, Jülich, Germany, Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
Jean-Baptiste Poline ORCID
Faculty of Medicine and Health Sciences, McConnell Brain Imaging Center, McGill University, Montreal, Canada
Tobias Kadelka ORCID
Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Center Jülich, Jülich, Germany
Kusti Skytén ORCID
University of Oslo, Oslo, Norway
Dorota Jarecka ORCID
Massachusetts Institute of Technology, Cambridge, MA, USA
David Kennedy ORCID
University of Massachusetts Medical School, Worcester, MA, USA
Ted Strauss ORCID
Montreal Neurological Institute, McGill University, Montreal, Canada
Matt Cieslak ORCID
University of Pennsylvania, Philadelphia, PA
Peter Vavra ORCID
Department of Biological Psychology, Otto-von-Guericke-University Magdeburg, Magdeburg, Germany
Horea-Ioan Ioanas ORCID
Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, USA
Robin Schneider ORCID
Independent Developer, Germany
Mika Pflüger ORCID
Potsdam Institute for Climate Impact Research (PIK) e. V., Potsdam, Germany
James V. Haxby
Center for Open Neuroscience, Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA
Simon B. Eickhoff ORCID
Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Center Jülich, Jülich, Germany, Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
Michael Hanke ORCID
Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Center Jülich, Jülich, Germany, Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
Editor
Ariel Rokem ORCID
Tags
Python command line version control data management data distribution data provenance reproducibility

Citation (CITATION.cff)

cff-version: 1.1.0
message: Please cite the following works when using this software.
authors:
  - family-names: Halchenko
    given-names: Yaroslav
  - family-names: Meyer
    given-names: Kyle
  - family-names: Poldrack
    given-names: Benjamin
  - family-names: Solanky
    given-names: Debanjum
  - family-names: Wagner
    given-names: Adina
  - family-names: Gors
    given-names: Jason
  - family-names: MacFarlane
    given-names: Dave
  - family-names: Pustina
    given-names: Dorian
  - family-names: Sochat
    given-names: Vanessa
  - family-names: Ghosh
    given-names: Satrajit
  - family-names: Mönch
    given-names: Christian
  - family-names: Markiewicz
    given-names: Christopher
  - family-names: Waite
    given-names: Laura
  - family-names: Shlyakhter
    given-names: Ilya
  - family-names: Vega
    given-names: Alejandro
    name-particle: de la
  - family-names: Hayashi
    given-names: Soichi
  - family-names: Häusler
    given-names: Christian
  - family-names: Poline
    given-names: Jean-Baptiste
  - family-names: Kadelka
    given-names: Tobias
  - family-names: Skytén
    given-names: Kusti
  - family-names: Jarecka
    given-names: Dorota
  - family-names: Kennedy
    given-names: David
  - family-names: Strauss
    given-names: Ted
  - family-names: Cieslak
    given-names: Matt
  - family-names: Vavra
    given-names: Peter
  - family-names: Ioanas
    given-names: Horea-Ioan
  - family-names: Schneider
    given-names: Robin
  - family-names: Pflüger
    given-names: Mika
  - family-names: Haxby
    given-names: James
  - family-names: Eickhoff
    given-names: Simon
  - family-names: Hanke
    given-names: Michael
doi: 10.21105/JOSS.03262
identifiers:
  - type: doi
    value: 10.21105/JOSS.03262
  - type: other
    value: urn:issn:2475-9066
keywords:
  - Computational reproducibility
  - reproducibility
  - Python
  - data management
  - workflow
title: >-
  DataLad: distributed system for joint management of code, data, and their
  relationship
version: 1.2.1

GitHub Events

Total
  • Create event: 17
  • Release event: 4
  • Issues event: 37
  • Watch event: 64
  • Delete event: 11
  • Issue comment event: 159
  • Push event: 48
  • Pull request review event: 5
  • Pull request review comment event: 3
  • Pull request event: 59
  • Fork event: 7
Last Year
  • Create event: 17
  • Release event: 4
  • Issues event: 37
  • Watch event: 64
  • Delete event: 11
  • Issue comment event: 159
  • Push event: 48
  • Pull request review event: 5
  • Pull request review comment event: 3
  • Pull request event: 59
  • Fork event: 7

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 13,421
  • Total Committers: 60
  • Avg Commits per committer: 223.683
  • Development Distribution Score (DDS): 0.684
Past Year
  • Commits: 92
  • Committers: 10
  • Avg Commits per committer: 9.2
  • Development Distribution Score (DDS): 0.478
Top Committers
Name Email Commits
Yaroslav Halchenko d****n@o****m 4,242
Michael Hanke m****e@g****m 3,845
Kyle Meyer k****e@k****m 1,778
Benjamin Poldrack b****k@g****m 1,686
Adina Wagner a****r@t****e 361
Christian Mönch c****h@w****e 324
DataLad Bot b****t@d****g 231
John T. Wodder II g****t@v****g 222
Debanjum Singh Solanky d****m@g****m 188
Gergana Alteva g****a@g****m 168
Michał Szczepanik m****k@f****e 61
Jason Gors j****k@g****m 58
github-actions g****s 36
Dave MacFarlane d****n@g****m 30
vsoch v****t@s****u 17
Stephan Heunis s****s@f****e 16
Christopher J. Markiewicz m****z@s****u 13
dependabot[bot] 4****] 13
Alex Waite a****5@g****m 11
Sin Kim k****8@g****m 11
Christian Olaf Häusler d****r@g****t 8
Horea Christian c****r@c****u 8
basile b****d@g****m 7
Taylor Olson t****e@g****m 7
Andy Connolly a****y@d****u 7
Yann Büchau n****n@p****e 6
Mika Pflüger m****r@p****e 6
Matthias Riße m****e@f****e 5
Nolan Nichols n****s@m****m 5
Michael Burgardt m****t@g****m 4
and 30 more...

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 305
  • Total pull requests: 297
  • Average time to close issues: 6 months
  • Average time to close pull requests: 26 days
  • Total issue authors: 60
  • Total pull request authors: 21
  • Average comments per issue: 3.23
  • Average comments per pull request: 3.89
  • Merged pull requests: 228
  • Bot issues: 0
  • Bot pull requests: 22
Past Year
  • Issues: 43
  • Pull requests: 62
  • Average time to close issues: 10 days
  • Average time to close pull requests: 18 days
  • Issue authors: 18
  • Pull request authors: 12
  • Average comments per issue: 1.12
  • Average comments per pull request: 1.68
  • Merged pull requests: 40
  • Bot issues: 0
  • Bot pull requests: 9
Top Authors
Issue Authors
  • yarikoptic (106)
  • mih (49)
  • mlell (13)
  • adswa (10)
  • anikfal (9)
  • bpinsard (8)
  • matrss (7)
  • mslw (7)
  • bpoldrack (7)
  • TheChymera (6)
  • jwodder (5)
  • psadil (4)
  • asmacdo (3)
  • ddeepwell (3)
  • JohannesWiesner (3)
Pull Request Authors
  • yarikoptic (178)
  • jwodder (42)
  • adswa (28)
  • mih (18)
  • github-actions[bot] (16)
  • christian-monch (15)
  • mslw (13)
  • dependabot[bot] (13)
  • bpoldrack (10)
  • bpinsard (7)
  • effigies (4)
  • jsheunis (3)
  • alliesw (2)
  • malikwirin (2)
  • asmacdo (2)
Top Labels
Issue Labels
UX (19) release automation (17) DX (11) enhancement (11) question (10) cmd-run/rerun (8) answered (8) spurious-test-failure (7) RIA/ORA (7) severity-normal (6) test-failure (6) platform-windows (5) easy (5) documentation (5) fix-implemented (5) annex (4) stale-issue-closed-without-resolution (4) corpse-in-basement (3) tests (3) cmd-clone (3) performance (3) for our information (3) cmd-diff (3) bare-mode (2) good-for-hackathon (2) cmd-status (2) team-core (2) cmd-create-sibling-gitlab (2) severity-wishlist (2) cmd-siblings (2)
Pull Request Labels
semver-patch (90) semver-internal (79) semver-tests (79) release (38) semver-documentation (28) release automation (20) semver-minor (19) merge-if-ok (16) tests (12) CHANGELOG-missing (10) UX (7) annex (7) released (5) documentation (4) semver-performance (4) cmd-run/rerun (3) semver-major (3) stale-PR-closed-without-merge (3) severity-minor (2) semver-dependencies (2) team-gitannex (2) severity-normal (1) cmd-save (1) spurious-test-failure (1) cmd-foreach-dataset (1) team-remotes (1) cmd-clone (1) adjusted-branches (1) DX (1) platform-windows (1)

Packages

  • Total packages: 2
  • Total downloads:
    • pypi 18,125 last-month
  • Total docker downloads: 10,390
  • Total dependent packages: 45
    (may contain duplicates)
  • Total dependent repositories: 85
    (may contain duplicates)
  • Total versions: 176
  • Total maintainers: 5
pypi.org: datalad

data distribution geared toward scientific datasets

  • Versions: 124
  • Dependent Packages: 43
  • Dependent Repositories: 78
  • Downloads: 18,125 Last month
  • Docker Downloads: 10,390
Rankings
Dependent packages count: 0.3%
Docker downloads count: 1.2%
Dependent repos count: 1.7%
Average: 2.1%
Downloads: 2.1%
Stargazers count: 2.9%
Forks count: 4.4%
Last synced: 4 months ago
conda-forge.org: datalad

DataLad aims to make data management and data distribution more accessible. To do that it stands on the shoulders of Git and Git-annex to deliver a decentralized system for data exchange. This includes automated ingestion of data from online portals, and exposing it in readily usable form as Git(-annex) repositories, so-called datasets. The actual data storage and permission management, however, remains with the original data providers.

  • Homepage: http://datalad.org
  • License: MIT
  • Latest release: 0.17.9
    published about 3 years ago
  • Versions: 52
  • Dependent Packages: 2
  • Dependent Repositories: 7
Rankings
Dependent repos count: 12.8%
Average: 17.4%
Forks count: 17.5%
Dependent packages count: 19.6%
Stargazers count: 19.6%
Last synced: 4 months ago

Dependencies

.github/workflows/add-changelog-snippet.yml actions
  • actions/checkout v3 composite
  • datalad/release-action/add-changelog-snippet v1 composite
.github/workflows/benchmarks.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
.github/workflows/docbuild.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
.github/workflows/lint.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
.github/workflows/release.yml actions
  • actions/checkout v3 composite
  • datalad/release-action/release v1 composite
.github/workflows/shellcheck.yml actions
  • actions/checkout v3 composite
.github/workflows/test_crippled.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
.github/workflows/test_extensions.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • codecov/codecov-action v3 composite
.github/workflows/test_macos.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • codecov/codecov-action v3 composite
.github/workflows/typing.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
.github/workflows/update-contributors.yml actions
  • actions/checkout v3 composite
  • con/tributors 0.0.21 composite
  • vsoch/pull-request-action 1.0.23 composite
.github/workflows/test-label.yml actions
_datalad_build_support/setup.py pypi
pyproject.toml pypi
requirements-devel.txt pypi
requirements.txt pypi
setup.py pypi