DataLad
DataLad: distributed system for joint management of code, data, and their relationship - Published in JOSS (2021)
Science Score: 100.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 7 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: joss.theoj.org, zenodo.org -
✓Committers with academic emails
11 of 60 committers (18.3%) from academic institutions -
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Keywords
Keywords from Contributors
Repository
Keep code, data, containers under control with git and git-annex
Basic Info
- Host: GitHub
- Owner: datalad
- License: other
- Language: Python
- Default Branch: maint
- Homepage: http://datalad.org
- Size: 40.4 MB
Statistics
- Stars: 603
- Watchers: 23
- Forks: 114
- Open Issues: 560
- Releases: 114
Topics
Metadata Files
README.md
____ _ _ _
| _ \ __ _ | |_ __ _ | | __ _ __| |
| | | | / _` | | __| / _` | | | / _` | / _` |
| |_| | | (_| | | |_ | (_| | | |___ | (_| | | (_| |
|____/ \__,_| \__| \__,_| |_____| \__,_| \__,_|
Read me
<!-- ALL-CONTRIBUTORS-BADGE:START - Do not remove or modify this section -->
<!-- ALL-CONTRIBUTORS-BADGE:END -->
Distribution
10000-ft. overview
DataLad's purpose is to make data management and data distribution more accessible. To do so, it stands on the shoulders of Git and Git-annex to deliver a decentralized system for data exchange. This includes automated ingestion of data from online portals and exposing it in readily usable form as Git(-annex) repositories - or datasets. However, the actual data storage and permission management remains with the original data provider(s).
The full documentation is available at http://docs.datalad.org and http://handbook.datalad.org provides a hands-on crash-course on DataLad.
Extensions
A number of extensions are available that provide additional functionality for DataLad. Extensions are separate packages that are to be installed in addition to DataLad. In order to install DataLad customized for a particular domain, one can simply install an extension directly, and DataLad itself will be automatically installed with it. An annotated list of extensions is available in the DataLad handbook.
Support
The documentation for this project is found here: http://docs.datalad.org
All bugs, concerns, and enhancement requests for this software can be submitted here: https://github.com/datalad/datalad/issues
If you have a problem or would like to ask a question about how to use DataLad,
please submit a question to
NeuroStars.org
with a datalad tag. NeuroStars.org is a platform similar to StackOverflow
but dedicated to neuroinformatics.
All previous DataLad questions are available here: http://neurostars.org/tags/datalad/
Installation
Debian-based systems
On Debian-based systems, we recommend enabling NeuroDebian, via which we provide recent releases of DataLad. Once enabled, just do:
apt-get install datalad
Gentoo-based systems
On Gentoo-based systems (i.e. all systems whose package manager can parse ebuilds as per the Package Manager Specification), we recommend enabling the ::science overlay, via which we provide recent releases of DataLad. Once enabled, just run:
emerge datalad
Other Linux'es via conda
conda install -c conda-forge datalad
will install the most recently released version, and release candidates are available via
conda install -c conda-forge/label/rc datalad
Other Linux'es, macOS via pip
Before you install this package, please make sure that you install a recent
version of git-annex. Afterwards,
install the latest version of datalad from
PyPI. It is recommended to use
a dedicated virtualenv:
# Create and enter a new virtual environment (optional)
virtualenv --python=python3 ~/env/datalad
. ~/env/datalad/bin/activate
# Install from PyPI
pip install datalad
By default, installation via pip installs the core functionality of DataLad,
allowing for managing datasets etc. Additional installation schemes
are available, so you can request enhanced installation via
pip install datalad[SCHEME], where SCHEME could be:
teststo also install dependencies used by DataLad's battery of unit testsfullto install all dependencies.
More details on installation and initial configuration can be found in the DataLad Handbook: Installation.
License
MIT/Expat
Contributing
See CONTRIBUTING.md if you are interested in internals or contributing to the project.
Acknowledgements
The DataLad project received support through the following grants:
US-German collaboration in computational neuroscience (CRCNS) project "DataGit: converging catalogues, warehouses, and deployment logistics into a federated 'data distribution'" (Halchenko/Hanke), co-funded by the US National Science Foundation (NSF 1429999) and the German Federal Ministry of Education and Research (BMBF 01GQ1411).
CRCNS US-German Data Sharing "DataLad - a decentralized system for integrated discovery, management, and publication of digital objects of science" (Halchenko/Pestilli/Hanke), co-funded by the US National Science Foundation (NSF 1912266) and the German Federal Ministry of Education and Research (BMBF 01GQ1905).
Helmholtz Research Center Jülich, FDM challenge 2022
German federal state of Saxony-Anhalt and the European Regional Development Fund (ERDF), Project: Center for Behavioral Brain Sciences, Imaging Platform
ReproNim project (NIH 1P41EB019936-01A1).
Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under grant SFB 1451 (431549029, INF project)
European Union’s Horizon 2020 research and innovation programme under grant agreements:
Mac mini instance for development is provided by MacStadium.
Contributors ✨
Thanks goes to these wonderful people (emoji key):
Owner
- Name: DataLad
- Login: datalad
- Kind: organization
- Email: team@datalad.org
- Location: USA&Germany
- Website: http://datalad.org
- Repositories: 97
- Profile: https://github.com/datalad
Data distribution and management platform
JOSS Publication
DataLad: distributed system for joint management of code, data, and their relationship
Authors
Center for Open Neuroscience, Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA
Center for Open Neuroscience, Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA
Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Center Jülich, Jülich, Germany
Center for Open Neuroscience, Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA
Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Center Jülich, Jülich, Germany
Center for Open Neuroscience, Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA
McGill Center for Integrative Neuroscience, Montreal, Canada
Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Center Jülich, Jülich, Germany
Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Center Jülich, Jülich, Germany
Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Center Jülich, Jülich, Germany, Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
Faculty of Medicine and Health Sciences, McConnell Brain Imaging Center, McGill University, Montreal, Canada
Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Center Jülich, Jülich, Germany
Department of Biological Psychology, Otto-von-Guericke-University Magdeburg, Magdeburg, Germany
Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, USA
Center for Open Neuroscience, Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA
Tags
Python command line version control data management data distribution data provenance reproducibilityCitation (CITATION.cff)
cff-version: 1.1.0
message: Please cite the following works when using this software.
authors:
- family-names: Halchenko
given-names: Yaroslav
- family-names: Meyer
given-names: Kyle
- family-names: Poldrack
given-names: Benjamin
- family-names: Solanky
given-names: Debanjum
- family-names: Wagner
given-names: Adina
- family-names: Gors
given-names: Jason
- family-names: MacFarlane
given-names: Dave
- family-names: Pustina
given-names: Dorian
- family-names: Sochat
given-names: Vanessa
- family-names: Ghosh
given-names: Satrajit
- family-names: Mönch
given-names: Christian
- family-names: Markiewicz
given-names: Christopher
- family-names: Waite
given-names: Laura
- family-names: Shlyakhter
given-names: Ilya
- family-names: Vega
given-names: Alejandro
name-particle: de la
- family-names: Hayashi
given-names: Soichi
- family-names: Häusler
given-names: Christian
- family-names: Poline
given-names: Jean-Baptiste
- family-names: Kadelka
given-names: Tobias
- family-names: Skytén
given-names: Kusti
- family-names: Jarecka
given-names: Dorota
- family-names: Kennedy
given-names: David
- family-names: Strauss
given-names: Ted
- family-names: Cieslak
given-names: Matt
- family-names: Vavra
given-names: Peter
- family-names: Ioanas
given-names: Horea-Ioan
- family-names: Schneider
given-names: Robin
- family-names: Pflüger
given-names: Mika
- family-names: Haxby
given-names: James
- family-names: Eickhoff
given-names: Simon
- family-names: Hanke
given-names: Michael
doi: 10.21105/JOSS.03262
identifiers:
- type: doi
value: 10.21105/JOSS.03262
- type: other
value: urn:issn:2475-9066
keywords:
- Computational reproducibility
- reproducibility
- Python
- data management
- workflow
title: >-
DataLad: distributed system for joint management of code, data, and their
relationship
version: 1.2.1
GitHub Events
Total
- Create event: 17
- Release event: 4
- Issues event: 37
- Watch event: 64
- Delete event: 11
- Issue comment event: 159
- Push event: 48
- Pull request review event: 5
- Pull request review comment event: 3
- Pull request event: 59
- Fork event: 7
Last Year
- Create event: 17
- Release event: 4
- Issues event: 37
- Watch event: 64
- Delete event: 11
- Issue comment event: 159
- Push event: 48
- Pull request review event: 5
- Pull request review comment event: 3
- Pull request event: 59
- Fork event: 7
Committers
Last synced: 5 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Yaroslav Halchenko | d****n@o****m | 4,242 |
| Michael Hanke | m****e@g****m | 3,845 |
| Kyle Meyer | k****e@k****m | 1,778 |
| Benjamin Poldrack | b****k@g****m | 1,686 |
| Adina Wagner | a****r@t****e | 361 |
| Christian Mönch | c****h@w****e | 324 |
| DataLad Bot | b****t@d****g | 231 |
| John T. Wodder II | g****t@v****g | 222 |
| Debanjum Singh Solanky | d****m@g****m | 188 |
| Gergana Alteva | g****a@g****m | 168 |
| Michał Szczepanik | m****k@f****e | 61 |
| Jason Gors | j****k@g****m | 58 |
| github-actions | g****s | 36 |
| Dave MacFarlane | d****n@g****m | 30 |
| vsoch | v****t@s****u | 17 |
| Stephan Heunis | s****s@f****e | 16 |
| Christopher J. Markiewicz | m****z@s****u | 13 |
| dependabot[bot] | 4****] | 13 |
| Alex Waite | a****5@g****m | 11 |
| Sin Kim | k****8@g****m | 11 |
| Christian Olaf Häusler | d****r@g****t | 8 |
| Horea Christian | c****r@c****u | 8 |
| basile | b****d@g****m | 7 |
| Taylor Olson | t****e@g****m | 7 |
| Andy Connolly | a****y@d****u | 7 |
| Yann Büchau | n****n@p****e | 6 |
| Mika Pflüger | m****r@p****e | 6 |
| Matthias Riße | m****e@f****e | 5 |
| Nolan Nichols | n****s@m****m | 5 |
| Michael Burgardt | m****t@g****m | 4 |
| and 30 more... | ||
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 305
- Total pull requests: 297
- Average time to close issues: 6 months
- Average time to close pull requests: 26 days
- Total issue authors: 60
- Total pull request authors: 21
- Average comments per issue: 3.23
- Average comments per pull request: 3.89
- Merged pull requests: 228
- Bot issues: 0
- Bot pull requests: 22
Past Year
- Issues: 43
- Pull requests: 62
- Average time to close issues: 10 days
- Average time to close pull requests: 18 days
- Issue authors: 18
- Pull request authors: 12
- Average comments per issue: 1.12
- Average comments per pull request: 1.68
- Merged pull requests: 40
- Bot issues: 0
- Bot pull requests: 9
Top Authors
Issue Authors
- yarikoptic (106)
- mih (49)
- mlell (13)
- adswa (10)
- anikfal (9)
- bpinsard (8)
- matrss (7)
- mslw (7)
- bpoldrack (7)
- TheChymera (6)
- jwodder (5)
- psadil (4)
- asmacdo (3)
- ddeepwell (3)
- JohannesWiesner (3)
Pull Request Authors
- yarikoptic (178)
- jwodder (42)
- adswa (28)
- mih (18)
- github-actions[bot] (16)
- christian-monch (15)
- mslw (13)
- dependabot[bot] (13)
- bpoldrack (10)
- bpinsard (7)
- effigies (4)
- jsheunis (3)
- alliesw (2)
- malikwirin (2)
- asmacdo (2)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 2
-
Total downloads:
- pypi 18,125 last-month
- Total docker downloads: 10,390
-
Total dependent packages: 45
(may contain duplicates) -
Total dependent repositories: 85
(may contain duplicates) - Total versions: 176
- Total maintainers: 5
pypi.org: datalad
data distribution geared toward scientific datasets
- Homepage: https://www.datalad.org
- Documentation: https://datalad.readthedocs.io/
- License: DFSG approved,MIT License
-
Latest release: 1.2.1
published 6 months ago
Rankings
Maintainers (5)
conda-forge.org: datalad
DataLad aims to make data management and data distribution more accessible. To do that it stands on the shoulders of Git and Git-annex to deliver a decentralized system for data exchange. This includes automated ingestion of data from online portals, and exposing it in readily usable form as Git(-annex) repositories, so-called datasets. The actual data storage and permission management, however, remains with the original data providers.
- Homepage: http://datalad.org
- License: MIT
-
Latest release: 0.17.9
published about 3 years ago
Rankings
Dependencies
- actions/checkout v3 composite
- datalad/release-action/add-changelog-snippet v1 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- datalad/release-action/release v1 composite
- actions/checkout v3 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- codecov/codecov-action v3 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- codecov/codecov-action v3 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- con/tributors 0.0.21 composite
- vsoch/pull-request-action 1.0.23 composite
