Recent Releases of DataLad
DataLad - 1.2.1
๐ Bug Fixes
Resolve datetime library warnings. PR #7714 (by @emmanuel-ferdman)
BF: Prevent infinite recursion in clone when a remote points to itself. Fixes #7721 via PR #7731 (by @yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic-gitmate 11 months ago
DataLad - 1.2.0
๐ Enhancements and New Features
- Remove import of GITSSHCOMMAND within .consts (import directly from datalad.runner.gitrunner). PR #7656 (by @yarikoptic)
๐ Internal
- Make github workflow which would ensure that a "release" PR against
masterwould have everything frommaintmerged. PR #7590 (by @yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic-gitmate about 1 year ago
DataLad - 1.1.6
๐ Documentation
- enh: update acknowledgments with additional grants. PR #7716 (by @yarikoptic)
๐งช Tests
- chore: for github tests CI - py 3.13 matrix should also (as for 3.12) use most recent available git-annex. PR #7710 (by @yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic-gitmate about 1 year ago
DataLad - 1.1.5
๐งช Tests
test: xfail testsubsuperdatasetsave on newer gits. PR #7687 (by @yarikoptic)
test: refactor test_parallel.py to be abit more pytest'y. PR #7690 (by @yarikoptic)
BF: use datalad/packages method of installing git-annex. PR #7692 (by @yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic-gitmate over 1 year ago
DataLad - 1.1.4
๐ Bug Fixes
Exit with the original non-0 exit code if underlying functionality, in particular "datalad run", returned incomplete result record with a non-0 exit code. Fixes #7504 via PR #7641 (by @yarikoptic)
Provide detail on why CHECKURL failed for datalad and archive special remotes (would require new AnnexRemote release (above 1.6.5) to take advantage of this PR). PR #7648 (by @yarikoptic)
BF: allow for empty output directory to be specified to run. Fixes #7653 via PR #7654 (by @yarikoptic)
fix: do load extension interfaces if early parsing errors out. Fixes #7678 via PR #7679 (by @yarikoptic)
Drop Python 3.8. Fixes #7678 via PR #7682 (by @yarikoptic)
๐ Performance
- OPT: use set O(log(n)) instead of list O(n) for checking if modified in checkfiles. PR #7655 (by @yarikoptic)
๐งช Tests
Declare minimal compat version of pytest to be 7.0. Fixes #7555 via PR #7645 (by @yarikoptic)
BF: replace url for codecov uploader for macos to versioned one with two archs. Fixes #7642 via PR #7649 (by @yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic-gitmate over 1 year ago
DataLad - 1.1.3
๐งช Tests
- Account for the fix in git-annex behavior in testadddeleteafteranddropsubdir. PR #7640 (by @yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic-gitmate almost 2 years ago
DataLad - 1.1.2
๐ Bug Fixes
Correct remote OS detection when working with RIA (ORA) stores: this should enable RIA operations, including push, from Mac clients to Linux hosts (and likely vice versa). Fixes #7536 via PR #7549 (by @mslw)
Allow only one thread in S3 downloader's progress report callback. PR #7636 (by @christian-monch)
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic-gitmate almost 2 years ago
DataLad - 1.1.1
๐ Bug Fixes
- Ensure timestamps of files in ZIP archives are within years 1980-2107. Fixes #3753 via PR #7450 (by @adswa)
๐ Documentation
๐ Internal
- Add codespell and minor fixuppers to pre-commit configuration and apply it to non-
datalad/components. PR #7621 (by @yarikoptic)
๐งช Tests
For appveyor ssh setup, setup MaxSessions 100 to avoid 'channel 22: open failed: connect failed: open failed'. PR #7617 (by @yarikoptic)
testgracefulldeath: raise testgracefulldeath threshold to 300 from 100. PR #7619 (by @yarikoptic)
Make test for presence of max_path in partitions not run for current psutil 6.0.0. PR #7622 (by @yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic-gitmate almost 2 years ago
DataLad - 1.1.0
๐ฉ Dependencies
- Deprecated
botois replaced withboto3(used to handle AWS S3 downloads). Fixes #5597 via PR #7340 (by @mslw, @effigies, and @yarikoptic). Remaining issues:- no download progress indication,
- no "Range" support (for partial downloads).
๐ Internal
- Retry logic for S3 connections is now handed over to Boto3 and its standard mode, removing our custom method. PR #7340
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic-gitmate almost 2 years ago
DataLad - 1.0.3
๐ Bug Fixes
Raise exception if an annex remote process without console tries to interact with the user, e.g. prompt for a password. PR #7578 (by @christian-monch)
Fix add-archive-content for patool>=2.0. PR #7603 (by @dguibert)
๐ Internal
- Fixup minor typos in documentation/comments using fresh codespell. PR #7610 (by @yarikoptic)
๐งช Tests
Stop testing on Python 3.7. Switch MacOS tests to 3.11, include 3.11 in Appveyor, and use 3.8 for other tests. Fixes #7584 via PR #7585 (by @mslw)
Convert
.travis.ymlto GitHub Actions workflow. Fixes #7574 via PR #7600 (by @jwodder)Cancel lengthy running workflows if a new commit is pushed. PR #7601 (by @jwodder)
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic-gitmate almost 2 years ago
DataLad - 1.0.2
๐งช Tests
- Relax condition in
test_force_checkdatapresentto avoid flaky test failures. PR #7581 (by @christian-monch)
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic-gitmate about 2 years ago
DataLad - 1.0.1
๐ Internal
- The main entrypoint for annex remotes now also runs the standard extension load hook. This enables extensions to alter annex remote implementation behavior in the same way than other DataLad components. (by @mih)
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic-gitmate about 2 years ago
DataLad - 1.0.0
๐ฅ Breaking Changes
- Merging maint to make the first major release. PR #7577 (by @yarikoptic)
๐ Enhancements and New Features
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic-gitmate about 2 years ago
DataLad - 0.19.6
๐ Enhancements and New Features
- Add the "http_token" authentication mechanism which provides 'Authentication: Token {TOKEN}' header. PR #7551 (by @yarikoptic)
๐ Internal
Update
pytest_ignore_collect()for pytest 8.0. PR #7546 (by @jwodder)Add manual triggering support/documentation for release workflow. PR #7553 (by @yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic-gitmate over 2 years ago
DataLad - 0.19.5
๐งช Tests
- Fix text to account for a recent change in git-annex dropping sub-second clock precision. As a result we might not report push of git-annex branch since there would be none. PR #7544 (by @yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic-gitmate over 2 years ago
DataLad - 0.19.4
๐ Bug Fixes
Update target detection for adjusted mode datasets has been improved. Fixes #7507 via PR #7522 (by @mih)
Fix typos found by new codespell 2.2.6 and also add checking/fixing "hidden files". PR #7530 (by @yarikoptic)
๐ Documentation
- Improve threaded-runner documentation. Fixes #7498 via PR #7500 (by @christian-monch)
๐ Internal
Fix timediff* and timeremove benchmarks to account for long RFed interfaces. PR #7502 (by @yarikoptic)
๐งช Tests
Cache value of the hassymlinkcapability to spare some cycles. PR #7471 (by @yarikoptic)
RF(TST): use setupmethod and teardownmethod in TestAddArchiveOptions. PR #7488 (by @yarikoptic)
Announce testclonedatasets_root xfail on github osx. PR #7489 (by @yarikoptic)
Inform asv that there should be no warmup runs for time_remove benchmark. PR #7505 (by @yarikoptic)
BF(TST): Relax matching of git-annex error message about unsafe drop, which was changed in 10.20231129-18-gfd0b510573. PR #7541 (by @yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic-gitmate over 2 years ago
DataLad - 0.19.3
๐ Bug Fixes
Type annotate getstatusdict and note that we can pass Exception or CapturedException which is not subclass. PR #7403 (by @yarikoptic)
BF: create-sibling-gitlab used to raise a TypeError when attempting a recursive operation in a dataset with uninstalled subdatasets. It now raises an impossible result instead. PR #7430 (by @adswa)
Pass branch option into recursive call within Install - for the cases whenever install is invoked with URL(s). Fixes #7461 via PR #7463 (by @yarikoptic)
Allow for reckless=ephemeral clone using relative path for the original location. Fixes #7469 via PR #7472 (by @yarikoptic)
๐ Documentation
- Fix a property name and default costs described in "getting subdatasets" section of
getdocumentation. Fixes #7458 via PR #7460 (by @mslw)
๐ Internal
Copy an adjusted environment only if requested to do so. PR #7399 (by @christian-monch)
Eliminate uses of
pkg_resources. Fixes #7435 via PR #7439 (by @jwodder)
๐งช Tests
- Disable some S3 tests of their VCR taping where they fail for known issues. PR #7467 (by @yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic-gitmate almost 3 years ago
DataLad - 0.19.2
๐ Bug Fixes
- Remove surrounding quotes in output filenames even for newer version of annex. Fixes #7440 via PR #7443 (by @yarikoptic)
๐ Documentation
- DOC: clarify description of the "install" interface to reflect its convoluted behavior. PR #7445 (by @yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic-gitmate almost 3 years ago
DataLad - 0.19.1
๐ Internal
- Make compatible with upcoming release of git-annex (next after 10.20230407) and pass explicit core.quotepath=false to all git calls. Also added
tools/find-hanged-testshelper. PR #7372 (by @yarikoptic)
๐งช Tests
- Adjust tests for upcoming release of git-annex (next after 10.20230407) and ignore DeprecationWarning for pkg_resources for now. PR #7372 (by @yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic-gitmate almost 3 years ago
DataLad - 0.19.0
๐ Enhancements and New Features
Address gitlab API special character restrictions. PR #7407 (by @jsheunis)
BF: The default layout of create-sibling-gitlab is now
collection. The previous default,hierarchyhas been removed as it failed in --recursive mode in different edgecases. For single-level datasets, the outcome ofcollectionandhierarchyis identical. PR #7410 (by @jsheunis and @adswa)
๐ Bug Fixes
WTF - bring back and extend information on metadata extractors etc, and allow for sections to have subsections and be selected at both levels PR #7309 (by @yarikoptic)
BF: Run an actual git invocation with interactive commit config. PR #7398 (by @adswa)
๐ฉ Dependencies
๐ Documentation
๐งช Tests
- Remove nose-based testing utils and possibility to test extensions using nose. PR #7261 (by @yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic-gitmate almost 3 years ago
DataLad - 0.18.5
๐ Bug Fixes
More correct summary reporting for relaxed (no size) --annex. PR #7050 (by @yarikoptic)
ENH: minor tune up of addurls to be more tolerant and "informative". PR #7388 (by @yarikoptic)
Ensure that data generated by timeout handlers in the asynchronous runner are accessible via the result generator, even if no other other events occur. PR #7390 (by @christian-monch)
Do not map (leave as is) trailing / or \ in github URLs. PR #7418 (by @yarikoptic)
๐ Documentation
๐ Internal
- Discontinue ConfigManager abuse for Git identity warning. PR #7378 (by @mih) and PR #7392 (by @yarikoptic)
๐งช Tests
Boost python to 3.8 during extensions testing. PR #7413 (by @yarikoptic)
Skip testsystemssh_version if no ssh found + split parsing into separate test. PR #7422 (by @yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic-gitmate almost 3 years ago
DataLad - 0.18.4
๐ Bug Fixes
- Provider config files were ignored, when CWD changed between different datasets during runtime. Fixes #7347 via PR #7357 (by @bpoldrack)
๐ Documentation
- Added a workaround for an issue with documentation theme (search function not working on Read the Docs). Fixes #7374 via PR #7385 (by @mslw)
๐ Internal
๐งช Tests
- Fix failing testing on CI
PR #7379 (by @yarikoptic)
- use sample S3 url DANDI archive,
- use our copy of old .deb from datasets.datalad.org instead of snapshots.d.o
- use specific miniconda installer for py 3.7.
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic-gitmate about 3 years ago
DataLad - 0.18.3
๐ Bug Fixes
Fixed that the
getcommand would fail, when subdataset source-candidate-templates where using thepathproperty from.gitmodules. Also enhance the respective documentation for thegetcommand. Fixes #7274 via PR #7280 (by @bpoldrack)Improve up-to-dateness of config reports across manager instances. Fixes #7299 via PR #7301 (by @mih)
BF: GitRepo.merge do not allow merging unrelated unconditionally. PR #7312 (by @yarikoptic)
Do not render (empty) WTF report on other records. PR #7322 (by @yarikoptic)
Fixed a bug where changing DataLad's log level could lead to failing git-annex calls. Fixes #7328 via PR #7329 (by @bpoldrack)
Fix an issue with uninformative error reporting by the datalad special remote. Fixes #7332 via PR #7333 (by @bpoldrack)
Fix save to not force committing into git if reference dataset is pure git (not git-annex). Fixes #7351 via PR #7355 (by @yarikoptic)
๐ Documentation
๐ Internal
Type-annotate almost all of
datalad/utils.py; adddatalad/typing.py. PR #7317 (by @jwodder)Type-annotate and fix
datalad/support/strings.py. PR #7318 (by @jwodder)Type-annotate
datalad/support/globbedpaths.py. PR #7327 (by @jwodder)Extend type-annotations for
datalad/support/path.py. PR #7336 (by @jwodder)Type-annotate various things in
datalad/runner/. PR #7337 (by @jwodder)Type-annotate some more files in
datalad/support/. PR #7339 (by @jwodder)
๐งช Tests
Skip or xfail some currently failing or stalling tests. PR #7331 (by @yarikoptic)
Skip withsameasremote when rsync and annex are incompatible. Fixes #7320 via PR #7342 (by @bpoldrack)
Fix testing assumption - do create pure GitRepo superdataset and test against it. PR #7353 (by @yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic-gitmate about 3 years ago
DataLad - 0.18.2
๐ Bug Fixes
Fix
create-siblingfor non-English SSH remotes by providingLC_ALL=Cfor thelscall. PR #7265 (by @nobodyinperson)Fix EnsureListOf() and EnsureTupleOf() for string inputs. PR #7267 (by @nobodyinperson)
create-sibling: Use C.UTF-8 locale instead of C on the remote end. PR #7273 (by @nobodyinperson)
Address compatibility with most recent git-annex where info would exit with non-0. PR #7292 (by @yarikoptic)
๐ฉ Dependencies
- Revert "Revert "Remove chardet version upper limit"". PR #7263 (by @yarikoptic)
๐ Internal
- Codespell more (CHANGELOGs etc) and remove custom CLI options from tox.ini. PR #7271 (by @yarikoptic)
๐งช Tests
- Use older python 3.8 in testing nose utils in github-action test-nose. Fixes #7259 via PR #7260 (by @yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic-gitmate about 3 years ago
DataLad - 0.18.1
๐ Bug Fixes
- Fixes crashes on windows where DataLad was mistaking git-annex 10.20221212 for a not yet released git-annex version and trying to use a new feature. Fixes #7248 via PR #7249 (by @bpoldrack)
๐ Documentation
๐ Performance
- Integrate buffer size optimization from datalad-next, leading to significant performance improvement for status and diff. Fixes #7190 via PR #7250 (by @bpoldrack)
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic-gitmate over 3 years ago
DataLad - 0.18.0
๐ฅ Breaking Changes
- Automatic reconfiguration of the ORA special remote when cloning from RIA stores now only applies locally rather than being committed. PR #7235 (by @bpoldrack)
๐ Enhancements and New Features
Saving removed dataset content was sped-up, and reporting of types of removed content now accurately states
datasetfor added and removed subdatasets, instead offile. Moreover, saving previously staged deletions is now also reported. PR #6784 (by @mih)foreach-datasetcommand got a new possible value for the --output-streamns|--o-s option 'relpath' to capture and pass-through prefixing with path to subds. Very handy for e.g. runninggit grepcommand across subdatasets. PR #7071 (by @yarikoptic)New config
datalad.create-sibling-ghlike.extra-remote-settings.NETLOC.KEY=VALUEallows to add and/or overwrite local configuration for the created sibling by the commandscreate-sibling-<gin|gitea|github|gitlab|gogs>. PR #7213 (by @matrss)The
siblingscommand does not concern the user with messages about inconsequential failure to annex-enable a remote anymore. PR #7217 (by @bpoldrack)ORA special remote now allows to override its configuration locally. PR #7235 (by @bpoldrack)
Added a 'ria' special remote to provide backwards compatibility with datasets that were set up with the deprecated ria-remote. PR #7235 (by @bpoldrack)
๐ Bug Fixes
- When
create-sibling-riawas invoked with a sibling name of a pre-existing sibling, a duplicate key in the result record caused a crashed. Fixes #6950 via PR #6952 (by @adswa)
๐ Documentation
create-sibling-ria's docstring now defines the schema of RIA URLs and clarifies internal layout of a RIA store. PR #6861 (by @adswa)
Move maintenance team info from issue to CONTRIBUTING. PR #6904 (by @adswa)
Describe specifications for a DataLad GitHub Action. PR #6931 (by @thewtex)
Fix capitalization of some service names. PR #6936 (by @aqw)
Command categories in help text are more consistently named. PR #7027 (by @aqw)
DOC: Add design document on Tests and CI. PR #7195 (by @adswa)
CONTRIBUTING.md was extended with up-to-date information on CI logging, changelog and release procedures. PR #7204 (by @yarikoptic)
๐ Internal
Use
looseversion.LooseVersionas drop-in replacement fordistutils.version.LooseVersionFixes #6307 via PR #6839 (by @effigies)Use --pathspec-from-file where possible instead of passing long lists of paths to git/git-annex calls. Fixes #6922 via PR #6932 (by @yarikoptic)
Make clone_dataset() better patchable ny extensions and less monolithic. PR #7017 (by @mih)
Remove
simplejsonin favor of usingjson. Fixes #7034 via PR #7035 (by @christian-monch)Fix an error in the command group names-test. PR #7044 (by @christian-monch)
Move eval_results() into interface.base to simplify imports for command implementations. Deprecate use from interface.utils accordingly. Fixes #6694 via PR #7170 (by @adswa)
๐ Performance
Use regular dicts instead of OrderedDicts for speedier operations. Fixes #6566 via PR #7174 (by @adswa)
Reimplement
get_submodules_()withoutget_content_info()for substantial performance boosts especially for large datasets with few subdatasets. Originally proposed in PR #6942 by @mih, fixing #6940. PR #7189 (by @adswa). Complemented with PR #7220 (by @yarikoptic) to avoidO(N^2)(instead ofO(N*log(N))performance in some cases.Use --include=* or --anything instead of --copies 0 to speed up getcontentannexinfo. PR #7230 (by @yarikoptic)
๐งช Tests
Reenable two now-passing core test on Windows CI. PR #7152 (by @adswa)
Remove the
with_testreposdecorator and associated tests for it Fixes #6752 via PR #7176 (by @adswa)
Breaking Changes
- Move all old-style metadata commands
aggregate_metadata,search,metadataandextract-metadata, as well as thecfg_metadatatypesprocedure and the old metadata extractors into the datalad-deprecated extension. Now recommended way of handling metadata is to install the datalad-metalad extension instead. Fixes #7012 via PR #7014
Internal
- Allow EnsureDataset constraint to handle Path instances. Fixes #7069 via PR #7133 (by @bpoldrack)
Enhancements and New Features
A repository description can be specified with a new
--descriptionoption when creating siblings usingcreate-sibling-[gin|gitea|github|gogs]. Fixes #6816 via PR #7109 (by @mslw)Make validation failure of alternative constraints more informative. Fixes #7092 via PR #7132 (by @bpoldrack)
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic-gitmate over 3 years ago
DataLad - 0.17.10
๐ Enhancements and New Features
Enhance concurrent invocation behavior of
ThreadedRunner.run(). If possible invocations are serialized instead of raising re-enter runtime errors. Deadlock situations are detected and runtime errors are raised instead of deadlocking. Fixes #7138 via PR #7201 (by @christian-monch)Exceptions bubbling up through CLI are now reported on including their chain of cause. Fixes #7163 via PR #7210 (by @bpoldrack)
๐ Bug Fixes
BF: read RIA config from stdin instead of temporary file. Fixes #6514 via PR #7147 (by @adswa)
Prevent doomed annex calls on files we already know are untracked. Fixes #7032 via PR #7166 (by @adswa)
Comply to Posix-like clone URL formats on Windows. Fixes #7180 via PR #7181 (by @adswa)
Ensure that paths used in the datalad-url field of .gitmodules are posix. Fixes #7182 via PR #7183 (by @adswa)
Bandaids for export-to-figshare to restore functionality. PR #7188 (by @adswa)
Fixes hanging threads when
close()ordelwhere called inBatchedCommandinstances. That could lead to hanging tests if the tests used the@serve_path_via_http()-decorator Fixes #6804 via PR #7201 (by @christian-monch)Interpret file-URL path components according to the local operating system as described in RFC 8089. With this fix,
datalad.network.RI('file:...').localpathreturns a correct local path on Windows if the RI is constructed with a file-URL. Fixes #7186 via PR #7206 (by @christian-monch)Fix a bug when retrieving several files from a RIA store via SSH, when the annex key does not contain size information. Fixes #7214 via PR #7215 (by @mslw)
Interface-specific (python vs CLI) doc generation for commands and their parameters was broken when brackets were used within the interface markups. Fixes #7225 via PR #7226 (by @bpoldrack)
๐ Documentation
- Fix documentation of
Runner.run()to not accept strings. Instead, encoding must be ensured by the caller. Fixes #7145 via PR #7155 (by @bpoldrack)
๐ Internal
Fix import of the
lscommand from datalad-deprecated for benchmarks. Fixes #7149 via PR #7154 (by @bpoldrack)Unify definition of parameter choices with
datalad clean. Fixes #7026 via PR #7161 (by @bpoldrack)
๐งช Tests
Fix test failure with old annex. Fixes #7157 via PR #7159 (by @bpoldrack)
Reenable now passing testpathdiff test on Windows. Fixes #3725 via PR #7194 (by @yarikoptic)
Use Plaintext keyring backend in tests to avoid the need for (interactive) authentication to unlock the keyring during (CI-) test runs. Fixes #6623 via PR #7209 (by @bpoldrack)
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic-gitmate over 3 years ago
DataLad - 0.17.9
๐ Bug Fixes
Various small fixups ran after looking post-release and trying to build Debian package. PR #7112 (by @yarikoptic)
BF: Fix add-archive-contents try-finally statement by defining variable earlier. PR #7117 (by @adswa)
Fix RIA file URL reporting in exception handling. PR #7123 (by @adswa)
HTTP download treated '429 - too many requests' as an authentication issue and was consequently trying to obtain credentials. Fixes #7129 via PR #7129 (by @bpoldrack)
๐ฉ Dependencies
Unrestrict pytest and pytest-cov versions. PR #7125 (by @jwodder)
Remove remaining references to
noseand the implied requirement for building the documentation Fixes #7100 via PR #7136 (by @bpoldrack)
๐ Internal
Use datalad/release-action. Fixes #7110. PR #7111 (by @jwodder)
Fix all logging to use %-interpolation and not .format, sort imports in touched files, add pylint-ing for % formatting in log messages to
tox -e lint. PR #7118 (by @yarikoptic)
๐งช Tests
Increase the upper time limit after which we assume that a process is stalling. That should reduce false positives from
datalad.support.tests.test_parallel.py::test_stalling, without impacting the runtime of passing tests. PR #7119 (by @christian-monch)XFAIL a check on length of results in testgracefulldeath. PR #7126 (by @yarikoptic)
Configure Git to allow for "file" protocol in tests. PR #7130 (by @yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic-gitmate over 3 years ago
DataLad - 0.17.8
Bug Fixes
Prevent adding duplicate entries to .gitmodules. PR #7088 (by @yarikoptic)
[BF] Prevent double yielding of impossible get result Fixes #5537. PR #7093 (by @jsheunis)
Stop rendering the output of internal
subdatset()call in the results ofrun_procedure(). Fixes #7091 via PR #7094 (by @mslw & @mih)Improve handling of
--existing reconfigureincreate-sibling-ria: previously, the command would not make the underlyinggit initcall for existing local repositories, leading to some configuration updates not being applied. Partially addresses https://github.com/datalad/datalad/issues/6967 via https://github.com/datalad/datalad/pull/7095 (by @mslw)Ensure subprocess environments have a valid path in
os.environ['PWD'], even if a Path-like object was given to the runner on subprocess creation or invocation. Fixes #7040 via PR #7107 (by @christian-monch)
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic-gitmate over 3 years ago
DataLad - 0.17.7
Bug Fixes
Let
EnsureChoicereport the value is failed validating. PR #7067 (by @mih)Avoid writing to stdout/stderr from within datalad sshrun. This could lead to broken pipe errors when cloning via SSH and was superfluous to begin with. Fixes https://github.com/datalad/datalad/issues/6599 via https://github.com/datalad/datalad/pull/7072 (by @bpoldrack)
BF: lock across threads check/instantiation of Flyweight instances. Fixes #6598 via PR #7075 (by @yarikoptic)
Internal
Do not use
gen4-metadata methods indatalad metadata-command. PR #7001 (by @christian-monch)Revert "Remove chardet version upper limit" (introduced in 0.17.6~11^2) to bring back upper limit <= 5.0.0 on chardet. Otherwise we can get some deprecation warnings from requests PR #7057 (by @yarikoptic)
Ensure that
BatchedCommandErroris raised if the subprocesses ofBatchedCommandfails or raises aCommandError. PR #7068 (by @christian-monch)RF: remove unused code str-ing PurePath. PR #7073 (by @yarikoptic)
Update GitHub Actions action versions. PR #7082 (by @jwodder)
Tests
- Fix broken test helpers for result record testing that would falsely pass. PR #7002 (by @bpoldrack)
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic-gitmate over 3 years ago
DataLad - 0.17.6
Bug Fixes
UX: push - provide specific error with details if push failed due to permission issue. PR #7011 (by @yarikoptic)
Fix datalad --help to not have Global options empty with python 3.10 and list options in "options:" section. PR #7028 (by @yarikoptic)
Let
createtouch the dataset root, if not saving in parent dataset. PR #7036 (by @mih)Let getstatusdict() use exception message if none is passed. PR #7037 (by @mih)
Make choices for
status|diff --annexandstatus|diff --untrackedvisible. PR #7039 (by @mih)push: Assume 0 bytes pushed if git-annex does not provide bytesize. PR #7049 (by @yarikoptic)
Internal
Tests
- Allow for any 2 from first 3 to be consumed in testgracefulldeath. PR #7041 (by @yarikoptic)
note: was retagged manually with changelog from non-annotated 0.17.6 minted by github
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic-gitmate over 3 years ago
DataLad - 0.17.5
๐ Bug Fix
- BF: blacklist 23.9.0 of keyring as introduces regression #7003 (@yarikoptic)
- Make the manpages build reproducible via datalad.source.epoch (to be used in Debian packaging) #6997 (@lamby bot@datalad.org @yarikoptic)
- BF: backquote path/drive in Changelog #6997 (@yarikoptic)
Authors: 3
- Chris Lamb (@lamby)
- DataLad Bot (bot@datalad.org)
- Yaroslav Halchenko (@yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by github-actions[bot] over 3 years ago
DataLad - 0.17.4
๐ Bug Fix
- BF: make logic more consistent for files=[] argument (which is False but not None) #6976 (@yarikoptic)
- Run pytests in parallel (-n 2) on appveyor #6987 (@yarikoptic)
- Add workflow for autogenerating changelog snippets #6981 (@jwodder)
- Provide "/dev/null" (b:\nul on ๐พ Windows) instead of empty string as a git-repo to avoid reading local repo configuration #6986 (@yarikoptic)
- RF: callfromparser - move code into "else" to simplify reading etc #6982 (@yarikoptic)
- BF: if early attempt to parse resulted in error, setup subparsers #6980 (@yarikoptic)
- Run pytests in parallel (-n 2) on Travis #6915 (@yarikoptic)
- Send one character (no newline) to stdout in protocol test to guarantee a single "message" and thus a single custom value #6978 (@christian-monch)
๐งช Tests
- TST: test_stalling -- wait x10 not just x5 time #6995 (@yarikoptic)
Authors: 3
- Christian Mรถnch (@christian-monch)
- John T. Wodder II (@jwodder)
- Yaroslav Halchenko (@yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by github-actions[bot] over 3 years ago
DataLad - 0.17.2
๐ Bug Fix
- BF(TST): do proceed to proper test for error being caught for recent git-annex on windows with symlinks #6850 (@yarikoptic)
- Addressing problem testing against python 3.10 on Travis (skip more annex versions) #6842 (@yarikoptic)
- XFAIL testrunnerparametrized_protocol on python3.8 when getting duplicate output #6837 (@yarikoptic)
- BF: Make create's check for procedures work with several again #6841 (@adswa)
- Support older pytests #6836 (@jwodder)
Authors: 3
- Adina Wagner (@adswa)
- John T. Wodder II (@jwodder)
- Yaroslav Halchenko (@yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by github-actions[bot] almost 4 years ago
DataLad - 0.17.1
๐ Bug Fix
- DOC: minor fix - consistent DataLad (not Datalad) in docs and CHANGELOG #6830 (@yarikoptic)
- DOC: fixup/harmonize Changelog for 0.17.0 a little #6828 (@yarikoptic)
- BF: use --python-match minor option in new datalad-installer release to match outside version of Python #6827 (@christian-monch @yarikoptic)
- Do not quote paths for ssh >= 9 #6826 (@christian-monch @yarikoptic)
- Suppress DeprecationWarning to allow for distutils to be used #6819 (@yarikoptic)
- RM(TST): remove testing of datalad.test which was removed from 0.17.0 #6822 (@yarikoptic)
- Avoid import of nose-based tests.utils, make skipifnomodule() and skipifnonetwork() allowed at module level #6817 (@jwodder)
- BF(TST): use higher level asyncio.run instead of asyncio.geteventloop in testinsideasync #6808 (@yarikoptic)
Authors: 3
- Christian Mรถnch (@christian-monch)
- John T. Wodder II (@jwodder)
- Yaroslav Halchenko (@yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by github-actions[bot] almost 4 years ago
DataLad - 0.17.0 (Thu Jul 7 2022) -- pytest migration
๐ซ Enhancements and new features
- "log" progress bar now reports about starting a specific action as well. #6756 (by @yarikoptic)
- Documentation and behavior of traceback reporting for log messages via
DATALAD_LOG_TRACEBACKwas improved to yield a more compact report. The documentation for this feature has been clarified. #6746 (by @mih) datalad unlockgained a progress bar. #6704 (by @adswa)- When
create-sibling-gitlabis called on non-existing subdatasets or paths it now returns an impossible result instead of no feedback at all. #6701 (by @adswa) datalad wtfincludes a report on file system types of commonly used paths. #6664 (by @adswa)- use next generation metadata code in search, if it is available #6518 (by @christian-monch)
๐ช Deprecations and removals
- Remove unused and untested log helpers
NoProgressLogandOnlyProgressLog. #6747 (by @mih) - Remove unused
sorted_files()helper. #6722 (by @adswa) - Discontinued the value
stdoutfor use with the config variabledatalad.log.targetas its use would inevitably break special remote implementations. #6675 (by @bpoldrack) AnnexRepo.add_urls()is deprecated in favor ofAnnexRepo.add_url_to_file()or a direct call toAnnexRepo.call_annex(). #6667 (by @mih)datalad testcommand and supporting functionality (e.g.,datalad.test) were removed. # (by @jwodder) #### ๐ Bug Fixesexport-archivedoes not rely onnormalize_path()methods anymore and became more robust when called from subdirectories. #6745 (by @adswa)- Sanitize keys before checking content availability to ensure that the content availability of files with URL- or custom backend keys is correctly determined and marked. #6663 (by @adswa)
- Ensure saving a new subdataset to a superdataset yields a valid
.gitmodulesrecord regardless of whether and how a path constraint is given to thesave()call. Fixes #6547 #6790 (by @mih) savenow repairs annex symlinks broken by agit-mvoperation prior recording a new dataset state. Fixes #4967 #6795 (by @mih)
๐ Documentation
- API documentation for log helpers, like
log_progress()is now included in the renderer documentation. #6746 (by @mih) - New design document on progress reporting. #6734 (by @mih)
- Explain downstream consequences of using
--fastoption inaddurls. #6684 (by @jdkent)
๐ Internal
- Inline code of
create-sibling-riahas been refactored to an internal helper to check for siblings with particular names across dataset hierarchies indatalad-next, and is reintroduced into core to modularize the code base further. #6706 (by @adswa) get_initialized_loggernow lets a givenlogtargettake precendence overdatalad.log.target. #6675 (by @bpoldrack)- Many uses of deprecated call options were replaced with the recommended ones. #6273 (by @jwodder)
- Get rid of
asyncioimport by defining few noops methods fromasyncio.protocols.SubprocessProtocoldirectly inWitlessProtocol. #6648 (by @yarikoptic) - Consolidate
GitRepo.remove()andAnnexRepo.remove()into a single implementation. #6783 (by @mih) #### ๐ก Tests - Discontinue use of
with_testreposdecorator other than for the deprecation cycle fornose. #6690 (by @mih @bpoldrack) See #6144 for full list of changes. - Remove usage of deprecated
AnnexRepo.add_urlsin tests. #6683 (by @bpoldrack) - Minimalistic (adapters, no assert changes, etc) migration from
nosetopytest. Support functionality possibly used by extensions and relying onnosehelpers is left in place to avoid affecting their run time and defer migration of their test setups.. #6273 (by @jwodder)
Authors: 7
- Yaroslav Halchenko (@yarikoptic)
- Michael Hanke (@mih)
- Benjamin Poldrack (@bpoldrack)
- Adina Wagner (@adswa)
- John T. Wodder (@jwodder)
- Christian Mรถnch (@christian-monch)
- James Kent (@jdkent)
Scientific Software - Peer-reviewed
- Python
Published by bpoldrack almost 4 years ago
DataLad - 0.16.7
๐ Bug Fix
- Fix broken annex symlink after git-mv before saving + fix a race condition in ssh copy test #6809 (@christian-monch @mih @yarikoptic)
- Do not ignore already known status info on submodules #6790 (@mih)
- Fix "common data source" test to use a valid URL (maint-based & extended edition) #6788 (@mih @yarikoptic)
- Upload coverage from extension tests to Codecov #6781 (@jwodder)
- Clean up line end handling in GitRepo #6768 (@christian-monch)
- Do not skip file-URL tests on windows #6772 (@christian-monch)
- Fix test errors caused by updated chardet v5 release #6777 (@christian-monch)
- Preserve final trailing slash in
call_git()output #6754 (@adswa @yarikoptic @christian-monch)
โ ๏ธ Pushed to maint
- Make sure a subdataset is saved with a complete .gitmodules record (@mih)
Authors: 5
- Adina Wagner (@adswa)
- Christian Mรถnch (@christian-monch)
- John T. Wodder II (@jwodder)
- Michael Hanke (@mih)
- Yaroslav Halchenko (@yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by github-actions[bot] almost 4 years ago
DataLad - 0.16.6
๐ Bug Fix
- Prevent duplicated result rendering when searching in default datasets #6765 (@christian-monch)
- BF(workaround): skip testriapostclonecfg on OSX for now (@yarikoptic)
- BF(workaround to #6759): if saving credential failed, just log error and continue #6762 (@yarikoptic)
- Prevent reentry of a runner instance #6737 (@christian-monch)
Authors: 2
- Christian Mรถnch (@christian-monch)
- Yaroslav Halchenko (@yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by github-actions[bot] almost 4 years ago
DataLad - 0.16.5
๐ Bug Fix
- BF: push to github - remove datalad-push-default-first config only in non-dry run to ensure we push default branch separately in next step #6750 (@yarikoptic)
- In addition to default (system) ssh version, report configured ssh; fix ssh version parsing on Windows #6729 (@yarikoptic)
Authors: 1
- Yaroslav Halchenko (@yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by github-actions[bot] almost 4 years ago
DataLad - 0.16.4
๐ Bug Fix
- BF(TST): RO operations - add test directory into git safe.directory #6726 (@yarikoptic)
- DOC: fixup of docstring for skip_ssh #6727 (@yarikoptic)
- DOC: Set language in Sphinx config to en #6727 (@adswa)
- BF: Catch KeyErrors from unavailable WTF infos #6712 (@adswa)
- Add annex.private to ephemeral clones. That would make git-annex not assign shared (in git-annex branch) annex uuid. #6702 (@bpoldrack @adswa)
- BF: require argcomplete version at least 1.12.3 to test/operate correctly #6693 (@yarikoptic)
- Replace Zenodo DOI with JOSS for due credit #6725 (@adswa)
Authors: 3
- Adina Wagner (@adswa)
- Benjamin Poldrack (@bpoldrack)
- Yaroslav Halchenko (@yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by github-actions[bot] almost 4 years ago
DataLad - 0.16.3
๐ Bug Fix
- No change for a PR to trigger release #6692 (@yarikoptic)
- Sanitize keys before checking content availability to ensure correct value for keys with URL or custom backend #6665 (@adswa @yarikoptic)
- Change a key-value pair in drop result record #6625 (@mslw)
- Link docs of datalad-next #6677 (@mih)
- Fix
GitRepo.get_branch_commits_()to handle branch names conflicts with paths #6661 (@mih) - OPT: AnnexJsonProtocol - avoid dragging possibly long data around #6660 (@yarikoptic)
- Remove two too prominent create() INFO log message that duplicate DEBUG log and harmonize some other log messages #6638 (@mih @yarikoptic)
- Remove unsupported parameter createsiblingria(existing=None) #6637 (@mih)
- Add released plugin to .autorc to annotate PRs on when released #6639 (@yarikoptic)
Authors: 4
- Adina Wagner (@adswa)
- Michael Hanke (@mih)
- Michaล Szczepanik (@mslw)
- Yaroslav Halchenko (@yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by github-actions[bot] about 4 years ago
DataLad - 0.16.2
๐ Bug Fix
- Demote (to level 1 from DEBUG) and speed-up API doc logging (parseParameters) #6635 (@mih)
- Factor out actual data transfer in push #6618 (@christian-monch)
- ENH: include version of datalad in tests teardown Versions: report #6628 (@yarikoptic)
- MNT: Require importlib-metadata >=3.6 for Python < 3.10 for entry_points taking kwargs #6631 (@effigies)
- Factor out credential handling of create-sibling-ghlike #6627 (@mih)
- BF: Fix wrong key name of annex' JSON records #6624 (@bpoldrack)
โ ๏ธ Pushed to maint
- Fix typo in changelog (@mih)
- [ci skip] minor typo fix (@yarikoptic)
Authors: 5
- Benjamin Poldrack (@bpoldrack)
- Chris Markiewicz (@effigies)
- Christian Mรถnch (@christian-monch)
- Michael Hanke (@mih)
- Yaroslav Halchenko (@yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by github-actions[bot] about 4 years ago
DataLad - 0.16.1 (Fr Apr 8 2022) -- April Fools' Release
- Fixes forgotten changelog in docs
Scientific Software - Peer-reviewed
- Python
Published by bpoldrack about 4 years ago
DataLad - 0.16.0 (Fr Apr 8 2022) -- Spring cleaning!
๐ซ Enhancements and new features
- A new set of
create-sibling-*commands reimplements the GitHub-platform support ofcreate-sibling-githuband adds support to interface three new platforms in a unified fashion: GIN (create-sibling-gin), GOGS (create-sibling-gogs), and Gitea (create-sibling-gitea). All commands rely on personal access tokens only for authentication, allow for specifying one of several stored credentials via a uniform--credentialparameter, and support a uniform--dry-runmode for testing without network. #5949 (by @mih) create-sibling-githubnow has supports direct specification of organization repositories via a[<org>/]reposyntax #5949 (by @mih)create-sibling-gitlabgained a--dry-runparameter to match the corresponding parameters increate-sibling-{github,gin,gogs,gitea}#6013 (by @adswa)- The
--new-store-okparameter ofcreate-sibling-riaonly creates new RIA stores when explicitly provided #6045 (by @adswa) - The default performance of
status()anddiff()commands is improved by up to 700% removing file-type evaluation as a default operation, and simplifying the type reporting rule #6097 (by @mih) drop()andremove()were reimplemented in full, conceptualized as the antagonist commands toget()andclone(). A new, harmonized set of parameters (--what ['filecontent', 'allkeys', 'datasets', 'all'],--reckless ['modification', 'availability', 'undead', 'kill']) simplifies their API. Both commands include additional safeguards.uninstallis replaced with a thin shim command arounddrop()#6111 (by @mih)add_archive_content()was refactored into a dataset method and gained progress bars #6105 (by @adswa)- The
dataladanddatalad-archivesspecial remotes have been reimplemented based onAnnexRemote#6165 (by @mih) - The
result_renderer()semantics were decomplexified and harmonized. The previousdefaultresult renderer was renamed togeneric. #6174 (by @mih) get_status_dictlearned to include exit codes in the case of CommandErrors #5642 (by @yarikoptic)datalad clonecan now pass options togit-clone, adding support for cloning specific tags or branches, naming siblings other names thanorigin, and exposinggit clone's optimization arguments #6218 (by @kyleam and @mih)- Inactive BatchedCommands are cleaned up #6206 (by @jwodder)
export-archive-oralearned to filter files exported to 7z archives #6234 (by @mih and @bpinsard)datalad runlearned to glob recursively #6262 (by @AKSoo)- The ORA remote learned to recover from interrupted uploads #6267 (by @mih)
- A new threaded runner with support for timeouts and generator-based subprocess communication is introduced and used in
BatchedCommandandAnnexRepo#6244 (by @christian-monch) - A new switch allows to enable librarymode and queries for the effective API in use #6213 (by @mih)
runandrerunnow support parallel jobs via--jobs#6279 (by @AKSoo)- A new
foreach-datasetplumbing command allows to run commands on each (sub)dataset, similar togit submodule foreach#5517 (by @yariktoptic) - The
datasetparameter is not restricted to only locally resolvable file-URLs anymore #6276 (by @christian-monch) - DataLad's credential system is now able to query
git-credentialby specifying credential typegitin the respective provider configuration #5796 (by @bpoldrack) - DataLad now comes with a git credential helper
git-credential-dataladallowing Git to query DataLad's credential system #5796 (by @bpoldrack and @mih) - The new runner now allows for multiple threads #6371 (by @christian-monch)
- A new configurationcommand provides an interface to manipulate and query the DataLad configuration. #6306 (by @mih)
- Unlike the global Python-only datalad.cfg or dataset-specific Dataset.config configuration managers, this command offers a uniform API across the Python and the command line interfaces.
- This command was previously available in the mihextras extension as x-configuration, and has been merged into the core package in an improved version. #5489 (by @mih)
- In its default dump mode, the command provides an annotated list of the effective configuration after considering all configuration sources, including hints on additional configuration settings and their supported values.
- The command line interface help-reporting has been sped up by ~20% #6370 #6378 (by @mih)
ConfigManagernow supports reading committed dataset configuration in bare repositories. Analog to reading.datalad/configfrom a worktree,blob:HEAD:.datalad/configis read (e.g., the config committed in the default branch). The support includes `reload()change detection using the gitsha of this file. The behavior for non-bare repositories is unchanged. #6332 (by @mih)- The CLI help generation has been sped up, and now also supports the completion of parameter values for a fixed set of choices #6415 (by @mih)
- Individual command implementations can now declare a specific "on-failure" behavior by defining
Interface.on_failureto be one of the supported modes (stop, continue, ignore). Previously, such a modification was only possible on a per-call basis. #6430 (by @mih) - The
runcommand changed its default "on-failure" behavior fromcontinuetostop. This change prevents the execution of a command in case a declared input can not be obtained. Previously, only an error result was yielded (and run eventually yielded a non-zero exit code or anIncompleteResultsException), but the execution proceeded and potentially saved a dataset modification despite incomplete inputs, in case the command succeeded. This previous default behavior can still be achieved by calling run with the equivalent of--on-failure continue#6430 (by @mih) - The `
runcommand now provides readily executable, API-specific instructions how to save the results of a command execution that failed expectedly #6434 (by @mih) create-sibling --since=^mode will now be as fast aspush --since=^to figure out for which subdatasets to create siblings #6436 (by @yarikoptic)- When file names contain illegal characters or reserved file names that are incompatible with Windows systems a configurable check for
save(datalad.save.windows-compat-warning) will either do nothing (none), emit an incompatibility warning (warning, default), or causesaveto error (error) #6291 (by @adswa) - Improve responsiveness of
datalad dropin datasets with a large annex. #6580 (by @christian-monch) savecode might operate faster on heavy file trees #6581 (by @yarikoptic)- Removed a per-file overhead cost for ORA when downloading over HTTP #6609 (by @bpoldrack)
- A new module
datalad.support.extensionsoffers the utility functionsregister_config()andhas_config()that allow extension developers to announce additional configuration items to the central configuration management. #6601 (by @mih) - When operating in a dirty dataset,
export-to-figsharenow yields and impossible result instead of raising a RunTimeError #6543 (by @adswa) - Loading DataLad extension packages has been sped-up leading to between 2x and 4x faster run times for loading individual extensions and reporting help output across all installed extensions. #6591 (by @mih)
- Introduces the configuration key
datalad.ssh.executable. This key allows specifying an ssh-client executable that should be used by datalad to establish ssh-connections. The default value issshunless on a Windows system where$WINDIR\System32\OpenSSH\ssh.exeexists. In this case, the value defaults to$WINDIR\System32\OpenSSH\ssh.exe. #6553 (by @christian-monch) - create-sibling should perform much faster in case of
--sincespecification since would consider only submodules related to the changes since that point. #6528 (by @yarikoptic) - A new configuration setting
datalad.ssh.try-use-annex-bundled-git=yes|nocan be used to influence the default remote git-annex bundle sensing for SSH connections. This was previously done unconditionally for any call todatalad sshrun(which is also used for any SSH-related Git or git-annex functionality triggered by DataLad-internal processing) and could incur a substantial per-call runtime cost. The new default is to not perform this sensing, because for, e.g., use as GITSSHCOMMAND there is no expectation to have a remote git-annex installation, and even with an existing git-annex/Git bundle on the remote, it is not certain that the bundled Git version is to be preferred over any other Git installation in a user's PATH. #6533 (by @mih) runnow yields a result record immediately after executing a command. This allows callers to use the standard--on-failure switchto control whether dataset modifications will be saved for a command that exited with an error. #6447 (by @mih)
๐ช Deprecations and removals
- The
--pbs-runnercommandline option (deprecated in0.15.0) was removed #5981 (by @mih) - The dependency to PyGithub was dropped #5949 (by @mih)
create-sibling-github's credential handling was trimmed down to only allow personal access tokens, because GitHub discontinued user/password based authentication #5949 (by @mih)create-sibling-gitlab's--dryrunparameter is deprecated in favor or--dry-run#6013 (by @adswa)- Internal obsolete
Gitrepo.*_submodulemethods were moved todatalad-deprecated#6010 (by @mih) datalad/support/versions.pyis unused in DataLad core and removed #6115 (by @yarikoptic)- Support for the undocumented
datalad.api.result-rendererconfig setting has been dropped #6174 (by @mih) - Undocumented use of
result_renderer=Noneis replaced withresult_renderer='disabled'#6174 (by @mih) remove's--recursiveargument has been deprecated #6257 (by @mih)- The use of the internal helper
get_repo_instance()is discontinued and deprecated #6268 (by @mih) - Support for Python 3.6 has been dropped (#6286 (by @christian-monch) and #6364 (by @yarikoptic))
- All but one Singularity recipe flavor have been removed due to their limited value with the end of life of Singularity Hub #6303 (by @mih)
- All code in module datalad.cmdline was (re)moved, only datalad.cmdline.helpers.getrepoinstanceis kept for a deprecation period (by @mih)
datalad.interface.common_opts.eval_defaulthas been deprecated. All (command-specific) defaults for common interface parameters can be read fromInterfaceclass attributes (#6391 (by @mih)- Remove unused and untested
datalad.interface.utilshelperscls2cmdlinenameandpath_is_under#6392 (by @mih) - An unused code path for result rendering was removed from the CLI
main()#6394 (by @mih) create-siblingwill require now"^"instead of an empty string for since option #6436 (by @yarikoptic)runno longer raises aCommandErrorexception for failed commands, but yields anerrorresult that includes a superset of the information provided by the exception. This change impacts command line usage insofar as the exit code of the underlying command is no longer relayed as the exit code of theruncommand call -- althoughruncontinues to exit with a non-zero exit code in case of an error. For Python API users, the nature of the raised exception changes fromCommandErrortoIncompleteResultsError, and the exception handling is now configurable using the standardon_failurecommand argument. The originalCommandErrorexception remains available via theexceptionproperty of the newly introduced result record for the command execution, and this result record is available viaIncompleteResultsError.failed, if such an exception is raised. #6447 (by @mih)- Custom cast helpers were removed from datalad core and migrated to a standalone repository https://github.com/datalad/screencaster #6516 (by @adswa)
- The
bundledparameter ofget_connection_hash()is now ignored and will be removed with a future release. #6532 (by @mih) BaseDownloader.fetch()is logging download attempts on DEBUG (previously INFO) level to avoid polluting output of higher-level commands. #6564 (by @mih)
๐ Bug Fixes
create-sibling-gitlaberroneously overwrote existing sibling configurations. A safeguard will now prevent overwriting and exit with an error result #6015 (by @adswa)create-sibling-gogsnow relays HTTP500 errors, such as "no space left on device" #6019 (by @mih)annotate_paths()is removed from the last parts of code base that still contained it #6128 (by @mih)add_archive_content()doesn't crash with--keyand--use-current-diranymore #6105 (by @adswa)run-procedurenow returns an error result when a non-existent procedure name is specified #6143 (by @mslw)- A fix for a silent failure of
download-url --archivewhen extracting the archive #6172 (by @adswa) - Uninitialized AnnexRepos can now be dropped #6183 (by @mih)
- Instead of raising an error, the formatters tests are skipped when the
formattersmodule is not found #6212 (by @adswa) create-sibling-gindoes not disable git-annex availability on Gin remotes anymore #6230 (by @mih)- The ORA special remote messaging is fixed to not break the special remote protocol anymore and to better relay messages from exceptions to communicate underlying causes #6242 (by @mih)
- A
keyring.delete()call was fixed to not call an uninitialized private attribute anymore #6253 (by @bpoldrack) - An erroneous placement of result keyword arguments into a
format()method instead ofget_status_dict()ofcreate-sibling-riahas been fixed #6256 (by @adswa) status,run-procedure, andmetadataare no longer swallowing result-related messages in renderers #6280 (by @mih)uninstallnow recommends the new--recklessparameter instead of the deprecated--nocheckparameter when reporting hints #6277 (by @adswa)download-urllearned to handle Pathobjects #6317 (by @adswa)- Restore default result rendering behavior broken by Key interface documentation #6394 (by @mih)
- Fix a broken check for file presence in the
ConfigManagerthat could have caused a crash in rare cases when a config file is removed during the process runtime #6332 (by @mih)-ConfigManager.get_from_source()now accesses the correct information when using the documentedsource='local'`, avoiding a crash #6332 (by @mih) runno longer let's the internal call tosaverender its results unconditionally, but the parameterization f run determines the effective rendering format. #6421 (by @mih)- Remove an unnecessary and misleading warning from the runner #6425 (by @christian-monch)
- A number of commands stopped to double-report results #6446 (by @adswa)
create-sibling-riano longer creates anannex/objectsdirectory in-store, when called with--no-storage-sibling. #6495 (by @bpoldrack )- Improve error message when an invalid URL is given to
clone. #6500 (by @mih) - DataLad declares a minimum version dependency to
keyring >= 20.0to ensure that token-based authentication can be used. #6515 (by @adswa) - ORA special remote tries to obtain permissions when dropping a key from a RIA store rather than just failing. Thus having the same permissions in the store's object trees as one directly managed by git-annex would have, works just fine now. #6493 (by @bpoldrack )
require_dataset()now uniformly raisesNoDatasetFoundwhen no dataset was found. Implementations that catch the previously documentedInsufficientArgumentsErroror the actually raisedValueErrorwill continue to work, becauseNoDatasetFoundis derived from both types. #6521 (by @mih)- Keyboard-interactive authentication is now possibly with non-multiplexed SSH connections (i.e., when no connection sharing is possible, due to lack of socket support, for example on Windows). Previously, it was disabled forcefully by DataLad for no valid reason. #6537 (by @mih)
- Remove duplicate exception type in reporting of top-level CLI exception handler. #6563 (by @mih)
- Fixes DataLad's parsing of git-annex' reporting on unknown paths depending on its version and the value of the
annex.skipunknownconfig. #6550 (by @bpoldrack) - Fix ORA special remote not properly reporting on HTTP failures. #6535 (by @bpoldrack)
- ORA special remote didn't show per-file progress bars when downloading over HTTP #6609 (by @bpoldrack)
savenow can commit the change where file becomes a directory with a staged for commit file. #6581 (by @yarikoptic)create-siblingwill no longer create siblings for not yet saved new subdatasets, and will now create sub-datasets nested in the subdatasets which did not yet have those siblings. #6603 (by @yarikoptic)
๐ Documentation
- A new design document sheds light on result records #6167 (by @mih)
- The
disabledresult renderer mode is documented #6174 (by @mih) - A new design document sheds light on the
dataladanddatalad-archivesspecial remotes #6181 (by @mih) - A new design document sheds light on
BatchedCommandandBatchedAnnex#6203 (by @christian-monch) - A new design document sheds light on standard parameters #6214 (by @adswa)
- The DataLad project adopted the Contributor Covenant COC v2.1 #6236 (by @adswa)
- Docstrings learned to include Sphinx' "version added" and "deprecated" directives #6249 (by @mih)
- A design document sheds light on basic docstring handling and formatting #6249 (by @mih)
- A new design document sheds light on position versus keyword parameter usage #6261 (by @yarikoptic)
create-sibling-gin's examples have been improved to suggestpushas an additional step to ensure proper configuration #6289 (by @mslw)- A new document describes the credential system from a user's perspective #5796 (by @bpoldrack)
- Enhance the design document on DataLad's credential system #5796 (by @bpoldrack)
- The documentation of the configuration command now details all locations DataLad is reading configuration items from, and their respective rules of precedence #6306 (by @mih)
- API docs for datalad.interface.base are now included in the documentation #6378 (by @mih)
- A new design document is provided that describes the basics of the command line interface implementation #6382 (by @mih)
- The `
datalad.interface.base.Interfaceclass, the basis of all DataLad command implementations, has been extensively documented to provide an overview of basic principles and customization possibilities #6391 (by @mih) --since=^mode of operation ofcreate-siblingis documented now #6436 (by @yarikoptic)
๐ Internal
- The internal
status()helper was equipped with docstrings and promotes "breadth-first" reporting with a new parameterreporting_order#6006 (by @mih) AnnexRepo.get_file_annexinfo()is introduced for more convinient queries for single files and replaces a now deprecatedAnnexRepo.get_file_key()to receive information with fewer calls to Git #6104 (by @mih)- A new
get_paths_by_ds()helper exposesstatus' path normalization and sorting #6110 (by @mih) statusis optimized with a cache for dataset roots #6137 (by @yarikoptic)- The internal
get_func_args_doc()helper with Python 2 is removed from DataLad core #6175 (by @yarikoptic) - Further restructuring of the source tree to better reflect the internal dependency structure of the code:
AddArchiveContentis moved fromdatalad/interfacetodatalad/local(#6188 (by @mih)),Cleanis moved fromdatalad/interfacetodatalad/local(#6191 (by @mih)),Unlockis moved fromdatalad/interfacetodatalad/local(#6192 (by @mih)),DownloadURLis moved fromdatalad/interfacetodatalad/local(#6217 (by @mih)),Rerunis moved fromdatalad/interfacetodatalad/local(#6220 (by @mih)),RunProcedureis moved fromdatalad/interfacetodatalad/local(#6222 (by @mih)). The interface command list is restructured and resorted #6223 (by @mih) wraptis replaced with functools'wraps#6190 (by @yariktopic)- The unmaintained
appdirslibrary has been replaced withplatformdirs#6198 (by @adswa) - Modelines mismatching the code style in source files were fixed #6263 (by @AKSoo)
datalad/__init__.pyhas been cleaned up #6271 (by @mih)GitRepo.call_git_itemsis implemented with a generator-based runner #6278 (by @christian-monch)- Separate positional from keyword arguments in the Python API to match CLI with
*#6176 (by @yarikoptic), #6304 (by @christian-monch) GitRepo.baredoes not require the ConfigManager anymore #6323 (by @mih)_get_dot_git()was reimplemented to be more efficient and consistent, by testing for common scenarios first and introducing a consistently appliedresolvedflag for result path reporting #6325 (by @mih)- All data files under
dataladare now included when installing DataLad #6336 (by @jwodder) - Add internal method for non-interactive provider/credential storing #5796 (by @bpoldrack)
- Allow credential classes to have a context set, consisting of a URL they are to be used with and a dataset DataLad is operating on, allowing to consider "local" and "dataset" config locations #5796 (by @bpoldrack)
- The Interface method
get_refds_path()was deprecated #6387 (by @adswa) datalad.interface.base.Interfaceis now an abstract class #6391 (by @mih)- Simplified the decision making for result rendering, and reduced code complexity #6394 (by @mih)
- Reduce code duplication in
datalad.support.json_py#6398 (by @mih) - Use public
ArgumentParser.parse_known_argsinstead of protected_parse_known_args#6414 (by @yarikoptic) add-archive-contentdoes not rely on the deprecatedtempfile.mktempanymore, but uses the more securetempfile.mkdtemp#6428 (by @adswa)- AnnexRepo's internal
annexstatusis deprecated. In its place, a new test helper assists the few tests that rely on it #6413 (by @adswa) confighas been refactored fromwhere[="dataset"]toscope[="branch"]#5969 (by @yarikoptic)- Common command arguments are now uniformly and exhaustively passed to result renderers and filters for decision making. Previously, the presence of a particular argument depended on the respective API and circumstances of a command call. #6440 (by @mih)
- Entrypoint processing for extensions and metadata extractors has been consolidated on a uniform helper that is about twice as fast as the previous implementations. #6591 (by @mih)
๐ก Tests
- A range of Windows tests pass and were enabled #6136 (by @adswa)
- Invalid escape sequences in some tests were fixed #6147 (by @mih)
- A cross-platform compatible HTTP-serving test environment is introduced #6153 (by @mih)
- A new helper exposes
serve_path_via_httpto the command line to deploy an ad-hoc instance of the HTTP server used for internal testing, with SSL and auth, if desired. #6169 (by @mih) - Windows tests were redistributed across worker runs to harmonize runtime #6200 (by @adswa)
Batchedcommandgained a basic test #6203 (by @christian-monch)- The use of
with_testrepois discontinued in all core tests #6224 (by @mih) - The new
git-annex.filter.annex.processconfiguration is enabled by default on Windows to speed up the test suite #6245 (by @mih) - If the available Git version supports it, the test suite now uses
GIT_CONFIG_GLOBALto configure a fake home directory instead of overwritingHOMEon OSX (#6251 (by @bpoldrack)) andHOMEandUSERPROFILEon Windows #6260 (by @adswa) - Windows test timeouts of runners were addressed #6311 (by @christian-monch)
- A handful of Windows tests were fixed (#6352 (by @yarikoptic)) or disabled (#6353 (by @yarikoptic))
download-url's test underhttp_proxyare skipped when a session can't be established #6361 (by @yarikoptic)- A test for
datalad cleanwas fixed to be invoked within a dataset #6359 (by @yarikoptic) - The new datalad.cli.tests have an improved module coverage of 80% #6378 (by @mih)
- The
test_source_candidate_subdatasethas been marked as@slow#6429 (by @yarikoptic) - Dedicated
CLIbenchmarks exist now #6381 (by @mih) - Enable code coverage report for subprocesses #6546 (by @adswa)
- Skip a test on annex>=10.20220127 due to a bug in annex. See https://git-annex.branchable.com/bugs/Changetoannex.largefilesleavesrepo_modified/
๐ง Infra
- A new issue template using GitHub forms prestructures bug reports #6048 (by @Remi-Gau)
- DataLad and its dependency stack were packaged for Gentoo Linux #6088 (by @TheChymera)
- The readthedocs configuration is modernized to version 2 #6207 (by @adswa)
- The Windows CI setup now runs on Appveyor's Visual Studio 2022 configuration #6228 (by @adswa)
- The
readthedocs-themeandSphinxversions were pinned to reenable rendering of bullet points in the documentation #6346 (by @adswa) - The PR template was updated with a CHANGELOG template. Future PRs should use it to include a summary for the CHANGELOG #6396 (by @mih)
Authors: 11
- Michael Hanke (@mih)
- Yaroslav Halchenko (@yarikoptic)
- Adina Wagner (@adswa)
- Remi Gau (@Remi-Gau)
- Horea Christian (@TheChymera)
- Michaล Szczepanik (@mslw)
- Christian Mรถnch (@christian-monch)
- John T. Wodder (@jwodder)
- Benjamin Poldrack (@bpoldrack)
- Sin Kim (@AKSoo)
- Basile Pinsard (@bpinsard)
Scientific Software - Peer-reviewed
- Python
Published by bpoldrack about 4 years ago
DataLad - 0.15.6
๐ Bug Fix
- BF: do not use BaseDownloader instance wide InterProcessLock - resolves stalling or errors during parallel installs #6507 (@yarikoptic)
- release workflow: add -vv to auto invocation (@yarikoptic)
- Fix version incorrectly incremented by release process in CHANGELOGs #6459 (@yarikoptic)
- BF(TST): add another condition to skip under http_proxy set #6459 (@yarikoptic)
Authors: 1
- Yaroslav Halchenko (@yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by github-actions[bot] about 4 years ago
DataLad - 0.15.5
๐ Enhancement
๐ Bug Fix
- Fix AnnexRepo.whereis key=True mode operation, and add batch mode support #6379 (@yarikoptic)
- DOC: run - adjust description for -i/-o to mention that it could be a directory #6416 (@yarikoptic)
- BF: ORA over HTTP tried to check archive #6355 (@bpoldrack @yarikoptic)
- BF: condition access to isatty to have stream eval to True #6360 (@yarikoptic)
- BF: python 3.10 compatibility fixes #6363 (@yarikoptic)
- Remove two(!) copies of a test #6374 (@mih)
- Warn just once about incomplete git config #6343 (@yarikoptic)
- Make version detection robust to GIT_DIR specification #6341 (@effigies @mih)
- BF(Q&D): do not crash - issue warning - if template fails to format #6319 (@yarikoptic)
Authors: 5
- Adina Wagner (@adswa)
- Benjamin Poldrack (@bpoldrack)
- Chris Markiewicz (@effigies)
- Michael Hanke (@mih)
- Yaroslav Halchenko (@yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by github-actions[bot] over 4 years ago
DataLad - 0.15.4
๐ Bug Fix
- BF: autorc - replace incorrect releaseTypes with "none" #6320 (@yarikoptic)
- Minor enhancement to CONTRIBUTING.md #6309 (@bpoldrack)
- UX: If a clean repo is dirty after a failed run, give clean-up hints #6112 (@adswa)
- Stop using distutils #6113 (@jwodder)
- BF: RIARemote - set UI backend to annex to make it interactive #6287 (@yarikoptic @bpoldrack)
- Fix invalid escape sequences #6293 (@jwodder)
- CI: Update environment for windows CI builds #6292 (@bpoldrack)
- bump the python version used for mac os tests #6288 (@christian-monch @bpoldrack)
- ENH(UX): log a hint to use ulimit command in case of "Too long" exception #6173 (@yarikoptic)
- Report correct HTTP URL for RIA store content #6091 (@mih)
- BF: Don't overwrite subdataset source candidates #6168 (@bpoldrack)
- Bump sphinx requirement to bypass readthedocs defaults #6189 (@mih)
- infra: Provide custom prefix to auto-related labels #6151 (@adswa)
- Remove all usage of exc_str() #6142 (@mih)
- BF: obtain information about annex special remotes also from annex journal #6135 (@yarikoptic @mih)
- BF: clone tried to save new subdataset despite failing to clone #6140 (@bpoldrack)
๐งช Tests
- RF+BF: use skipifno_module helper instead of try/except for libxmp and boto #6148 (@yarikoptic)
- git://github.com -> https://github.com #6134 (@mih)
Authors: 6
- Adina Wagner (@adswa)
- Benjamin Poldrack (@bpoldrack)
- Christian Mรถnch (@christian-monch)
- John T. Wodder II (@jwodder)
- Michael Hanke (@mih)
- Yaroslav Halchenko (@yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by github-actions[bot] over 4 years ago
DataLad - 0.15.3
๐ Bug Fix
- BF: Don't make create-sibling recursive by default #6116 (@adswa)
- BF: Add dashes to 'force' option in non-empty directory error message #6078 (@DisasterMo)
- DOC: Add supported URL types to download-url's docstring #6098 (@adswa)
- BF: Retain git-annex error messages & don't show them if operation successful #6070 (@DisasterMo)
- Remove uses of
__full_version__anddatalad.version#6073 (@jwodder) - BF: ORA shouldn't crash while handling a failure #6063 (@bpoldrack)
- DOC: Refine --reckless docstring on usage and wording #6043 (@adswa)
- BF: archives upon strip - use rmtree which retries etc instead of rmdir #6064 (@yarikoptic)
- BF: do not leave test in a tmp dir destined for removal #6059 (@yarikoptic)
- Next wave of exc_str() removals #6022 (@mih)
โ ๏ธ Pushed to maint
- CI: Enable new codecov uploader in Appveyor CI (@adswa)
๐ Internal
- UX: Log clone-candidate number and URLs #6092 (@adswa)
- UX/ENH: Disable reporting, and don't do superfluous internal subdatasets calls #6094 (@adswa)
- Update codecov action to v2 #6072 (@jwodder)
๐ Documentation
๐งช Tests
- BF(TST): remove reuse of the same tape across unrelated tests #6127 (@yarikoptic)
- Fail Travis tests on deprecation warnings #6074 (@jwodder)
- Ux get result handling broken #6052 (@christian-monch)
- enable metalad tests again #6060 (@christian-monch)
Authors: 7
- Adina Wagner (@adswa)
- Benjamin Poldrack (@bpoldrack)
- Christian Mรถnch (@christian-monch)
- John T. Wodder II (@jwodder)
- Michael Burgardt (@DisasterMo)
- Michael Hanke (@mih)
- Yaroslav Halchenko (@yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by github-actions[bot] over 4 years ago
DataLad - 0.15.2
๐ Bug Fix
- BF: Don't suppress datalad subdatasets output #6035 (@DisasterMo @mih)
- Honor datalad.runtime.use-patool if set regardless of OS (was Windows only) #6033 (@mih)
- Discontinue usage of deprecated (public) helper #6032 (@mih)
- BF: ProgressHandler - close the other handler if was specified #6020 (@yarikoptic)
- UX: Report GitLab weburl of freshly created projects in the result #6017 (@adswa)
- Ensure there's a blank line between the class
__doc__and "Parameters" inbuild_docdocstrings #6004 (@jwodder) - Large code-reorganization of everything runner-related #6008 (@mih)
- Discontinue exc_str() in all modern parts of the code base #6007 (@mih)
๐งช Tests
- TST: Add test to ensure functionality with subdatasets starting with a hyphen (-) #6042 (@DisasterMo)
- BF(TST): filter away warning from coverage from analysis of stderr of --help #6028 (@yarikoptic)
- BF: disable outdated SSL root certificate breaking chain on older/buggy clients #6027 (@yarikoptic)
- BF: start global testhttpserver only if not running already #6023 (@yarikoptic)
Authors: 5
- Adina Wagner (@adswa)
- John T. Wodder II (@jwodder)
- Michael Burgardt (@DisasterMo)
- Michael Hanke (@mih)
- Yaroslav Halchenko (@yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by github-actions[bot] over 4 years ago
DataLad - 0.15.1
๐ Bug Fix
- BF: downloader - fail to download even on non-crippled FS if symlink exists #5991 (@yarikoptic)
- ENH: import datalad.api to bind extensions methods for discovery of dataset methods #5999 (@yarikoptic)
- Restructure cmdline API presentation #5988 (@mih)
- Close file descriptors after process exit #5983 (@mih)
โ ๏ธ Pushed to maint
- Discontinue testing of hirni extension (@mih)
๐ Internal
๐ Documentation
๐งช Tests
- BF(TST): use sys.executable, mark testriabasics.testurlkeys as requiring network #5986 (@yarikoptic)
Authors: 3
- John T. Wodder II (@jwodder)
- Michael Hanke (@mih)
- Yaroslav Halchenko (@yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by github-actions[bot] over 4 years ago
DataLad - 0.15.0 (Tue Sep 14 2021) -- We miss you Kyle!
Enhancements and new features
Command execution is now performed by a new
Runnerimplementation that is no longer based on theasyncioframework, which was found to exhibit fragile performance in interaction with otherasyncio-using code, such as Jupyter notebooks. The new implementation is based on threads. It also supports the specification of "protocols" that were introduced with the switch to theasyncioimplementation in 0.14.0. ([#5667][])clonenow supports arbitrary URL transformations based on regular expressions. One or more transformation steps can be defined viadatalad.clone.url-substitute.<label>configuration settings. The feature can be (and is now) used to support convenience mappings, such ashttps://osf.io/q8xnk/(displayed in a browser window) toosf://q8xnk(clonable via thedatalad-osfextension. ([#5749][])Homogenize SSH use and configurability between DataLad and git-annex, by instructing git-annex to use DataLad's
sshrunfor SSH calls (instead of SSH directly). ([#5389][])The ORA special remote has received several new features:
- It now support a
push-urlsetting as an alternative tourlfor write access. An analog parameter was also added tocreate-sibling-ria. ([#5420][], [#5428][]) - Access of RIA stores now performs homogeneous availability checks, regardless of access protocol. Before, broken HTTP-based access due to misspecified URLs could have gone unnoticed. ([#5459][], [#5672][])
- Error reporting was introduce to inform about undesirable conditions in remote RIA stores. ([#5683][])
- It now support a
create-sibling-rianow supports--aliasfor the specification of a convenience dataset alias name in a RIA store. ([#5592][])Analog to
git commit,savenow features an--amendmode to support incremental updates of a dataset state. ([#5430][])runnow supports a dry-run mode that can be used to inspect the result of parameter expansion on the effective command to ease the composition of more complicated command lines. ([#5539][])runnow supports a--assume-readyswitch to avoid the (possibly expensive) preparation of inputs and outputs with large datasets that have already been readied through other means. ([#5431][])updatenow features--howand--how-subdsparameters to configure how an update shall be performed. Supported modes arefetch(unchanged default), andmerge(previously also possible via--merge), but also new strategies likeresetorcheckout. ([#5534][])updatehas a new--follow=parentds-lazymode that only performs a fetch operation in subdatasets when the desired commit is not yet present. During recursive updates involving many subdatasets this can substantially speed up performance. ([#5474][])DataLad's command line API can now report the version for individual commands via
datalad <cmd> --version. The output has been homogenized to<providing package> <version>. ([#5543][])create-siblingnow logs information on an auto-generated sibling name, in the case that no--name/-swas provided. ([#5550][])create-sibling-githubhas been updated to emit result records like any standard DataLad command. Previously it was implemented as a "plugin", which did not support all standard API parameters. ([#5551][])copy-filenow also works with content-less files in datasets on crippled filesystems (adjusted mode), when a recent enough git-annex (8.20210428 or later) is available. ([#5630][])addurlscan now be instructed how to behave in the event of file name collision via a new parameter--on-collision. ([#5675][])addurlsreporting now informs which particular subdatasets were created. ([#5689][])Credentials can now be provided or overwritten via all means supported by
ConfigManager. Importantly,datalad.credential.<name>.<field>configuration settings and analog specification via environment variables are now supported (rather than custom environment variables only). Previous specification methods are still supported too. ([#5680][])A new
datalad.credentials.force-askconfiguration flag can now be used to force re-entry of already known credentials. This simplifies credential updates without having to use an approach native to individual credential stores. ([#5777][])Suppression of rendering repeated similar results is now configurable via the configuration switches
datalad.ui.suppress-similar-results(bool), anddatalad.ui.suppress-similar-results-threshold(int). ([#5681][])The performance of
statusand similar functionality when determining local file availability has been improved. ([#5692][])pushnow renders a result summary on completion. ([#5696][])A dedicated info log message indicates when dataset repositories are subjected to an annex version upgrade. ([#5698][])
Error reporting improvements:
- The
NoDatasetFoundexception now provides information for which purpose a dataset is required. ([#5708][]) - Wording of the
MissingExternalDependenyerror was rephrased to account for cases of non-functional installations. ([#5803][]) pushreports when a--toparameter specification was (likely) forgotten. ([#5726][])- Detailed information is now given when DataLad fails to obtain a lock for credential entry in a timely fashion. Previously only a generic debug log message was emitted. ([#5884][])
- Clarified error message when
create-sibling-gitlabwas called without--project. ([#5907][])
- The
add-readmenow provides a README template with more information on the nature and use of DataLad datasets. A README file is no longer annex'ed by default, but can be using the new--annexswitch. ([#5723][], [#5725][])cleannow supports a--dry-runmode to inform about cleanable content. ([#5738][])A new configuration setting
datalad.locations.lockscan be used to control the placement of lock files. ([#5740][])wtfnow also reports branch names and states. ([#5804][])AnnexRepo.whereis()now supports batch mode. ([#5533][])
Deprecations and removals
The minimum supported git-annex version is now 8.20200309. ([#5512][])
ORA special remote configuration items
ssh-host, andbase-pathare deprecated. They are completely replaced byria+<protocol>://URL specifications. ([#5425][])The deprecated
no_annexparameter ofcreate()was removed from the Python API. ([#5441][])The unused
GitRepo.pull()method has been removed. ([#5558][])Residual support for "plugins" (a mechanism used before DataLad supported extensions) was removed. This includes the configuration switches
datalad.locations.{system,user}-plugins. ([#5554][], [#5564][])Several features and comments have been moved to the
datalad-deprecatedpackage. This package must now be installed to be able to use keep using this functionality.- The
publishcommand. Usepushinstead. ([#5837][]) - The
lscommand. ([#5569][]) - The web UI that is deployable via
datalad create-sibling --ui. ([#5555][]) - The "automagic IO" feature. ([#5577][])
- The
AnnexRepo.copy_to()has been deprecated. Thepushcommand should be used instead. ([#5560][])AnnexRepo.sync()has been deprecated.AnnexRepo.call_annex(['sync', ...])should be used instead. ([#5461][])All
GitRepo.*_submodule()methods have been deprecated and will be removed in a future release. ([#5559][])create-sibling-github's--dryrunswitch was deprecated, use--dry-runinstead. ([#5551][])The
datalad --pbs-runneroption has been deprecated, usecondor_run(or similar) instead. ([#5956][])
๐ Fixes
Prevent invalid declaration of a publication dependencies for 'origin' on any auto-detected ORA special remotes, when cloing from a RIA store. An ORA remote is now checked whether it actually points to the RIA store the clone was made from. ([#5415][])
The ORA special remote implementation has received several fixes:
- It can now handle HTTP redirects. ([#5792][])
- Prevents failure when URL-type annex keys contain the '/' character. ([#5823][])
- Properly support the specification of usernames, passwords and ports in
ria+<protocol>://URLs. ([#5902][])
It is now possible to specifically select the default (or generic) result renderer via
datalad -f defaultand with that override atailoredresult renderer that may be preconfigured for a particular command. ([#5476][])Starting with 0.14.0, original URLs given to
clonewere recorded in a subdataset record. This was initially done in a second commit, leading to inflation of commits and slowdown in superdatasets with many subdatasets. Such subdataset record annotation is now collapsed into a single commits. ([#5480][])runnow longer removes leading empty directories as part of the output preparation. This was surprising behavior for commands that do not ensure on their own that output directories exist. ([#5492][])A potentially existing
messageproperty is no longer removed when using thejsonorjson_ppresult renderer to avoid undesired withholding of relevant information. ([#5536][])subdatasetsnow reportsstate=present, rather thanstate=clean, for installed subdatasets to complementstate=absentreports for uninstalled dataset. ([#5655][])create-sibling-rianow executes commands with a consistent environment setup that matches all other command execution in other DataLad commands. ([#5682][])saveno longer saves unspecified subdatasets when called with an explicit path (list). The fix required a behavior change ofGitRepo.get_content_info()in its interpretation ofNonevs.[]path argument values that now aligns the behavior ofGitRepo.diff|status()with their respective documentation. ([#5693][])getnow prefers the location of a subdatasets that is recorded in a superdataset's.gitmodulesrecord. Previously, DataLad tried to obtain a subdataset from an assumed checkout of the superdataset's origin. This new default order is (re-)configurable via thedatalad.get.subdataset-source-candidate-<priority-label>configuration mechanism. ([#5760][])create-sibling-gitlabno longer skips the root dataset when.is given as a path. ([#5789][])siblingsnow rejects a value given to--as-common-datasrcthat clashes with the respective Git remote. ([#5805][])The usage synopsis reported by
siblingsnow lists all supported actions. ([#5913][])siblingsnow renders non-ok results to avoid silent failure. ([#5915][]).gitattributefile manipulations no longer leave the file without a trailing newline. ([#5847][])Prevent crash when trying to delete a non-existing keyring credential field. ([#5892][])
git-annex is no longer called with an unconditional
annex.retry=3configuration. Instead, this parameterization is now limited toannex getandannex copycalls. ([#5904][])
๐งช Tests
file://URLs are no longer the predominant test case forAnnexRepofunctionality. A built-in HTTP server now used in most cases. ([#5332][])
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic over 4 years ago
DataLad - 0.14.8
๐ Bug Fix
- BF: add-archive-content on .xz and other non-.gz stream compressed files #5930 (@yarikoptic)
- BF(UX): do not keep logging ERROR possibly present in progress records #5936 (@yarikoptic)
- Annotate datalad_core as not needing actual data -- just uses annex whereis #5971 (@yarikoptic)
- BF: limit CMDMAXARG if obnoxious value is encountered. #5945 (@yarikoptic)
- Download session/credentials locking -- inform user if locking is "failing" to be obtained, fail upon ~5min timeout #5884 (@yarikoptic)
- Render siblings()'s non-ok results with the default renderer #5915 (@mih)
- BF: do not crash, just skip whenever trying to delete non existing field in the underlying keyring #5892 (@yarikoptic)
- Fix argument-spec for
siblingsand improve usage synopsis #5913 (@mih) - Clarify error message re unspecified gitlab project #5907 (@mih)
- Support username, password and port specification in RIA URLs #5902 (@mih)
- BF: take path from SSHRI, test URLs not only on Windows #5881 (@yarikoptic)
- ENH(UX): warn user if keyring returned a "null" keyring #5875 (@yarikoptic)
- ENH(UX): state original purpose in NoDatasetFound exception + detail it for get #5708 (@yarikoptic)
โ ๏ธ Pushed to maint
- Merge branch 'bf-http-headers-agent' into maint (@yarikoptic)
- RF(BF?)+DOC: provide User-Agent to entire session headers + use those if provided (@yarikoptic)
๐ Internal
- Pass
--no-changelogtoauto shipitif changelog already has entry #5952 (@jwodder) - Add isort config to match current convention + run isort via pre-commit (if configured) #5923 (@jwodder)
- .travis.yml: use python -m {nose,coverage} invocations, and always show combined report #5888 (@yarikoptic)
- Add project URLs into the package metadata for convenience links on Pypi #5866 (@adswa @yarikoptic)
๐งช Tests
- BF: do use OBSCURE_FILENAME instead of hardcoded unicode #5944 (@yarikoptic)
- BF(TST): Skip testing for having PID listed if no psutil #5920 (@yarikoptic)
- BF(TST): Boost version of git-annex to 8.20201129 to test an error message #5894 (@yarikoptic)
Authors: 4
- Adina Wagner (@adswa)
- John T. Wodder II (@jwodder)
- Michael Hanke (@mih)
- Yaroslav Halchenko (@yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic over 4 years ago
DataLad - 0.14.7
๐ Bug Fix
- UX: When two or more clone URL templates are found, error out more gracefully #5839 (@adswa)
- BF: http_auth - follow redirect (just 1) to re-authenticate after initial attempt #5852 (@yarikoptic)
- addurls Formatter - provide value repr in exception #5850 (@yarikoptic)
- ENH: allow for "patch" level semver for "master" branch #5839 (@yarikoptic)
- BF: Report info from annex JSON error message in CommandError #5809 (@mih)
- RF(TST): do not test for no EASY and pkg_resources in shims #5817 (@yarikoptic)
- http downloaders: Provide custom informative User-Agent, do not claim to be "Authenticated access" #5802 (@yarikoptic)
- ENH(UX,DX): inform user with a warning if version is 0+unknown #5787 (@yarikoptic)
- shell-completion: add argcomplete to 'misc' extra_depends, log an ERROR if argcomplete fails to import #5781 (@yarikoptic)
- ENH (UX): add python-gitlab dependency #5776 (s.heunis@fz-juelich.de)
๐ Internal
- BF: Fix reported paths in ORA remote #5821 (@adswa)
- BF: import importlib.metadata not importlib_metadata whenever available #5818 (@yarikoptic)
๐งช Tests
- TST: set --allow-unrelated-histories in the mkpushtarget setup for Windows #5855 (@adswa)
- Tests: Allow for version to contain + as a separator and provide more information for version related comparisons #5786 (@yarikoptic)
Authors: 4
- Adina Wagner (@adswa)
- Michael Hanke (@mih)
- Stephan Heunis (@jsheunis)
- Yaroslav Halchenko (@yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by github-actions[bot] almost 5 years ago
DataLad - 0.14.6
๐ Internal
- BF: update changelog conversion from .md to .rst (for sphinx) #5757 (@yarikoptic @jwodder)
Authors: 2
- John T. Wodder II (@jwodder)
- Yaroslav Halchenko (@yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by github-actions[bot] almost 5 years ago
DataLad - 0.14.5
๐ Bug Fix
- BF(TST): parallel - take longer for producer to produce #5747 (@yarikoptic)
- add --on-failure default value and document it #5690 (@christian-monch @yarikoptic)
- ENH: harmonize "purpose" statements to imperative form #5733 (@yarikoptic)
- ENH(TST): populate heavy tree with 100 unique keys (not just 1) among 10,000 #5734 (@yarikoptic)
- BF: do not use .acquired - just get state from acquire() #5718 (@yarikoptic)
- BF: account for annex now "scanning for annexed" instead of "unlocked" files #5705 (@yarikoptic)
- interface: Don't repeat custom summary for non-generator results #5688 (@kyleam)
- RF: just pip install datalad-installer #5676 (@yarikoptic)
- DOC: addurls.extract: Drop mention of removed 'stream' parameter #5690 (@kyleam)
- Merge pull request #5674 from kyleam/test-addurls-copy-fix #5674 (@kyleam)
- Merge pull request #5663 from kyleam/status-ds-equal-path #5663 (@kyleam)
- Merge pull request #5671 from kyleam/update-fetch-fail #5671 (@kyleam)
- BF: update: Honor --on-failure if fetch fails #5671 (@kyleam)
- RF: update: Avoid fetch's deprecated kwargs #5671 (@kyleam)
- CLN: update: Drop an unused import #5671 (@kyleam)
- Merge pull request #5664 from kyleam/addurls-better-url-parts-error #5664 (@kyleam)
- Merge pull request #5661 from kyleam/sphinx-fix-plugin-refs #5661 (@kyleam)
- BF: status: Provide special treatment of "this dataset" path #5663 (@kyleam)
- BF: addurls: Provide better placeholder error for special keys #5664 (@kyleam)
- RF: addurls: Simply construction of placeholder exception message #5664 (@kyleam)
- RF: addurls.getplaceholder_exception: Rename a parameter #5664 (@kyleam)
- RF: status: Avoid repeated Dataset.path access #5663 (@kyleam)
- DOC: Reference plugins via datalad.api #5661 (@kyleam)
- download-url: Set up datalad special remote if needed #5648 (@kyleam @yarikoptic)
โ ๏ธ Pushed to maint
- MNT: Post-release dance (@kyleam)
๐ Internal
- Switch to versioneer and auto #5669 (@jwodder @yarikoptic)
- MNT: setup.py: Temporarily avoid Sphinx 4 #5649 (@kyleam)
๐งช Tests
- BF(TST): skip testing for showing "Scanning for ..." since not shown if too quick #5727 (@yarikoptic)
- Revert "TST: testpartialunlocked: Document and avoid recent git-annex failure" #5651 (@kyleam)
Authors: 4
- Christian Mรถnch (@christian-monch)
- John T. Wodder II (@jwodder)
- Kyle Meyer (@kyleam)
- Yaroslav Halchenko (@yarikoptic)
Scientific Software - Peer-reviewed
- Python
Published by github-actions[bot] almost 5 years ago
DataLad - 0.14.4 (May 10, 2021)
Fixes
Following an internal call to
git-clone, clone assumed that the remote name was "origin", but this may not be the case ifclone.defaultRemoteNameis configured (available as of Git 2.30). #5572Several test fixes, including updates for changes in git-annex. #5612 #5632 #5639
Scientific Software - Peer-reviewed
- Python
Published by kyleam about 5 years ago
DataLad - 0.14.3 (April 28, 2021)
Fixes
For outputs that include a glob, run didn't re-glob after executing the command, which is necessary to catch changes if
--explicitor--expand={outputs,both}is specified. #5594run now gives an error result rather than a warning when an input glob doesn't match. #5594
The procedure for creating a RIA store checks for an existing ria-layout-version file and makes sure its version matches the desired version. This check wasn't done correctly for SSH hosts. #5607
A helper for transforming git-annex JSON records into DataLad results didn't account for the unusual case where the git-annex record doesn't have a "file" key. #5580
The test suite required updates for recent changes in PyGithub and git-annex. #5603 #5609
Enhancements and new features
- The DataLad source repository has long had a tools/cmdline-completion helper. This functionality is now exposed as a command,
datalad shell-completion. #5544
Scientific Software - Peer-reviewed
- Python
Published by kyleam about 5 years ago
DataLad - 0.14.2 (April 14, 2021)
Fixes
push now works bottom-up, pushing submodules first so that hooks on the remote can aggregate updated subdataset information. #5416
run-procedure didn't ensure that the configuration of subdatasets was reloaded. #5552
Scientific Software - Peer-reviewed
- Python
Published by kyleam about 5 years ago
DataLad - 0.14.1 (April 01, 2021)
Fixes
The recent default branch changes on GitHub's side can lead to "git-annex" being selected over "master" as the default branch on GitHub when setting up a sibling with create-sibling-github. To work around this, the current branch is now pushed first. #5010
The logic for reading in a JSON line from git-annex failed if the response exceeded the buffer size (256 KB on *nix systems).
Calling unlock with a path of "." from within an untracked subdataset incorrectly aborted, complaining that the "dataset containing given paths is not underneath the reference dataset". #5458
clone didn't account for the possibility of multiple accessible ORA remotes or the fact that none of them may be associated with the RIA store being cloned. #5488
create-sibling-ria didn't call
git update-server-infoafter setting up the remote repository and, as a result, the repository couldn't be fetched until something else (e.g., a push) triggered a call togit update-server-info. #5531The parser for git-config output didn't properly handle multi-line values and got thrown off by unexpected and unrelated lines. #5509
The 0.14 release introduced regressions in the handling of progress bars for git-annex actions, including collapsing progress bars for concurrent operations. #5421 #5438
save failed if the user configured Git's
diff.ignoreSubmodulesto a non-default value. #5453A interprocess lock is now used to prevent a race between checking for an SSH socket's existence and creating it. #5466
If a Python procedure script is executable, run-procedure invokes it directly rather than passing it to
sys.executable. The non-executable Python procedures that ship with DataLad now include shebangs so that invoking them has a chance of working on file systems that present all files as executable. #5436DataLad's wrapper around
argparsefailed if an underscore was used in a positional argument. #5525
Enhancements and new features
- DataLad's method for mapping environment variables to configuration options (e.g.,
DATALAD_FOO_X__Ytodatalad.foo.x-y) doesn't work if the subsection name ("FOO") has an underscore. This limitation can be sidestepped with the newDATALAD_CONFIG_OVERRIDES_JSONenvironment variable, which can be set to a JSON record of configuration values. #5505
Scientific Software - Peer-reviewed
- Python
Published by kyleam about 5 years ago
DataLad - 0.14.0 (February 02, 2021)
Major refactoring and deprecations
Git versions below v2.19.1 are no longer supported. #4650
The minimum git-annex version is still 7.20190503, but, if you're on Windows (or use adjusted branches in general), please upgrade to at least 8.20200330 but ideally 8.20210127 to get subdataset-related fixes. #4292 #5290
The minimum supported version of Python is now 3.6. #4879
publish is now deprecated in favor of push. It will be removed in the 0.15.0 release at the earliest.
A new command runner was added in v0.13. Functionality related to the old runner has now been removed:
Runner,GitRunner, andrun_gitcommand_on_file_list_chunksfrom thedatalad.cmdmodule along with thedatalad.tests.protocolremote,datalad.cmd.protocol, anddatalad.cmd.protocol.prefixconfiguration options. #5229The
--no-storage-siblingswitch ofcreate-sibling-riais deprecated in favor of--storage-sibling=offand will be removed in a later release. #5090The
get_git_dirstatic method ofGitRepois deprecated and will be removed in a later release. Use thedot_gitattribute of an instance instead. #4597The
ProcessAnnexProgressIndicatorshelper fromdatalad.support.annexrepohas been removed. #5259The
saveargument of install, a noop since v0.6.0, has been dropped. #5278The
get_URLSmethod ofAnnexCustomRemoteis deprecated and will be removed in a later release. #4955ConfigManager.getnow returns a single value rather than a tuple when there are multiple values for the same key, as very few callers correctly accounted for the possibility of a tuple return value. Callers can restore the old behavior by passingget_all=True. #4924In 0.12.0, all of the
assure_*functions indatalad.utilswere renamed asensure_*, keeping the old names around as compatibility aliases. Theassure_*variants are now marked as deprecated and will be removed in a later release. #4908The
datalad.inteface.runmodule, which was deprecated in 0.12.0 and kept as a compatibility shim fordatalad.core.local.run, has been removed. #4583The
saverargument ofdatalad.core.local.run.run_command, marked as obsolete in 0.12.0, has been removed. #4583The
dataset_onlyargument of theConfigManagerclass was deprecated in 0.12 and has now been removed. #4828The
linux_distribution_name,linux_distribution_release, andon_debian_wheezyattributes indatalad.utilsare no longer set at import time and will be removed in a later release. Usedatalad.utils.get_linux_distributioninstead. #4696datalad.distribution.clone, which was marked as obsolete in v0.12 in favor ofdatalad.core.distributed.clone, has been removed. #4904datalad.support.annexrepo.N_AUTO_JOBS, announced as deprecated in v0.12.6, has been removed. #4904The
compatparameter ofGitRepo.get_submodules, added in v0.12 as a temporary compatibility layer, has been removed. #4904The long-deprecated (and non-functional)
urlparameter ofGitRepo.__init__has been removed. #5342
Fixes
Cloning onto a system that enters adjusted branches by default (as Windows does) did not properly record the clone URL. #5128
The RIA-specific handling after calling clone was correctly triggered by
ria+httpURLs but notria+httpsURLs. #4977If the registered commit wasn't found when cloning a subdataset, the failed attempt was left around. #5391
The remote calls to
cpandchmodin create-sibling were not portable and failed on macOS. #5108A more reliable check is now done to decide if configuration files need to be reloaded. #5276
The internal command runner's handling of the event loop has been improved to play nicer with outside applications and scripts that use asyncio. #5350 #5367
Enhancements and new features
The subdataset handling for adjusted branches, which is particularly important on Windows where git-annex enters an adjusted branch by default, has been improved. A core piece of the new approach is registering the commit of the primary branch, not its checked out adjusted branch, in the superdataset. Note: This means that
git statuswill always consider a subdataset on an adjusted branch as dirty whiledatalad statuswill look more closely and see if the tip of the primary branch matches the registered commit. #5241The performance of the subdatasets command has been improved, with substantial speedups for recursive processing of many subdatasets. #4868 #5076
Adding new subdatasets via save has been sped up. #4793
get, save, and addurls gained support for parallel operations that can be enabled via the
--jobscommand-line option or the newdatalad.runtime.max-jobsconfiguration option. #5022-
- learned how to read data from standard input. #4669
- now supports tab-separated input. #4845
- now lets Python callers pass in a list of records rather than a file name. #5285
- gained a
--drop-afterswitch that signals to drop a file's content after downloading and adding it to the annex. #5081 - is now able to construct a tree of files from known checksums without downloading content via its new
--keyoption. #5184 - records the URL file in the commit message as provided by the caller rather than using the resolved absolute path. #5091
- is now speedier. #4867 #5022
create-sibling-github learned how to create private repositories (thanks to Nolan Nichols). #4769
create-sibling-ria gained a
--storage-siblingoption. When--storage-sibling=onlyis specified, the storage sibling is created without an accompanying Git sibling. This enables using hosts without Git installed for storage. #5090The download machinery (and thus the
dataladspecial remote) gained support for a new scheme,shub://, which follows the same format used bysingularity runand friends. In contrast to the short-lived URLs obtained by querying Singularity Hub directly,shub://URLs are suitable for registering with git-annex. #4816A provider is now included for https://registry-1.docker.io URLs. This is useful for storing an image's blobs in a dataset and registering the URLs with git-annex. #5129
The
add-readmecommand now links to the DataLad handbook rather than http://docs.datalad.org. #4991New option
datalad.locations.extra-proceduresspecifies an additional location that should be searched for procedures. #5156The class for handling configuration values,
ConfigManager, now takes a lock before writes to allow for multiple processes to modify the configuration of a dataset. #4829clone now records the original, unresolved URL for a subdataset under
submodule.<name>.datalad-urlin the parent's .gitmodules, enabling later get calls to use the original URL. This is particularly useful forria+URLs. #5346Installing a subdataset now uses custom handling rather than calling
git submodule update --init. This avoids some locking issues when running get in parallel and enables more accurate source URLs to be recorded. #4853GitRepo.get_content_info, a helper that gets triggered by many commands, got faster by tweaking itsgit ls-filescall. #5067wtf now includes credentials-related information (e.g. active backends) in the its output. #4982
The
call_git*methods ofGitReponow have aread_onlyparameter. Callers can set this toTrueto promise that the provided command does not write to the repository, bypassing the cost of some checks and locking. #5070New
call_annex*methods in theAnnexRepoclass provide an interface for running git-annex commands similar to that of theGitRepo.call_git*methods. #5163It's now possible to register a custom metadata indexer that is discovered by search and used to generate an index. #4963
The
ConfigManagermethodsget,getbool,getfloat, andgetintnow return a single value (with same precedence asgit config --get) when there are multiple values for the same key (in the non-committed git configuration, if the key is present there, or in the dataset configuration). Forget, the old behavior can be restored by specifyingget_all=True. #4924Command-line scripts are now defined via the
entry_pointsargument ofsetuptools.setupinstead of thescriptsargument. #4695Interactive use of
--helpon the command-line now invokes a pager on more systems and installation setups. #5344The
dataladspecial remote now tries to eliminate some unnecessary interactions with git-annex by being smarter about how it queries for URLs associated with a key. #4955The
GitRepoclass now does a better job of handling bare repositories, a step towards bare repositories support in DataLad. #4911More internal work to move the code base over to the new command runner. #4699 #4855 #4900 #4996 #5002 #5141 #5142 #5229
Scientific Software - Peer-reviewed
- Python
Published by kyleam over 5 years ago
DataLad - First release candidate for 0.14.0 (January 26, 2021)
Major refactoring and deprecations
Git versions below v2.19.1 are no longer supported. #4650
The minimum supported version of Python is now 3.6. #4879
publish is now deprecated in favor of push. It will be removed in the 0.15.0 release at the earliest.
A new command runner was added in v0.13. Functionality related to the old runner has now been removed:
Runner,GitRunner, andrun_gitcommand_on_file_list_chunksfrom thedatalad.cmdmodule along with thedatalad.tests.protocolremote,datalad.cmd.protocol, anddatalad.cmd.protocol.prefixconfiguration options. #5229The
--no-storage-siblingswitch ofcreate-sibling-riais deprecated in favor of--storage-sibling=offand will be removed in a later release. #5090The
get_git_dirstatic method ofGitRepois deprecated and will be removed in a later release. Use thedot_gitattribute of an instance instead. #4597The
ProcessAnnexProgressIndicatorshelper fromdatalad.support.annexrepohas been removed. #5259The
saveargument of install, a noop since v0.6.0, has been dropped. #5278The
get_URLSmethod ofAnnexCustomRemoteis deprecated and will be removed in a later release. #4955ConfigManager.getnow returns a single value rather than a tuple when there are multiple values for the same key, as very few callers correctly accounted for the possibility of a tuple return value. Callers can restore the old behavior by passingget_all=True. #4924In 0.12.0, all of the
assure_*functions indatalad.utilswere renamed asensure_*, keeping the old names around as compatibility aliases. Theassure_*variants are now marked as deprecated and will be removed in a later release. #4908The
datalad.inteface.runmodule, which was deprecated in 0.12.0 and kept as a compatibility shim fordatalad.core.local.run, has been removed. #4583The
saverargument ofdatalad.core.local.run.run_command, marked as obsolete in 0.12.0, has been removed. #4583The
dataset_onlyargument of theConfigManagerclass was deprecated in 0.12 and has now been removed. #4828The
linux_distribution_name,linux_distribution_release, andon_debian_wheezyattributes indatalad.utilsare no longer set at import time and will be removed in a later release. Usedatalad.utils.get_linux_distributioninstead. #4696datalad.distribution.clone, which was marked as obsolete in v0.12 in favor ofdatalad.core.distributed.clone, has been removed. #4904datalad.support.annexrepo.N_AUTO_JOBS, announced as deprecated in v0.12.6, has been removed. #4904The
compatparameter ofGitRepo.get_submodules, added in v0.12 as a temporary compatibility layer, has been removed. #4904The long-deprecated (and non-functional)
urlparameter ofGitRepo.__init__has been removed. #5342
Fixes
Cloning onto a system that enters adjusted branches by default (as Windows does) did not properly record the clone URL. #5128
The RIA-specific handling after calling clone was correctly triggered by
ria+httpURLs but notria+httpsURLs. #4977The remote calls to
cpandchmodin create-sibling were not portable and failed on macOS. #5108A more reliable check is now done to decide if the configuration files need to be reloaded. #5276
The internal command runner's handling of the event loop has been improved to play nicer with outside applications and scripts that use asyncio. #5350 #5367
Enhancements and new features
The subdataset handling for adjusted branches, which is particularly important on Windows where git-annex enters an adjusted branch by default, has been improved. A core piece of the new approach is registering the commit of the primary branch, not its checked out adjusted branch, in the superdataset. Note: This means that
git statuswill always considered a subdataset on an adjusted branch as dirty whiledatalad statuswill look more closely and see if the tip of the primary branch matches the registered commit. #5241create-sibling-github learned how to create private repositories (thanks to Nolan Nichols). #4769
create-sibling-ria gained a
--storage-siblingoption. When--storage-sibling=onlyis specified, the storage sibling is created without an accompanying Git sibling. This enables using hosts without Git installed for storage. #5090get, save, and addurls gained support for parallel operations that can be enabled via the
--jobscommand-line option or the newdatalad.runtime.max-jobsconfiguration option. #5022The download machinery (and thus the
dataladspecial remote) gained support for a new scheme,shub://, which follows the same format used bysingularity runand friends. In contrast to the short-lived URLs obtained by querying Singularity Hub directly,shub://URLs are suitable for registering with git-annex. #4816A provider is now included for https://registry-1.docker.io URLs. This is useful for storing an image's blobs in a dataset and registering the URLs with git-annex. #5129
-
- learned how to read data from standard input. #4669
- now supports tab-separated input. #4845
- now lets Python callers pass in a list of records rather than a file name. #5285
- gained a
--drop-afterswitch that signals to drop a file's content after downloading and adding it to the annex. #5081 - is now able to construct a tree of files from known checksums without downloading content via its new
--keyoption. #5184 - records the URL file in the commit message as provided by the caller it rather than using the resolved absolute path. #5091
- is now speedier. #4867 #5022
The
add-readmecommand now links to the DataLad handbook rather than http://docs.datalad.org. #4991DataLad now ships with a module that is capable of installing git-annex via various methods. See
python -m datalad.install -h. #5098 #5139New option
datalad.locations.extra-proceduresspecifies an additional location that should be searched for procedures. #5156The class for handling configuration values,
ConfigManager, now takes a lock before writes to allow for multiple processes to modify the configuration of a dataset. #4829clone now records the original, unresolved URL for a subdataset under
submodule.<name>.datalad-urlin the parent's .gitmodules, enabling later get calls to use the original URL. This is particularly useful forria+URLs. #5346Installing a subdataset now uses custom handling rather than calling
git submodule update --init. This avoids some locking issues when running get in parallel and enables more accurate source URLs to be recorded. #4853The performance of the subdatasets command has been improved, with substantial speedups for recursive processing of many subdatasets. #4868 #5076
Adding new subdatasets via save has been sped up. #4793
GitRepo.get_content_info, a helper that gets triggered by many commands, got faster by tweaking itsgit ls-filescall. #5067wtf now includes credentials-related information (e.g. active backends) in the its output. #4982
The
call_git*methods ofGitReponow have aread_onlyparameter. Callers can set this toTrueto promise that the provided command does not write to the repository, bypassing the cost of some checks and locking. #5070New
call_annex*methods in theAnnexRepoclass provide an interface for running git-annex commands similar to that of theGitRepo.call_git*methods. #5163It's now possible to register a custom metadata indexer that is discovered by search and used it to generate an index. #4963
The
ConfigManagermethodsget,getbool,getfloat, andgetintnow return a single value (with same precedence asgit config --get) when there are multiple values for the same key (in the non-committed git configuration, if the key is present there, or in the dataset configuration). Forget, the old behavior can be restored by specifyingget_all=True. #4924Command-line scripts are now defined via the
entry_pointsargument ofsetuptools.setupinstead of thescriptsargument. #4695Interactive use of
--helpon the command-line now invokes a pager on more systems and installation setups. #5344The
dataladspecial remote now tries to eliminate some unnecessary interactions with git-annex by being smarter about how it queries for URLs associated with a key. #4955The
GitRepoclass now does a better job of handling bare repositories, a step towards bare repositories support in DataLad. #4911More internal work to move the code base over to the new command runner. #4699 #4855 #4900 #4996 #5002 #5141 #5142 #5229
Scientific Software - Peer-reviewed
- Python
Published by kyleam over 5 years ago
DataLad - 0.13.7 (January 04, 2021)
Fixes
Cloning from a RIA store on the local file system initialized annex in the Git sibling of the RIA source, which is problematic because all annex-related functionality should go through the storage sibling. clone now sets
remote.origin.annex-ignoretotrueafter cloning from RIA stores to prevent this. #5255create-sibling invoked
cpin a way that was not compatible with macOS. #5269Due to a bug in older Git versions (before 2.25), calling status with a file under .git/ (e.g.,
datalad status .git/config) incorrectly reported the file as untracked. A workaround has been added. #5258Update tests for compatibility with latest git-annex. #5254
Enhancements and new features
- copy-file now aborts if .git/ is in the target directory, adding to its existing .git/ safety checks. #5258
Scientific Software - Peer-reviewed
- Python
Published by kyleam over 5 years ago
DataLad - 0.13.6 (December 14, 2020)
Fixes
An assortment of fixes for Windows compatibility. #5113 #5119 #5125 #5127 #5136 #5201 #5200 #5214
Adding a subdataset on a system that defaults to using an adjusted branch (i.e. doesn't support symlinks) didn't properly set up the submodule URL if the source dataset was not in an adjusted state. #5127
push failed to push to a remote that did not have an
annex-uuidvalue in the local.git/config. #5148The default renderer has been improved to avoid a spurious leading space, which led to the displayed path being incorrect in some cases. #5121
siblings showed an uninformative error message when asked to configure an unknown remote. #5146
drop confusingly relayed a suggestion from
git annex dropto use--force, an option that does not exist indatalad drop.5194
[create-sibling-github][] no longer offers user/password authentication because it is no longer supported by GitHub.
5218
The internal command runner's handling of the event loop has been tweaked to hopefully fix issues with runnning DataLad from IPython. #5106
SSH cleanup wasn't reliably triggered by the ORA special remote on failure, leading to a stall with a particular version of git-annex, 8.20201103. (This is also resolved on git-annex's end as of 8.20201127.) #5151
Enhancements and new features
The credential helper no longer asks the user to repeat tokens or AWS keys. #5219
The new option
datalad.locations.socketscontrols where Datalad stores SSH sockets, allowing users to more easily work around file system and path length restrictions. #5238
Scientific Software - Peer-reviewed
- Python
Published by kyleam over 5 years ago
DataLad - 0.13.5 (October 30, 2020)
Fixes
SSH connection handling has been reworked to fix cloning on Windows. A new configuration option,
datalad.ssh.multiplex-connections, defaults to false on Windows. #5042The ORA special remote and post-clone RIA configuration now provide authentication via DataLad's credential mechanism and better handling of HTTP status codes. #5025 #5026
By default, if a git executable is present in the same location as git-annex, DataLad modifies
PATHwhen running git and git-annex so that the bundled git is used. This logic has been tightened to avoid unnecessarily adjusting the path, reducing the cases where the adjustment interferes with the local environment, such as special remotes in a virtual environment being masked by the system-wide variants. #5035git-annex is now consistently invoked as "git annex" rather than "git-annex" to work around failures on Windows. #5001
push called
git annex sync ...on plain git repositories. #5051save in genernal doesn't support registering multiple levels of untracked subdatasets, but it can now properly register nested subdatasets when all of the subdataset paths are passed explicitly (e.g.,
datalad save -d. sub-a sub-a/sub-b). #5049When called with
--sidecarand--explicit, run didn't save the sidecar. #5017A couple of spots didn't properly quote format fields when combining substrings into a format string. #4957
The default credentials configured for
indi-s3prevented anonymous access. #5045
Enhancements and new features
Messages about suppressed similar results are now rate limited to improve performance when there are many similar results coming through quickly. #5060
create-sibling-github can now be told to replace an existing sibling by passing
--existing=replace. #5008Progress bars now react to changes in the terminal's width (requires tqdm 2.1 or later). #5057
Scientific Software - Peer-reviewed
- Python
Published by kyleam over 5 years ago
DataLad - 0.13.4 (October 6, 2020)
Fixes
Ephemeral clones mishandled bare repositories. #4899
The post-clone logic for configuring RIA stores didn't consider
https://URLs. #4977DataLad custom remotes didn't escape newlines in messages sent to git-annex. #4926
The datalad-archives special remote incorrectly treated file names as percent-encoded. #4953
The result handler didn't properly escape "%" when constructing its message template. #4953
In v0.13.0, the tailored rendering for specific subtypes of external command failures (e.g., "out of space" or "remote not available") was unintentionally switched to the default rendering. #4966
Various fixes and updates for the NDA authenticator. #4824
The helper for getting a versioned S3 URL did not support anonymous access or buckets with "." in their name. #4985
Several issues with the handling of S3 credentials and token expiration have been addressed. #4927 #4931 #4952
Enhancements and new features
A warning is now given if the detected Git is below v2.13.0 to let users that run into problems know that their Git version is likely the culprit. #4866
A fix to push in v0.13.2 introduced a regression that surfaces when
push.defaultis configured to "matching" and prevents the git-annex branch from being pushed. Note that, as part of the fix, the current branch is now always pushed even when it wouldn't be based on the configured refspec orpush.defaultvalue. #4896-
- now allows spelling the empty string value of
--since=as^for consistency with push. #4683 - compares a revision given to
--since=withHEADrather than the working tree to speed up the operation. #4448
- now allows spelling the empty string value of
rerun emits more INFO-level log messages. #4764
The archives are handled with p7zip, if available, since DataLad v0.12.0. This implementation now supports .tgz and .tbz2 archives. #4877
Scientific Software - Peer-reviewed
- Python
Published by kyleam over 5 years ago
DataLad - 0.13.3 (August 28, 2020)
Fixes
Work around a Python bug that led to our asyncio-based command runner intermittently failing to capture the output of commands that exit very quickly. #4835
push displayed an overestimate of the transfer size when multiple files pointed to the same key. #4821
When download-url calls
git annex addurl, it catches and reports any failures rather than crashing. A change in v0.12.0 broke this handling in a particular case. #4817
Enhancements and new features
- The wrapper functions returned by decorators are now given more meaningful names to hopefully make tracebacks easier to digest. #4834
Scientific Software - Peer-reviewed
- Python
Published by kyleam over 5 years ago
DataLad - 0.13.2 (August 10, 2020)
Deprecations
- The
allow_quickparameter ofAnnexRepo.file_has_contentandAnnexRepo.is_under_annexis now ignored and will be removed in a later release. This parameter was only relevant for git-annex versions before 7.20190912. #4736
Fixes
Updates for compatibility with recent git and git-annex releases. #4746 #4760 #4684
push didn't sync the git-annex branch when
--data=nothingwas specified. #4786The
datalad.clone.recklessconfiguration wasn't stored in non-annex datasets, preventing the values from being inherited by annex subdatasets. #4749Running the post-update hook installed by
create-sibling --uicould overwrite web log files from previous runs in the unlikely event that the hook was executed multiple times in the same second. #4745clone inspected git's standard error in a way that could cause an attribute error. #4775
When cloning a repository whose
HEADpoints to a branch without commits, clone tries to find a more useful branch to check out. It unwisely considered adjusted branches. #4792Since v0.12.0,
SSHManager.closehasn't closed connections when thectrl_pathargument was explicitly given. #4757When working in a dataset in which
git annex inithad not yet been called, thefile_has_contentandis_under_annexmethods ofAnnexRepoincorrectly took the "allow quick" code path on file systems that did not support it #4736
Enhancements
create now assigns version 4 (random) UUIDs instead of version 1 UUIDs that encode the time and hardware address. #4790
The documentation for create now does a better job of describing the interaction between
--datasetandPATH. #4763The
format_commitandget_hexshamethods ofGitRepohave been sped up. #4807 #4806A better error message is now shown when the
^or^.shortcuts for--datasetdo not resolve to a dataset. #4759A more helpful error message is now shown if a caller tries to download an
ftp://link but does not haverequest_ftpinstalled. #4788clone now tries harder to get up-to-date availability information after auto-enabling
type=gitspecial remotes. #2897
Scientific Software - Peer-reviewed
- Python
Published by kyleam almost 6 years ago
DataLad - 0.13.1 (July 17, 2020)
Fixes
Cloning a subdataset should inherit the parent's
datalad.clone.recklessvalue, but that did not happen when cloning viadatalad getrather thandatalad installordatalad clone. #4657The default result renderer crashed when the result did not have a
pathkey. #4666 #4673datalad pushdidn't show information aboutgit pusherrors when the output was not in the format that it expected. #4674datalad pushsilently accepted an empty string for--sinceeven though it is an invalid value. #4682Our JavaScript testing setup on Travis grew stale and has now been updated. (Thanks to Xiao Gui.) #4687
The new class for running Git commands (added in v0.13.0) ignored any changes to the process environment that occurred after instantiation. #4703
Enhancements and new features
datalad pushnow avoids unnecessarygit pushdry runs and pushes all refspecs with a singlegit pushcall rather than invokinggit pushfor each one. #4692 #4675The readability of SSH error messages has been improved. #4729
datalad.support.annexrepoavoids callingdatalad.utils.get_linux_distributionat import time and caches the result once it is called because, as of Python 3.8, the function usesdistrounderneath, adding noticeable overhead. #4696
Third-party code should be updated to use get_linux_distribution directly in the unlikely event that the code relied on the import-time call to get_linux_distribution setting the linux_distribution_name, linux_distribution_release, or on_debian_wheezy attributes in datalad.utils.
Scientific Software - Peer-reviewed
- Python
Published by kyleam almost 6 years ago
DataLad - 0.13.0 (June 23, 2020)
A handful of new commands, including copy-file, push, and create-sibling-ria, along with various fixes and enhancements
Changes since rc2
- git-annex-remote-ora has been updated for compatibility with annexremote v1.4.2. #4573 - A progress bar fix from rc2 led to unintended messages when not attached to a tty. #4575 - `publish` is no longer marked as deprecated. #4578 - `push` #4620 - `--force` no longer takes "no-datatransfer" as a value. There is instead a `--data` option that takes the values "anything", "nothing", "auto", "auto-if-wanted". "auto-if-wanted" (the default) results in `--auto` being added to `git annex copy` calls if the sibling was configured to prefer content via `git annex wanted`. - The "pushall" and "datatransfer" values of `--force` have been renamed to "all" and "checkdatapresent", respectively. - The `--since=` option of `push` now takes '^', not an empty string, to mean "the last known state of the matching branch on the sibling". #4617 - `datalad.get.subdataset-source-candidate-NAME` can now include a cost value by appending three digits to `NAME`. #4619Major refactoring and deprecations
The
no_annexparameter of create, which is exposed in the Python API but not the command line, is deprecated and will be removed in a later release. Use the newannexargument instead, flipping the value. Command-line callers that use--no-annexare unaffected. #4321datalad add, which was deprecated in 0.12.0, has been removed. #4158 #4319The following
GitRepoandAnnexRepomethods have been removed:get_changed_files,get_missing_files, andget_deleted_files. #4169 #4158The
get_branch_commitsmethod ofGitRepoandAnnexRepohas been renamed toget_branch_commits_. #3834The custom
commitmethod ofAnnexRepohas been removed, andAnnexRepo.commitnow resolves to the parent method,GitRepo.commit. #4168GitPython's
git.repo.base.Repoclass is no longer available via the.repoattribute ofGitRepoandAnnexRepo. #4172AnnexRepo.get_corresponding_branchnow returnsNonerather than the current branch name when a managed branch is not checked out. #4274The special UUID for git-annex web remotes is now available as
datalad.consts.WEB_SPECIAL_REMOTE_UUID. It remains accessible asAnnexRepo.WEB_UUIDfor compatibility, but new code should useconsts.WEB_SPECIAL_REMOTE_UUID#4460.
Fixes
Widespread improvements in functionality and test coverage on Windows and crippled file systems in general. #4057 #4245 #4268 #4276 #4291 #4296 #4301 #4303 #4304 #4305 #4306
AnnexRepo.get_size_from_keyincorrectly handled file chunks. #4081create-sibling would too readily clobber existing paths when called with
--existing=replace. It now gets confirmation from the user before doing so if running interactively and unconditionally aborts when running non-interactively. #4147update #4159
- queried the incorrect branch configuration when updating non-annex repositories.
- didn't account for the fact that the local repository can be configured as the upstream "remote" for a branch.
When the caller included
--bareas agit initoption, create crashed creating the bare repository, which is currently unsupported, rather than aborting with an informative error message. #4065The logic for automatically propagating the 'origin' remote when cloning a local source could unintentionally trigger a fetch of a non-local remote. #4196
All remaining
get_submodules()call sites that relied on the temporary compatibility layer added in v0.12.0 have been updated. #4348The custom result summary renderer for get, which was visible with
--output-format=tailored, displayed incorrect and confusing information in some cases. The custom renderer has been removed entirely. #4471The documentation for the Python interface of a command listed an incorrect default when the command overrode the value of command parameters such as
result_renderer. #4480
Enhancements and new features
The default result renderer learned to elide a chain of results after seeing ten consecutive results that it considers similar, which improves the display of actions that have many results (e.g., saving hundreds of files). #4337
The default result renderer, in addition to "tailored" result renderer, now triggers the custom summary renderer, if any. #4338
The new command create-sibling-ria provides support for creating a sibling in a RIA store. #4124
DataLad ships with a new special remote, git-annex-remote-ora, for interacting with RIA stores and a new command export-archive-ora for exporting an archive from a local annex object store. #4260 #4203
The new command push provides an alternative interface to publish for pushing a dataset hierarchy to a sibling. #4206 #4581 #4617 #4620
The new command copy-file copies files and associated availability information from one dataset to another. #4430
The command examples have been expanded and improved. #4091 #4314 #4464
The tooling for linking to the DataLad Handbook from DataLad's documentation has been improved. #4046
The
--recklessparameter of clone and install learned two new modes:- "ephemeral", where the .git/annex/ of the cloned repository is symlinked to the local source repository's. #4099
- "shared-{group|all|...}" that can be used to set up datasets for collaborative write access. #4324
-
- learned to handle dataset aliases in RIA stores when given a URL of the form
ria+<protocol>://<storelocation>#~<aliasname>. #4459 - now checks
datalad.get.subdataset-source-candidate-NAMEto see ifNAMEstarts with three digits, which is taken as a "cost". Sources with lower costs will be tried first. #4619
- learned to handle dataset aliases in RIA stores when given a URL of the form
update #4167
- learned to disallow non-fast-forward updates when
ff-onlyis given to the--mergeoption. - gained a
--followoption that controls how--mergebehaves, adding support for merging in the revision that is registered in the parent dataset rather than merging in the configured branch from the sibling. - now provides a result record for merge events.
- learned to disallow non-fast-forward updates when
create-sibling now supports local paths as targets in addition to SSH URLs. #4187
siblings now
- shows a warning if the caller requests to delete a sibling that does not exist. #4257
- phrases its warning about non-annex repositories in a less alarming way. #4323
The rendering of command errors has been improved. #4157
save now
- displays a message to signal that the working tree is clean, making it more obvious that no results being rendered corresponds to a clean state. #4106
- provides a stronger warning against using
--to-git. #4290
diff and save learned about scenarios where they could avoid unnecessary and expensive work. #4526 #4544 #4549
Calling diff without
--recursivebut with a path constraint within a subdataset ("/ ") now traverses into the subdataset, as " /" would, restricting its report to " / ". #4235 New option
datalad.annex.retrycontrols how many times git-annex will retry on a failed transfer. It defaults to 3 and can be set to 0 to restore the previous behavior. #4382wtf now warns when the specified dataset does not exist. #4331
The
reprandstroutput of the dataset and repo classes got a facelift. #4420 #4435 #4439The DataLad Singularity container now comes with p7zip-full.
DataLad emits a log message when the current working directory is resolved to a different location due to a symlink. This is now logged at the DEBUG rather than WARNING level, as it typically does not indicate a problem. #4426
DataLad now lets the caller know that
git annex initis scanning for unlocked files, as this operation can be slow in some repositories. #4316The
log_progresshelper learned how to set the starting point to a non-zero value and how to update the total of an existing progress bar, two features needed for planned improvements to how some commands display their progress. #4438The
ExternalVersionsobject, which is used to check versions of Python modules and external tools (e.g., git-annex), gained anaddmethod that enables DataLad extensions and other third-party code to include other programs of interest. #4441All of the remaining spots that use GitPython have been rewritten without it. Most notably, this includes rewrites of the
clone,fetch, andpushmethods ofGitRepo. #4080 #4087 #4170 #4171 #4175 #4172When
GitRepo.commitsplits its operation across multiple calls to avoid exceeding the maximum command line length, it now amends to initial commit rather than creating multiple commits. #4156GitRepogained aget_corresponding_branchmethod (which always returns None), allowing a caller to invoke the method without needing to check if the underlying repo class isGitRepoorAnnexRepo. #4274A new helper function
datalad.core.local.repo.repo_from_pathreturns a repo class for a specified path. #4273New
AnnexRepomethodlocalsyncperforms agit annex syncthat disables external interaction and is particularly useful for propagating changes on an adjusted branch back to the main branch. #4243
Scientific Software - Peer-reviewed
- Python
Published by kyleam almost 6 years ago
DataLad - Second release candidate for 0.13.0 (May 22, 2020)
Notable changes since rc1
create-sibling-ria produced results with an incorrect "dataset" value. (#4486)
clone did not correctly handle RIA datasets that were not annex repositories. (#4487)
-
- now fails earlier if given an unknown target. (#4517)
- got some optimizations and progress bar improvements. (#4545) (#4546) (#4547) (#4548)
- now only warns about unavailable content when given explicit paths. (#4547)
The documentation for publish has been updated to mark it as deprecated, pointing to push as its replacement. (#4515)
Fixes for progress bar glitches. (#4503) (#4555)
For a description of changes in the 0.13.0 release, see https://github.com/datalad/datalad/releases/tag/0.13.0rc1.
Scientific Software - Peer-reviewed
- Python
Published by kyleam about 6 years ago
DataLad - 0.12.7 (May 22, 2020)
Fixes
Requesting tailored output (
--output=tailored) from a command with a custom result summary renderer produced repeated output. (#4463)A longstanding regression in argcomplete-based command-line completion for Bash has been fixed. You can enable completion by configuring a Bash startup file to run
eval "$(register-python-argcomplete datalad)"or source DataLad'stools/cmdline-completion. The latter should work for Zsh as well. (#4477)publish didn't prevent
git-fetchfrom recursing into submodules, leading to a failure when the registered submodule was not present locally and the submodule did not have a remote named 'origin'. (#4560)addurls botched path handling when the file name format started with "./" and the call was made from a subdirectory of the dataset. (#4504)
Double dash options in manpages were unintentionally escaped. (#4332)
The check for HTTP authentication failures crashed in situations where content came in as bytes rather than unicode. (#4543)
A check in
AnnexRepo.whereiscould lead to a type error. (#4552)When installing a dataset to obtain a subdataset, get confusingly displayed a message that described the containing dataset as "underneath" the subdataset. (#4456)
A couple of Makefile rules didn't properly quote paths. (#4481)
With DueCredit support enabled (
DUECREDIT_ENABLE=1), the query for metadata information could flood the output with warnings if datasets didn't have aggregated metadata. The warnings are now silenced, with the overall failure of a metadata call logged at the debug level. (#4568)
Enhancements and new features
The resource identifier helper learned to recognize URLs with embedded Git transport information, such as gcrypt::https://example.com. (#4529)
When running non-interactively, a more informative error is now signaled when the UI backend, which cannot display a question, is asked to do so. (#4553)
Scientific Software - Peer-reviewed
- Python
Published by kyleam about 6 years ago
DataLad - First release candidate for 0.13.0 (May 05, 2020)
A handful of new commands, including copy-file, push, and create-sibling-ria, along with various fixes and enhancements
Major refactoring and deprecations
The
no_annexparameter of create, which is exposed in the Python API but not the command line, is deprecated and will be removed in a later release. Use the newannexargument instead, flipping the value. Command-line callers that use--no-annexare unaffected. (#4321)datalad add, which was deprecated in 0.12.0, has been removed. (#4158) (#4319)The following
GitRepoandAnnexRepomethods have been removed:get_changed_files,get_missing_files, andget_deleted_files. (#4169) (#4158)The
get_branch_commitsmethod ofGitRepoandAnnexRepohas been renamed toget_branch_commits_. (#3834)The custom
commitmethod ofAnnexRepohas been removed, andAnnexRepo.commitnow resolves to the parent method,GitRepo.commit. (#4168)GitPython's
git.repo.base.Repoclass is no longer available via the.repoattribute ofGitRepoandAnnexRepo. (#4172)AnnexRepo.get_corresponding_branchnow returnsNonerather than the current branch name when a managed branch is not checked out. (#4274)The special UUID for git-annex web remotes is now available as
datalad.consts.WEB_SPECIAL_REMOTE_UUID. It remains accessible asAnnexRepo.WEB_UUIDfor compatibility, but new code should useconsts.WEB_SPECIAL_REMOTE_UUID(#4460).
Fixes
Widespread improvements in functionality and test coverage on Windows and crippled file systems in general. (#4057) (#4245) (#4268) (#4276) (#4291) (#4296) (#4301) (#4303) (#4304) (#4305) (#4306)
AnnexRepo.get_size_from_keyincorrectly handled file chunks. (#4081)create-sibling would too readily clobber existing paths when called with
--existing=replace. It now gets confirmation from the user before doing so if running interactively and unconditionally aborts when running non-interactively. (#4147)-
- queried the incorrect branch configuration when updating non-annex repositories.
- didn't account for the fact that the local repository can be configured as the upstream "remote" for a branch.
When the caller included
--bareas agit initoption, create crashed creating the bare repository, which is currently unsupported, rather than aborting with an informative error message. (#4065)The logic for automatically propagating the 'origin' remote when cloning a local source could unintentionally trigger a fetch of a non-local remote. (#4196)
All remaining
get_submodules()call sites that relied on the temporary compatibility layer added in v0.12.0 have been updated. (#4348)The custom result summary renderer for get, which was visible with
--output-format=tailored, displayed incorrect and confusing information in some cases. The custom renderer has been removed entirely. (#4471)
Enhancements and new features
The default result renderer learned to elide a chain of results after seeing ten consecutive results that it considers similar, which improves the display of actions that have many results (e.g., saving hundreds of files). (#4337)
The default result renderer, in addition to "tailored" result renderer, now triggers the custom summary renderer, if any. (#4338)
The new command create-sibling-ria provides support for creating a sibling in a RIA store. (#4124)
DataLad ships with a new special remote, git-annex-remote-ora, for interacting with RIA stores and a new command export-archive-ora for exporting an archive from a local annex object store. (#4260) (#4203)
The new command push provides an alternative interface to publish for pushing a dataset hierarchy to a sibling. (#4206)
The new command copy-file copies files and associated availability information from one dataset to another. (#4430)
The command examples have been expanded and improved. (#4091) (#4314) (#4464)
The tooling for linking to the DataLad Handbook from DataLad's documentation has been improved. (#4046)
The
--recklessparameter of clone and install learned two new modes:- "ephemeral", where the .git/annex/ of the cloned repository is symlinked to the local source repository's. (#4099)
- "shared-{group|all|...}" that can be used to set up datasets for collaborative write access. (#4324)
clone learned to handle dataset aliases in RIA stores when given a URL of the form
ria+<protocol>://<storelocation>#~<aliasname>. (#4459)-
- learned to disallow non-fast-forward updates when
ff-onlyis given to the--mergeoption. - gained a
--followoption that controls how--mergebehaves, adding support for merging in the revision that is registered in the parent dataset rather than merging in the configured branch from the sibling. - now provides a result record for merge events.
- learned to disallow non-fast-forward updates when
create-sibling now supports local paths as targets in addition to SSH URLs. (#4187)
siblings now
- shows a warning if the caller requests to delete a sibling that does not exist. (#4257)
- phrases its warning about non-annex repositories in a less alarming way. (#4323)
The rendering of command errors has been improved. (#4157)
save now
- displays a message to signal that the working tree is clean, making it more obvious that no results being rendered corresponds to a clean state. (#4106)
- provides a stronger warning against using
--to-git. (#4290)
Calling diff without
--recursivebut with a path constraint within a subdataset ("/ ") now traverses into the subdataset, as " /" would, restricting its report to " / ". (#4235) New option
datalad.annex.retrycontrols how many times git-annex will retry on a failed transfer. It defaults to 3 and can be set to 0 to restore the previous behavior. (#4382)wtf now warns when the specified dataset does not exist. (#4331)
The
reprandstroutput of the dataset and repo classes got a facelift. (#4420) (#4435) (#4439)The DataLad Singularity container now comes with p7zip-full.
DataLad shows a log message when the current working is resolved to a different location due to a symlink. This is now logged at the DEBUG rather than WARNING level, as it typically does not indicate a problem. (#4426)
DataLad now lets the caller know that
git annex initis scanning for unlocked files, as this operation can be slow in some repositories. (#4316)The
log_progresshelper learned how to set the starting point to a non-zero value and how to update the total of an existing progress bar, two features needed for planned improvements to how some commands display their progress. (#4438)The
ExternalVersionsobject, which is used to check versions of Python modules and external tools (e.g., git-annex), gained anaddmethod that enables DataLad extensions and other third-party code to include other programs of interest. (#4441)All of the remaining spots that use GitPython have been rewritten without it. Most notably, this includes rewrites of the
clone,fetch, andpushmethods ofGitRepo. (#4080) (#4087) (#4170) (#4171) (#4175) (#4172)When
GitRepo.commitsplits its operation across multiple calls to avoid exceeding the maximum command line length, it now amends to initial commit rather than creating multiple commits. (#4156)GitRepogained aget_corresponding_branchmethod (which always returns None), allowing a caller to invoke the method without needing to check if the underlying repo class isGitRepoorAnnexRepo. (#4274)A new helper function
datalad.core.local.repo.repo_from_pathreturns a repo class for a specified path. (#4273)New
AnnexRepomethodlocalsyncperforms agit annex syncthat disables external interaction and is particularly useful for propagating changes on an adjusted branch back to the main branch. (#4243)
Scientific Software - Peer-reviewed
- Python
Published by kyleam about 6 years ago
DataLad - 0.12.6 (April 23, 2020)
Major refactoring and deprecations
- The value of
datalad.support.annexrep.N_AUTO_JOBSis no longer considered. The variable will be removed in a later release. (#4409)
Fixes
Staring with v0.12.0,
datalad saverecorded the current branch of a parent dataset as thebranchvalue in the .gitmodules entry for a subdataset. This behavior is problematic for a few reasons and has been reverted. (#4375)The default for the
--jobsoption, "auto", instructed DataLad to pass a value to git-annex's--jobsequal tomin(8, max(3, <number of CPUs>)), which could lead to issues due to the large number of child processes spawned and file descriptors opened. To avoid this behavior,--jobs=autonow results in git-annex being called with--jobs=1by default. Configure the new optiondatalad.runtime.max-annex-jobsto control the maximum value that will be considered when--jobs='auto'. (#4409)Various commands have been adjusted to better handle the case where a remote's HEAD ref points to an unborn branch. (#4370)
-
- learned to use the query as a regular expression that restricts
the keys that are shown for
--show-keys short. (#4354) - gives a more helpful message when query is an invalid regular expression. (#4398)
- learned to use the query as a regular expression that restricts
the keys that are shown for
The code for parsing Git configuration did not follow Git's behavior of accepting a key with no value as shorthand for key=true. (#4421)
AnnexRepo.infoneeded a compatibility update for a change in how git-annex reports file names. (#4431)create-sibling-github did not gracefully handle a token that did not have the necessary permissions. (#4400)
Enhancements and new features
search learned to use the query as a regular expression that restricts the keys that are shown for
--show-keys short. (#4354)datalad <subcommand>learned to point to the [datalad-container][] extension when a subcommand from that extension is given but the extension is not installed. (#4400) (#4174)
Scientific Software - Peer-reviewed
- Python
Published by kyleam about 6 years ago
DataLad - A small step for datalad ...
Fix some bugs and make the world an even better place.
- Our log_progress helper mishandled the initial display and step of
the progress bar. (#4326)
AnnexRepo.get_content_annexinfois designed to acceptinit=None, but passing that led to an error. (#4330)Update a regular expression to handle an output change in Git v2.26.0. (#4328)
We now set
LC_MESSAGESto 'C' while running git to avoid failures when parsing output that is marked for translation. (#4342)The helper for decoding JSON streams loaded the last line of input without decoding it if the line didn't end with a new line, a regression introduced in the 0.12.0 release. (#4361)
The clone command failed to git-annex-init a fresh clone whenever it considered to add the origin of the origin as a remote. (#4367)
Scientific Software - Peer-reviewed
- Python
Published by bpoldrack about 6 years ago
DataLad - 0.12.4 (Mar 19, 2020) -- Windows?!
The main purpose of this release is to have one on PyPi that has no associated wheel to enable a working installation on Windows (#4315).
Fixes
- The description of the
log.outputsconfig switch did not keep up with code changes and incorrectly stated that the output would be logged at the DEBUG level; logging actually happens at a lower level. (#4317)
Scientific Software - Peer-reviewed
- Python
Published by mih about 6 years ago
DataLad - 0.12.3 (March 16, 2020)
Updates for compatibility with the latest git-annex, along with a few miscellaneous fixes
Major refactoring and deprecations
- All spots that raised a
NoDatasetArgumentFoundexception now raise aNoDatasetFoundexception to better reflect the situation: it is the dataset rather than the argument that is not found. For compatibility, the latter inherits from the former, but new code should prefer the latter. (#4285)
Fixes
Updates for compatibility with git-annex version 8.20200226. (#4214)
datalad export-to-figsharefailed to export if the generated title was fewer than three characters. It now queries the caller for the title and guards against titles that are too short. (#4140)Authentication was requested multiple times when git-annex launched parallel downloads from the
dataladspecial remote. (#4308)At verbose logging levels, DataLad requests that git-annex display debugging information too. Work around a bug in git-annex that prevented that from happening. (#4212)
The internal command runner looked in the wrong place for some configuration variables, including
datalad.log.outputs, resulting in the default value always being used. (#4194)publishwhen trying to publish to a git-lfs special remote for the first time. (#4200)AnnexRepo.set_remote_urlis supposed to establish shared SSH connections but failed to do so. (#4262)
Enhancements and new features
The message provided when a command cannot determine what dataset to operate on has been improved. (#4285)
The "aws-s3" authentication type now allows specifying the host through "aws-s3_host", which was needed to work around an authorization error due to a longstanding upstream bug. (#4239)
The xmp metadata extractor now recognizes ".wav" files.
Scientific Software - Peer-reviewed
- Python
Published by kyleam about 6 years ago
DataLad - 0.12.2 (Jan 28, 2020) -- Smoothen the ride
Mostly a bugfix release with various robustifications, but also makes the first step towards versioned dataset installation requests.
Major refactoring and deprecations
- The minimum required version for GitPython is now 2.1.12. #4070
Fixes
The class for handling configuration values,
ConfigManager, inappropriately considered the current working directory's dataset, if any, for both reading and writing when instantiated withdataset=None. This misbehavior is fairly inaccessible through typical use of DataLad. It affectsdatalad.cfg, the top-level configuration instance that should not consider repository-specific values. It also affects Python users that callDatasetwith a path that does not yet exist and persists until that dataset is created. #4078updatesaved the dataset when called with--merge, which is unnecessary and risks committing unrelated changes. #3996Confusing and irrelevant information about Python defaults have been dropped from the command-line help. #4002
The logic for automatically propagating the 'origin' remote when cloning a local source didn't properly account for relative paths.
4045
Various fixes to file name handling and quoting on Windows.
4049 #4050
When cloning failed, error lines were not bubbled up to the user in some scenarios. #4060
Enhancements and new features
clone(and thusinstall)- now propagates the
recklessmode from the superdataset when cloning a dataset into it. #4037 - gained support for
ria+<protocol>://URLs that point to RIA stores. #4022 - learned to read "@version" from
ria+URLs and install that version of a dataset #4036 and to apply URL rewrites configured through Git'surl.*.insteadOfmechanism #4064. - now copies
datalad.get.subdataset-source-candidate-<name>options configured within the superdataset into the subdataset. This is particularly useful for RIA data stores. #4073
- now propagates the
Archives are now (optionally) handled with 7-Zip instead of
patool. 7-Zip will be used by default, butpatoolwill be used on non-Windows systems if thedatalad.runtime.use-patooloption is set or the7zexecutable is not found. #4041
Scientific Software - Peer-reviewed
- Python
Published by mih over 6 years ago
DataLad - Small bump after big bang
0.12.1 (Jan 15, 2020) -- Small bump after big bang
Fix some fallout after major release.
Fixes
Revert incorrect relative path adjustment to URLs in
clone. #3538Various small fixes to internal helpers and test to run on Windows
2566 #2534
Scientific Software - Peer-reviewed
- Python
Published by mih over 6 years ago
DataLad - 0.12.0 (Jan 11, 2020) -- Krakatoa
This release is the result of more than a year of development that includes
fixes for a large number of issues, yielding more robust behavior across a
wider range of use cases, and introduces major changes in API and behavior. It
is the first release for which extensive user documentation is available in a
dedicated [DataLad Handbook]handbook. Python 3 (3.5 and later) is now the
only supported Python flavor.
Major changes 0.12 vs 0.11
savefully replacesadd(which is obsolete now, and will be removed in a future release).A new Git-annex aware
statuscommand enables detailed inspection of dataset hierarchies. The previously availablediffcommand has been adjusted to matchstatusin argument semantics and behavior.The ability to configure dataset procedures prior and after the execution of particular commands has been replaced by a flexible "hook" mechanism that is able to run arbitrary DataLad commands whenever command results are detected that match a specification.
Support of the Windows platform has been improved substantially. While performance and feature coverage on Windows still falls behind Unix-like systems, typical data consumer use cases, and standard dataset operations, such as
createandsave, are now working. Basic support for data provenance capture viarunis also functional.Support for Git-annex direct mode repositories has been removed, following the end of support in Git-annex itself.
The semantics of relative paths in command line arguments have changed. Previously, a call
datalad save --dataset /tmp/myds some/relpathwould have been interpreted as saving a file at/tmp/myds/some/relpathinto dataset/tmp/myds. This has changed to saving$PWD/some/relpathinto dataset/tmp/myds. More generally, relative paths are now always treated as relative to the current working directory, except for path arguments of [Dataset] class instance methods of the Python API. The resulting partial duplication of path specifications between path and dataset arguments is mitigated by the introduction of two special symbols that can be given as dataset argument:^and^., which identify the topmost superdataset and the closest dataset that contains the working directory, respectively.The concept of a "core API" has been introduced. Commands situated in the module
datalad.core(such ascreate,save,run,status, [diff receive additional scrutiny regarding API and implementation, and are meant to provide longer-term stability. Application developers are encouraged to preferentially build on these commands.
Major refactoring and deprecations since 0.12.0rc6
clonehas been incorporated into the growing core API. The public--alternative-sourceparameter has been removed, and aclone_datasetfunction with multi-source capabilities is provided instead. The--recklessparameter can now take literal mode labels instead of just beeing a binary flag, but backwards compatibility is maintained.The
get_file_contentmethod ofGitRepowas no longer used internally or in any known DataLad extensions and has been removed.3812
The function
get_dataset_roothas been replaced byrev_get_dataset_root.rev_get_dataset_rootremains as a compatibility alias and will be removed in a later release. #3815The
add_siblingmodule, marked obsolete in v0.6.0, has been removed. #3871mockis no longer declared as an external dependency because we can rely on it being in the standard library now that our minimum required Python version is 3.5. #3860[download-url] now requires that directories be indicated with a trailing slash rather than interpreting a path as directory when it doesn't exist. This avoids confusion that can result from typos and makes it possible to support directory targets that do not exist.
3854
The
dataset_onlyargument of theConfigManagerclass is deprecated. Usesource="dataset"instead. #3907The
--proc-preand--proc-postoptions have been removed, and configuration values fordatalad.COMMAND.proc-preanddatalad.COMMAND.proc-postare no longer honored. The new result hook mechanism provides an alternative forproc-postprocedures. #3963
Fixes since 0.12.0rc6
publishcrashed when called with a detached HEAD. It now aborts with an informative message. #3804Since 0.12.0rc6 the call to
updateinsiblingsresulted in a spurious warning. #3877siblingscrashed if it encountered an annex repository that was marked as dead. #3892The update of
rerunin v0.12.0rc3 for the rewrittendiffcommand didn't account for a change in the output ofdiff, leading torerun --reportunintentionally including unchanged files in its diff values. #3873In 0.12.0rc5 [download-url] was updated to follow the new path handling logic, but its calls to AnnexRepo weren't properly adjusted, resulting in incorrect path handling when the called from a dataset subdirectory. #3850
[download-url] called
git annex addurlin a way that failed to register a URL when its header didn't report the content size.3911
With Git v2.24.0, saving new subdatasets failed due to a bug in that Git release. #3904
With DataLad configured to stop on failure (e.g., specifying
--on-failure=stopfrom the command line), a failing result record was not rendered. #3863Installing a subdataset yielded an "ok" status in cases where the repository was not yet in its final state, making it ineffective for a caller to operate on the repository in response to the result.
3906
The internal helper for converting git-annex's JSON output did not relay information from the "error-messages" field. #3931
[run-procedure] reported relative paths that were confusingly not relative to the current directory in some cases. It now always reports absolute paths. #3959
diffinappropriately reported files as deleted in some cases whentowas a value other thanNone. #3999An assortment of fixes for Windows compatibility. #3971 #3974
3975 #3976 #3979
Subdatasets installed from a source given by relative path will now have this relative path used as 'url' in their .gitmodules record, instead of an absolute path generated by Git. #3538
clonewill now correctly interpret '~/...' paths as absolute path specifications. #3958
Enhancements and new features since 0.12.0rc6
By default, datasets cloned from local source paths will now get a configured remote for any recursively discoverable 'origin' sibling that is also available from a local path in order to maximize automatic file availability across local annexes. #3926
The new [result hooks mechanism]
hooksallows callers to specify, via local Git configuration values, DataLad command calls that will be triggered in response to matching result records (i.e., what you see when you call a command with-f json_pp). #3903The command interface classes learned to use a new
_examples_attribute to render documentation examples for both the Python and command-line API. #3821Candidate URLs for cloning a submodule can now be generated based on configured templates that have access to various properties of the submodule, including its dataset ID. #3828
DataLad's check that the user's Git identity is configured has been sped up and now considers the appropriate environment variables as well. #3807
The
tagmethod ofGitRepocan now tag revisions other thanHEADand accepts a list of arbitrarygit tagoptions.3787
When
getclones a subdataset and the subdataset's HEAD differs from the commit that is registered in the parent, the active branch of the subdataset is moved to the registered commit if the registered commit is an ancestor of the subdataset's HEAD commit. This handling has been moved to a more central location withinGitRepo, and now applies to anyupdate_submodule(..., init=True)call. #3831The output of
datalad -hhas been reformatted to improve readability. #3862unlockhas been sped up. #3880[run-procedure] learned to provide and render more information about discovered procedures, including whether the procedure is overridden by another procedure with the same base name. #3960
savenow #3817- records the active branch in the superdataset when registering a new subdataset.
- calls
git annex syncwhen saving a dataset on an adjusted branch so that the changes are brought into the mainline branch.
subdatasetsnow aborts when itsdatasetargument points to a non-existent dataset. #3940wtfnow- reports the dataset ID if the current working directory is visiting a dataset. #3888
- outputs entries deterministically. #3927
The
ConfigManagerclass- learned to exclude
.datalad/configas a source of configuration values, restricting the sources to standard Git configuration files, when called withsource="local". #3907 - accepts a value of "override" for its
whereargument to allow Python callers to more convenient override configuration. #3970
- learned to exclude
Commands now accept a
datasetvalue of "^." as shorthand for "the dataset to which the current directory belongs". #3242
Scientific Software - Peer-reviewed
- Python
Published by mih over 6 years ago
DataLad - the revolution is over
With the replacement of the save command implementation with rev-save
the revolution effort is now over, and the set of key commands for
local dataset operations (create, run, save, status, diff) is
now complete. This new core API is available from datalad.core.local
(and also via datalad.api, as any other command).
๏ฟผ
Major refactoring and deprecations
- The
addcommand is now deprecated. It will be removed in a future release.
Fixes
Remove hard-coded dependencies on POSIX path conventions in SSH support code (#3400)
Emit an
addresult when adding a new subdataset during save (#3398)SSH file transfer now actually opens a shared connection, if none exists yet (#3403)
Enhancements and new features
SSHConnectionnow offers methods for file upload and dowload (get(),put(). The previouscopy()method only supported upload and was discontinued (#3401)
Scientific Software - Peer-reviewed
- Python
Published by mih about 7 years ago
DataLad - 0.11.2 (Feb 07, 2019) -- live-long-and-prosper
A variety of bugfixes and enhancements
Major refactoring and deprecations
- All extracted metadata is now placed under git-annex by default. Previously files smaller than 20 kb were stored in git. (#3109)
- The function
datalad.cmd.get_runnerhas been removed. (#3104)
Fixes
- Improved handling of long commands:
- The code that inspected
SC_ARG_MAXdidn't check that the reported value was a sensible, positive number. (#3025) - More commands that invoke
gitandgit-annexwith file arguments learned to split up the command calls when it is likely that the command would fail due to exceeding the maximum supported length. (#3138)
- The code that inspected
- The
setup_yoda_datasetprocedure created a malformed .gitattributes line. (#3057) - download-url unnecessarily tried to infer the dataset when
--no-savewas given. (#3029) - rerun aborted too late and with a confusing message when a ref
specified via
--ontodidn't exist. (#3019) - run:
rundidn't preserve the current directory prefix ("./") on inputs and outputs, which is problematic if the caller relies on this representation when formatting the command. (#3037)- Fixed a number of unicode py2-compatibility issues. ([#3035]) (#3046)
- To proceed with a failed command, the user was confusingly
instructed to use
saveinstead ofaddeven thoughrunusesaddunderneath. (#3080)
- Fixed a case where the helper class for checking external modules incorrectly reported a module as unknown. (#3051)
- add-archive-content mishandled the archive path when the leading path contained a symlink. (#3058)
- Following denied access, the credential code failed to consider a scenario, leading to a type error rather than an appropriate error message. (#3091)
- Some tests failed when executed from a
git worktreecheckout of the source repository. (#3129) - During metadata extraction, batched annex processes weren't properly terminated, leading to issues on Windows. (#3137)
- add incorrectly handled an "invalid repository" exception when trying to add a submodule. (#3141)
- Pass
GIT_SSH_VARIANT=sshto git processes to be able to specify alternative ports in SSH urls
Enhancements and new features
- search learned to suggest closely matching keys if there are no hits. (#3089)
- create-sibling
- Interface classes can now override the default renderer for summarizing results. (#3061)
- run:
--inputand--outputcan now be shortened to-iand-o. (#3066)- Placeholders such as "{inputs}" are now expanded in the command that is shown in the commit message subject. (#3065)
interface.run.run_commandgained anextra_inputsargument so that wrappers like datalad-container can specify additional inputs that aren't considered when formatting the command string. (#3038)- "--" can now be used to separate options for
runand those for the command in ambiguous cases. (#3119)
- The utilities
create_treeandok_file_has_contentnow support ".gz" files. (#3049) - The Singularity container for 0.11.1 now uses nd_freeze to make its builds reproducible.
- A publications page has been added to the documentation. (#3099)
GitRepo.set_gitattributesnow accepts amodeargument that controls whether the .gitattributes file is appended to (default) or overwritten. (#3115)datalad --helpnow avoids usingmanso that the list of subcommands is shown. (#3124)
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic over 7 years ago
DataLad - 0.11.1 (Nov 26, 2018) -- v7-better-than-v6
Rushed out bugfix release to stay fully compatible with recent git-annex which introduced v7 to replace v6.
Fixes
- install: be able to install recursively into a dataset (#2982)
- save: be able to commit/save changes whenever files potentially could have swapped their storage between git and annex ([#1651]) (#2752) (#3009)
- [aggregate-metadata]:
- dataset's itself is now not "aggregated" if specific paths are
provided for aggregation (#3002). That resolves the issue of
-rinvocation aggregating all subdatasets of the specified dataset as well - also compare/verify the actual content checksum of aggregated metadata while considering subdataset metadata for re-aggregation (#3007)
- dataset's itself is now not "aggregated" if specific paths are
provided for aggregation (#3002). That resolves the issue of
annexcommands are now chunked assuming 50% "safety margin" on the maximal command line length. Should resolve crashes while operating ot too many files at ones (#3001)runsidecar config processing (#2991)- no double trailing period in docs (#2984)
- correct identification of the repository with symlinks in the paths in the tests (#2972)
- re-evaluation of dataset properties in case of dataset changes (#2946)
- [text2git] procedure to use
ds.repo.set_gitattributes([#2974]) (#2954) - Switch to use plain
os.getcwd()if inconsistency with env var$PWDis detected (#2914) - Make sure that credential defined in env var takes precedence ([#2960]) (#2950)
Enhancements and new features
- shub://datalad/datalad:git-annex-dev
provides a Debian buster Singularity image with build environment for
git-annex. tools/bisect-git-annex provides a helper for running
git bisecton git-annex using that Singularity container (#2995) - Added .zenodo.json for better integration with Zenodo for citation
- run-procedure now provides names and help messages with a custom renderer for (#2993)
- Documentation: point to datalad-revolution extension (prototype of the greater DataLad future)
- run
- support injecting of a detached command (#2937)
annexmetadata extractor now extractsannex.keymetadata record. Should allow now to identify uses of specific files etc (#2952)- Test that we can install from http://datasets.datalad.org
- Proper rendering of
CommandError(e.g. in case of "out of space" error) (#2958)
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic over 7 years ago
DataLad - 0.11.0 (Oct 23, 2018) -- Soon-to-be-perfect
git-annex 6.20180913 (or later) is now required - provides a number of fixes for v6 mode operations etc.
Major refactoring and deprecations
datalad.consts.LOCAL_CENTRAL_PATHconstant was deprecated in favor ofdatalad.locations.default-datasetconfiguration variable (#2835)
Minor refactoring
"notneeded"messages are no longer reported by default results renderer- run no longer shows commit instructions upon command failure when
explicitis true and no outputs are specified (#2922) get_git_dirmoved into GitRepo (#2886)_gitpy_custom_callremoved from GitRepo (#2894)GitRepo.get_merge_baseargument is now calledcommitishesinstead oftreeishes(#2903)
Fixes
- update should not leave the dataset in non-clean state (#2858) and some other enhancements (#2859)
- Fixed chunking of the long command lines to account for decorators and other arguments (#2864)
- Progress bar should not crash the process on some missing progress information (#2891)
- Default value for
jobsset to be"auto"(notNone) to take advantage of possible parallel get if in-gmode (#2861) - [wtf] must not crash if
git-annexis not installed etc (#2865), (#2865), ([#2918]), (#2917) - Fixed paths (with spaces etc) handling while reporting annex error output ([#2892]), (#2893)
__del__should not access.repobut._repoto avoid attempts for reinstantiation etc (#2901)- Fix up submodule
.gitright inGitRepo.add_submoduleto avoid added submodules being non git-annex friendly ([#2909]), (#2904) - run-procedure
- now will provide dataset into the procedure if called within dataset
- will not crash if procedure is an executable without
.pyor.shsuffixes
- Use centralized
.gitattributeshandling while setting annex backend (#2912) GlobbedPaths.expand(..., full=True)incorrectly returned relative paths when called more than once (#2921)
Enhancements and new features
- Report progress on clone when installing from "smart" git servers (#2876)
- Stale/unused
sth_like_file_has_contentwas removed (#2860) - Enhancements to search to operate on "improved" metadata layouts (#2878)
- Output of
git annex initoperation is now logged (#2881) - New
- run-procedure
- procedures can now recursively be discovered in subdatasets as well. The uppermost has highest priority
- Procedures in user and system locations now take precedence over those in datasets.
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic over 7 years ago
DataLad - 0.10.3.1: Nothing is perfect - rushed bugfix with correct __version__
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic over 7 years ago
DataLad - 0.10.3: Almost-perfect
This is largely a bugfix release which addressed many (but not yet all)
issues of working with git-annex direct and version 6 modes, and operation
on Windows in general. Among enhancements you will see the
support of public S3 buckets (even with periods in their names),
ability to configure new providers interactively, and improved egrep
search backend.
Although we do not require with this release, it is recommended to make
sure that you are using a recent git-annex since it also had a variety
of fixes and enhancements in the past months.
Fixes
- Parsing of combined short options has been broken since DataLad v0.10.0. (#2710)
- The
datalad saveinstructions shown bydatalad runfor a command with a non-zero exit were incorrectly formatted. (#2692) - Decompression of zip files (e.g., through
datalad add-archive-content) failed on Python 3. (#2702) - Windows:
- Internal git fetch calls have been updated to work around a
GitPython
BadNameissue. ([#2712]), (#2794) - The progess bar for annex file transferring was unable to handle an empty file. (#2717)
datalad add-readmehalted when no aggregated metadata was found rather than displaying a warning. (#2731)datalad rerunfailed if--ontowas specified and the history contained no run commits. (#2761)- Processing of a command's results failed on a result record with a missing value (e.g., absent field or subfield in metadata). Now the missing value is rendered as "N/A". (#2725).
- A couple of documentation links in the "Delineation from related solutions" were misformatted. (#2773)
- With the latest git-annex, several known V6 failures are no longer an issue. (#2777)
- In direct mode, commit changes would often commit annexed content as regular Git files. A new approach fixes this and resolves a good number of known failures. (#2770)
- The reporting of command results failed if the current working
directory was removed (e.g., after an unsuccessful
install). (#2788) - When installing into an existing empty directory,
datalad installremoved the directory after a failed clone. (#2788) datalad runincorrectly handled inputs and outputs for paths with spaces and other characters that require shell escaping. (#2798)- Globbing inputs and outputs for
datalad rundidn't work correctly if a subdataset wasn't installed. (#2796) - Minor (in)compatibility with git 2.19 - (no) trailing period in an error message now. (#2815)
Enhancements and new features
- Anonymous access is now supported for S3 and other downloaders. (#2708)
- A new interface is available to ease setting up new providers. (#2708)
- Metadata: changes to egrep mode search (#2735)
- Queries in egrep mode are now case-sensitive when the query contains any uppercase letters and are case-insensitive otherwise. The new mode egrepcs can be used to perform a case-sensitive query with all lower-case letters.
- Search can now be limited to a specific key.
- Multiple queries (list of expressions) are evaluated using AND to determine whether something is a hit.
- A single multi-field query (e.g.,
pa*:findme) is a hit, when any matching field matches the query. - All matching key/value combinations across all (multi-field) queries are reported in the query_matched result field.
- egrep mode now shows all hits rather than limiting the results to the top 20 hits.
- The documentation on how to format commands for
datalad runhas been improved. (#2703) - The method for determining the current working directory on Windows has been improved. (#2707)
datalad --versionnow simply shows the version without the license. (#2733)datalad export-archivelearned to export under an existing directory via its--filenameoption. (#2723)datalad export-to-figsharenow generates the zip archive in the root of the dataset unless--filenameis specified. (#2723)- After importing
datalad.api,help(datalad.api)(ordatalad.api?in IPython) now shows a summary of the available DataLad commands. (#2728) - Support for using
dataladfrom IPython has been improved. (#2722) datalad wtfnow returns structured data and reports the version of each extension. (#2741)- The internal handling of gitattributes information has been
improved. A user-visible consequence is that
datalad create --forceno longer duplicates existing attributes. (#2744) - The "annex" metadata extractor can now be used even when no content is present. (#2724)
- The
add_url_to_filemethod (called by commands likedatalad download-urlanddatalad add-archive-content) learned how to display a progress bar. (#2738)
Scientific Software - Peer-reviewed
- Python
Published by yarikoptic over 7 years ago
DataLad - OHBM polish
0.10.1 (Jun 17, 2018) -- OHBM polish
This is a minor bugfix release.
Fixes
- Be able to use backports.lzma as a drop-in replacement for pyliblzma.
- Give help when not specifying a procedure name in
run-procedure. - Abort early when a downloader received no filename.
- Avoid
rerunerror when trying to unlock non-available files.
Scientific Software - Peer-reviewed
- Python
Published by mih almost 8 years ago