Change log
1.1.3 (2024-08-08)
Tests
Account for the fix in git-annex behavior in test_add_delete_after_and_drop_subdir. PR #7640 (by @yarikoptic)
1.1.2 (2024-07-25)
Bug Fixes
Correct remote OS detection when working with RIA (ORA) stores: this should enable RIA operations, including push, from Mac clients to Linux hosts (and likely vice versa). Fixes #7536 via PR #7549 (by @mslw)
Allow only one thread in S3 downloader’s progress report callback. PR #7636 (by @christian-monch)
1.1.1 (2024-07-03)
Bug Fixes
Documentation
Internal
Add codespell and minor fixuppers to pre-commit configuration and apply it to non-
datalad/
components. PR #7621 (by @yarikoptic)
Tests
For appveyor ssh setup, setup MaxSessions 100 to avoid ‘channel 22: open failed: connect failed: open failed’. PR #7617 (by @yarikoptic)
test_gracefull_death: raise test_gracefull_death threshold to 300 from 100. PR #7619 (by @yarikoptic)
Make test for presence of max_path in partitions not run for current psutil 6.0.0. PR #7622 (by @yarikoptic)
1.1.0 (2024-06-06)
Dependencies
Deprecated
boto
is replaced withboto3
(used to handle AWS S3 downloads). Fixes #5597 via PR #7340 (by @mslw, @effigies, and @yarikoptic). Remaining issues:no download progress indication,
no “Range” support (for partial downloads).
Internal
Retry logic for S3 connections is now handed over to Boto3 and its standard mode, removing our custom method. PR #7340
1.0.3 (2024-06-06)
Bug Fixes
Raise exception if an annex remote process without console tries to interact with the user, e.g. prompt for a password. PR #7578 (by @christian-monch)
Fix add-archive-content for patool>=2.0. PR #7603 (by @dguibert)
Internal
Fixup minor typos in documentation/comments using fresh codespell. PR #7610 (by @yarikoptic)
Tests
Stop testing on Python 3.7. Switch MacOS tests to 3.11, include 3.11 in Appveyor, and use 3.8 for other tests. Fixes #7584 via PR #7585 (by @mslw)
Convert
.travis.yml
to GitHub Actions workflow. Fixes #7574 via PR #7600 (by @jwodder)Cancel lengthy running workflows if a new commit is pushed. PR #7601 (by @jwodder)
1.0.2 (2024-04-19)
Tests
Relax condition in
test_force_checkdatapresent
to avoid flaky test failures. PR #7581 (by @christian-monch)
1.0.1 (2024-04-17)
Internal
The main entrypoint for annex remotes now also runs the standard extension load hook. This enables extensions to alter annex remote implementation behavior in the same way than other DataLad components. (by @mih)
1.0.0 (2024-04-06)
Breaking Changes
Merging maint to make the first major release. PR #7577 (by @yarikoptic)
Enhancements and New Features
0.19.6 (2024-02-02)
Enhancements and New Features
Add the “http_token” authentication mechanism which provides ‘Authentication: Token {TOKEN}’ header. PR #7551 (by @yarikoptic)
Internal
Update
pytest_ignore_collect()
for pytest 8.0. PR #7546 (by @jwodder)Add manual triggering support/documentation for release workflow. PR #7553 (by @yarikoptic)
0.19.5 (2023-12-28)
Tests
Fix text to account for a recent change in git-annex dropping sub-second clock precision. As a result we might not report push of git-annex branch since there would be none. PR #7544 (by @yarikoptic)
0.19.4 (2023-12-13)
Bug Fixes
Update target detection for adjusted mode datasets has been improved. Fixes #7507 via PR #7522 (by @mih)
Fix typos found by new codespell 2.2.6 and also add checking/fixing “hidden files”. PR #7530 (by @yarikoptic)
Documentation
Improve threaded-runner documentation. Fixes #7498 via PR #7500 (by @christian-monch)
Internal
Fix time_diff* and time_remove benchmarks to account for long RFed interfaces. PR #7502 (by @yarikoptic)
Tests
Cache value of the has_symlink_capability to spare some cycles. PR #7471 (by @yarikoptic)
RF(TST): use setup_method and teardown_method in TestAddArchiveOptions. PR #7488 (by @yarikoptic)
Announce test_clone_datasets_root xfail on github osx. PR #7489 (by @yarikoptic)
Inform asv that there should be no warmup runs for time_remove benchmark. PR #7505 (by @yarikoptic)
BF(TST): Relax matching of git-annex error message about unsafe drop, which was changed in 10.20231129-18-gfd0b510573. PR #7541 (by @yarikoptic)
0.19.3 (2023-08-10)
Bug Fixes
Type annotate get_status_dict and note that we can pass Exception or CapturedException which is not subclass. PR #7403 (by @yarikoptic)
BF: create-sibling-gitlab used to raise a TypeError when attempting a recursive operation in a dataset with uninstalled subdatasets. It now raises an impossible result instead. PR #7430 (by @adswa)
Pass branch option into recursive call within Install - for the cases whenever install is invoked with URL(s). Fixes #7461 via PR #7463 (by @yarikoptic)
Allow for reckless=ephemeral clone using relative path for the original location. Fixes #7469 via PR #7472 (by @yarikoptic)
Documentation
Internal
Copy an adjusted environment only if requested to do so. PR #7399 (by @christian-monch)
Eliminate uses of
pkg_resources
. Fixes #7435 via PR #7439 (by @jwodder)
Tests
Disable some S3 tests of their VCR taping where they fail for known issues. PR #7467 (by @yarikoptic)
0.19.2 (2023-07-03)
Bug Fixes
Remove surrounding quotes in output filenames even for newer version of annex. Fixes #7440 via PR #7443 (by @yarikoptic)
Documentation
DOC: clarify description of the “install” interface to reflect its convoluted behavior. PR #7445 (by @yarikoptic)
0.19.1 (2023-06-26)
Internal
Make compatible with upcoming release of git-annex (next after 10.20230407) and pass explicit core.quotepath=false to all git calls. Also added
tools/find-hanged-tests
helper. PR #7372 (by @yarikoptic)
Tests
Adjust tests for upcoming release of git-annex (next after 10.20230407) and ignore DeprecationWarning for pkg_resources for now. PR #7372 (by @yarikoptic)
0.19.0 (2023-06-14)
Enhancements and New Features
Address gitlab API special character restrictions. PR #7407 (by @jsheunis)
BF: The default layout of create-sibling-gitlab is now
collection
. The previous default,hierarchy
has been removed as it failed in –recursive mode in different edgecases. For single-level datasets, the outcome ofcollection
andhierarchy
is identical. PR #7410 (by @jsheunis and @adswa)
Bug Fixes
WTF - bring back and extend information on metadata extractors etc, and allow for sections to have subsections and be selected at both levels PR #7309 (by @yarikoptic)
BF: Run an actual git invocation with interactive commit config. PR #7398 (by @adswa)
Dependencies
Documentation
Tests
Remove nose-based testing utils and possibility to test extensions using nose. PR #7261 (by @yarikoptic)
0.18.5 (2023-06-13)
Bug Fixes
More correct summary reporting for relaxed (no size) –annex. PR #7050 (by @yarikoptic)
ENH: minor tune up of addurls to be more tolerant and “informative”. PR #7388 (by @yarikoptic)
Ensure that data generated by timeout handlers in the asynchronous runner are accessible via the result generator, even if no other other events occur. PR #7390 (by @christian-monch)
Do not map (leave as is) trailing / or in github URLs. PR #7418 (by @yarikoptic)
Documentation
Internal
Discontinue ConfigManager abuse for Git identity warning. PR #7378 (by @mih) and PR #7392 (by @yarikoptic)
Tests
Boost python to 3.8 during extensions testing. PR #7413 (by @yarikoptic)
Skip test_system_ssh_version if no ssh found + split parsing into separate test. PR #7422 (by @yarikoptic)
0.18.4 (2023-05-16)
Bug Fixes
Provider config files were ignored, when CWD changed between different datasets during runtime. Fixes #7347 via PR #7357 (by @bpoldrack)
Documentation
Internal
Tests
Fix failing testing on CI PR #7379 (by @yarikoptic)
use sample S3 url DANDI archive,
use our copy of old .deb from datasets.datalad.org instead of snapshots.d.o
use specific miniconda installer for py 3.7.
0.18.3 (2023-03-25)
Bug Fixes
Fixed that the
get
command would fail, when subdataset source-candidate-templates where using thepath
property from.gitmodules
. Also enhance the respective documentation for theget
command. Fixes #7274 via PR #7280 (by @bpoldrack)Improve up-to-dateness of config reports across manager instances. Fixes #7299 via PR #7301 (by @mih)
BF: GitRepo.merge do not allow merging unrelated unconditionally. PR #7312 (by @yarikoptic)
Do not render (empty) WTF report on other records. PR #7322 (by @yarikoptic)
Fixed a bug where changing DataLad’s log level could lead to failing git-annex calls. Fixes #7328 via PR #7329 (by @bpoldrack)
Fix an issue with uninformative error reporting by the datalad special remote. Fixes #7332 via PR #7333 (by @bpoldrack)
Fix save to not force committing into git if reference dataset is pure git (not git-annex). Fixes #7351 via PR #7355 (by @yarikoptic)
Documentation
Internal
Type-annotate almost all of
datalad/utils.py
; adddatalad/typing.py
. PR #7317 (by @jwodder)Type-annotate and fix
datalad/support/strings.py
. PR #7318 (by @jwodder)Type-annotate
datalad/support/globbedpaths.py
. PR #7327 (by @jwodder)Extend type-annotations for
datalad/support/path.py
. PR #7336 (by @jwodder)Type-annotate various things in
datalad/runner/
. PR #7337 (by @jwodder)Type-annotate some more files in
datalad/support/
. PR #7339 (by @jwodder)
Tests
Skip or xfail some currently failing or stalling tests. PR #7331 (by @yarikoptic)
Skip with_sameas_remote when rsync and annex are incompatible. Fixes #7320 via PR #7342 (by @bpoldrack)
Fix testing assumption - do create pure GitRepo superdataset and test against it. PR #7353 (by @yarikoptic)
0.18.2 (2023-02-27)
Bug Fixes
Fix
create-sibling
for non-English SSH remotes by providingLC_ALL=C
for thels
call. PR #7265 (by @nobodyinperson)Fix EnsureListOf() and EnsureTupleOf() for string inputs. PR #7267 (by @nobodyinperson)
create-sibling: Use C.UTF-8 locale instead of C on the remote end. PR #7273 (by @nobodyinperson)
Address compatibility with most recent git-annex where info would exit with non-0. PR #7292 (by @yarikoptic)
Dependencies
Revert “Revert”Remove chardet version upper limit””. PR #7263 (by @yarikoptic)
Internal
Codespell more (CHANGELOGs etc) and remove custom CLI options from tox.ini. PR #7271 (by @yarikoptic)
Tests
Use older python 3.8 in testing nose utils in github-action test-nose. Fixes #7259 via PR #7260 (by @yarikoptic)
0.18.1 (2023-01-16)
Bug Fixes
Fixes crashes on windows where DataLad was mistaking git-annex 10.20221212 for a not yet released git-annex version and trying to use a new feature. Fixes #7248 via PR #7249 (by @bpoldrack)
Documentation
Performance
Integrate buffer size optimization from datalad-next, leading to significant performance improvement for status and diff. Fixes #7190 via PR #7250 (by @bpoldrack)
0.18.0 (2022-12-31)
Breaking Changes
Move all old-style metadata commands
aggregate_metadata
,search
,metadata
andextract-metadata
, as well as thecfg_metadatatypes
procedure and the old metadata extractors into the datalad-deprecated extension. Now recommended way of handling metadata is to install the datalad-metalad extension instead. Fixes #7012 via PR #7014Automatic reconfiguration of the ORA special remote when cloning from RIA stores now only applies locally rather than being committed. PR #7235 (by @bpoldrack)
Enhancements and New Features
A repository description can be specified with a new
--description
option when creating siblings usingcreate-sibling-[gin|gitea|github|gogs]
. Fixes #6816 via PR #7109 (by @mslw)Make validation failure of alternative constraints more informative. Fixes #7092 via PR #7132 (by @bpoldrack)
Saving removed dataset content was sped-up, and reporting of types of removed content now accurately states
dataset
for added and removed subdatasets, instead offile
. Moreover, saving previously staged deletions is now also reported. PR #6784 (by @mih)foreach-dataset
command got a new possible value for the –output-streamns|–o-s option ‘relpath’ to capture and pass-through prefixing with path to subds. Very handy for e.g. runninggit grep
command across subdatasets. PR #7071 (by @yarikoptic)New config
datalad.create-sibling-ghlike.extra-remote-settings.NETLOC.KEY=VALUE
allows to add and/or overwrite local configuration for the created sibling by the commandscreate-sibling-<gin|gitea|github|gitlab|gogs>
. PR #7213 (by @matrss)The
siblings
command does not concern the user with messages about inconsequential failure to annex-enable a remote anymore. PR #7217 (by @bpoldrack)ORA special remote now allows to override its configuration locally. PR #7235 (by @bpoldrack)
Added a ‘ria’ special remote to provide backwards compatibility with datasets that were set up with the deprecated ria-remote. PR #7235 (by @bpoldrack)
Bug Fixes
Documentation
create-sibling-ria’s docstring now defines the schema of RIA URLs and clarifies internal layout of a RIA store. PR #6861 (by @adswa)
Move maintenance team info from issue to CONTRIBUTING. PR #6904 (by @adswa)
Describe specifications for a DataLad GitHub Action. PR #6931 (by @thewtex)
Fix capitalization of some service names. PR #6936 (by @aqw)
Command categories in help text are more consistently named. PR #7027 (by @aqw)
DOC: Add design document on Tests and CI. PR #7195 (by @adswa)
CONTRIBUTING.md was extended with up-to-date information on CI logging, changelog and release procedures. PR #7204 (by @yarikoptic)
Internal
Allow EnsureDataset constraint to handle Path instances. Fixes #7069 via PR #7133 (by @bpoldrack)
Use
looseversion.LooseVersion
as drop-in replacement fordistutils.version.LooseVersion
Fixes #6307 via PR #6839 (by @effigies)Use –pathspec-from-file where possible instead of passing long lists of paths to git/git-annex calls. Fixes #6922 via PR #6932 (by @yarikoptic)
Make clone_dataset() better patchable ny extensions and less monolithic. PR #7017 (by @mih)
Remove
simplejson
in favor of usingjson
. Fixes #7034 via PR #7035 (by @christian-monch)Fix an error in the command group names-test. PR #7044 (by @christian-monch)
Move eval_results() into interface.base to simplify imports for command implementations. Deprecate use from interface.utils accordingly. Fixes #6694 via PR #7170 (by @adswa)
Performance
Use regular dicts instead of OrderedDicts for speedier operations. Fixes #6566 via PR #7174 (by @adswa)
Reimplement
get_submodules_()
withoutget_content_info()
for substantial performance boosts especially for large datasets with few subdatasets. Originally proposed in PR #6942 by @mih, fixing #6940. PR #7189 (by @adswa). Complemented with PR #7220 (by @yarikoptic) to avoidO(N^2)
(instead ofO(N*log(N))
performance in some cases.Use –include=* or –anything instead of –copies 0 to speed up get_content_annexinfo. PR #7230 (by @yarikoptic)
Tests
0.17.10 (2022-12-14)
Enhancements and New Features
Enhance concurrent invocation behavior of
ThreadedRunner.run()
. If possible invocations are serialized instead of raising re-enter runtime errors. Deadlock situations are detected and runtime errors are raised instead of deadlocking. Fixes #7138 via PR #7201 (by @christian-monch)Exceptions bubbling up through CLI are now reported on including their chain of cause. Fixes #7163 via PR #7210 (by @bpoldrack)
Bug Fixes
BF: read RIA config from stdin instead of temporary file. Fixes #6514 via PR #7147 (by @adswa)
Prevent doomed annex calls on files we already know are untracked. Fixes #7032 via PR #7166 (by @adswa)
Comply to Posix-like clone URL formats on Windows. Fixes #7180 via PR #7181 (by @adswa)
Ensure that paths used in the datalad-url field of .gitmodules are posix. Fixes #7182 via PR #7183 (by @adswa)
Bandaids for export-to-figshare to restore functionality. PR #7188 (by @adswa)
Fixes hanging threads when
close()
ordel
where called inBatchedCommand
instances. That could lead to hanging tests if the tests used the@serve_path_via_http()
-decorator Fixes #6804 via PR #7201 (by @christian-monch)Interpret file-URL path components according to the local operating system as described in RFC 8089. With this fix,
datalad.network.RI('file:...').localpath
returns a correct local path on Windows if the RI is constructed with a file-URL. Fixes #7186 via PR #7206 (by @christian-monch)Fix a bug when retrieving several files from a RIA store via SSH, when the annex key does not contain size information. Fixes #7214 via PR #7215 (by @mslw)
Interface-specific (python vs CLI) doc generation for commands and their parameters was broken when brackets were used within the interface markups. Fixes #7225 via PR #7226 (by @bpoldrack)
Documentation
Fix documentation of
Runner.run()
to not accept strings. Instead, encoding must be ensured by the caller. Fixes #7145 via PR #7155 (by @bpoldrack)
Internal
Fix import of the
ls
command from datalad-deprecated for benchmarks. Fixes #7149 via PR #7154 (by @bpoldrack)Unify definition of parameter choices with
datalad clean
. Fixes #7026 via PR #7161 (by @bpoldrack)
Tests
Fix test failure with old annex. Fixes #7157 via PR #7159 (by @bpoldrack)
Re-enable now passing test_path_diff test on Windows. Fixes #3725 via PR #7194 (by @yarikoptic)
Use Plaintext keyring backend in tests to avoid the need for (interactive) authentication to unlock the keyring during (CI-) test runs. Fixes #6623 via PR #7209 (by @bpoldrack)
0.17.9 (2022-11-07)
Bug Fixes
Various small fixups ran after looking post-release and trying to build Debian package. PR #7112 (by @yarikoptic)
BF: Fix add-archive-contents try-finally statement by defining variable earlier. PR #7117 (by @adswa)
Fix RIA file URL reporting in exception handling. PR #7123 (by @adswa)
HTTP download treated ‘429 - too many requests’ as an authentication issue and was consequently trying to obtain credentials. Fixes #7129 via PR #7129 (by @bpoldrack)
Dependencies
Unrestrict pytest and pytest-cov versions. PR #7125 (by @jwodder)
Remove remaining references to
nose
and the implied requirement for building the documentation Fixes #7100 via PR #7136 (by @bpoldrack)
Internal
Use datalad/release-action. Fixes #7110. PR #7111 (by @jwodder)
Fix all logging to use %-interpolation and not .format, sort imports in touched files, add pylint-ing for % formatting in log messages to
tox -e lint
. PR #7118 (by @yarikoptic)
Tests
Increase the upper time limit after which we assume that a process is stalling. That should reduce false positives from
datalad.support.tests.test_parallel.py::test_stalling
, without impacting the runtime of passing tests. PR #7119 (by @christian-monch)XFAIL a check on length of results in test_gracefull_death. PR #7126 (by @yarikoptic)
Configure Git to allow for “file” protocol in tests. PR #7130 (by @yarikoptic)
0.17.8 (2022-10-24)
Bug Fixes
Prevent adding duplicate entries to .gitmodules. PR #7088 (by @yarikoptic)
[BF] Prevent double yielding of impossible get result Fixes #5537. PR #7093 (by @jsheunis)
Stop rendering the output of internal
subdatset()
call in the results ofrun_procedure()
. Fixes #7091 via PR #7094 (by @mslw & @mih)Improve handling of
--existing reconfigure
increate-sibling-ria
: previously, the command would not make the underlyinggit init
call for existing local repositories, leading to some configuration updates not being applied. Partially addresses https://github.com/datalad/datalad/issues/6967 via https://github.com/datalad/datalad/pull/7095 (by @mslw)Ensure subprocess environments have a valid path in
os.environ['PWD']
, even if a Path-like object was given to the runner on subprocess creation or invocation. Fixes #7040 via PR #7107 (by @christian-monch)Improved reporting when using
dry-run
with github-likecreate-sibling*
commands (-gin
,-gitea
,-github
,-gogs
). The result messages will now display names of the repositories which would be created (useful for recursive operations). PR #7103 (by @mslw)
0.17.7 (2022-10-14)
Bug Fixes
Let
EnsureChoice
report the value is failed validating. PR #7067 (by @mih)Avoid writing to stdout/stderr from within datalad sshrun. This could lead to broken pipe errors when cloning via SSH and was superfluous to begin with. Fixes https://github.com/datalad/datalad/issues/6599 via https://github.com/datalad/datalad/pull/7072 (by @bpoldrack)
BF: lock across threads check/instantiation of Flyweight instances. Fixes #6598 via PR #7075 (by @yarikoptic)
Internal
Do not use
gen4
-metadata methods indatalad metadata
-command. PR #7001 (by @christian-monch)Revert “Remove chardet version upper limit” (introduced in 0.17.6~11^2) to bring back upper limit <= 5.0.0 on chardet. Otherwise we can get some deprecation warnings from requests PR #7057 (by @yarikoptic)
Ensure that
BatchedCommandError
is raised if the subprocesses ofBatchedCommand
fails or raises aCommandError
. PR #7068 (by @christian-monch)RF: remove unused code str-ing PurePath. PR #7073 (by @yarikoptic)
Update GitHub Actions action versions. PR #7082 (by @jwodder)
Tests
Fix broken test helpers for result record testing that would falsely pass. PR #7002 (by @bpoldrack)
0.17.6 (2022-09-21)
Bug Fixes
UX: push - provide specific error with details if push failed due to permission issue. PR #7011 (by @yarikoptic)
Fix datalad –help to not have Global options empty with python 3.10 and list options in “options:” section. PR #7028 (by @yarikoptic)
Let
create
touch the dataset root, if not saving in parent dataset. PR #7036 (by @mih)Let
get_status_dict()
use exception message if none is passed. PR #7037 (by @mih)Make choices for
status|diff --annex
andstatus|diff --untracked
visible. PR #7039 (by @mih)push: Assume 0 bytes pushed if git-annex does not provide bytesize. PR #7049 (by @yarikoptic)
Internal
Tests
Allow for any 2 from first 3 to be consumed in test_gracefull_death. PR #7041 (by @yarikoptic)
0.17.5 (Fri Sep 02 2022)
Bug Fix
BF: blacklist 23.9.0 of keyring as introduces regression #7003 (@yarikoptic)
Make the manpages build reproducible via datalad.source.epoch (to be used in Debian packaging) #6997 (@lamby bot@datalad.org @yarikoptic)
BF: backquote path/drive in Changelog #6997 (@yarikoptic)
0.17.4 (Tue Aug 30 2022)
Bug Fix
BF: make logic more consistent for files=[] argument (which is False but not None) #6976 (@yarikoptic)
Run pytests in parallel (-n 2) on appveyor #6987 (@yarikoptic)
Add workflow for autogenerating changelog snippets #6981 (@jwodder)
Provide
/dev/null
(b:\nul
on Windows) instead of empty string as a git-repo to avoid reading local repo configuration #6986 (@yarikoptic)RF: call_from_parser - move code into “else” to simplify reading etc #6982 (@yarikoptic)
BF: if early attempt to parse resulted in error, setup subparsers #6980 (@yarikoptic)
Run pytests in parallel (-n 2) on Travis #6915 (@yarikoptic)
Send one character (no newline) to stdout in protocol test to guarantee a single “message” and thus a single custom value #6978 (@christian-monch)
Tests
TST: test_stalling – wait x10 not just x5 time #6995 (@yarikoptic)
0.17.3 (Tue Aug 23 2022)
Bug Fix
BF: git_ignore_check do not overload possible value of stdout/err if present #6937 (@yarikoptic)
DOCfix: fix docstring GeneratorStdOutErrCapture to say that treats both stdout and stderr identically #6930 (@yarikoptic)
Explain purpose of create-sibling-ria’s –post-update-hook #6958 (@mih)
ENH+BF: get_parent_paths - make / into sep option and consistently use “/” as path separator #6963 (@yarikoptic)
BF(TEMP): use git-annex from neurodebian -devel to gain fix for bug detected with datalad-crawler #6965 (@yarikoptic)
BF(TST): make tests use path helper for Windows “friendliness” of the tests #6955 (@yarikoptic)
BF(TST): prevent auto-upgrade of “remote” test sibling, do not use local path for URL #6957 (@yarikoptic)
Forbid drop operation from symlink’ed annex (e.g. due to being cloned with –reckless=ephemeral) to prevent data-loss #6959 (@mih)
Acknowledge git-config comment chars #6944 (@mih @yarikoptic)
Minor tuneups to please updated codespell #6956 (@yarikoptic)
BF+ENH(TST): fix typo in code of wtf filesystems reports #6920 (@yarikoptic)
BF: fix typo which prevented silently to not show details of filesystems #6930 (@yarikoptic)
BF(TST): allow for a annex repo version to upgrade if running in adjusted branches #6927 (@yarikoptic)
RF extensions github action to centralize configuration for extensions etc, use pytest for crawler #6914 (@yarikoptic)
BF: travis - mark our directory as safe to interact with as root #6919 (@yarikoptic)
BF: do not pretend we know what repo version git-annex would upgrade to #6902 (@yarikoptic)
BF(TST): do not expect log message for guessing Path to be possibly a URL on windows #6911 (@yarikoptic)
ENH(TST): Disable coverage reporting on travis while running pytest #6898 (@yarikoptic)
RF: just rename internal variable from unclear “op” to “io” #6907 (@yarikoptic)
DX: Demote loglevel of message on url parameters to DEBUG while guessing RI #6891 (@adswa @yarikoptic)
Fix and expand datalad.runner type annotations #6893 (@christian-monch @yarikoptic)
Use pytest to test datalad-metalad in test_extensions-workflow #6892 (@christian-monch)
Let push honor multiple publication dependencies declared via siblings #6869 (@mih @yarikoptic)
ENH: upgrade versioneer from versioneer-0.20.dev0 to versioneer-0.23.dev0 #6888 (@yarikoptic)
ENH: introduce typing checking and GitHub workflow #6885 (@yarikoptic)
RF,ENH(TST): future proof testing of git annex version upgrade + test annex init on all supported versions #6880 (@yarikoptic)
ENH(TST): test against supported git annex repo version 10 + make it a full sweep over tests #6881 (@yarikoptic)
BF: RF f-string uses in logger to %-interpolations #6886 (@yarikoptic)
Merge branch ‘bf-sphinx-5.1.0’ into maint #6883 (@yarikoptic)
BF(DOC): workaround for #10701 of sphinx in 5.1.0 #6883 (@yarikoptic)
Clarify confusing INFO log message from get() on dataset installation #6871 (@mih)
Protect again failing to load a command interface from an extension #6879 (@mih)
Support unsetting config via
datalad -c :<name>
#6864 (@mih)Fix DOC string typo in the path within AnnexRepo.annexstatus, and replace with proper sphinx reference #6858 (@christian-monch)
Pushed to maint
Tests
BF(TST,workaround): just xfail failing archives test on NFS #6912 (@yarikoptic)
0.17.2 (Sat Jul 16 2022)
Bug Fix
BF(TST): do proceed to proper test for error being caught for recent git-annex on windows with symlinks #6850 (@yarikoptic)
Addressing problem testing against python 3.10 on Travis (skip more annex versions) #6842 (@yarikoptic)
XFAIL test_runner_parametrized_protocol on python3.8 when getting duplicate output #6837 (@yarikoptic)
BF: Make create’s check for procedures work with several again #6841 (@adswa)
0.17.1 (Mon Jul 11 2022)
Bug Fix
DOC: minor fix - consistent DataLad (not Datalad) in docs and CHANGELOG #6830 (@yarikoptic)
DOC: fixup/harmonize Changelog for 0.17.0 a little #6828 (@yarikoptic)
BF: use –python-match minor option in new datalad-installer release to match outside version of Python #6827 (@christian-monch @yarikoptic)
Do not quote paths for ssh >= 9 #6826 (@christian-monch @yarikoptic)
Suppress DeprecationWarning to allow for distutils to be used #6819 (@yarikoptic)
RM(TST): remove testing of datalad.test which was removed from 0.17.0 #6822 (@yarikoptic)
Avoid import of nose-based tests.utils, make skip_if_no_module() and skip_if_no_network() allowed at module level #6817 (@jwodder)
BF(TST): use higher level asyncio.run instead of asyncio.get_event_loop in test_inside_async #6808 (@yarikoptic)
0.17.0 (Thu Jul 7 2022) – pytest migration
Enhancements and new features
“log” progress bar now reports about starting a specific action as well. #6756 (by @yarikoptic)
Documentation and behavior of traceback reporting for log messages via
DATALAD_LOG_TRACEBACK
was improved to yield a more compact report. The documentation for this feature has been clarified. #6746 (by @mih)datalad unlock
gained a progress bar. #6704 (by @adswa)When
create-sibling-gitlab
is called on non-existing subdatasets or paths it now returns an impossible result instead of no feedback at all. #6701 (by @adswa)datalad wtf
includes a report on file system types of commonly used paths. #6664 (by @adswa)Use next generation metadata code in search, if it is available. #6518 (by @christian-monch)
Deprecations and removals
Remove unused and untested log helpers
NoProgressLog
andOnlyProgressLog
. #6747 (by @mih)Remove unused
sorted_files()
helper. #6722 (by @adswa)Discontinued the value
stdout
for use with the config variabledatalad.log.target
as its use would inevitably break special remote implementations. #6675 (by @bpoldrack)AnnexRepo.add_urls()
is deprecated in favor ofAnnexRepo.add_url_to_file()
or a direct call toAnnexRepo.call_annex()
. #6667 (by @mih)datalad test
command and supporting functionality (e.g.,datalad.test
) were removed. #6273 (by @jwodder)
Bug Fixes
export-archive
does not rely onnormalize_path()
methods anymore and became more robust when called from subdirectories. #6745 (by @adswa)Sanitize keys before checking content availability to ensure that the content availability of files with URL- or custom backend keys is correctly determined and marked. #6663 (by @adswa)
Ensure saving a new subdataset to a superdataset yields a valid
.gitmodules
record regardless of whether and how a path constraint is given to thesave()
call. Fixes #6547 #6790 (by @mih)save
now repairs annex symlinks broken by agit-mv
operation prior recording a new dataset state. Fixes #4967 #6795 (by @mih)
Documentation
Internal
Inline code of
create-sibling-ria
has been refactored to an internal helper to check for siblings with particular names across dataset hierarchies indatalad-next
, and is reintroduced into core to modularize the code base further. #6706 (by @adswa)get_initialized_logger
now lets a givenlogtarget
take precedence overdatalad.log.target
. #6675 (by @bpoldrack)Many uses of deprecated call options were replaced with the recommended ones. #6273 (by @jwodder)
Get rid of
asyncio
import by defining few noops methods fromasyncio.protocols.SubprocessProtocol
directly inWitlessProtocol
. #6648 (by @yarikoptic)Consolidate
GitRepo.remove()
andAnnexRepo.remove()
into a single implementation. #6783 (by @mih) ## TestsDiscontinue use of
with_testrepos
decorator other than for the deprecation cycle fornose
. #6690 (by @mih @bpoldrack) See #6144 for full list of changes.Remove usage of deprecated
AnnexRepo.add_urls
in tests. #6683 (by @bpoldrack)Minimalistic (adapters, no assert changes, etc) migration from
nose
topytest
. Support functionality possibly used by extensions and relying onnose
helpers is left in place to avoid affecting their run time and defer migration of their test setups.. #6273 (by @jwodder)
0.16.7 (Wed Jul 06 2022)
Bug Fix
Fix broken annex symlink after git-mv before saving + fix a race condition in ssh copy test #6809 (@christian-monch @mih @yarikoptic)
Do not ignore already known status info on submodules #6790 (@mih)
Fix “common data source” test to use a valid URL (maint-based & extended edition) #6788 (@mih @yarikoptic)
Upload coverage from extension tests to Codecov #6781 (@jwodder)
Clean up line end handling in GitRepo #6768 (@christian-monch)
Do not skip file-URL tests on windows #6772 (@christian-monch)
Fix test errors caused by updated chardet v5 release #6777 (@christian-monch)
Preserve final trailing slash in
call_git()
output #6754 (@adswa @yarikoptic @christian-monch)
Pushed to maint
Make sure a subdataset is saved with a complete .gitmodules record (@mih)
0.16.6 (Tue Jun 14 2022)
Bug Fix
Prevent duplicated result rendering when searching in default datasets #6765 (@christian-monch)
BF(workaround): skip test_ria_postclonecfg on OSX for now (@yarikoptic)
BF(workaround to #6759): if saving credential failed, just log error and continue #6762 (@yarikoptic)
Prevent reentry of a runner instance #6737 (@christian-monch)
0.16.5 (Wed Jun 08 2022)
Bug Fix
BF: push to github - remove datalad-push-default-first config only in non-dry run to ensure we push default branch separately in next step #6750 (@yarikoptic)
In addition to default (system) ssh version, report configured ssh; fix ssh version parsing on Windows #6729 (@yarikoptic)
0.16.4 (Thu Jun 02 2022)
Bug Fix
BF(TST): RO operations - add test directory into git safe.directory #6726 (@yarikoptic)
DOC: fixup of docstring for skip_ssh #6727 (@yarikoptic)
BF: Catch KeyErrors from unavailable WTF infos #6712 (@adswa)
Add annex.private to ephemeral clones. That would make git-annex not assign shared (in git-annex branch) annex uuid. #6702 (@bpoldrack @adswa)
BF: require argcomplete version at least 1.12.3 to test/operate correctly #6693 (@yarikoptic)
0.16.3 (Thu May 12 2022)
Bug Fix
No change for a PR to trigger release #6692 (@yarikoptic)
Sanitize keys before checking content availability to ensure correct value for keys with URL or custom backend #6665 (@adswa @yarikoptic)
Fix
GitRepo.get_branch_commits_()
to handle branch names conflicts with paths #6661 (@mih)OPT: AnnexJsonProtocol - avoid dragging possibly long data around #6660 (@yarikoptic)
Remove two too prominent create() INFO log message that duplicate DEBUG log and harmonize some other log messages #6638 (@mih @yarikoptic)
Remove unsupported parameter create_sibling_ria(existing=None) #6637 (@mih)
Add released plugin to .autorc to annotate PRs on when released #6639 (@yarikoptic)
0.16.2 (Thu Apr 21 2022)
Bug Fix
Demote (to level 1 from DEBUG) and speed-up API doc logging (parseParameters) #6635 (@mih)
Factor out actual data transfer in push #6618 (@christian-monch)
ENH: include version of datalad in tests teardown Versions: report #6628 (@yarikoptic)
MNT: Require importlib-metadata >=3.6 for Python < 3.10 for entry_points taking kwargs #6631 (@effigies)
Factor out credential handling of create-sibling-ghlike #6627 (@mih)
BF: Fix wrong key name of annex’ JSON records #6624 (@bpoldrack)
Pushed to maint
Fix typo in changelog (@mih)
[ci skip] minor typo fix (@yarikoptic)
0.16.1 (Fr Apr 8 2022) – April Fools’ Release
Fixes forgotten changelog in docs
0.16.0 (Fr Apr 8 2022) – Spring cleaning!
Enhancements and new features
A new set of
create-sibling-*
commands reimplements the GitHub-platform support ofcreate-sibling-github
and adds support to interface three new platforms in a unified fashion: GIN (create-sibling-gin
), GOGS (create-sibling-gogs
), and Gitea (create-sibling-gitea
). All commands rely on personal access tokens only for authentication, allow for specifying one of several stored credentials via a uniform--credential
parameter, and support a uniform--dry-run
mode for testing without network. #5949 (by @mih)create-sibling-github
now has supports direct specification of organization repositories via a[<org>/]repo
syntax #5949 (by @mih)create-sibling-gitlab
gained a--dry-run
parameter to match the corresponding parameters increate-sibling-{github,gin,gogs,gitea}
#6013 (by @adswa)The
--new-store-ok
parameter ofcreate-sibling-ria
only creates new RIA stores when explicitly provided #6045 (by @adswa)The default performance of
status()
anddiff()
commands is improved by up to 700% removing file-type evaluation as a default operation, and simplifying the type reporting rule #6097 (by @mih)drop()
andremove()
were reimplemented in full, conceptualized as the antagonist commands toget()
andclone()
. A new, harmonized set of parameters (--what ['filecontent', 'allkeys', 'datasets', 'all']
,--reckless ['modification', 'availability', 'undead', 'kill']
) simplifies their API. Both commands include additional safeguards.uninstall
is replaced with a thin shim command arounddrop()
#6111 (by @mih)add_archive_content()
was refactored into a dataset method and gained progress bars #6105 (by @adswa)The
datalad
anddatalad-archives
special remotes have been reimplemented based onAnnexRemote
#6165 (by @mih)The
result_renderer()
semantics were decomplexified and harmonized. The previousdefault
result renderer was renamed togeneric
. #6174 (by @mih)get_status_dict
learned to include exit codes in the case of CommandErrors #5642 (by @yarikoptic)datalad clone
can now pass options togit-clone
, adding support for cloning specific tags or branches, naming siblings other names thanorigin
, and exposinggit clone
’s optimization arguments #6218 (by @kyleam and @mih)Inactive BatchedCommands are cleaned up #6206 (by @jwodder)
export-archive-ora
learned to filter files exported to 7z archives #6234 (by @mih and @bpinsard)datalad run
learned to glob recursively #6262 (by @AKSoo)The ORA remote learned to recover from interrupted uploads #6267 (by @mih)
A new threaded runner with support for timeouts and generator-based subprocess communication is introduced and used in
BatchedCommand
andAnnexRepo
#6244 (by @christian-monch)A new switch allows to enable librarymode and queries for the effective API in use #6213 (by @mih)
run
andrerun
now support parallel jobs via--jobs
#6279 (by @AKSoo)A new
foreach-dataset
plumbing command allows to run commands on each (sub)dataset, similar togit submodule foreach
#5517 (by @yarikoptic)The
dataset
parameter is not restricted to only locally resolvable file-URLs anymore #6276 (by @christian-monch)DataLad’s credential system is now able to query
git-credential
by specifying credential typegit
in the respective provider configuration #5796 (by @bpoldrack)DataLad now comes with a git credential helper
git-credential-datalad
allowing Git to query DataLad’s credential system #5796 (by @bpoldrack and @mih)The new runner now allows for multiple threads #6371 (by @christian-monch)
A new configurationcommand provides an interface to manipulate and query the DataLad configuration. #6306 (by @mih)
Unlike the global Python-only datalad.cfg or dataset-specific Dataset.config configuration managers, this command offers a uniform API across the Python and the command line interfaces.
This command was previously available in the mihextras extension as x-configuration, and has been merged into the core package in an improved version. #5489 (by @mih)
In its default dump mode, the command provides an annotated list of the effective configuration after considering all configuration sources, including hints on additional configuration settings and their supported values.
The command line interface help-reporting has been sped up by ~20% #6370 #6378 (by @mih)
ConfigManager
now supports reading committed dataset configuration in bare repositories. Analog to reading.datalad/config
from a worktree,blob:HEAD:.datalad/config
is read (e.g., the config committed in the default branch). The support includes `reload()
change detection using the gitsha of this file. The behavior for non-bare repositories is unchanged. #6332 (by @mih)The CLI help generation has been sped up, and now also supports the completion of parameter values for a fixed set of choices #6415 (by @mih)
Individual command implementations can now declare a specific “on-failure” behavior by defining
Interface.on_failure
to be one of the supported modes (stop, continue, ignore). Previously, such a modification was only possible on a per-call basis. #6430 (by @mih)The
run
command changed its default “on-failure” behavior fromcontinue
tostop
. This change prevents the execution of a command in case a declared input can not be obtained. Previously, only an error result was yielded (and run eventually yielded a non-zero exit code or anIncompleteResultsException
), but the execution proceeded and potentially saved a dataset modification despite incomplete inputs, in case the command succeeded. This previous default behavior can still be achieved by calling run with the equivalent of--on-failure continue
#6430 (by @mih)The `
run
command now provides readily executable, API-specific instructions how to save the results of a command execution that failed expectedly #6434 (by @mih)create-sibling --since=^
mode will now be as fast aspush --since=^
to figure out for which subdatasets to create siblings #6436 (by @yarikoptic)When file names contain illegal characters or reserved file names that are incompatible with Windows systems a configurable check for
save
(datalad.save.windows-compat-warning
) will either do nothing (none
), emit an incompatibility warning (warning
, default), or causesave
to error (error
) #6291 (by @adswa)Improve responsiveness of
datalad drop
in datasets with a large annex. #6580 (by @christian-monch)save
code might operate faster on heavy file trees #6581 (by @yarikoptic)Removed a per-file overhead cost for ORA when downloading over HTTP #6609 (by @bpoldrack)
A new module
datalad.support.extensions
offers the utility functionsregister_config()
andhas_config()
that allow extension developers to announce additional configuration items to the central configuration management. #6601 (by @mih)When operating in a dirty dataset,
export-to-figshare
now yields and impossible result instead of raising a RunTimeError #6543 (by @adswa)Loading DataLad extension packages has been sped-up leading to between 2x and 4x faster run times for loading individual extensions and reporting help output across all installed extensions. #6591 (by @mih)
Introduces the configuration key
datalad.ssh.executable
. This key allows specifying an ssh-client executable that should be used by datalad to establish ssh-connections. The default value isssh
unless on a Windows system where$WINDIR\System32\OpenSSH\ssh.exe
exists. In this case, the value defaults to$WINDIR\System32\OpenSSH\ssh.exe
. #6553 (by @christian-monch)create-sibling should perform much faster in case of
--since
specification since would consider only submodules related to the changes since that point. #6528 (by @yarikoptic)A new configuration setting
datalad.ssh.try-use-annex-bundled-git=yes|no
can be used to influence the default remote git-annex bundle sensing for SSH connections. This was previously done unconditionally for any call todatalad sshrun
(which is also used for any SSH-related Git or git-annex functionality triggered by DataLad-internal processing) and could incur a substantial per-call runtime cost. The new default is to not perform this sensing, because for, e.g., use as GIT_SSH_COMMAND there is no expectation to have a remote git-annex installation, and even with an existing git-annex/Git bundle on the remote, it is not certain that the bundled Git version is to be preferred over any other Git installation in a user’s PATH. #6533 (by @mih)run
now yields a result record immediately after executing a command. This allows callers to use the standard--on-failure switch
to control whether dataset modifications will be saved for a command that exited with an error. #6447 (by @mih)
Deprecations and removals
The
--pbs-runner
commandline option (deprecated in0.15.0
) was removed #5981 (by @mih)The dependency to PyGithub was dropped #5949 (by @mih)
create-sibling-github
’s credential handling was trimmed down to only allow personal access tokens, because GitHub discontinued user/password based authentication #5949 (by @mih)create-sibling-gitlab
’s--dryrun
parameter is deprecated in favor or--dry-run
#6013 (by @adswa)Internal obsolete
Gitrepo.*_submodule
methods were moved todatalad-deprecated
#6010 (by @mih)datalad/support/versions.py
is unused in DataLad core and removed #6115 (by @yarikoptic)Support for the undocumented
datalad.api.result-renderer
config setting has been dropped #6174 (by @mih)Undocumented use of
result_renderer=None
is replaced withresult_renderer='disabled'
#6174 (by @mih)remove
’s--recursive
argument has been deprecated #6257 (by @mih)The use of the internal helper
get_repo_instance()
is discontinued and deprecated #6268 (by @mih)Support for Python 3.6 has been dropped (#6286 (by @christian-monch) and #6364 (by @yarikoptic))
All but one Singularity recipe flavor have been removed due to their limited value with the end of life of Singularity Hub #6303 (by @mih)
All code in module datalad.cmdline was (re)moved, only datalad.cmdline.helpers.get_repo_instanceis kept for a deprecation period (by @mih)
datalad.interface.common_opts.eval_default
has been deprecated. All (command-specific) defaults for common interface parameters can be read fromInterface
class attributes (#6391 (by @mih)Remove unused and untested
datalad.interface.utils
helperscls2cmdlinename
andpath_is_under
#6392 (by @mih)An unused code path for result rendering was removed from the CLI
main()
#6394 (by @mih)create-sibling
will require now"^"
instead of an empty string for since option #6436 (by @yarikoptic)run
no longer raises aCommandError
exception for failed commands, but yields anerror
result that includes a superset of the information provided by the exception. This change impacts command line usage insofar as the exit code of the underlying command is no longer relayed as the exit code of therun
command call – althoughrun
continues to exit with a non-zero exit code in case of an error. For Python API users, the nature of the raised exception changes fromCommandError
toIncompleteResultsError
, and the exception handling is now configurable using the standardon_failure
command argument. The originalCommandError
exception remains available via theexception
property of the newly introduced result record for the command execution, and this result record is available viaIncompleteResultsError.failed
, if such an exception is raised. #6447 (by @mih)Custom cast helpers were removed from datalad core and migrated to a standalone repository https://github.com/datalad/screencaster #6516 (by @adswa)
The
bundled
parameter ofget_connection_hash()
is now ignored and will be removed with a future release. #6532 (by @mih)BaseDownloader.fetch()
is logging download attempts on DEBUG (previously INFO) level to avoid polluting output of higher-level commands. #6564 (by @mih)
Bug Fixes
create-sibling-gitlab
erroneously overwrote existing sibling configurations. A safeguard will now prevent overwriting and exit with an error result #6015 (by @adswa)create-sibling-gogs
now relays HTTP500 errors, such as “no space left on device” #6019 (by @mih)annotate_paths()
is removed from the last parts of code base that still contained it #6128 (by @mih)add_archive_content()
doesn’t crash with--key
and--use-current-dir
anymore #6105 (by @adswa)run-procedure
now returns an error result when a non-existent procedure name is specified #6143 (by @mslw)A fix for a silent failure of
download-url --archive
when extracting the archive #6172 (by @adswa)Uninitialized AnnexRepos can now be dropped #6183 (by @mih)
Instead of raising an error, the formatters tests are skipped when the
formatters
module is not found #6212 (by @adswa)create-sibling-gin
does not disable git-annex availability on Gin remotes anymore #6230 (by @mih)The ORA special remote messaging is fixed to not break the special remote protocol anymore and to better relay messages from exceptions to communicate underlying causes #6242 (by @mih)
A
keyring.delete()
call was fixed to not call an uninitialized private attribute anymore #6253 (by @bpoldrack)An erroneous placement of result keyword arguments into a
format()
method instead ofget_status_dict()
ofcreate-sibling-ria
has been fixed #6256 (by @adswa)status
,run-procedure
, andmetadata
are no longer swallowing result-related messages in renderers #6280 (by @mih)uninstall
now recommends the new--reckless
parameter instead of the deprecated--nocheck
parameter when reporting hints #6277 (by @adswa)download-url
learned to handle Pathobjects #6317 (by @adswa)Restore default result rendering behavior broken by Key interface documentation #6394 (by @mih)
Fix a broken check for file presence in the
ConfigManager
that could have caused a crash in rare cases when a config file is removed during the process runtime #6332 (by @mih) `-ConfigManager.get_from_source()
now accesses the correct information when using the documentedsource='local'
, avoiding a crash #6332 (by @mih)run
no longer let’s the internal call tosave
render its results unconditionally, but the parameterization f run determines the effective rendering format. #6421 (by @mih)Remove an unnecessary and misleading warning from the runner #6425 (by @christian-monch)
A number of commands stopped to double-report results #6446 (by @adswa)
create-sibling-ria
no longer creates anannex/objects
directory in-store, when called with--no-storage-sibling
. #6495 (by @bpoldrack )Improve error message when an invalid URL is given to
clone
. #6500 (by @mih)DataLad declares a minimum version dependency to
keyring >= 20.0
to ensure that token-based authentication can be used. #6515 (by @adswa)ORA special remote tries to obtain permissions when dropping a key from a RIA store rather than just failing. Thus having the same permissions in the store’s object trees as one directly managed by git-annex would have, works just fine now. #6493 (by @bpoldrack )
require_dataset()
now uniformly raisesNoDatasetFound
when no dataset was found. Implementations that catch the previously documentedInsufficientArgumentsError
or the actually raisedValueError
will continue to work, becauseNoDatasetFound
is derived from both types. #6521 (by @mih)Keyboard-interactive authentication is now possibly with non-multiplexed SSH connections (i.e., when no connection sharing is possible, due to lack of socket support, for example on Windows). Previously, it was disabled forcefully by DataLad for no valid reason. #6537 (by @mih)
Remove duplicate exception type in reporting of top-level CLI exception handler. #6563 (by @mih)
Fixes DataLad’s parsing of git-annex’ reporting on unknown paths depending on its version and the value of the
annex.skipunknown
config. #6550 (by @bpoldrack)Fix ORA special remote not properly reporting on HTTP failures. #6535 (by @bpoldrack)
ORA special remote didn’t show per-file progress bars when downloading over HTTP #6609 (by @bpoldrack)
save
now can commit the change where file becomes a directory with a staged for commit file. #6581 (by @yarikoptic)create-sibling
will no longer create siblings for not yet saved new subdatasets, and will now create sub-datasets nested in the subdatasets which did not yet have those siblings. #6603 (by @yarikoptic)
Documentation
A new design document sheds light on result records #6167 (by @mih)
The
disabled
result renderer mode is documented #6174 (by @mih)A new design document sheds light on the
datalad
anddatalad-archives
special remotes #6181 (by @mih)A new design document sheds light on
BatchedCommand
andBatchedAnnex
#6203 (by @christian-monch)A new design document sheds light on standard parameters #6214 (by @adswa)
The DataLad project adopted the Contributor Covenant COC v2.1 #6236 (by @adswa)
Docstrings learned to include Sphinx’ “version added” and “deprecated” directives #6249 (by @mih)
A design document sheds light on basic docstring handling and formatting #6249 (by @mih)
A new design document sheds light on position versus keyword parameter usage #6261 (by @yarikoptic)
create-sibling-gin
’s examples have been improved to suggestpush
as an additional step to ensure proper configuration #6289 (by @mslw)A new document describes the credential system from a user’s perspective #5796 (by @bpoldrack)
Enhance the design document on DataLad’s credential system #5796 (by @bpoldrack)
The documentation of the configuration command now details all locations DataLad is reading configuration items from, and their respective rules of precedence #6306 (by @mih)
API docs for datalad.interface.base are now included in the documentation #6378 (by @mih)
A new design document is provided that describes the basics of the command line interface implementation #6382 (by @mih)
The `
datalad.interface.base.Interface
class, the basis of all DataLad command implementations, has been extensively documented to provide an overview of basic principles and customization possibilities #6391 (by @mih)--since=^
mode of operation ofcreate-sibling
is documented now #6436 (by @yarikoptic)
Internal
The internal
status()
helper was equipped with docstrings and promotes “breadth-first” reporting with a new parameterreporting_order
#6006 (by @mih)AnnexRepo.get_file_annexinfo()
is introduced for more convenient queries for single files and replaces a now deprecatedAnnexRepo.get_file_key()
to receive information with fewer calls to Git #6104 (by @mih)A new
get_paths_by_ds()
helper exposesstatus
’ path normalization and sorting #6110 (by @mih)status
is optimized with a cache for dataset roots #6137 (by @yarikoptic)The internal
get_func_args_doc()
helper with Python 2 is removed from DataLad core #6175 (by @yarikoptic)Further restructuring of the source tree to better reflect the internal dependency structure of the code:
AddArchiveContent
is moved fromdatalad/interface
todatalad/local
(#6188 (by @mih)),Clean
is moved fromdatalad/interface
todatalad/local
(#6191 (by @mih)),Unlock
is moved fromdatalad/interface
todatalad/local
(#6192 (by @mih)),DownloadURL
is moved fromdatalad/interface
todatalad/local
(#6217 (by @mih)),Rerun
is moved fromdatalad/interface
todatalad/local
(#6220 (by @mih)),RunProcedure
is moved fromdatalad/interface
todatalad/local
(#6222 (by @mih)). The interface command list is restructured and resorted #6223 (by @mih)wrapt
is replaced with functools’wraps
#6190 (by @yariktopic)The unmaintained
appdirs
library has been replaced withplatformdirs
#6198 (by @adswa)Modelines mismatching the code style in source files were fixed #6263 (by @AKSoo)
datalad/__init__.py
has been cleaned up #6271 (by @mih)GitRepo.call_git_items
is implemented with a generator-based runner #6278 (by @christian-monch)Separate positional from keyword arguments in the Python API to match CLI with
*
#6176 (by @yarikoptic), #6304 (by @christian-monch)GitRepo.bare
does not require the ConfigManager anymore #6323 (by @mih)_get_dot_git()
was reimplemented to be more efficient and consistent, by testing for common scenarios first and introducing a consistently appliedresolved
flag for result path reporting #6325 (by @mih)All data files under
datalad
are now included when installing DataLad #6336 (by @jwodder)Add internal method for non-interactive provider/credential storing #5796 (by @bpoldrack)
Allow credential classes to have a context set, consisting of a URL they are to be used with and a dataset DataLad is operating on, allowing to consider “local” and “dataset” config locations #5796 (by @bpoldrack)
The Interface method
get_refds_path()
was deprecated #6387 (by @adswa)datalad.interface.base.Interface
is now an abstract class #6391 (by @mih)Simplified the decision making for result rendering, and reduced code complexity #6394 (by @mih)
Reduce code duplication in
datalad.support.json_py
#6398 (by @mih)Use public
ArgumentParser.parse_known_args
instead of protected_parse_known_args
#6414 (by @yarikoptic)add-archive-content
does not rely on the deprecatedtempfile.mktemp
anymore, but uses the more securetempfile.mkdtemp
#6428 (by @adswa)AnnexRepo’s internal
annexstatus
is deprecated. In its place, a new test helper assists the few tests that rely on it #6413 (by @adswa)config
has been refactored fromwhere[="dataset"]
toscope[="branch"]
#5969 (by @yarikoptic)Common command arguments are now uniformly and exhaustively passed to result renderers and filters for decision making. Previously, the presence of a particular argument depended on the respective API and circumstances of a command call. #6440 (by @mih)
Entrypoint processing for extensions and metadata extractors has been consolidated on a uniform helper that is about twice as fast as the previous implementations. #6591 (by @mih)
Tests
A range of Windows tests pass and were enabled #6136 (by @adswa)
Invalid escape sequences in some tests were fixed #6147 (by @mih)
A cross-platform compatible HTTP-serving test environment is introduced #6153 (by @mih)
A new helper exposes
serve_path_via_http
to the command line to deploy an ad-hoc instance of the HTTP server used for internal testing, with SSL and auth, if desired. #6169 (by @mih)Windows tests were redistributed across worker runs to harmonize runtime #6200 (by @adswa)
Batchedcommand
gained a basic test #6203 (by @christian-monch)The use of
with_testrepo
is discontinued in all core tests #6224 (by @mih)The new
git-annex.filter.annex.process
configuration is enabled by default on Windows to speed up the test suite #6245 (by @mih)If the available Git version supports it, the test suite now uses
GIT_CONFIG_GLOBAL
to configure a fake home directory instead of overwritingHOME
on OSX (#6251 (by @bpoldrack)) andHOME
andUSERPROFILE
on Windows #6260 (by @adswa)Windows test timeouts of runners were addressed #6311 (by @christian-monch)
A handful of Windows tests were fixed (#6352 (by @yarikoptic)) or disabled (#6353 (by @yarikoptic))
download-url
’s test underhttp_proxy
are skipped when a session can’t be established #6361 (by @yarikoptic)A test for
datalad clean
was fixed to be invoked within a dataset #6359 (by @yarikoptic)The new datalad.cli.tests have an improved module coverage of 80% #6378 (by @mih)
The
test_source_candidate_subdataset
has been marked as@slow
#6429 (by @yarikoptic)Dedicated
CLI
benchmarks exist now #6381 (by @mih)Enable code coverage report for subprocesses #6546 (by @adswa)
Skip a test on annex>=10.20220127 due to a bug in annex. See https://git-annex.branchable.com/bugs/Change_to_annex.largefiles_leaves_repo_modified/
Infra
A new issue template using GitHub forms prestructures bug reports #6048 (by @Remi-Gau)
DataLad and its dependency stack were packaged for Gentoo Linux #6088 (by @TheChymera)
The readthedocs configuration is modernized to version 2 #6207 (by @adswa)
The Windows CI setup now runs on Appveyor’s Visual Studio 2022 configuration #6228 (by @adswa)
The
readthedocs-theme
andSphinx
versions were pinned to re-enable rendering of bullet points in the documentation #6346 (by @adswa)The PR template was updated with a CHANGELOG template. Future PRs should use it to include a summary for the CHANGELOG #6396 (by @mih)
0.15.6 (Sun Feb 27 2022)
Bug Fix
BF: do not use BaseDownloader instance wide InterProcessLock - resolves stalling or errors during parallel installs #6507 (@yarikoptic)
release workflow: add -vv to auto invocation (@yarikoptic)
Fix version incorrectly incremented by release process in CHANGELOGs #6459 (@yarikoptic)
BF(TST): add another condition to skip under http_proxy set #6459 (@yarikoptic)
0.15.5 (Wed Feb 09 2022)
Enhancement
Bug Fix
Fix AnnexRepo.whereis key=True mode operation, and add batch mode support #6379 (@yarikoptic)
DOC: run - adjust description for -i/-o to mention that it could be a directory #6416 (@yarikoptic)
BF: ORA over HTTP tried to check archive #6355 (@bpoldrack @yarikoptic)
BF: condition access to isatty to have stream eval to True #6360 (@yarikoptic)
BF: python 3.10 compatibility fixes #6363 (@yarikoptic)
Warn just once about incomplete git config #6343 (@yarikoptic)
Make version detection robust to GIT_DIR specification #6341 (@effigies @mih)
BF(Q&D): do not crash - issue warning - if template fails to format #6319 (@yarikoptic)
0.15.4 (Thu Dec 16 2021)
Bug Fix
BF: autorc - replace incorrect releaseTypes with “none” #6320 (@yarikoptic)
Minor enhancement to CONTRIBUTING.md #6309 (@bpoldrack)
UX: If a clean repo is dirty after a failed run, give clean-up hints #6112 (@adswa)
BF: RIARemote - set UI backend to annex to make it interactive #6287 (@yarikoptic @bpoldrack)
CI: Update environment for windows CI builds #6292 (@bpoldrack)
bump the python version used for mac os tests #6288 (@christian-monch @bpoldrack)
ENH(UX): log a hint to use ulimit command in case of “Too long” exception #6173 (@yarikoptic)
BF: Don’t overwrite subdataset source candidates #6168 (@bpoldrack)
Bump sphinx requirement to bypass readthedocs defaults #6189 (@mih)
infra: Provide custom prefix to auto-related labels #6151 (@adswa)
BF: obtain information about annex special remotes also from annex journal #6135 (@yarikoptic @mih)
BF: clone tried to save new subdataset despite failing to clone #6140 (@bpoldrack)
Tests
RF+BF: use skip_if_no_module helper instead of try/except for libxmp and boto #6148 (@yarikoptic)
0.15.3 (Sat Oct 30 2021)
Bug Fix
BF: Don’t make create-sibling recursive by default #6116 (@adswa)
BF: Add dashes to ‘force’ option in non-empty directory error message #6078 (@DisasterMo)
DOC: Add supported URL types to download-url’s docstring #6098 (@adswa)
BF: Retain git-annex error messages & don’t show them if operation successful #6070 (@DisasterMo)
Remove uses of
__full_version__
anddatalad.version
#6073 (@jwodder)BF: ORA shouldn’t crash while handling a failure #6063 (@bpoldrack)
DOC: Refine –reckless docstring on usage and wording #6043 (@adswa)
BF: archives upon strip - use rmtree which retries etc instead of rmdir #6064 (@yarikoptic)
BF: do not leave test in a tmp dir destined for removal #6059 (@yarikoptic)
Pushed to maint
CI: Enable new codecov uploader in Appveyor CI (@adswa)
Internal
Documentation
Tests
BF(TST): remove reuse of the same tape across unrelated tests #6127 (@yarikoptic)
Ux get result handling broken #6052 (@christian-monch)
enable metalad tests again #6060 (@christian-monch)
0.15.2 (Wed Oct 06 2021)
Bug Fix
BF: Don’t suppress datalad subdatasets output #6035 (@DisasterMo @mih)
Honor datalad.runtime.use-patool if set regardless of OS (was Windows only) #6033 (@mih)
Discontinue usage of deprecated (public) helper #6032 (@mih)
BF: ProgressHandler - close the other handler if was specified #6020 (@yarikoptic)
UX: Report GitLab weburl of freshly created projects in the result #6017 (@adswa)
Ensure there’s a blank line between the class
__doc__
and “Parameters” inbuild_doc
docstrings #6004 (@jwodder)Large code-reorganization of everything runner-related #6008 (@mih)
Discontinue exc_str() in all modern parts of the code base #6007 (@mih)
Tests
TST: Add test to ensure functionality with subdatasets starting with a hyphen (-) #6042 (@DisasterMo)
BF(TST): filter away warning from coverage from analysis of stderr of –help #6028 (@yarikoptic)
BF: disable outdated SSL root certificate breaking chain on older/buggy clients #6027 (@yarikoptic)
BF: start global test_http_server only if not running already #6023 (@yarikoptic)
0.15.1 (Fri Sep 24 2021)
Bug Fix
BF: downloader - fail to download even on non-crippled FS if symlink exists #5991 (@yarikoptic)
ENH: import datalad.api to bind extensions methods for discovery of dataset methods #5999 (@yarikoptic)
Pushed to maint
Discontinue testing of hirni extension (@mih)
Internal
Documentation
Tests
BF(TST): use sys.executable, mark test_ria_basics.test_url_keys as requiring network #5986 (@yarikoptic)
0.15.0 (Tue Sep 14 2021) – We miss you Kyle!
Enhancements and new features
Command execution is now performed by a new
Runner
implementation that is no longer based on theasyncio
framework, which was found to exhibit fragile performance in interaction with otherasyncio
-using code, such as Jupyter notebooks. The new implementation is based on threads. It also supports the specification of “protocols” that were introduced with the switch to theasyncio
implementation in 0.14.0. (#5667)clone
now supports arbitrary URL transformations based on regular expressions. One or more transformation steps can be defined viadatalad.clone.url-substitute.<label>
configuration settings. The feature can be (and is now) used to support convenience mappings, such ashttps://osf.io/q8xnk/
(displayed in a browser window) toosf://q8xnk
(clonable via thedatalad-osf
extension. (#5749)Homogenize SSH use and configurability between DataLad and git-annex, by instructing git-annex to use DataLad’s
sshrun
for SSH calls (instead of SSH directly). (#5389)The ORA special remote has received several new features:
It now support a
push-url
setting as an alternative tourl
for write access. An analog parameter was also added tocreate-sibling-ria
. (#5420, #5428)Access of RIA stores now performs homogeneous availability checks, regardless of access protocol. Before, broken HTTP-based access due to misspecified URLs could have gone unnoticed. (#5459, #5672)
Error reporting was introduce to inform about undesirable conditions in remote RIA stores. (#5683)
create-sibling-ria
now supports--alias
for the specification of a convenience dataset alias name in a RIA store. (#5592)Analog to
git commit
,save
now features an--amend
mode to support incremental updates of a dataset state. (#5430)run
now supports a dry-run mode that can be used to inspect the result of parameter expansion on the effective command to ease the composition of more complicated command lines. (#5539)run
now supports a--assume-ready
switch to avoid the (possibly expensive) preparation of inputs and outputs with large datasets that have already been readied through other means. (#5431)update
now features--how
and--how-subds
parameters to configure how an update shall be performed. Supported modes arefetch
(unchanged default), andmerge
(previously also possible via--merge
), but also new strategies likereset
orcheckout
. (#5534)update
has a new--follow=parentds-lazy
mode that only performs a fetch operation in subdatasets when the desired commit is not yet present. During recursive updates involving many subdatasets this can substantially speed up performance. (#5474)DataLad’s command line API can now report the version for individual commands via
datalad <cmd> --version
. The output has been homogenized to<providing package> <version>
. (#5543)create-sibling
now logs information on an auto-generated sibling name, in the case that no--name/-s
was provided. (#5550)create-sibling-github
has been updated to emit result records like any standard DataLad command. Previously it was implemented as a “plugin”, which did not support all standard API parameters. (#5551)copy-file
now also works with content-less files in datasets on crippled filesystems (adjusted mode), when a recent enough git-annex (8.20210428 or later) is available. (#5630)addurls
can now be instructed how to behave in the event of file name collision via a new parameter--on-collision
. (#5675)addurls
reporting now informs which particular subdatasets were created. (#5689)Credentials can now be provided or overwritten via all means supported by
ConfigManager
. Importantly,datalad.credential.<name>.<field>
configuration settings and analog specification via environment variables are now supported (rather than custom environment variables only). Previous specification methods are still supported too. (#5680)A new
datalad.credentials.force-ask
configuration flag can now be used to force re-entry of already known credentials. This simplifies credential updates without having to use an approach native to individual credential stores. (#5777)Suppression of rendering repeated similar results is now configurable via the configuration switches
datalad.ui.suppress-similar-results
(bool), anddatalad.ui.suppress-similar-results-threshold
(int). (#5681)The performance of
status
and similar functionality when determining local file availability has been improved. (#5692)push
now renders a result summary on completion. (#5696)A dedicated info log message indicates when dataset repositories are subjected to an annex version upgrade. (#5698)
Error reporting improvements:
The
NoDatasetFound
exception now provides information for which purpose a dataset is required. (#5708)Wording of the
MissingExternalDependeny
error was rephrased to account for cases of non-functional installations. (#5803)push
reports when a--to
parameter specification was (likely) forgotten. (#5726)Detailed information is now given when DataLad fails to obtain a lock for credential entry in a timely fashion. Previously only a generic debug log message was emitted. (#5884)
Clarified error message when
create-sibling-gitlab
was called without--project
. (#5907)
add-readme
now provides a README template with more information on the nature and use of DataLad datasets. A README file is no longer annex’ed by default, but can be using the new--annex
switch. ([#5723][], [#5725][])clean
now supports a--dry-run
mode to inform about cleanable content. (#5738)A new configuration setting
datalad.locations.locks
can be used to control the placement of lock files. (#5740)wtf
now also reports branch names and states. (#5804)AnnexRepo.whereis()
now supports batch mode. (#5533)
Deprecations and removals
The minimum supported git-annex version is now 8.20200309. (#5512)
ORA special remote configuration items
ssh-host
, andbase-path
are deprecated. They are completely replaced byria+<protocol>://
URL specifications. (#5425)The deprecated
no_annex
parameter ofcreate()
was removed from the Python API. (#5441)The unused
GitRepo.pull()
method has been removed. (#5558)Residual support for “plugins” (a mechanism used before DataLad supported extensions) was removed. This includes the configuration switches
datalad.locations.{system,user}-plugins
. (#5554, #5564)Several features and comments have been moved to the
datalad-deprecated
package. This package must now be installed to be able to use keep using this functionality.AnnexRepo.copy_to()
has been deprecated. Thepush
command should be used instead. (#5560)AnnexRepo.sync()
has been deprecated.AnnexRepo.call_annex(['sync', ...])
should be used instead. (#5461)All
GitRepo.*_submodule()
methods have been deprecated and will be removed in a future release. (#5559)create-sibling-github
’s--dryrun
switch was deprecated, use--dry-run
instead. (#5551)The
datalad --pbs-runner
option has been deprecated, usecondor_run
(or similar) instead. (#5956)
Fixes
Prevent invalid declaration of a publication dependencies for ‘origin’ on any auto-detected ORA special remotes, when cloing from a RIA store. An ORA remote is now checked whether it actually points to the RIA store the clone was made from. (#5415)
The ORA special remote implementation has received several fixes:
It is now possible to specifically select the default (or generic) result renderer via
datalad -f default
and with that override atailored
result renderer that may be preconfigured for a particular command. (#5476)Starting with 0.14.0, original URLs given to
clone
were recorded in a subdataset record. This was initially done in a second commit, leading to inflation of commits and slowdown in superdatasets with many subdatasets. Such subdataset record annotation is now collapsed into a single commits. (#5480)run
now longer removes leading empty directories as part of the output preparation. This was surprising behavior for commands that do not ensure on their own that output directories exist. (#5492)A potentially existing
message
property is no longer removed when using thejson
orjson_pp
result renderer to avoid undesired withholding of relevant information. (#5536)subdatasets
now reportsstate=present
, rather thanstate=clean
, for installed subdatasets to complementstate=absent
reports for uninstalled dataset. (#5655)create-sibling-ria
now executes commands with a consistent environment setup that matches all other command execution in other DataLad commands. (#5682)save
no longer saves unspecified subdatasets when called with an explicit path (list). The fix required a behavior change ofGitRepo.get_content_info()
in its interpretation ofNone
vs.[]
path argument values that now aligns the behavior ofGitRepo.diff|status()
with their respective documentation. (#5693)get
now prefers the location of a subdatasets that is recorded in a superdataset’s.gitmodules
record. Previously, DataLad tried to obtain a subdataset from an assumed checkout of the superdataset’s origin. This new default order is (re-)configurable via thedatalad.get.subdataset-source-candidate-<priority-label>
configuration mechanism. (#5760)create-sibling-gitlab
no longer skips the root dataset when.
is given as a path. (#5789)siblings
now rejects a value given to--as-common-datasrc
that clashes with the respective Git remote. (#5805)The usage synopsis reported by
siblings
now lists all supported actions. (#5913)siblings
now renders non-ok results to avoid silent failure. (#5915).gitattribute
file manipulations no longer leave the file without a trailing newline. (#5847)Prevent crash when trying to delete a non-existing keyring credential field. (#5892)
git-annex is no longer called with an unconditional
annex.retry=3
configuration. Instead, this parameterization is now limited toannex get
andannex copy
calls. (#5904)
Tests
file://
URLs are no longer the predominant test case forAnnexRepo
functionality. A built-in HTTP server now used in most cases. (#5332)
0.14.8 (Sun Sep 12 2021)
Bug Fix
BF: add-archive-content on .xz and other non-.gz stream compressed files #5930 (@yarikoptic)
BF(UX): do not keep logging ERROR possibly present in progress records #5936 (@yarikoptic)
Annotate datalad_core as not needing actual data – just uses annex whereis #5971 (@yarikoptic)
BF: limit CMD_MAX_ARG if obnoxious value is encountered. #5945 (@yarikoptic)
Download session/credentials locking – inform user if locking is “failing” to be obtained, fail upon ~5min timeout #5884 (@yarikoptic)
Render siblings()’s non-ok results with the default renderer #5915 (@mih)
BF: do not crash, just skip whenever trying to delete non existing field in the underlying keyring #5892 (@yarikoptic)
Fix argument-spec for
siblings
and improve usage synopsis #5913 (@mih)Clarify error message re unspecified gitlab project #5907 (@mih)
Support username, password and port specification in RIA URLs #5902 (@mih)
BF: take path from SSHRI, test URLs not only on Windows #5881 (@yarikoptic)
ENH(UX): warn user if keyring returned a “null” keyring #5875 (@yarikoptic)
ENH(UX): state original purpose in NoDatasetFound exception + detail it for get #5708 (@yarikoptic)
Pushed to maint
Merge branch ‘bf-http-headers-agent’ into maint (@yarikoptic)
RF(BF?)+DOC: provide User-Agent to entire session headers + use those if provided (@yarikoptic)
Internal
Pass
--no-changelog
toauto shipit
if changelog already has entry #5952 (@jwodder)Add isort config to match current convention + run isort via pre-commit (if configured) #5923 (@jwodder)
.travis.yml: use python -m {nose,coverage} invocations, and always show combined report #5888 (@yarikoptic)
Add project URLs into the package metadata for convenience links on Pypi #5866 (@adswa @yarikoptic)
Tests
BF: do use OBSCURE_FILENAME instead of hardcoded unicode #5944 (@yarikoptic)
BF(TST): Skip testing for having PID listed if no psutil #5920 (@yarikoptic)
BF(TST): Boost version of git-annex to 8.20201129 to test an error message #5894 (@yarikoptic)
0.14.7 (Tue Aug 03 2021)
Bug Fix
UX: When two or more clone URL templates are found, error out more gracefully #5839 (@adswa)
BF: http_auth - follow redirect (just 1) to re-authenticate after initial attempt #5852 (@yarikoptic)
addurls Formatter - provide value repr in exception #5850 (@yarikoptic)
ENH: allow for “patch” level semver for “master” branch #5839 (@yarikoptic)
BF: Report info from annex JSON error message in CommandError #5809 (@mih)
RF(TST): do not test for no EASY and pkg_resources in shims #5817 (@yarikoptic)
http downloaders: Provide custom informative User-Agent, do not claim to be “Authenticated access” #5802 (@yarikoptic)
ENH(UX,DX): inform user with a warning if version is 0+unknown #5787 (@yarikoptic)
shell-completion: add argcomplete to ‘misc’ extra_depends, log an ERROR if argcomplete fails to import #5781 (@yarikoptic)
ENH (UX): add python-gitlab dependency #5776 (s.heunis@fz-juelich.de)
Internal
BF: import importlib.metadata not importlib_metadata whenever available #5818 (@yarikoptic)
Tests
TST: set –allow-unrelated-histories in the mk_push_target setup for Windows #5855 (@adswa)
Tests: Allow for version to contain + as a separator and provide more information for version related comparisons #5786 (@yarikoptic)
0.14.6 (Sun Jun 27 2021)
Internal
BF: update changelog conversion from .md to .rst (for sphinx) #5757 (@yarikoptic @jwodder)
0.14.5 (Mon Jun 21 2021)
Bug Fix
BF(TST): parallel - take longer for producer to produce #5747 (@yarikoptic)
add –on-failure default value and document it #5690 (@christian-monch @yarikoptic)
ENH: harmonize “purpose” statements to imperative form #5733 (@yarikoptic)
ENH(TST): populate heavy tree with 100 unique keys (not just 1) among 10,000 #5734 (@yarikoptic)
BF: do not use .acquired - just get state from acquire() #5718 (@yarikoptic)
BF: account for annex now “scanning for annexed” instead of “unlocked” files #5705 (@yarikoptic)
interface: Don’t repeat custom summary for non-generator results #5688 (@kyleam)
RF: just pip install datalad-installer #5676 (@yarikoptic)
DOC: addurls.extract: Drop mention of removed ‘stream’ parameter #5690 (@kyleam)
Merge pull request #5674 from kyleam/test-addurls-copy-fix #5674 (@kyleam)
Merge pull request #5663 from kyleam/status-ds-equal-path #5663 (@kyleam)
Merge pull request #5671 from kyleam/update-fetch-fail #5671 (@kyleam)
BF: update: Honor –on-failure if fetch fails #5671 (@kyleam)
Merge pull request #5664 from kyleam/addurls-better-url-parts-error #5664 (@kyleam)
Merge pull request #5661 from kyleam/sphinx-fix-plugin-refs #5661 (@kyleam)
BF: status: Provide special treatment of “this dataset” path #5663 (@kyleam)
BF: addurls: Provide better placeholder error for special keys #5664 (@kyleam)
RF: addurls: Simply construction of placeholder exception message #5664 (@kyleam)
RF: addurls._get_placeholder_exception: Rename a parameter #5664 (@kyleam)
RF: status: Avoid repeated Dataset.path access #5663 (@kyleam)
download-url: Set up datalad special remote if needed #5648 (@kyleam @yarikoptic)
Pushed to maint
MNT: Post-release dance (@kyleam)
Internal
Switch to versioneer and auto #5669 (@jwodder @yarikoptic)
Tests
BF(TST): skip testing for showing “Scanning for …” since not shown if too quick #5727 (@yarikoptic)
Revert “TST: test_partial_unlocked: Document and avoid recent git-annex failure” #5651 (@kyleam)
0.14.4 (May 10, 2021) – .
Fixes
0.14.3 (April 28, 2021) – .
Fixes
For outputs that include a glob, run didn’t re-glob after executing the command, which is necessary to catch changes if
--explicit
or--expand={outputs,both}
is specified. (#5594)run now gives an error result rather than a warning when an input glob doesn’t match. (#5594)
The procedure for creating a RIA store checks for an existing ria-layout-version file and makes sure its version matches the desired version. This check wasn’t done correctly for SSH hosts. (#5607)
A helper for transforming git-annex JSON records into DataLad results didn’t account for the unusual case where the git-annex record doesn’t have a “file” key. (#5580)
The test suite required updates for recent changes in PyGithub and git-annex. (#5603) (#5609)
Enhancements and new features
The DataLad source repository has long had a tools/cmdline-completion helper. This functionality is now exposed as a command,
datalad shell-completion
. (#5544)
0.14.2 (April 14, 2021) – .
Fixes
push now works bottom-up, pushing submodules first so that hooks on the remote can aggregate updated subdataset information. (#5416)
run-procedure didn’t ensure that the configuration of subdatasets was reloaded. (#5552)
0.14.1 (April 01, 2021) – .
Fixes
The recent default branch changes on GitHub’s side can lead to “git-annex” being selected over “master” as the default branch on GitHub when setting up a sibling with create-sibling-github. To work around this, the current branch is now pushed first. (#5010)
The logic for reading in a JSON line from git-annex failed if the response exceeded the buffer size (256 KB on *nix systems).
Calling unlock with a path of “.” from within an untracked subdataset incorrectly aborted, complaining that the “dataset containing given paths is not underneath the reference dataset”. (#5458)
clone didn’t account for the possibility of multiple accessible ORA remotes or the fact that none of them may be associated with the RIA store being cloned. (#5488)
create-sibling-ria didn’t call
git update-server-info
after setting up the remote repository and, as a result, the repository couldn’t be fetched until something else (e.g., a push) triggered a call togit update-server-info
. (#5531)The parser for git-config output didn’t properly handle multi-line values and got thrown off by unexpected and unrelated lines. (#5509)
The 0.14 release introduced regressions in the handling of progress bars for git-annex actions, including collapsing progress bars for concurrent operations. (#5421) (#5438)
save failed if the user configured Git’s
diff.ignoreSubmodules
to a non-default value. (#5453)A interprocess lock is now used to prevent a race between checking for an SSH socket’s existence and creating it. (#5466)
If a Python procedure script is executable, run-procedure invokes it directly rather than passing it to
sys.executable
. The non-executable Python procedures that ship with DataLad now include shebangs so that invoking them has a chance of working on file systems that present all files as executable. (#5436)DataLad’s wrapper around
argparse
failed if an underscore was used in a positional argument. (#5525)
Enhancements and new features
DataLad’s method for mapping environment variables to configuration options (e.g.,
DATALAD_FOO_X__Y
todatalad.foo.x-y
) doesn’t work if the subsection name (“FOO”) has an underscore. This limitation can be sidestepped with the newDATALAD_CONFIG_OVERRIDES_JSON
environment variable, which can be set to a JSON record of configuration values. (#5505)
0.14.0 (February 02, 2021) – .
Major refactoring and deprecations
Git versions below v2.19.1 are no longer supported. (#4650)
The minimum git-annex version is still 7.20190503, but, if you’re on Windows (or use adjusted branches in general), please upgrade to at least 8.20200330 but ideally 8.20210127 to get subdataset-related fixes. (#4292) (#5290)
The minimum supported version of Python is now 3.6. (#4879)
publish is now deprecated in favor of push. It will be removed in the 0.15.0 release at the earliest.
A new command runner was added in v0.13. Functionality related to the old runner has now been removed:
Runner
,GitRunner
, andrun_gitcommand_on_file_list_chunks
from thedatalad.cmd
module along with thedatalad.tests.protocolremote
,datalad.cmd.protocol
, anddatalad.cmd.protocol.prefix
configuration options. (#5229)The
--no-storage-sibling
switch ofcreate-sibling-ria
is deprecated in favor of--storage-sibling=off
and will be removed in a later release. (#5090)The
get_git_dir
static method ofGitRepo
is deprecated and will be removed in a later release. Use thedot_git
attribute of an instance instead. (#4597)The
ProcessAnnexProgressIndicators
helper fromdatalad.support.annexrepo
has been removed. (#5259)The
save
argument of install, a noop since v0.6.0, has been dropped. (#5278)The
get_URLS
method ofAnnexCustomRemote
is deprecated and will be removed in a later release. (#4955)ConfigManager.get
now returns a single value rather than a tuple when there are multiple values for the same key, as very few callers correctly accounted for the possibility of a tuple return value. Callers can restore the old behavior by passingget_all=True
. (#4924)In 0.12.0, all of the
assure_*
functions indatalad.utils
were renamed asensure_*
, keeping the old names around as compatibility aliases. Theassure_*
variants are now marked as deprecated and will be removed in a later release. (#4908)The
datalad.interface.run
module, which was deprecated in 0.12.0 and kept as a compatibility shim fordatalad.core.local.run
, has been removed. (#4583)The
saver
argument ofdatalad.core.local.run.run_command
, marked as obsolete in 0.12.0, has been removed. (#4583)The
dataset_only
argument of theConfigManager
class was deprecated in 0.12 and has now been removed. (#4828)The
linux_distribution_name
,linux_distribution_release
, andon_debian_wheezy
attributes indatalad.utils
are no longer set at import time and will be removed in a later release. Usedatalad.utils.get_linux_distribution
instead. (#4696)datalad.distribution.clone
, which was marked as obsolete in v0.12 in favor ofdatalad.core.distributed.clone
, has been removed. (#4904)datalad.support.annexrepo.N_AUTO_JOBS
, announced as deprecated in v0.12.6, has been removed. (#4904)The
compat
parameter ofGitRepo.get_submodules
, added in v0.12 as a temporary compatibility layer, has been removed. (#4904)The long-deprecated (and non-functional)
url
parameter ofGitRepo.__init__
has been removed. (#5342)
Fixes
Cloning onto a system that enters adjusted branches by default (as Windows does) did not properly record the clone URL. (#5128)
The RIA-specific handling after calling clone was correctly triggered by
ria+http
URLs but notria+https
URLs. (#4977)If the registered commit wasn’t found when cloning a subdataset, the failed attempt was left around. (#5391)
The remote calls to
cp
andchmod
in create-sibling were not portable and failed on macOS. (#5108)A more reliable check is now done to decide if configuration files need to be reloaded. (#5276)
The internal command runner’s handling of the event loop has been improved to play nicer with outside applications and scripts that use asyncio. (#5350) (#5367)
Enhancements and new features
The subdataset handling for adjusted branches, which is particularly important on Windows where git-annex enters an adjusted branch by default, has been improved. A core piece of the new approach is registering the commit of the primary branch, not its checked out adjusted branch, in the superdataset. Note: This means that
git status
will always consider a subdataset on an adjusted branch as dirty whiledatalad status
will look more closely and see if the tip of the primary branch matches the registered commit. (#5241)The performance of the subdatasets command has been improved, with substantial speedups for recursive processing of many subdatasets. (#4868) (#5076)
get, save, and addurls gained support for parallel operations that can be enabled via the
--jobs
command-line option or the newdatalad.runtime.max-jobs
configuration option. (#5022)-
learned how to read data from standard input. (#4669)
now supports tab-separated input. (#4845)
now lets Python callers pass in a list of records rather than a file name. (#5285)
gained a
--drop-after
switch that signals to drop a file’s content after downloading and adding it to the annex. (#5081)is now able to construct a tree of files from known checksums without downloading content via its new
--key
option. (#5184)records the URL file in the commit message as provided by the caller rather than using the resolved absolute path. (#5091)
create-sibling-github learned how to create private repositories (thanks to Nolan Nichols). (#4769)
create-sibling-ria gained a
--storage-sibling
option. When--storage-sibling=only
is specified, the storage sibling is created without an accompanying Git sibling. This enables using hosts without Git installed for storage. (#5090)The download machinery (and thus the
datalad
special remote) gained support for a new scheme,shub://
, which follows the same format used bysingularity run
and friends. In contrast to the short-lived URLs obtained by querying Singularity Hub directly,shub://
URLs are suitable for registering with git-annex. (#4816)A provider is now included for https://registry-1.docker.io URLs. This is useful for storing an image’s blobs in a dataset and registering the URLs with git-annex. (#5129)
The
add-readme
command now links to the DataLad handbook rather than http://docs.datalad.org. (#4991)New option
datalad.locations.extra-procedures
specifies an additional location that should be searched for procedures. (#5156)The class for handling configuration values,
ConfigManager
, now takes a lock before writes to allow for multiple processes to modify the configuration of a dataset. (#4829)clone now records the original, unresolved URL for a subdataset under
submodule.<name>.datalad-url
in the parent’s .gitmodules, enabling later get calls to use the original URL. This is particularly useful forria+
URLs. (#5346)Installing a subdataset now uses custom handling rather than calling
git submodule update --init
. This avoids some locking issues when running get in parallel and enables more accurate source URLs to be recorded. (#4853)GitRepo.get_content_info
, a helper that gets triggered by many commands, got faster by tweaking itsgit ls-files
call. (#5067)wtf now includes credentials-related information (e.g. active backends) in the its output. (#4982)
The
call_git*
methods ofGitRepo
now have aread_only
parameter. Callers can set this toTrue
to promise that the provided command does not write to the repository, bypassing the cost of some checks and locking. (#5070)New
call_annex*
methods in theAnnexRepo
class provide an interface for running git-annex commands similar to that of theGitRepo.call_git*
methods. (#5163)It’s now possible to register a custom metadata indexer that is discovered by search and used to generate an index. (#4963)
The
ConfigManager
methodsget
,getbool
,getfloat
, andgetint
now return a single value (with same precedence asgit config --get
) when there are multiple values for the same key (in the non-committed git configuration, if the key is present there, or in the dataset configuration). Forget
, the old behavior can be restored by specifyingget_all=True
. (#4924)Command-line scripts are now defined via the
entry_points
argument ofsetuptools.setup
instead of thescripts
argument. (#4695)Interactive use of
--help
on the command-line now invokes a pager on more systems and installation setups. (#5344)The
datalad
special remote now tries to eliminate some unnecessary interactions with git-annex by being smarter about how it queries for URLs associated with a key. (#4955)The
GitRepo
class now does a better job of handling bare repositories, a step towards bare repositories support in DataLad. (#4911)More internal work to move the code base over to the new command runner. (#4699) (#4855) (#4900) (#4996) (#5002) (#5141) (#5142) (#5229)
0.13.7 (January 04, 2021) – .
Fixes
Cloning from a RIA store on the local file system initialized annex in the Git sibling of the RIA source, which is problematic because all annex-related functionality should go through the storage sibling. clone now sets
remote.origin.annex-ignore
totrue
after cloning from RIA stores to prevent this. (#5255)create-sibling invoked
cp
in a way that was not compatible with macOS. (#5269)Due to a bug in older Git versions (before 2.25), calling status with a file under .git/ (e.g.,
datalad status .git/config
) incorrectly reported the file as untracked. A workaround has been added. (#5258)Update tests for compatibility with latest git-annex. (#5254)
Enhancements and new features
0.13.6 (December 14, 2020) – .
Fixes
An assortment of fixes for Windows compatibility. (#5113) (#5119) (#5125) (#5127) (#5136) (#5201) (#5200) (#5214)
Adding a subdataset on a system that defaults to using an adjusted branch (i.e. doesn’t support symlinks) didn’t properly set up the submodule URL if the source dataset was not in an adjusted state. (#5127)
push failed to push to a remote that did not have an
annex-uuid
value in the local.git/config
. (#5148)The default renderer has been improved to avoid a spurious leading space, which led to the displayed path being incorrect in some cases. (#5121)
siblings showed an uninformative error message when asked to configure an unknown remote. (#5146)
drop confusingly relayed a suggestion from
git annex drop
to use--force
, an option that does not exist indatalad drop
. (#5194)create-sibling-github no longer offers user/password authentication because it is no longer supported by GitHub. (#5218)
The internal command runner’s handling of the event loop has been tweaked to hopefully fix issues with running DataLad from IPython. (#5106)
SSH cleanup wasn’t reliably triggered by the ORA special remote on failure, leading to a stall with a particular version of git-annex, 8.20201103. (This is also resolved on git-annex’s end as of 8.20201127.) (#5151)
Enhancements and new features
0.13.5 (October 30, 2020) – .
Fixes
SSH connection handling has been reworked to fix cloning on Windows. A new configuration option,
datalad.ssh.multiplex-connections
, defaults to false on Windows. (#5042)The ORA special remote and post-clone RIA configuration now provide authentication via DataLad’s credential mechanism and better handling of HTTP status codes. (#5025) (#5026)
By default, if a git executable is present in the same location as git-annex, DataLad modifies
PATH
when running git and git-annex so that the bundled git is used. This logic has been tightened to avoid unnecessarily adjusting the path, reducing the cases where the adjustment interferes with the local environment, such as special remotes in a virtual environment being masked by the system-wide variants. (#5035)git-annex is now consistently invoked as “git annex” rather than “git-annex” to work around failures on Windows. (#5001)
push called
git annex sync ...
on plain git repositories. (#5051)save in genernal doesn’t support registering multiple levels of untracked subdatasets, but it can now properly register nested subdatasets when all of the subdataset paths are passed explicitly (e.g.,
datalad save -d. sub-a sub-a/sub-b
). (#5049)When called with
--sidecar
and--explicit
, run didn’t save the sidecar. (#5017)A couple of spots didn’t properly quote format fields when combining substrings into a format string. (#4957)
The default credentials configured for
indi-s3
prevented anonymous access. (#5045)
Enhancements and new features
Messages about suppressed similar results are now rate limited to improve performance when there are many similar results coming through quickly. (#5060)
create-sibling-github can now be told to replace an existing sibling by passing
--existing=replace
. (#5008)Progress bars now react to changes in the terminal’s width (requires tqdm 2.1 or later). (#5057)
0.13.4 (October 6, 2020) – .
Fixes
Ephemeral clones mishandled bare repositories. (#4899)
The post-clone logic for configuring RIA stores didn’t consider
https://
URLs. (#4977)DataLad custom remotes didn’t escape newlines in messages sent to git-annex. (#4926)
The datalad-archives special remote incorrectly treated file names as percent-encoded. (#4953)
The result handler didn’t properly escape “%” when constructing its message template. (#4953)
In v0.13.0, the tailored rendering for specific subtypes of external command failures (e.g., “out of space” or “remote not available”) was unintentionally switched to the default rendering. (#4966)
Various fixes and updates for the NDA authenticator. (#4824)
The helper for getting a versioned S3 URL did not support anonymous access or buckets with “.” in their name. (#4985)
Several issues with the handling of S3 credentials and token expiration have been addressed. (#4927) (#4931) (#4952)
Enhancements and new features
A warning is now given if the detected Git is below v2.13.0 to let users that run into problems know that their Git version is likely the culprit. (#4866)
A fix to push in v0.13.2 introduced a regression that surfaces when
push.default
is configured to “matching” and prevents the git-annex branch from being pushed. Note that, as part of the fix, the current branch is now always pushed even when it wouldn’t be based on the configured refspec orpush.default
value. (#4896)The archives are handled with p7zip, if available, since DataLad v0.12.0. This implementation now supports .tgz and .tbz2 archives. (#4877)
0.13.3 (August 28, 2020) – .
Fixes
Work around a Python bug that led to our asyncio-based command runner intermittently failing to capture the output of commands that exit very quickly. (#4835)
push displayed an overestimate of the transfer size when multiple files pointed to the same key. (#4821)
When download-url calls
git annex addurl
, it catches and reports any failures rather than crashing. A change in v0.12.0 broke this handling in a particular case. (#4817)
Enhancements and new features
The wrapper functions returned by decorators are now given more meaningful names to hopefully make tracebacks easier to digest. (#4834)
0.13.2 (August 10, 2020) – .
Deprecations
The
allow_quick
parameter ofAnnexRepo.file_has_content
andAnnexRepo.is_under_annex
is now ignored and will be removed in a later release. This parameter was only relevant for git-annex versions before 7.20190912. (#4736)
Fixes
Updates for compatibility with recent git and git-annex releases. (#4746) (#4760) (#4684)
push didn’t sync the git-annex branch when
--data=nothing
was specified. (#4786)The
datalad.clone.reckless
configuration wasn’t stored in non-annex datasets, preventing the values from being inherited by annex subdatasets. (#4749)Running the post-update hook installed by
create-sibling --ui
could overwrite web log files from previous runs in the unlikely event that the hook was executed multiple times in the same second. (#4745)clone inspected git’s standard error in a way that could cause an attribute error. (#4775)
When cloning a repository whose
HEAD
points to a branch without commits, clone tries to find a more useful branch to check out. It unwisely considered adjusted branches. (#4792)Since v0.12.0,
SSHManager.close
hasn’t closed connections when thectrl_path
argument was explicitly given. (#4757)When working in a dataset in which
git annex init
had not yet been called, thefile_has_content
andis_under_annex
methods ofAnnexRepo
incorrectly took the “allow quick” code path on file systems that did not support it (#4736)
Enhancements
create now assigns version 4 (random) UUIDs instead of version 1 UUIDs that encode the time and hardware address. (#4790)
The documentation for create now does a better job of describing the interaction between
--dataset
andPATH
. (#4763)The
format_commit
andget_hexsha
methods ofGitRepo
have been sped up. (#4807) (#4806)A better error message is now shown when the
^
or^.
shortcuts for--dataset
do not resolve to a dataset. (#4759)A more helpful error message is now shown if a caller tries to download an
ftp://
link but does not haverequest_ftp
installed. (#4788)clone now tries harder to get up-to-date availability information after auto-enabling
type=git
special remotes. (#2897)
0.13.1 (July 17, 2020) – .
Fixes
Cloning a subdataset should inherit the parent’s
datalad.clone.reckless
value, but that did not happen when cloning viadatalad get
rather thandatalad install
ordatalad clone
. (#4657)The default result renderer crashed when the result did not have a
path
key. (#4666) (#4673)datalad push
didn’t show information aboutgit push
errors when the output was not in the format that it expected. (#4674)datalad push
silently accepted an empty string for--since
even though it is an invalid value. (#4682)Our JavaScript testing setup on Travis grew stale and has now been updated. (Thanks to Xiao Gui.) (#4687)
The new class for running Git commands (added in v0.13.0) ignored any changes to the process environment that occurred after instantiation. (#4703)
Enhancements and new features
datalad push
now avoids unnecessarygit push
dry runs and pushes all refspecs with a singlegit push
call rather than invokinggit push
for each one. (#4692) (#4675)The readability of SSH error messages has been improved. (#4729)
datalad.support.annexrepo
avoids callingdatalad.utils.get_linux_distribution
at import time and caches the result once it is called because, as of Python 3.8, the function usesdistro
underneath, adding noticeable overhead. (#4696)Third-party code should be updated to use
get_linux_distribution
directly in the unlikely event that the code relied on the import-time call toget_linux_distribution
setting thelinux_distribution_name
,linux_distribution_release
, oron_debian_wheezy
attributes in `datalad.utils.
0.13.0 (June 23, 2020) – .
A handful of new commands, including copy-file
, push
, and
create-sibling-ria
, along with various fixes and enhancements
Major refactoring and deprecations
The
no_annex
parameter of create, which is exposed in the Python API but not the command line, is deprecated and will be removed in a later release. Use the newannex
argument instead, flipping the value. Command-line callers that use--no-annex
are unaffected. (#4321)datalad add
, which was deprecated in 0.12.0, has been removed. (#4158) (#4319)The following
GitRepo
andAnnexRepo
methods have been removed:get_changed_files
,get_missing_files
, andget_deleted_files
. (#4169) (#4158)The
get_branch_commits
method ofGitRepo
andAnnexRepo
has been renamed toget_branch_commits_
. (#3834)The custom
commit
method ofAnnexRepo
has been removed, andAnnexRepo.commit
now resolves to the parent method,GitRepo.commit
. (#4168)GitPython’s
git.repo.base.Repo
class is no longer available via the.repo
attribute ofGitRepo
andAnnexRepo
. (#4172)AnnexRepo.get_corresponding_branch
now returnsNone
rather than the current branch name when a managed branch is not checked out. (#4274)The special UUID for git-annex web remotes is now available as
datalad.consts.WEB_SPECIAL_REMOTE_UUID
. It remains accessible asAnnexRepo.WEB_UUID
for compatibility, but new code should useconsts.WEB_SPECIAL_REMOTE_UUID
(#4460).
Fixes
Widespread improvements in functionality and test coverage on Windows and crippled file systems in general. (#4057) (#4245) (#4268) (#4276) (#4291) (#4296) (#4301) (#4303) (#4304) (#4305) (#4306)
AnnexRepo.get_size_from_key
incorrectly handled file chunks. (#4081)create-sibling would too readily clobber existing paths when called with
--existing=replace
. It now gets confirmation from the user before doing so if running interactively and unconditionally aborts when running non-interactively. (#4147)-
queried the incorrect branch configuration when updating non-annex repositories.
didn’t account for the fact that the local repository can be configured as the upstream “remote” for a branch.
When the caller included
--bare
as agit init
option, create crashed creating the bare repository, which is currently unsupported, rather than aborting with an informative error message. (#4065)The logic for automatically propagating the ‘origin’ remote when cloning a local source could unintentionally trigger a fetch of a non-local remote. (#4196)
All remaining
get_submodules()
call sites that relied on the temporary compatibility layer added in v0.12.0 have been updated. (#4348)The custom result summary renderer for get, which was visible with
--output-format=tailored
, displayed incorrect and confusing information in some cases. The custom renderer has been removed entirely. (#4471)The documentation for the Python interface of a command listed an incorrect default when the command overrode the value of command parameters such as
result_renderer
. (#4480)
Enhancements and new features
The default result renderer learned to elide a chain of results after seeing ten consecutive results that it considers similar, which improves the display of actions that have many results (e.g., saving hundreds of files). (#4337)
The default result renderer, in addition to “tailored” result renderer, now triggers the custom summary renderer, if any. (#4338)
The new command create-sibling-ria provides support for creating a sibling in a RIA store. (#4124)
DataLad ships with a new special remote, git-annex-remote-ora, for interacting with RIA stores and a new command export-archive-ora for exporting an archive from a local annex object store. (#4260) (#4203)
The new command push provides an alternative interface to publish for pushing a dataset hierarchy to a sibling. (#4206) (#4581) (#4617) (#4620)
The new command copy-file copies files and associated availability information from one dataset to another. (#4430)
The command examples have been expanded and improved. (#4091) (#4314) (#4464)
The tooling for linking to the DataLad Handbook from DataLad’s documentation has been improved. (#4046)
The
--reckless
parameter of clone and install learned two new modes:-
learned to handle dataset aliases in RIA stores when given a URL of the form
ria+<protocol>://<storelocation>#~<aliasname>
. (#4459)now checks
datalad.get.subdataset-source-candidate-NAME
to see ifNAME
starts with three digits, which is taken as a “cost”. Sources with lower costs will be tried first. (#4619)
-
learned to disallow non-fast-forward updates when
ff-only
is given to the--merge
option.gained a
--follow
option that controls how--merge
behaves, adding support for merging in the revision that is registered in the parent dataset rather than merging in the configured branch from the sibling.now provides a result record for merge events.
create-sibling now supports local paths as targets in addition to SSH URLs. (#4187)
siblings now
The rendering of command errors has been improved. (#4157)
save now
diff and save learned about scenarios where they could avoid unnecessary and expensive work. (#4526) (#4544) (#4549)
Calling diff without
--recursive
but with a path constraint within a subdataset (“/”) now traverses into the subdataset, as “/” would, restricting its report to “/”. (#4235)New option
datalad.annex.retry
controls how many times git-annex will retry on a failed transfer. It defaults to 3 and can be set to 0 to restore the previous behavior. (#4382)wtf now warns when the specified dataset does not exist. (#4331)
The
repr
andstr
output of the dataset and repo classes got a facelift. (#4420) (#4435) (#4439)The DataLad Singularity container now comes with p7zip-full.
DataLad emits a log message when the current working directory is resolved to a different location due to a symlink. This is now logged at the DEBUG rather than WARNING level, as it typically does not indicate a problem. (#4426)
DataLad now lets the caller know that
git annex init
is scanning for unlocked files, as this operation can be slow in some repositories. (#4316)The
log_progress
helper learned how to set the starting point to a non-zero value and how to update the total of an existing progress bar, two features needed for planned improvements to how some commands display their progress. (#4438)The
ExternalVersions
object, which is used to check versions of Python modules and external tools (e.g., git-annex), gained anadd
method that enables DataLad extensions and other third-party code to include other programs of interest. (#4441)All of the remaining spots that use GitPython have been rewritten without it. Most notably, this includes rewrites of the
clone
,fetch
, andpush
methods ofGitRepo
. (#4080) (#4087) (#4170) (#4171) (#4175) (#4172)When
GitRepo.commit
splits its operation across multiple calls to avoid exceeding the maximum command line length, it now amends to initial commit rather than creating multiple commits. (#4156)GitRepo
gained aget_corresponding_branch
method (which always returns None), allowing a caller to invoke the method without needing to check if the underlying repo class isGitRepo
orAnnexRepo
. (#4274)A new helper function
datalad.core.local.repo.repo_from_path
returns a repo class for a specified path. (#4273)New
AnnexRepo
methodlocalsync
performs agit annex sync
that disables external interaction and is particularly useful for propagating changes on an adjusted branch back to the main branch. (#4243)
0.12.7 (May 22, 2020) – .
Fixes
Requesting tailored output (
--output=tailored
) from a command with a custom result summary renderer produced repeated output. (#4463)A longstanding regression in argcomplete-based command-line completion for Bash has been fixed. You can enable completion by configuring a Bash startup file to run
eval "$(register-python-argcomplete datalad)"
or source DataLad’stools/cmdline-completion
. The latter should work for Zsh as well. (#4477)publish didn’t prevent
git-fetch
from recursing into submodules, leading to a failure when the registered submodule was not present locally and the submodule did not have a remote named ‘origin’. (#4560)addurls botched path handling when the file name format started with “./” and the call was made from a subdirectory of the dataset. (#4504)
Double dash options in manpages were unintentionally escaped. (#4332)
The check for HTTP authentication failures crashed in situations where content came in as bytes rather than unicode. (#4543)
A check in
AnnexRepo.whereis
could lead to a type error. (#4552)When installing a dataset to obtain a subdataset, get confusingly displayed a message that described the containing dataset as “underneath” the subdataset. (#4456)
A couple of Makefile rules didn’t properly quote paths. (#4481)
With DueCredit support enabled (
DUECREDIT_ENABLE=1
), the query for metadata information could flood the output with warnings if datasets didn’t have aggregated metadata. The warnings are now silenced, with the overall failure of a metadata call logged at the debug level. (#4568)
Enhancements and new features
0.12.6 (April 23, 2020) – .
Major refactoring and deprecations
The value of
datalad.support.annexrep.N_AUTO_JOBS
is no longer considered. The variable will be removed in a later release. (#4409)
Fixes
Staring with v0.12.0,
datalad save
recorded the current branch of a parent dataset as thebranch
value in the .gitmodules entry for a subdataset. This behavior is problematic for a few reasons and has been reverted. (#4375)The default for the
--jobs
option, “auto”, instructed DataLad to pass a value to git-annex’s--jobs
equal tomin(8, max(3, <number of CPUs>))
, which could lead to issues due to the large number of child processes spawned and file descriptors opened. To avoid this behavior,--jobs=auto
now results in git-annex being called with--jobs=1
by default. Configure the new optiondatalad.runtime.max-annex-jobs
to control the maximum value that will be considered when--jobs='auto'
. (#4409)Various commands have been adjusted to better handle the case where a remote’s HEAD ref points to an unborn branch. (#4370)
The code for parsing Git configuration did not follow Git’s behavior of accepting a key with no value as shorthand for key=true. (#4421)
AnnexRepo.info
needed a compatibility update for a change in how git-annex reports file names. (#4431)create-sibling-github did not gracefully handle a token that did not have the necessary permissions. (#4400)
Enhancements and new features
search learned to use the query as a regular expression that restricts the keys that are shown for
--show-keys short
. (#4354)datalad <subcommand>
learned to point to the datalad-container extension when a subcommand from that extension is given but the extension is not installed. (#4400) (#4174)
0.12.5 (Apr 02, 2020) – a small step for datalad …
Fix some bugs and make the world an even better place.
Fixes
Our
log_progress
helper mishandled the initial display and step of the progress bar. (#4326)AnnexRepo.get_content_annexinfo
is designed to acceptinit=None
, but passing that led to an error. (#4330)Update a regular expression to handle an output change in Git v2.26.0. (#4328)
We now set
LC_MESSAGES
to ‘C’ while running git to avoid failures when parsing output that is marked for translation. (#4342)The helper for decoding JSON streams loaded the last line of input without decoding it if the line didn’t end with a new line, a regression introduced in the 0.12.0 release. (#4361)
The clone command failed to git-annex-init a fresh clone whenever it considered to add the origin of the origin as a remote. (#4367)
0.12.4 (Mar 19, 2020) – Windows?!
The main purpose of this release is to have one on PyPi that has no associated wheel to enable a working installation on Windows (#4315).
Fixes
The description of the
log.outputs
config switch did not keep up with code changes and incorrectly stated that the output would be logged at the DEBUG level; logging actually happens at a lower level. (#4317)
0.12.3 (March 16, 2020) – .
Updates for compatibility with the latest git-annex, along with a few miscellaneous fixes
Major refactoring and deprecations
All spots that raised a
NoDatasetArgumentFound
exception now raise aNoDatasetFound
exception to better reflect the situation: it is the dataset rather than the argument that is not found. For compatibility, the latter inherits from the former, but new code should prefer the latter. (#4285)
Fixes
Updates for compatibility with git-annex version 8.20200226. (#4214)
datalad export-to-figshare
failed to export if the generated title was fewer than three characters. It now queries the caller for the title and guards against titles that are too short. (#4140)Authentication was requested multiple times when git-annex launched parallel downloads from the
datalad
special remote. (#4308)At verbose logging levels, DataLad requests that git-annex display debugging information too. Work around a bug in git-annex that prevented that from happening. (#4212)
The internal command runner looked in the wrong place for some configuration variables, including
datalad.log.outputs
, resulting in the default value always being used. (#4194)publish failed when trying to publish to a git-lfs special remote for the first time. (#4200)
AnnexRepo.set_remote_url
is supposed to establish shared SSH connections but failed to do so. (#4262)
Enhancements and new features
The message provided when a command cannot determine what dataset to operate on has been improved. (#4285)
The “aws-s3” authentication type now allows specifying the host through “aws-s3_host”, which was needed to work around an authorization error due to a longstanding upstream bug. (#4239)
The xmp metadata extractor now recognizes “.wav” files.
0.12.2 (Jan 28, 2020) – Smoothen the ride
Mostly a bugfix release with various robustifications, but also makes the first step towards versioned dataset installation requests.
Major refactoring and deprecations
The minimum required version for GitPython is now 2.1.12. (#4070)
Fixes
The class for handling configuration values,
ConfigManager
, inappropriately considered the current working directory’s dataset, if any, for both reading and writing when instantiated withdataset=None
. This misbehavior is fairly inaccessible through typical use of DataLad. It affectsdatalad.cfg
, the top-level configuration instance that should not consider repository-specific values. It also affects Python users that callDataset
with a path that does not yet exist and persists until that dataset is created. (#4078)update saved the dataset when called with
--merge
, which is unnecessary and risks committing unrelated changes. (#3996)Confusing and irrelevant information about Python defaults have been dropped from the command-line help. (#4002)
The logic for automatically propagating the ‘origin’ remote when cloning a local source didn’t properly account for relative paths. (#4045)
Various fixes to file name handling and quoting on Windows. (#4049) (#4050)
When cloning failed, error lines were not bubbled up to the user in some scenarios. (#4060)
Enhancements and new features
-
now propagates the
reckless
mode from the superdataset when cloning a dataset into it. (#4037)gained support for
ria+<protocol>://
URLs that point to RIA stores. (#4022)learned to read “@version” from
ria+
URLs and install that version of a dataset (#4036) and to apply URL rewrites configured through Git’surl.*.insteadOf
mechanism (#4064).now copies
datalad.get.subdataset-source-candidate-<name>
options configured within the superdataset into the subdataset. This is particularly useful for RIA data stores. (#4073)
Archives are now (optionally) handled with 7-Zip instead of
patool
. 7-Zip will be used by default, butpatool
will be used on non-Windows systems if thedatalad.runtime.use-patool
option is set or the7z
executable is not found. (#4041)
0.12.1 (Jan 15, 2020) – Small bump after big bang
Fix some fallout after major release.
Fixes
0.12.0 (Jan 11, 2020) – Krakatoa
This release is the result of more than a year of development that includes fixes for a large number of issues, yielding more robust behavior across a wider range of use cases, and introduces major changes in API and behavior. It is the first release for which extensive user documentation is available in a dedicated DataLad Handbook. Python 3 (3.5 and later) is now the only supported Python flavor.
Major changes 0.12 vs 0.11
save fully replaces add (which is obsolete now, and will be removed in a future release).
A new Git-annex aware status command enables detailed inspection of dataset hierarchies. The previously available diff command has been adjusted to match status in argument semantics and behavior.
The ability to configure dataset procedures prior and after the execution of particular commands has been replaced by a flexible “hook” mechanism that is able to run arbitrary DataLad commands whenever command results are detected that match a specification.
Support of the Windows platform has been improved substantially. While performance and feature coverage on Windows still falls behind Unix-like systems, typical data consumer use cases, and standard dataset operations, such as create and save, are now working. Basic support for data provenance capture via run is also functional.
Support for Git-annex direct mode repositories has been removed, following the end of support in Git-annex itself.
The semantics of relative paths in command line arguments have changed. Previously, a call
datalad save --dataset /tmp/myds some/relpath
would have been interpreted as saving a file at/tmp/myds/some/relpath
into dataset/tmp/myds
. This has changed to saving$PWD/some/relpath
into dataset/tmp/myds
. More generally, relative paths are now always treated as relative to the current working directory, except for path arguments of Dataset class instance methods of the Python API. The resulting partial duplication of path specifications between path and dataset arguments is mitigated by the introduction of two special symbols that can be given as dataset argument:^
and^.
, which identify the topmost superdataset and the closest dataset that contains the working directory, respectively.The concept of a “core API” has been introduced. Commands situated in the module
datalad.core
(such as create, save, run, status, diff) receive additional scrutiny regarding API and implementation, and are meant to provide longer-term stability. Application developers are encouraged to preferentially build on these commands.
Major refactoring and deprecations since 0.12.0rc6
clone has been incorporated into the growing core API. The public
--alternative-source
parameter has been removed, and aclone_dataset
function with multi-source capabilities is provided instead. The--reckless
parameter can now take literal mode labels instead of just being a binary flag, but backwards compatibility is maintained.The
get_file_content
method ofGitRepo
was no longer used internally or in any known DataLad extensions and has been removed. (#3812)The function
get_dataset_root
has been replaced byrev_get_dataset_root
.rev_get_dataset_root
remains as a compatibility alias and will be removed in a later release. (#3815)The
add_sibling
module, marked obsolete in v0.6.0, has been removed. (#3871)mock
is no longer declared as an external dependency because we can rely on it being in the standard library now that our minimum required Python version is 3.5. (#3860)download-url now requires that directories be indicated with a trailing slash rather than interpreting a path as directory when it doesn’t exist. This avoids confusion that can result from typos and makes it possible to support directory targets that do not exist. (#3854)
The
dataset_only
argument of theConfigManager
class is deprecated. Usesource="dataset"
instead. (#3907)The
--proc-pre
and--proc-post
options have been removed, and configuration values fordatalad.COMMAND.proc-pre
anddatalad.COMMAND.proc-post
are no longer honored. The new result hook mechanism provides an alternative forproc-post
procedures. (#3963)
Fixes since 0.12.0rc6
publish crashed when called with a detached HEAD. It now aborts with an informative message. (#3804)
Since 0.12.0rc6 the call to update in siblings resulted in a spurious warning. (#3877)
siblings crashed if it encountered an annex repository that was marked as dead. (#3892)
The update of rerun in v0.12.0rc3 for the rewritten diff command didn’t account for a change in the output of
diff
, leading torerun --report
unintentionally including unchanged files in its diff values. (#3873)In 0.12.0rc5 download-url was updated to follow the new path handling logic, but its calls to AnnexRepo weren’t properly adjusted, resulting in incorrect path handling when the called from a dataset subdirectory. (#3850)
download-url called
git annex addurl
in a way that failed to register a URL when its header didn’t report the content size. (#3911)With Git v2.24.0, saving new subdatasets failed due to a bug in that Git release. (#3904)
With DataLad configured to stop on failure (e.g., specifying
--on-failure=stop
from the command line), a failing result record was not rendered. (#3863)Installing a subdataset yielded an “ok” status in cases where the repository was not yet in its final state, making it ineffective for a caller to operate on the repository in response to the result. (#3906)
The internal helper for converting git-annex’s JSON output did not relay information from the “error-messages” field. (#3931)
run-procedure reported relative paths that were confusingly not relative to the current directory in some cases. It now always reports absolute paths. (#3959)
diff inappropriately reported files as deleted in some cases when
to
was a value other thanNone
. (#3999)An assortment of fixes for Windows compatibility. (#3971) (#3974) (#3975) (#3976) (#3979)
Subdatasets installed from a source given by relative path will now have this relative path used as ‘url’ in their .gitmodules record, instead of an absolute path generated by Git. (#3538)
clone will now correctly interpret ‘~/…’ paths as absolute path specifications. (#3958)
run-procedure mistakenly reported a directory as a procedure. (#3793)
The cleanup for batched git-annex processes has been improved. (#3794) (#3851)
The function for adding a version ID to an AWS S3 URL doesn’t support URLs with an “s3://” scheme and raises a
NotImplementedError
exception when it encounters one. The function learned to return a URL untouched if an “s3://” URL comes in with a version ID. (#3842)A few spots needed to be adjusted for compatibility with git-annex’s new
--sameas
feature, which allows special remotes to share a data store. (#3856)The
swallow_logs
utility failed to capture some log messages due to an incompatibility with Python 3.7. (#3935)
Enhancements and new features since 0.12.0rc6
By default, datasets cloned from local source paths will now get a configured remote for any recursively discoverable ‘origin’ sibling that is also available from a local path in order to maximize automatic file availability across local annexes. (#3926)
The new result hooks mechanism allows callers to specify, via local Git configuration values, DataLad command calls that will be triggered in response to matching result records (i.e., what you see when you call a command with
-f json_pp
). (#3903)The command interface classes learned to use a new
_examples_
attribute to render documentation examples for both the Python and command-line API. (#3821)Candidate URLs for cloning a submodule can now be generated based on configured templates that have access to various properties of the submodule, including its dataset ID. (#3828)
DataLad’s check that the user’s Git identity is configured has been sped up and now considers the appropriate environment variables as well. (#3807)
The
tag
method ofGitRepo
can now tag revisions other thanHEAD
and accepts a list of arbitrarygit tag
options. (#3787)When
get
clones a subdataset and the subdataset’s HEAD differs from the commit that is registered in the parent, the active branch of the subdataset is moved to the registered commit if the registered commit is an ancestor of the subdataset’s HEAD commit. This handling has been moved to a more central location withinGitRepo
, and now applies to anyupdate_submodule(..., init=True)
call. (#3831)The output of
datalad -h
has been reformatted to improve readability. (#3862)run-procedure learned to provide and render more information about discovered procedures, including whether the procedure is overridden by another procedure with the same base name. (#3960)
-
records the active branch in the superdataset when registering a new subdataset.
calls
git annex sync
when saving a dataset on an adjusted branch so that the changes are brought into the mainline branch.
subdatasets now aborts when its
dataset
argument points to a non-existent dataset. (#3940)wtf now
The
ConfigManager
classlearned to exclude
.datalad/config
as a source of configuration values, restricting the sources to standard Git configuration files, when called withsource="local"
. (#3907)accepts a value of “override” for its
where
argument to allow Python callers to more convenient override configuration. (#3970)
Commands now accept a
dataset
value of “^.” as shorthand for “the dataset to which the current directory belongs”. (#3242)
0.12.0rc6 (Oct 19, 2019) – some releases are better than the others
bet we will fix some bugs and make a world even a better place.
Major refactoring and deprecations
DataLad no longer supports Python 2. The minimum supported version of Python is now 3.5. (#3629)
Much of the user-focused content at http://docs.datalad.org has been removed in favor of more up to date and complete material available in the DataLad Handbook. Going forward, the plan is to restrict http://docs.datalad.org to technical documentation geared at developers. (#3678)
update used to allow the caller to specify which dataset(s) to update as a
PATH
argument or via the the--dataset
option; now only the latter is supported. Path arguments only serve to restrict which subdataset are updated when operating recursively. (#3700)Result records from a get call no longer have a “state” key. (#3746)
update and get no longer support operating on independent hierarchies of datasets. (#3700) (#3746)
The run update in 0.12.0rc4 for the new path resolution logic broke the handling of inputs and outputs for calls from a subdirectory. (#3747)
The
is_submodule_modified
method ofGitRepo
as well as two helper functions in gitrepo.py,kwargs_to_options
andsplit_remote_branch
, were no longer used internally or in any known DataLad extensions and have been removed. (#3702) (#3704)The
only_remote
option ofGitRepo.is_with_annex
was not used internally or in any known extensions and has been dropped. (#3768)The
get_tags
method ofGitRepo
used to sort tags by committer date. It now sorts them by the tagger date for annotated tags and the committer date for lightweight tags. (#3715)The
rev_resolve_path
substitutedresolve_path
helper. (#3797)
Fixes
Do not erroneously discover directory as a procedure. (#3793)
Correctly extract version from manpage to trigger use of manpages for
--help
. (#3798)The
cfg_yoda
procedure saved all modifications in the repository rather than saving only the files it modified. (#3680)Some spots in the documentation that were supposed appear as two hyphens were incorrectly rendered in the HTML output en-dashs. (#3692)
create, install, and clone treated paths as relative to the dataset even when the string form was given, violating the new path handling rules. (#3749) (#3777) (#3780)
Providing the “^” shortcut to
--dataset
didn’t work properly when called from a subdirectory of a subdataset. (#3772)We failed to propagate some errors from git-annex when working with its JSON output. (#3751)
With the Python API, callers are allowed to pass a string or list of strings as the
cfg_proc
argument to create, but the string form was mishandled. (#3761)Incorrect command quoting for SSH calls on Windows that rendered basic SSH-related functionality (e.g., sshrun) on Windows unusable. (#3688)
Annex JSON result handling assumed platform-specific paths on Windows instead of the POSIX-style that is happening across all platforms. (#3719)
path_is_under()
was incapable of comparing Windows paths with different drive letters. (#3728)
Enhancements and new features
Provide a collection of “public”
call_git*
helpers within GitRepo and replace use of “private” and less specific_git_custom_command
calls. (#3791)status gained a
--report-filetype
. Setting it to “raw” can give a performance boost for the price of no longer distinguishing symlinks that point to annexed content from other symlinks. (#3701)save disables file type reporting by status to improve performance. (#3712)
-
now extends its result records with a
contains
field that lists whichcontains
arguments matched a given subdataset.yields an ‘impossible’ result record when a
contains
argument wasn’t matched to any of the reported subdatasets.
install now shows more readable output when cloning fails. (#3775)
SSHConnection
now displays a more informative error message when it cannot start theControlMaster
process. (#3776)If the new configuration option
datalad.log.result-level
is set to a single level, all result records will be logged at that level. If you’ve been bothered by DataLad’s double reporting of failures, consider setting this to “debug”. (#3754)Configuration values from
datalad -c OPTION=VALUE ...
are now validated to provide better errors. (#3695)rerun learned how to handle history with merges. As was already the case when cherry picking non-run commits, re-creating merges may results in conflicts, and
rerun
does not yet provide an interface to let the user handle these. (#2754)The
fsck
method ofAnnexRepo
has been enhanced to expose more features of the underlyinggit fsck
command. (#3693)GitRepo
now has afor_each_ref_
method that wrapsgit for-each-ref
, which is used in various spots that used to rely on GitPython functionality. (#3705)Do not pretend to be able to work in optimized (
python -O
) mode, crash early with an informative message. (#3803)
0.12.0rc5 (September 04, 2019) – .
Various fixes and enhancements that bring the 0.12.0 release closer.
Major refactoring and deprecations
The two modules below have a new home. The old locations still exist as compatibility shims and will be removed in a future release.
The
lock
method ofAnnexRepo
and theoptions
parameter ofAnnexRepo.unlock
were unused internally and have been removed. (#3459)The
get_submodules
method ofGitRepo
has been rewritten without GitPython. When the newcompat
flag is true (the current default), the method returns a value that is compatible with the old return value. This backwards-compatible return value and thecompat
flag will be removed in a future release. (#3508)The logic for resolving relative paths given to a command has changed (#3435). The new rule is that relative paths are taken as relative to the dataset only if a dataset instance is passed by the caller. In all other scenarios they’re considered relative to the current directory.
The main user-visible difference from the command line is that using the
--dataset
argument does not result in relative paths being taken as relative to the specified dataset. (The undocumented distinction between “rel/path” and “./rel/path” no longer exists.)All commands under
datalad.core
anddatalad.local
, as well asunlock
andaddurls
, follow the new logic. The goal is for all commands to eventually do so.
Fixes
The function for loading JSON streams wasn’t clever enough to handle content that included a Unicode line separator like U2028. (#3524)
When unlock was called without an explicit target (i.e., a directory or no paths at all), the call failed if any of the files did not have content present. (#3459)
AnnexRepo.get_content_info
failed in the rare case of a key without size information. (#3534)save ignored
--on-failure
in its underlying call to status. (#3470)Calling remove with a subdirectory displayed spurious warnings about the subdirectory files not existing. (#3586)
Our processing of
git-annex --json
output mishandled info messages from special remotes. (#3546)The base downloader had some error handling that wasn’t compatible with Python 3. (#3622)
Fixed a number of Unicode py2-compatibility issues. (#3602)
AnnexRepo.get_content_annexinfo
did not properly chunk file arguments to avoid exceeding the command-line character limit. (#3587)
Enhancements and new features
New command
create-sibling-gitlab
provides an interface for creating a publication target on a GitLab instance. (#3447)-
now supports path-constrained queries in the same manner as commands like
save
andstatus
gained a
--contains=PATH
option that can be used to restrict the output to datasets that include a specific path.now narrows the listed subdatasets to those underneath the current directory when called with no arguments
status learned to accept a plain
--annex
(no value) as shorthand for--annex basic
. (#3534)The
.dirty
property ofGitRepo
andAnnexRepo
has been sped up. (#3460)The
get_content_info
method ofGitRepo
, used bystatus
and commands that depend onstatus
, now restricts its git calls to a subset of files, if possible, for a performance gain in repositories with many files. (#3508)Extensions that do not provide a command, such as those that provide only metadata extractors, are now supported. (#3531)
When calling git-annex with
--json
, we log standard error at the debug level rather than the warning level if a non-zero exit is expected behavior. (#3518)create no longer refuses to create a new dataset in the odd scenario of an empty .git/ directory upstairs. (#3475)
As of v2.22.0 Git treats a sub-repository on an unborn branch as a repository rather than as a directory. Our documentation and tests have been updated appropriately. (#3476)
addurls learned to accept a
--cfg-proc
value and pass it to itscreate
calls. (#3562)
0.12.0rc4 (May 15, 2019) – the revolution is over
With the replacement of the save
command implementation with
rev-save
the revolution effort is now over, and the set of key
commands for local dataset operations (create
, run
, save
,
status
, diff
) is now complete. This new core API is available
from datalad.core.local
(and also via datalad.api
, as any other
command).
Major refactoring and deprecations
The
add
command is now deprecated. It will be removed in a future release.
Fixes
Enhancements and new features
SSHConnection
now offers methods for file upload and download (get()
,put()
. The previouscopy()
method only supported upload and was discontinued (#3401)
0.12.0rc3 (May 07, 2019) – the revolution continues
Continues API consolidation and replaces the create
and diff
command with more performant implementations.
Major refactoring and deprecations
The previous
diff
command has been replaced by the diff variant from the datalad-revolution extension. (#3366)rev-create
has been renamed tocreate
, and the previouscreate
has been removed. (#3383)The procedure
setup_yoda_dataset
has been renamed tocfg_yoda
(#3353).The
--nosave
ofaddurls
now affects only added content, not newly created subdatasets (#3259).Dataset.get_subdatasets
(deprecated since v0.9.0) has been removed. (#3336)The
.is_dirty
method ofGitRepo
andAnnexRepo
has been replaced by.status
or, for a subset of cases, the.dirty
property. (#3330)AnnexRepo.get_status
has been replaced byAnnexRepo.status
. (#3330)
Fixes
-
reported on directories that contained only ignored files (#3238)
gave a confusing failure when called from a subdataset with an explicitly specified dataset argument and “.” as a path (#3325)
misleadingly claimed that the locally present content size was zero when
--annex basic
was specified (#3378)
An informative error wasn’t given when a download provider was invalid. (#3258)
Calling
rev-save PATH
saved unspecified untracked subdatasets. (#3288)The available choices for command-line options that take values are now displayed more consistently in the help output. (#3326)
The new pathlib-based code had various encoding issues on Python 2. (#3332)
Enhancements and new features
wtf now includes information about the Python version. (#3255)
When operating in an annex repository, checking whether git-annex is available is now delayed until a call to git-annex is actually needed, allowing systems without git-annex to operate on annex repositories in a restricted fashion. (#3274)
The
load_stream
on helper now supports auto-detection of compressed files. (#3289)create
(formerlyrev-create
)AnnexRepo.set_metadata
now returns a list whileAnnexRepo.set_metadata_
returns a generator, a behavior which is consistent with theadd
andadd_
method pair. (#3298)AnnexRepo.get_metadata
now supports batch querying of known annex files. Note, however, that callers should carefully validate the input paths because the batch call will silently hang if given non-annex files. (#3364)-
now reports a “bytesize” field for files tracked by Git (#3299)
gained a new option
eval_subdataset_state
that controls how the subdataset state is evaluated. Depending on the information you need, you can select a less expensive mode to makestatus
faster. (#3324)colors deleted files “red” (#3334)
Querying repository content is faster due to batching of
git cat-file
calls. (#3301)The dataset ID of a subdataset is now recorded in the superdataset. (#3304)
GitRepo.diffstatus
GitRepo.get_content_info
now supports disabling the file type evaluation, which gives a performance boost in cases where this information isn’t needed. (#3362)The XMP metadata extractor now filters based on file name to improve its performance. (#3329)
0.12.0rc2 (Mar 18, 2019) – revolution!
Fixes
GitRepo.dirty
does not report on nested empty directories (#3196).GitRepo.save()
reports results on deleted files.
Enhancements and new features
Absorb a new set of core commands from the datalad-revolution extension:
rev-status
: likegit status
, but simpler and working with dataset hierarchiesrev-save
: a 2-in-1 replacement for save and addrev-create
: a ~30% faster create
JSON support tools can now read and write compressed files.
0.12.0rc1 (Mar 03, 2019) – to boldly go …
Major refactoring and deprecations
Discontinued support for git-annex direct-mode (also no longer supported upstream).
Enhancements and new features
Dataset and Repo object instances are now hashable, and can be created based on pathlib Path object instances
Imported various additional methods for the Repo classes to query information and save changes.
0.11.8 (Oct 11, 2019) – annex-we-are-catching-up
Fixes
Enhancements and new features
0.11.7 (Sep 06, 2019) – python2-we-still-love-you-but-…
Primarily bugfixes with some optimizations and refactorings.
Fixes
-
now provides better handling when the URL file isn’t in the expected format. (#3579)
always considered a relative file for the URL file argument as relative to the current working directory, which goes against the convention used by other commands of taking relative paths as relative to the dataset argument. (#3582)
-
hard coded “python” when formatting the command for non-executable procedures ending with “.py”.
sys.executable
is now used. (#3624)failed if arguments needed more complicated quoting than simply surrounding the value with double quotes. This has been resolved for systems that support
shlex.quote
, but note that on Windows values are left unquoted. (#3626)
siblings now displays an informative error message if a local path is given to
--url
but--name
isn’t specified. (#3555)sshrun, the command DataLad uses for
GIT_SSH_COMMAND
, didn’t support all the parameters that Git expects it to. (#3616)Fixed a number of Unicode py2-compatibility issues. (#3597)
download-url now will create leading directories of the output path if they do not exist (#3646)
Enhancements and new features
The annotate-paths helper now caches subdatasets it has seen to avoid unnecessary calls. (#3570)
A repeated configuration query has been dropped from the handling of
--proc-pre
and--proc-post
. (#3576)Calls to
git annex find
now use--in=.
instead of the alias--in=here
to take advantage of an optimization that git-annex (as of the current release, 7.20190730) applies only to the former. (#3574)addurls now suggests close matches when the URL or file format contains an unknown field. (#3594)
Shared logic used in the setup.py files of DataLad and its extensions has been moved to modules in the _datalad_build_support/ directory. (#3600)
Get ready for upcoming git-annex dropping support for direct mode (#3631)
0.11.6 (Jul 30, 2019) – am I the last of 0.11.x?
Primarily bug fixes to achieve more robust performance
Fixes
Our tests needed various adjustments to keep up with upstream changes in Travis and Git. (#3479) (#3492) (#3493)
AnnexRepo.is_special_annex_remote
was too selective in what it considered to be a special remote. (#3499)We now provide information about unexpected output when git-annex is called with
--json
. (#3516)Exception logging in the
__del__
method ofGitRepo
andAnnexRepo
no longer fails if the names it needs are no longer bound. (#3527)addurls botched the construction of subdataset paths that were more than two levels deep and failed to create datasets in a reliable, breadth-first order. (#3561)
Cloning a
type=git
special remote showed a spurious warning about the remote not being enabled. (#3547)
Enhancements and new features
For calls to git and git-annex, we disable automatic garbage collection due to past issues with GitPython’s state becoming stale, but doing so results in a larger .git/objects/ directory that isn’t cleaned up until garbage collection is triggered outside of DataLad. Tests with the latest GitPython didn’t reveal any state issues, so we’ve re-enabled automatic garbage collection. (#3458)
rerun learned an
--explicit
flag, which it relays to its calls to [run][[]]. This makes it possible to callrerun
in a dirty working tree (#3498).The metadata command aborts earlier if a metadata extractor is unavailable. (#3525)
0.11.5 (May 23, 2019) – stability is not overrated
Should be faster and less buggy, with a few enhancements.
Fixes
-
Siblings are no longer configured with a post-update hook unless a web interface is requested with
--ui
.git submodule update --init
is no longer called from the post-update hook.If
--inherit
is given for a dataset without a superdataset, a warning is now given instead of raising an error.
The internal command runner failed on Python 2 when its
env
argument had unicode values. (#3332)The safeguard that prevents creating a dataset in a subdirectory that already contains tracked files for another repository failed on Git versions before 2.14. For older Git versions, we now warn the caller that the safeguard is not active. (#3347)
A regression introduced in v0.11.1 prevented save from committing changes under a subdirectory when the subdirectory was specified as a path argument. (#3106)
A workaround introduced in v0.11.1 made it possible for save to do a partial commit with an annex file that has gone below the
annex.largefiles
threshold. The logic of this workaround was faulty, leading to files being displayed as typechanged in the index following the commit. (#3365)The resolve_path() helper confused paths that had a semicolon for SSH RIs. (#3425)
The detection of SSH RIs has been improved. (#3425)
Enhancements and new features
The internal command runner was too aggressive in its decision to sleep. (#3322)
The “INFO” label in log messages now retains the default text color for the terminal rather than using white, which only worked well for terminals with dark backgrounds. (#3334)
A short flag
-R
is now available for the--recursion-limit
flag, a flag shared by several subcommands. (#3340)The authentication logic for create-sibling-github has been revamped and now supports 2FA. (#3180)
New configuration option
datalad.ui.progressbar
can be used to configure the default backend for progress reporting (“none”, for example, results in no progress bars being shown). (#3396)A new progress backend, available by setting datalad.ui.progressbar to “log”, replaces progress bars with a log message upon completion of an action. (#3396)
DataLad learned to consult the NO_COLOR environment variable and the new
datalad.ui.color
configuration option when deciding to color output. The default value, “auto”, retains the current behavior of coloring output if attached to a TTY (#3407).clean now removes annex transfer directories, which is useful for cleaning up failed downloads. (#3374)
clone no longer refuses to clone into a local path that looks like a URL, making its behavior consistent with
git clone
. (#3425)-
Learned to fall back to the
dist
package ifplatform.dist
, which has been removed in the yet-to-be-release Python 3.8, does not exist. (#3439)Gained a
--section
option for limiting the output to specific sections and a--decor
option, which currently knows how to format the output as GitHub’s<details>
section. (#3440)
0.11.4 (Mar 18, 2019) – get-ready
Largely a bug fix release with a few enhancements
Important
0.11.x series will be the last one with support for direct mode of git-annex which is used on crippled (no symlinks and no locking) filesystems. v7 repositories should be used instead.
Fixes
Extraction of .gz files is broken without p7zip installed. We now abort with an informative error in this situation. (#3176)
Committing failed in some cases because we didn’t ensure that the path passed to
git read-tree --index-output=...
resided on the same filesystem as the repository. (#3181)Some pointless warnings during metadata aggregation have been eliminated. (#3186)
With Python 3 the LORIS token authenticator did not properly decode a response (#3205).
With Python 3 downloaders unnecessarily decoded the response when getting the status, leading to an encoding error. (#3210)
In some cases, our internal command Runner did not adjust the environment’s
PWD
to match the current working directory specified with thecwd
parameter. (#3215)The specification of the pyliblzma dependency was broken. (#3220)
search displayed an uninformative blank log message in some cases. (#3222)
The logic for finding the location of the aggregate metadata DB anchored the search path incorrectly, leading to a spurious warning. (#3241)
Some progress bars were still displayed when stdout and stderr were not attached to a tty. (#3281)
Check for stdin/out/err to not be closed before checking for
.isatty
. (#3268)
Enhancements and new features
Creating a new repository now aborts if any of the files in the directory are tracked by a repository in a parent directory. (#3211)
run learned to replace the
{tmpdir}
placeholder in commands with a temporary directory. (#3223)duecredit support has been added for citing DataLad itself as well as datasets that an analysis uses. (#3184)
The
eval_results
interface helper unintentionally modified one of its arguments. (#3249)A few DataLad constants have been added, changed, or renamed (#3250):
HANDLE_META_DIR
is nowDATALAD_DOTDIR
. The old name should be considered deprecated.METADATA_DIR
now refers toDATALAD_DOTDIR/metadata
rather thanDATALAD_DOTDIR/meta
(which is still available asOLDMETADATA_DIR
).The new
DATASET_METADATA_FILE
refers toMETADATA_DIR/dataset.json
.The new
DATASET_CONFIG_FILE
refers toDATALAD_DOTDIR/config
.METADATA_FILENAME
has been renamed toOLDMETADATA_FILENAME
.
0.11.3 (Feb 19, 2019) – read-me-gently
Just a few of important fixes and minor enhancements.
Fixes
The logic for setting the maximum command line length now works around Python 3.4 returning an unreasonably high value for
SC_ARG_MAX
on Debian systems. (#3165)DataLad commands that are conceptually “read-only”, such as
datalad ls -L
, can fail when the caller lacks write permissions because git-annex tries merging remote git-annex branches to update information about availability. DataLad now disablesannex.merge-annex-branches
in some common “read-only” scenarios to avoid these failures. (#3164)
Enhancements and new features
Accessing an “unbound” dataset method now automatically imports the necessary module rather than requiring an explicit import from the Python caller. For example, calling
Dataset.add
no longer needs to be preceded byfrom datalad.distribution.add import Add
or an import ofdatalad.api
. (#3156)Configuring the new variable
datalad.ssh.identityfile
instructs DataLad to pass a value to the-i
option ofssh
. (#3149) (#3168)
0.11.2 (Feb 07, 2019) – live-long-and-prosper
A variety of bugfixes and enhancements
Major refactoring and deprecations
Fixes
Improved handling of long commands:
The code that inspected
SC_ARG_MAX
didn’t check that the reported value was a sensible, positive number. (#3025)More commands that invoke
git
andgit-annex
with file arguments learned to split up the command calls when it is likely that the command would fail due to exceeding the maximum supported length. (#3138)
The
setup_yoda_dataset
procedure created a malformed .gitattributes line. (#3057)download-url unnecessarily tried to infer the dataset when
--no-save
was given. (#3029)rerun aborted too late and with a confusing message when a ref specified via
--onto
didn’t exist. (#3019)run:
run
didn’t preserve the current directory prefix (“./”) on inputs and outputs, which is problematic if the caller relies on this representation when formatting the command. (#3037)Fixed a number of unicode py2-compatibility issues. (#3035) (#3046)
To proceed with a failed command, the user was confusingly instructed to use
save
instead ofadd
even thoughrun
usesadd
underneath. (#3080)
Fixed a case where the helper class for checking external modules incorrectly reported a module as unknown. (#3051)
add-archive-content mishandled the archive path when the leading path contained a symlink. (#3058)
Following denied access, the credential code failed to consider a scenario, leading to a type error rather than an appropriate error message. (#3091)
Some tests failed when executed from a
git worktree
checkout of the source repository. (#3129)During metadata extraction, batched annex processes weren’t properly terminated, leading to issues on Windows. (#3137)
add incorrectly handled an “invalid repository” exception when trying to add a submodule. (#3141)
Pass
GIT_SSH_VARIANT=ssh
to git processes to be able to specify alternative ports in SSH urls
Enhancements and new features
search learned to suggest closely matching keys if there are no hits. (#3089)
-
gained a
--group
option so that the caller can specify the file system group for the repository. (#3098)now understands SSH URLs that have a port in them (i.e. the “ssh://[user@]host.xz[:port]/path/to/repo.git/” syntax mentioned in
man git-fetch
). (#3146)
Interface classes can now override the default renderer for summarizing results. (#3061)
run:
--input
and--output
can now be shortened to-i
and-o
. (#3066)Placeholders such as “{inputs}” are now expanded in the command that is shown in the commit message subject. (#3065)
interface.run.run_command
gained anextra_inputs
argument so that wrappers like datalad-container can specify additional inputs that aren’t considered when formatting the command string. (#3038)“–” can now be used to separate options for
run
and those for the command in ambiguous cases. (#3119)
The utilities
create_tree
andok_file_has_content
now support “.gz” files. (#3049)The Singularity container for 0.11.1 now uses nd_freeze to make its builds reproducible.
A publications page has been added to the documentation. (#3099)
GitRepo.set_gitattributes
now accepts amode
argument that controls whether the .gitattributes file is appended to (default) or overwritten. (#3115)datalad --help
now avoids usingman
so that the list of subcommands is shown. (#3124)
0.11.1 (Nov 26, 2018) – v7-better-than-v6
Rushed out bugfix release to stay fully compatible with recent git-annex which introduced v7 to replace v6.
Fixes
install: be able to install recursively into a dataset (#2982)
save: be able to commit/save changes whenever files potentially could have swapped their storage between git and annex (#1651) (#2752) (#3009)
[aggregate-metadata][]:
dataset’s itself is now not “aggregated” if specific paths are provided for aggregation (#3002). That resolves the issue of
-r
invocation aggregating all subdatasets of the specified dataset as wellalso compare/verify the actual content checksum of aggregated metadata while considering subdataset metadata for re-aggregation (#3007)
annex
commands are now chunked assuming 50% “safety margin” on the maximal command line length. Should resolve crashes while operating of too many files at ones (#3001)run
sidecar config processing (#2991)no double trailing period in docs (#2984)
correct identification of the repository with symlinks in the paths in the tests (#2972)
re-evaluation of dataset properties in case of dataset changes (#2946)
[text2git][] procedure to use
ds.repo.set_gitattributes
(#2974) (#2954)Switch to use plain
os.getcwd()
if inconsistency with env var$PWD
is detected (#2914)Make sure that credential defined in env var takes precedence (#2960) (#2950)
Enhancements and new features
shub://datalad/datalad:git-annex-dev provides a Debian buster Singularity image with build environment for git-annex.
tools/bisect-git-annex
provides a helper for runninggit bisect
on git-annex using that Singularity container (#2995)Added
.zenodo.json
for better integration with Zenodo for citationrun-procedure now provides names and help messages with a custom renderer for (#2993)
Documentation: point to datalad-revolution extension (prototype of the greater DataLad future)
-
support injecting of a detached command (#2937)
annex
metadata extractor now extractsannex.key
metadata record. Should allow now to identify uses of specific files etc (#2952)Test that we can install from http://datasets.datalad.org
Proper rendering of
CommandError
(e.g. in case of “out of space” error) (#2958)
0.11.0 (Oct 23, 2018) – Soon-to-be-perfect
git-annex 6.20180913 (or later) is now required - provides a number of fixes for v6 mode operations etc.
Major refactoring and deprecations
datalad.consts.LOCAL_CENTRAL_PATH
constant was deprecated in favor ofdatalad.locations.default-dataset
configuration variable (#2835)
Minor refactoring
"notneeded"
messages are no longer reported by default results rendererrun no longer shows commit instructions upon command failure when
explicit
is true and no outputs are specified (#2922)get_git_dir
moved into GitRepo (#2886)_gitpy_custom_call
removed from GitRepo (#2894)GitRepo.get_merge_base
argument is now calledcommitishes
instead oftreeishes
(#2903)
Fixes
update should not leave the dataset in non-clean state (#2858) and some other enhancements (#2859)
Fixed chunking of the long command lines to account for decorators and other arguments (#2864)
Progress bar should not crash the process on some missing progress information (#2891)
Default value for
jobs
set to be"auto"
(notNone
) to take advantage of possible parallel get if in-g
mode (#2861)wtf must not crash if
git-annex
is not installed etc (#2865), (#2865), (#2918), (#2917)Fixed paths (with spaces etc) handling while reporting annex error output (#2892), (#2893)
__del__
should not access.repo
but._repo
to avoid attempts for reinstantiation etc (#2901)Fix up submodule
.git
right inGitRepo.add_submodule
to avoid added submodules being non git-annex friendly (#2909), (#2904)-
now will provide dataset into the procedure if called within dataset
will not crash if procedure is an executable without
.py
or.sh
suffixes
Use centralized
.gitattributes
handling while setting annex backend (#2912)GlobbedPaths.expand(..., full=True)
incorrectly returned relative paths when called more than once (#2921)
Enhancements and new features
Report progress on clone when installing from “smart” git servers (#2876)
Stale/unused
sth_like_file_has_content
was removed (#2860)Enhancements to search to operate on “improved” metadata layouts (#2878)
Output of
git annex init
operation is now logged (#2881)New
-
procedures can now recursively be discovered in subdatasets as well. The uppermost has highest priority
Procedures in user and system locations now take precedence over those in datasets.
0.10.3.1 (Sep 13, 2018) – Nothing-is-perfect
Emergency bugfix to address forgotten boost of version in
datalad/version.py
.
0.10.3 (Sep 13, 2018) – Almost-perfect
This is largely a bugfix release which addressed many (but not yet all)
issues of working with git-annex direct and version 6 modes, and
operation on Windows in general. Among enhancements you will see the
support of public S3 buckets (even with periods in their names), ability
to configure new providers interactively, and improved egrep
search
backend.
Although we do not require with this release, it is recommended to make
sure that you are using a recent git-annex
since it also had a
variety of fixes and enhancements in the past months.
Fixes
Parsing of combined short options has been broken since DataLad v0.10.0. (#2710)
The
datalad save
instructions shown bydatalad run
for a command with a non-zero exit were incorrectly formatted. (#2692)Decompression of zip files (e.g., through
datalad add-archive-content
) failed on Python 3. (#2702)Windows:
Internal git fetch calls have been updated to work around a GitPython
BadName
issue. (#2712), (#2794)The progress bar for annex file transferring was unable to handle an empty file. (#2717)
datalad add-readme
halted when no aggregated metadata was found rather than displaying a warning. (#2731)datalad rerun
failed if--onto
was specified and the history contained no run commits. (#2761)Processing of a command’s results failed on a result record with a missing value (e.g., absent field or subfield in metadata). Now the missing value is rendered as “N/A”. (#2725).
A couple of documentation links in the “Delineation from related solutions” were misformatted. (#2773)
With the latest git-annex, several known V6 failures are no longer an issue. (#2777)
In direct mode, commit changes would often commit annexed content as regular Git files. A new approach fixes this and resolves a good number of known failures. (#2770)
The reporting of command results failed if the current working directory was removed (e.g., after an unsuccessful
install
). (#2788)When installing into an existing empty directory,
datalad install
removed the directory after a failed clone. (#2788)datalad run
incorrectly handled inputs and outputs for paths with spaces and other characters that require shell escaping. (#2798)Globbing inputs and outputs for
datalad run
didn’t work correctly if a subdataset wasn’t installed. (#2796)Minor (in)compatibility with git 2.19 - (no) trailing period in an error message now. (#2815)
Enhancements and new features
Anonymous access is now supported for S3 and other downloaders. (#2708)
A new interface is available to ease setting up new providers. (#2708)
Metadata: changes to egrep mode search (#2735)
Queries in egrep mode are now case-sensitive when the query contains any uppercase letters and are case-insensitive otherwise. The new mode egrepcs can be used to perform a case-sensitive query with all lower-case letters.
Search can now be limited to a specific key.
Multiple queries (list of expressions) are evaluated using AND to determine whether something is a hit.
A single multi-field query (e.g.,
pa*:findme
) is a hit, when any matching field matches the query.All matching key/value combinations across all (multi-field) queries are reported in the query_matched result field.
egrep mode now shows all hits rather than limiting the results to the top 20 hits.
The documentation on how to format commands for
datalad run
has been improved. (#2703)The method for determining the current working directory on Windows has been improved. (#2707)
datalad --version
now simply shows the version without the license. (#2733)datalad export-archive
learned to export under an existing directory via its--filename
option. (#2723)datalad export-to-figshare
now generates the zip archive in the root of the dataset unless--filename
is specified. (#2723)After importing
datalad.api
,help(datalad.api)
(ordatalad.api?
in IPython) now shows a summary of the available DataLad commands. (#2728)Support for using
datalad
from IPython has been improved. (#2722)datalad wtf
now returns structured data and reports the version of each extension. (#2741)The internal handling of gitattributes information has been improved. A user-visible consequence is that
datalad create --force
no longer duplicates existing attributes. (#2744)The “annex” metadata extractor can now be used even when no content is present. (#2724)
The
add_url_to_file
method (called by commands likedatalad download-url
anddatalad add-archive-content
) learned how to display a progress bar. (#2738)
0.10.2 (Jul 09, 2018) – Thesecuriestever
Primarily a bugfix release to accommodate recent git-annex release forbidding file:// and http://localhost/ URLs which might lead to revealing private files if annex is publicly shared.
Fixes
fixed testing to be compatible with recent git-annex (6.20180626)
download-url will now download to current directory instead of the top of the dataset
Enhancements and new features
do not quote ~ in URLs to be consistent with quote implementation in Python 3.7 which now follows RFC 3986
run support for user-configured placeholder values
documentation on native git-annex metadata support
handle 401 errors from LORIS tokens
yoda
procedure will instantiateREADME.md
--discover
option added to run-procedure to list available procedures
0.10.1 (Jun 17, 2018) – OHBM polish
The is a minor bugfix release.
Fixes
Be able to use backports.lzma as a drop-in replacement for pyliblzma.
Give help when not specifying a procedure name in
run-procedure
.Abort early when a downloader received no filename.
Avoid
rerun
error when trying to unlock non-available files.
0.10.0 (Jun 09, 2018) – The Release
This release is a major leap forward in metadata support.
Major refactoring and deprecations
Metadata
Prior metadata provided by datasets under
.datalad/meta
is no longer used or supported. Metadata must be reaggregated using 0.10 versionMetadata extractor types are no longer auto-guessed and must be explicitly specified in
datalad.metadata.nativetype
config (could contain multiple values)Metadata aggregation of a dataset hierarchy no longer updates all datasets in the tree with new metadata. Instead, only the target dataset is updated. This behavior can be changed via the –update-mode switch. The new default prevents needless modification of (3rd-party) subdatasets.
Neuroimaging metadata support has been moved into a dedicated extension: https://github.com/datalad/datalad-neuroimaging
Crawler
moved into a dedicated extension: https://github.com/datalad/datalad-crawler
export_tarball
plugin has been generalized toexport_archive
and can now also generate ZIP archives.By default a dataset X is now only considered to be a super-dataset of another dataset Y, if Y is also a registered subdataset of X.
Fixes
A number of fixes did not make it into the 0.9.x series:
Dynamic configuration overrides via the
-c
option were not in effect.save
is now more robust with respect to invocation in subdirectories of a dataset.unlock
now reports correct paths when running in a dataset subdirectory.get
is more robust to path that contain symbolic links.symlinks to subdatasets of a dataset are now correctly treated as a symlink, and not as a subdataset
add
now correctly saves staged subdataset additions.Running
datalad save
in a dataset no longer adds untracked content to the dataset. In order to add content a path has to be given, e.g.datalad save .
wtf
now works reliably with a DataLad that wasn’t installed from Git (but, e.g., via pip)More robust URL handling in
simple_with_archives
crawler pipeline.
Enhancements and new features
Support for DataLad extension that can contribute API components from 3rd-party sources, incl. commands, metadata extractors, and test case implementations. See https://github.com/datalad/datalad-extension-template for a demo extension.
Metadata (everything has changed!)
Metadata extraction and aggregation is now supported for datasets and individual files.
Metadata query via
search
can now discover individual files.Extracted metadata can now be stored in XZ compressed files, is optionally annexed (when exceeding a configurable size threshold), and obtained on demand (new configuration option
datalad.metadata.create-aggregate-annex-limit
).Status and availability of aggregated metadata can now be reported via
metadata --get-aggregates
New configuration option
datalad.metadata.maxfieldsize
to exclude too large metadata fields from aggregation.The type of metadata is no longer guessed during metadata extraction. A new configuration option
datalad.metadata.nativetype
was introduced to enable one or more particular metadata extractors for a dataset.New configuration option
datalad.metadata.store-aggregate-content
to enable the storage of aggregated metadata for dataset content (i.e. file-based metadata) in contrast to just metadata describing a dataset as a whole.
search
was completely reimplemented. It offers three different modes now:‘egrep’ (default): expression matching in a plain string version of metadata
‘textblob’: search a text version of all metadata using a fully featured query language (fast indexing, good for keyword search)
‘autofield’: search an auto-generated index that preserves individual fields of metadata that can be represented in a tabular structure (substantial indexing cost, enables the most detailed queries of all modes)
New extensions:
addurls, an extension for creating a dataset (and possibly subdatasets) from a list of URLs.
export_to_figshare
extract_metadata
add_readme makes use of available metadata
By default the wtf extension now hides sensitive information, which can be included in the output by passing
--senstive=some
or--senstive=all
.Reduced startup latency by only importing commands necessary for a particular command line call.
-
-d <parent> --nosave
now registers subdatasets, when possible.--fake-dates
configures dataset to use fake-dates
run now provides a way for the caller to save the result when a command has a non-zero exit status.
datalad rerun
now has a--script
option that can be used to extract previous commands into a file.A DataLad Singularity container is now available on Singularity Hub.
More casts have been embedded in the use case section of the documentation.
datalad --report-status
has a new value ‘all’ that can be used to temporarily re-enable reporting that was disable by configuration settings.
0.9.3 (Mar 16, 2018) – pi+0.02 release
Some important bug fixes which should improve usability
Fixes
datalad-archives
special remote now will lock on acquiring or extracting an archive - this allows for it to be used with -J flag for parallel operationrelax introduced in 0.9.2 demand on git being configured for datalad operation - now we will just issue a warning
datalad ls
should now list “authored date” and work also for datasets in detached HEAD modedatalad save
will now save original file as well, if file was “git mv”ed, so you can nowdatalad run git mv old new
and have changes recorded
Enhancements and new features
--jobs
argument now could takeauto
value which would decide on # of jobs depending on the # of available CPUs.git-annex
> 6.20180314 is recommended to avoid regression with -J.memoize calls to
RI
meta-constructor – should speed up operation a bitDATALAD_SEED
environment variable could be used to seed Python RNG and provide reproducible UUIDs etc (useful for testing and demos)
0.9.2 (Mar 04, 2018) – it is (again) better than ever
Largely a bugfix release with a few enhancements.
Fixes
Execution of external commands (git) should not get stuck when lots of both stdout and stderr output, and should not loose remaining output in some cases
Config overrides provided in the command line (-c) should now be handled correctly
Consider more remotes (not just tracking one, which might be none) while installing subdatasets
Compatibility with git 2.16 with some changed behaviors/annotations for submodules
Fail
remove
ifannex drop
failedDo not fail operating on files which start with dash (-)
URL unquote paths within S3, URLs and DataLad RIs (///)
In non-interactive mode fail if authentication/access fails
Web UI:
refactored a little to fix incorrect listing of submodules in subdirectories
now auto-focuses on search edit box upon entering the page
Assure that extracted from tarballs directories have executable bit set
Enhancements and new features
A log message and progress bar will now inform if a tarball to be downloaded while getting specific files (requires git-annex > 6.20180206)
A dedicated
datalad rerun
command capable of rerunning entire sequences of previouslyrun
commands. Reproducibility through VCS. Use ``run`` even if not interested in ``rerun``Alert the user if
git
is not yet configured but git operations are requestedDelay collection of previous ssh connections until it is actually needed. Also do not require ‘:’ while specifying ssh host
AutomagicIO: Added proxying of isfile, lzma.LZMAFile and io.open
Testing:
added DATALAD_DATASETS_TOPURL=http://datasets-tests.datalad.org to run tests against another website to not obscure access stats
tests run against temporary HOME to avoid side-effects
better unit-testing of interactions with special remotes
CONTRIBUTING.md describes how to setup and use
git-hub
tool to “attach” commits to an issue making it into a PRDATALAD_USE_DEFAULT_GIT env variable could be used to cause DataLad to use default (not the one possibly bundled with git-annex) git
Be more robust while handling not supported requests by annex in special remotes
Use of
swallow_logs
in the code was refactored away – less mysteries now, just increase logging levelwtf
plugin will report more information about environment, externals and the system
0.9.1 (Oct 01, 2017) – “DATALAD!”(JBTM)
Minor bugfix release
Fixes
Should work correctly with subdatasets named as numbers of bool values (requires also GitPython >= 2.1.6)
Custom special remotes should work without crashing with git-annex >= 6.20170924
0.9.0 (Sep 19, 2017) – isn’t it a lucky day even though not a Friday?
Major refactoring and deprecations
the
files
argument of save has been renamed topath
to be uniform with any other commandall major commands now implement more uniform API semantics and result reporting. Functionality for modification detection of dataset content has been completely replaced with a more efficient implementation
publish now features a
--transfer-data
switch that allows for a disambiguous specification of whether to publish data – independent of the selection which datasets to publish (which is done via their paths). Moreover, publish now transfers data before repository content is pushed.
Fixes
drop no longer errors when some subdatasets are not installed
install will no longer report nothing when a Dataset instance was given as a source argument, but rather perform as expected
remove doesn’t remove when some files of a dataset could not be dropped
-
no longer hides error during a repository push
publish behaves “correctly” for
--since=
in considering only the differences the last “pushed” statedata transfer handling while publishing with dependencies, to github
improved robustness with broken Git configuration
search should search for unicode strings correctly and not crash
robustify git-annex special remotes protocol handling to allow for spaces in the last argument
UI credentials interface should now allow to Ctrl-C the entry
should not fail while operating on submodules named with numerics only or by bool (true/false) names
crawl templates should not now override settings for
largefiles
if specified in.gitattributes
Enhancements and new features
Exciting new feature run command to protocol execution of an external command and rerun computation if desired. See screencast
save now uses Git for detecting with sundatasets need to be inspected for potential changes, instead of performing a complete traversal of a dataset tree
add looks for changes relative to the last committed state of a dataset to discover files to add more efficiently
diff can now report untracked files in addition to modified files
[uninstall][] will check itself whether a subdataset is properly registered in a superdataset, even when no superdataset is given in a call
subdatasets can now configure subdatasets for exclusion from recursive installation (
datalad-recursiveinstall
submodule configuration property)precrafted pipelines of [crawl][] now will not override
annex.largefiles
setting if any was set within.gitattribues
(e.g. bydatalad create --text-no-annex
)framework for screencasts:
tools/cast*
tools and sample cast scripts underdoc/casts
which are published at datalad.org/features.htmltests failing in direct and/or v6 modes marked explicitly
0.8.1 (Aug 13, 2017) – the best birthday gift
Bugfixes
Fixes
Enhancements and new features
0.8.0 (Jul 31, 2017) – it is better than ever
A variety of fixes and enhancements
Fixes
Enhancements and new features
plugin mechanism came to replace export. See export_tarball for the replacement of export. Now it should be easy to extend datalad’s interface with custom functionality to be invoked along with other commands.
Minimalistic coloring of the results rendering
publish/
copy_to
got progress bar report now and support of--jobs
minor fixes and enhancements to crawler (e.g. support of recursive removes)
0.7.0 (Jun 25, 2017) – when it works - it is quite awesome!
New features, refactorings, and bug fixes.
Major refactoring and deprecations
add-sibling has been fully replaced by the siblings command
create-sibling, and unlock have been re-written to support the same common API as most other commands
Enhancements and new features
siblings can now be used to query and configure a local repository by using the sibling name
here
siblings can now query and set annex preferred content configuration. This includes
wanted
(as previously supported in other commands), and now alsorequired
New metadata command to interface with datasets/files meta-data
Documentation for all commands is now built in a uniform fashion
Significant parts of the documentation of been updated
Instantiate GitPython’s Repo instances lazily
Fixes
API documentation is now rendered properly as HTML, and is easier to browse by having more compact pages
Closed files left open on various occasions (Popen PIPEs, etc)
Restored basic (consumer mode of operation) compatibility with Windows OS
0.6.0 (Jun 14, 2017) – German perfectionism
This release includes a huge refactoring to make code base and functionality more robust and flexible
outputs from API commands could now be highly customized. See
--output-format
,--report-status
,--report-type
, and--report-type
options for datalad command.effort was made to refactor code base so that underlying functions behave as generators where possible
input paths/arguments analysis was redone for majority of the commands to provide unified behavior
Major refactoring and deprecations
add-sibling
andrewrite-urls
were refactored in favor of new siblings command which should be used for siblings manipulations‘datalad.api.alwaysrender’ config setting/support is removed in favor of new outputs processing
Fixes
Do not flush manually git index in pre-commit to avoid “Death by the Lock” issue
Deployed by publish
post-update
hook script now should be more robust (tolerate directory names with spaces, etc.)A variety of fixes, see list of pull requests and issues closed for more information
Enhancements and new features
new annotate-paths plumbing command to inspect and annotate provided paths. Use
--modified
to summarize changes between different points in the historynew clone plumbing command to provide a subset (install a single dataset from a URL) functionality of install
new diff plumbing command
new siblings command to list or manipulate siblings
new subdatasets command to list subdatasets and their properties
benchmarks/
collection of Airspeed velocity benchmarks initiated. See reports at http://datalad.github.io/datalad/crawler would try to download a new url multiple times increasing delay between attempts. Helps to resolve problems with extended crawls of Amazon S3
CRCNS crawler pipeline now also fetches and aggregates meta-data for the datasets from datacite
overall optimisations to benefit from the aforementioned refactoring and improve user-experience
a few stub and not (yet) implemented commands (e.g.
move
) were removed from the interfaceWeb frontend got proper coloring for the breadcrumbs and some additional caching to speed up interactions. See http://datasets.datalad.org
Small improvements to the online documentation. See e.g. summary of differences between git/git-annex/datalad
0.5.1 (Mar 25, 2017) – cannot stop the progress
A bugfix release
Fixes
add was forcing addition of files to annex regardless of settings in
.gitattributes
. Now that decision is left to annex by defaulttools/testing/run_doc_examples
used to run doc examples as tests, fixed up to provide status per each example and not fail at oncedoc/examples
3rdparty_analysis_workflow.sh was fixed up to reflect changes in the API of 0.5.0.
progress bars
should no longer crash datalad and report correct sizes and speeds
should provide progress reports while using Python 3.x
Enhancements and new features
doc/examples
nipype_workshop_dataset.sh new example to demonstrate how new super- and sub- datasets were established as a part of our datasets collection
0.5.0 (Mar 20, 2017) – it’s huge
This release includes an avalanche of bug fixes, enhancements, and additions which at large should stay consistent with previous behavior but provide better functioning. Lots of code was refactored to provide more consistent code-base, and some API breakage has happened. Further work is ongoing to standardize output and results reporting (#1350)
Most notable changes
requires git-annex >= 6.20161210 (or better even >= 6.20161210 for improved functionality)
commands should now operate on paths specified (if any), without causing side-effects on other dirty/staged files
-
-a
is deprecated in favor of-u
or--all-updates
so only changes known components get saved, and no new files automagically added-S
does no longer store the originating dataset in its commit message
-
can specify commit/save message with
-m
add-sibling and create-sibling
now take the name of the sibling (remote) as a
-s
(--name
) option, not a positional argument--publish-depends
to setup publishing data and code to multiple repositories (e.g. github + webserve) should now be functional see this commentgot
--publish-by-default
to specify what refs should be published by defaultgot
--annex-wanted
,--annex-groupwanted
and--annex-group
settings which would be used to instruct annex about preferred content. publish then will publish data using those settings ifwanted
is set.got
--inherit
option to automagically figure out url/wanted and other git/annex settings for new remote sub-dataset to be constructed
-
got
--skip-failing
refactored into--missing
option which could use new feature of create-sibling--inherit
Fixes
Enhancements and new features
-
got
--what
to specify explicitly what cleaning steps to perform and now could be invoked with-r
datalad
andgit-annex-remote*
scripts now do not use setuptools entry points mechanism and rely on simple import to shorten start up timeDataset is also now using Flyweight pattern, so the same instance is reused for the same dataset
progressbars should not add more empty lines
Internal refactoring
Majority of the commands now go through
_prep
for arguments validation and pre-processing to avoid recursive invocations
0.4.1 (Nov 10, 2016) – CA release
Requires now GitPython >= 2.1.0
Fixes
Enhancements and new features
New rfc822-compliant metadata format
-
-S to save the change also within all super-datasets
add now has progress-bar reporting
create-sibling-github to create a :term:
sibling
of a dataset on githubOpenfMRI crawler and datasets were enriched with URLs to separate files where also available from openfmri s3 bucket (if upgrading your datalad datasets, you might need to run
git annex enableremote datalad
to make them available)various enhancements to log messages
web interface
populates “install” box first thus making UX better over slower connections
0.4 (Oct 22, 2016) – Paris is waiting
Primarily it is a bugfix release but because of significant refactoring of the install and get implementation, it gets a new minor release.
Fixes
Enhancements and new features
interface changes
more (unit-)testing
documentation: see http://docs.datalad.org/en/latest/basics.html for basic principles and useful shortcuts in referring to datasets
various webface improvements: breadcrumb paths, instructions how to install dataset, show version from the tags, etc.
0.3.1 (Oct 1, 2016) – what a wonderful week
Primarily bugfixes but also a number of enhancements and core refactorings
Fixes
do not build manpages and examples during installation to avoid problems with possibly previously outdated dependencies
install can be called on already installed dataset (with
-r
or-g
)
Enhancements and new features
complete overhaul of datalad configuration settings handling (see Configuration documentation), so majority of the environment. Now uses git format and stores persistent configuration settings under
.datalad/config
and local within.git/config
variables we have used were renamed to match configuration namescreate-sibling does not now by default upload web front-end
export command with a plug-in interface and
tarball
plugin to export datasetsin Python,
.api
functions with rendering of results in command line got a _-suffixed sibling, which would render results as well in Python as well (e.g., usingsearch_
instead ofsearch
would also render results, not only output them back as Python objects)-
--jobs
option (passed toannex get
) for parallel downloadstotal and per-download (with git-annex >= 6.20160923) progress bars (note that if content is to be obtained from an archive, no progress will be reported yet)
install
--reckless
mode option-
highlights locations and fieldmaps for better readability
supports
-d^
or-d///
to point to top-most or centrally installed meta-datasets“complete” paths to the datasets are reported now
-s
option to specify which fields (only) to search
various enhancements and small fixes to meta-data handling, ls, custom remotes, code-base formatting, downloaders, etc
completely switched to
tqdm
library (progressbar
is no longer used/supported)
0.3 (Sep 23, 2016) – winter is coming
Lots of everything, including but not limited to
enhanced index viewer, as the one on http://datasets.datalad.org
initial new data providers support: Kaggle, BALSA, NDA, NITRC
initial meta-data support and management
new and/or improved crawler pipelines for BALSA, CRCNS, OpenfMRI
some other commands renaming/refactoring (e.g., create-sibling)
datalad search would give you an option to install datalad’s super-dataset under ~/datalad if ran outside of a dataset
0.2.3 (Jun 28, 2016) – busy OHBM
New features and bugfix release
support of /// urls to point to http://datasets.datalad.org
variety of fixes and enhancements throughout
0.2.2 (Jun 20, 2016) – OHBM we are coming!
New feature and bugfix release
greatly improved documentation
publish command API RFing allows for custom options to annex, and uses –to REMOTE for consistent with annex invocation
variety of fixes and enhancements throughout
0.2.1 (Jun 10, 2016)
variety of fixes and enhancements throughout
0.2 (May 20, 2016)
Major RFing to switch from relying on rdf to git native submodules etc
0.1 (Oct 14, 2015)
Release primarily focusing on interface functionality including initial publishing