datalad

Synopsis

datalad [-h] [-l LEVEL] [--pbs-runner {condor}] [-C PATH] [--version] [--dbg] [--idbg] [-c KEY=VALUE] [--output-format {default,json,json_pp,tailored,'<template>'] [--report-status {success,failure,ok,notneeded,impossible,error}] [--report-type {dataset,file}] [--on-failure {ignore,continue,stop}] [--run-before PLUGINSPEC [PLUGINSPEC ...]] [--run-after PLUGINSPEC [PLUGINSPEC ...]] [--cmd] {create,install,get,add,publish,uninstall,drop,remove,update,create-sibling,create-sibling-github,unlock,save,plugin,search,metadata,aggregate-metadata,test,crawl,crawl-init,ls,clean,add-archive-content,download-url,run,rerun,annotate-paths,clone,create-test-dataset,diff,siblings,sshrun,subdatasets} ...

Description

DataLad provides a unified data distribution with the convenience of git-annex repositories as a backend. DataLad command line tools allow to manipulate (obtain, create, update, publish, etc.) datasets and their collections.

Commands for dataset operations

create
Create a new dataset from scratch
install
Install a dataset from a (remote) source
get
Get any dataset content (files/directories/subdatasets)
add
Add files/directories to an existing dataset
publish
Publish a dataset to a known sibling
uninstall
Uninstall subdatasets
drop
Drop file content from datasets
remove
Remove components from datasets
update
Update a dataset from a sibling
create-sibling
Create a dataset sibling on a UNIX-like SSH-accessible machine
create-sibling-github
Create dataset sibling on Github
unlock
Unlock file(s) of a dataset
save
Save the current state of a dataset
plugin
Generic plugin interface

Commands for meta data handling

search
Search within available in datasets’ meta data
metadata
Metadata manipulation for files and whole datasets
aggregate-metadata
Aggregate meta data of a dataset for later query

Miscellaneous commands

test
Run internal DataLad (unit)tests
crawl
Crawl online resource to create or update a dataset
crawl-init
Initialize crawling configuration
ls
List summary information about URLs and dataset(s)
clean
Clean up after DataLad (possible temporary files etc.)
add-archive-content
Add content of an archive under git annex control
download-url
Download content
run
Run an arbitrary command and record its impact on a dataset
rerun
Re-execute previous datalad run commands

Plumbing commands

annotate-paths
Analyze and act upon input paths
clone
Obtain a dataset copy from a URL or local source (path)
create-test-dataset
Create test (meta-)dataset
diff
Report changes of dataset components
siblings
Manage sibling configuration
sshrun
Run command on remote machines via SSH
subdatasets
Report subdatasets and their properties

General information

Detailed usage information for individual commands is available via command-specific –help, i.e.: datalad <command> –help

Options

{create,install,get,add,publish,uninstall,drop,remove,update,create-sibling,create-sibling-github,unlock,save,plugin,search,metadata,aggregate-metadata,test,crawl,crawl-init,ls,clean,add-archive-content,download-url,run,rerun,annotate-paths,clone,create-test-dataset,diff,siblings,sshrun,subdatasets}

-h, –help, –help-np

show this help message. –help-np forcefully disables the use of a pager for displaying the help message

-l LEVEL, –log-level LEVEL

set logging verbosity level. Choose among critical, error, warning, info, debug. Also you can specify an integer <10 to provide even more debugging information

–pbs-runner {condor}

execute command by scheduling it via available PBS. For settings, config file will be consulted

-C PATH

run as if datalad was started in <path> instead of the current working directory. When multiple -C options are given, each subsequent non-absolute -C <path> is interpreted relative to the preceding -C <path>. This option affects the interpretations of the path names in that they are made relative to the working directory caused by the -C option

–version

show the program’s version and license information

–dbg

enter Python debugger when uncaught exception happens

–idbg

enter IPython debugger when uncaught exception happens

-c KEY=VALUE

configuration variable setting. Overrides any configuration read from a file, but is potentially overridden itself by configuration variables in the process environment.

–output-format {default,json,json_pp,tailored,’<template>’

select format for returned command results. ‘default’ give one line per result reporting action, status, path and an optional message; ‘json’ renders a JSON object with all properties for each result (one per line); ‘json_pp’ pretty- prints JSON spanning multiple lines; ‘tailored’ enables a command-specific rendering style that is typically tailored to human consumption (no result output otherwise), ‘<template>’ reports any value(s) of any result properties in any format indicated by the template (e.g. ‘{path}’, compare with JSON output for all key-value choices).

–report-status {success,failure,ok,notneeded,impossible,error}

constrain command result report to records matching the given status. ‘success’ is a synonym for ‘ok’ OR ‘notneeded’, ‘failure’ stands for ‘impossible’ OR ‘error’.

–report-type {dataset,file}

constrain command result report to records matching the given type. Can be given more than once to match multiple types.

–on-failure {ignore,continue,stop}

when an operation fails: ‘ignore’ and continue with remaining operations, the error is logged but does not lead to a non-zero exit code of the command; ‘continue’ works like ‘ignore’, but an error causes a non-zero exit code; ‘stop’ halts on first failure and yields non-zero exit code. A failure is any result with status ‘impossible’ or ‘error’.

–run-before PLUGINSPEC [PLUGINSPEC …]

DataLad plugin to run after the command. PLUGINSPEC is a list comprised of a plugin name plus optional key=value pairs with arguments for the plugin call (see plugin command documentation for details). This option can be given more than once to run multiple plugins in the order in which they were given. For running plugins that require a –dataset argument it is important to provide the respective dataset as the –dataset argument of the main command, if it is not in the list of plugin arguments.

–run-after PLUGINSPEC [PLUGINSPEC …]

Like –run-before, but plugins are executed after the main command has finished.

–cmd

syntactical helper that can be used to end the list of global command line options before the subcommand label. Options like –run-before can take an arbitray number of arguments and may require to be followed by a single –cmd in order to enable identification of the subcommand.

“Control Your Data”

Authors

datalad is developed by The DataLad Team and Contributors <team@datalad.org>.