datalad.api.add(path=None, dataset=None, to_git=None, save=True, message=None, recursive=False, recursion_limit=None, ds2super=False, git_opts=None, annex_opts=None, annex_add_opts=None, jobs=None)

Add files/directories to an existing dataset.

Typically, files and directories to be added to a dataset would be placed into a directory of a dataset, and subsequently this command can be used to register this new content with the dataset. With recursion enabled, files will be added to their respective subdatasets as well.

By default all files are added to the dataset’s annex, i.e. only their content identity and availability information is tracked with Git. This results in lightweight datasets. If desired, the to_git flag can be used to tell datalad to inject files directly into Git. While this is not recommended for binary data or large files, it can be used for source code and meta-data to be able to benefit from Git’s track and merge capabilities. Files checked directly into Git are always and unconditionally available immediately after installation of a dataset.


Power-user info: This command uses git annex add, or git add to incorporate new dataset content.

  • path (non-empty sequence of str or None, optional) – path/name of the component to be added. The component must either exist on the filesystem already, or a source has to be provided. [Default: None]
  • dataset (Dataset or None, optional) – specify the dataset to perform the add operation on. If no dataset is given, an attempt is made to identify the dataset based on the current working directory and/or the path given. [Default: None]
  • to_git (bool, optional) – flag whether to add data directly to Git, instead of tracking data identity only. Usually this is not desired, as it inflates dataset sizes and impacts flexibility of data transport. If not specified - it will be up to git-annex to decide, possibly on .gitattributes options. [Default: None]
  • save (bool, optional) – by default all modifications to a dataset are immediately saved. Given this option will disable this behavior. [Default: True]
  • message (str or None, optional) – a description of the state or the changes made to a dataset. [Default: None]
  • recursive (bool, optional) – if set, recurse into potential subdataset. [Default: False]
  • recursion_limit (int or None, optional) – limit recursion into subdataset to the given number of levels. [Default: None]
  • ds2super (bool, optional) – given paths of dataset (toplevel) locations will cause these datasets to be added to their respective superdatasets underneath a given base dataset (instead of all their content to themselves). If no base dataset is provided, this flag has no effect. Regular files and directories are always added to their respective datasets, regardless of this setting. [Default: False]
  • git_opts (str or None, optional) – option string to be passed to git calls. [Default: None]
  • annex_opts (str or None, optional) – option string to be passed to git annex calls. [Default: None]
  • annex_add_opts (str or None, optional) – option string to be passed to git annex add calls. [Default: None]
  • jobs (int or None or {'auto'}, optional) – how many parallel jobs (where possible) to use. [Default: None]
  • on_failure ({'ignore', 'continue', 'stop'}, optional) – behavior to perform on failure: ‘ignore’ any failure is reported, but does not cause an exception; ‘continue’ if any failure occurs an exception will be raised at the end, but processing other actions will continue for as long as possible; ‘stop’: processing will stop on first failure and an exception is raised. A failure is any result with status ‘impossible’ or ‘error’. Raised exception is an IncompleteResultsError that carries the result dictionaries of the failures in its failed attribute. [Default: ‘continue’]
  • result_filter (callable or None, optional) – if given, each to-be-returned status dictionary is passed to this callable, and is only returned if the callable’s return value does not evaluate to False or a ValueError exception is raised. If the given callable supports **kwargs it will additionally be passed the keyword arguments of the original API call. [Default: None]
  • result_renderer ({'default', 'json', 'json_pp', 'tailored'} or None, optional) – format of return value rendering on stdout. [Default: None]
  • result_xfm ({'paths', 'relpaths', 'datasets', 'successdatasets-or-none'} or callable or None, optional) – if given, each to-be-returned result status dictionary is passed to this callable, and its return value becomes the result instead. This is different from result_filter, as it can perform arbitrary transformation of the result value. This is mostly useful for top- level command invocations that need to provide the results in a particular format. Instead of a callable, a label for a pre-crafted result transformation can be given. [Default: None]
  • return_type ({'generator', 'list', 'item-or-list'}, optional) – return value behavior switch. If ‘item-or-list’ a single value is returned instead of a one-item return value list, or a list in case of multiple return values. None is return in case of an empty list. [Default: ‘list’]
  • run_after – Like run_before, but plugins are executed after the main command has finished. [Default: None]
  • run_before – DataLad plugin to run before the command. PLUGINSPEC is a list comprised of a plugin name plus optional 2-tuples of key-value pairs with arguments for the plugin call (see plugin command documentation for details). PLUGINSPECs must be wrapped in list where each item configures one plugin call. Plugins are called in the order defined by this list. For running plugins that require a dataset argument it is important to provide the respective dataset as the dataset argument of the main command, if it is not in the list of plugin arguments. [Default: None]