datalad.api.add_archive_content

datalad.api.add_archive_content(archive, *, dataset=None, annex=None, add_archive_leading_dir=False, strip_leading_dirs=False, leading_dirs_depth=None, leading_dirs_consider=None, use_current_dir=False, delete=False, key=False, exclude=None, rename=None, existing='fail', annex_options=None, copy=False, commit=True, allow_dirty=False, stats=None, drop_after=False, delete_after=False)

Add content of an archive under git annex control.

Given an already annex’ed archive, extract and add its files to the dataset, and reference the original archive as a custom special remote.

Examples

Add files from the archive ‘big_tarball.tar.gz’, but keep big_tarball.tar.gz in the index:

> add_archive_content(path='big_tarball.tar.gz')

Add files from the archive ‘tarball.tar.gz’, and remove big_tarball.tar.gz from the index:

> add_archive_content(path='big_tarball.tar.gz', delete=True)

Add files from the archive ‘s3.zip’ but remove the leading directory:

> add_archive_content(path='s3.zip', strip_leading_dirs=True)

Parameters:

archive (str) – archive file or a key (if key=True specified).
dataset (Dataset or None, optional) – “specify the dataset to save. [Default: None]
annex – DEPRECATED. Use the ‘dataset’ parameter instead. [Default: None]
add_archive_leading_dir (bool, optional) – place extracted content under a directory which would correspond to the archive name with all suffixes stripped. E.g. the content of archive.tar.gz will be extracted under archive/. [Default: False]
strip_leading_dirs (bool, optional) – remove one or more leading directories from the archive layout on extraction. [Default: False]
leading_dirs_depth – maximum depth of leading directories to strip. If not specified (None), no limit. [Default: None]
leading_dirs_consider (list of str or None, optional) – regular expression(s) for directories to consider to strip away. [Default: None]
use_current_dir (bool, optional) – extract the archive under the current directory, not the directory where the archive is located. This parameter is applied automatically if key=True was used. [Default: False]
delete (bool, optional) – delete original archive from the filesystem/Git in current tree. Note that it will be of no effect if key=True is given. [Default: False]
key (bool, optional) – signal if provided archive is not actually a filename on its own but an annex key. The archive will be extracted in the current directory. [Default: False]
exclude (list of str or None, optional) – regular expressions for filenames which to exclude from being added to annex. Applied after –rename if that one is specified. For exact matching, use anchoring. [Default: None]
rename (list of str or None, optional) – regular expressions to rename files before added them under to Git. The first defines how to split provided string into two parts: Python regular expression (with groups), and replacement string. [Default: None]
existing – what operation to perform if a file from an archive tries to overwrite an existing file with the same name. ‘fail’ (default) leads to an error result, ‘overwrite’ silently replaces existing file, ‘archive-suffix’ instructs to add a suffix (prefixed with a ‘-’) matching archive name from which file gets extracted, and if that one is present as well, ‘numeric-suffix’ is in effect in addition, when incremental numeric suffix (prefixed with a ‘.’) is added until no name collision is longer detected. [Default: ‘fail’]
annex_options (str or None, optional) – additional options to pass to git-annex. [Default: None]
copy (bool, optional) – copy the content of the archive instead of moving. [Default: False]
commit (bool, optional) – don’t commit upon completion. [Default: True]
allow_dirty (bool, optional) – flag that operating on a dirty repository (uncommitted or untracked content) is ok. [Default: False]
stats – ActivityStats instance for global tracking. [Default: None]
drop_after (bool, optional) – drop extracted files after adding to annex. [Default: False]
delete_after (bool, optional) – extract under a temporary directory, git-annex add, and delete afterwards. To be used to “index” files within annex without actually creating corresponding files under git. Note that annex dropunused would later remove that load. [Default: False]
on_failure ({'ignore', 'continue', 'stop'}, optional) – behavior to perform on failure: ‘ignore’ any failure is reported, but does not cause an exception; ‘continue’ if any failure occurs an exception will be raised at the end, but processing other actions will continue for as long as possible; ‘stop’: processing will stop on first failure and an exception is raised. A failure is any result with status ‘impossible’ or ‘error’. Raised exception is an IncompleteResultsError that carries the result dictionaries of the failures in its failed attribute. [Default: ‘continue’]
result_filter (callable or None, optional) – if given, each to-be-returned status dictionary is passed to this callable, and is only returned if the callable’s return value does not evaluate to False or a ValueError exception is raised. If the given callable supports **kwargs it will additionally be passed the keyword arguments of the original API call. [Default: None]
result_renderer – select rendering mode command results. ‘tailored’ enables a command- specific rendering style that is typically tailored to human consumption, if there is one for a specific command, or otherwise falls back on the the ‘generic’ result renderer; ‘generic’ renders each result in one line with key info like action, status, path, and an optional message); ‘json’ a complete JSON line serialization of the full result record; ‘json_pp’ like ‘json’, but pretty-printed spanning multiple lines; ‘disabled’ turns off result rendering entirely; ‘<template>’ reports any value(s) of any result properties in any format indicated by the template (e.g. ‘{path}’, compare with JSON output for all key-value choices). The template syntax follows the Python “format() language”. It is possible to report individual dictionary values, e.g. ‘{metadata[name]}’. If a 2nd-level key contains a colon, e.g. ‘music:Genre’, ‘:’ must be substituted by ‘#’ in the template, like so: ‘{metadata[music#Genre]}’. [Default: ‘tailored’]
result_xfm ({'datasets', 'successdatasets-or-none', 'paths', 'relpaths', 'metadata'} or callable or None, optional) – if given, each to-be-returned result status dictionary is passed to this callable, and its return value becomes the result instead. This is different from result_filter, as it can perform arbitrary transformation of the result value. This is mostly useful for top- level command invocations that need to provide the results in a particular format. Instead of a callable, a label for a pre-crafted result transformation can be given. [Default: None]
return_type ({'generator', 'list', 'item-or-list'}, optional) – return value behavior switch. If ‘item-or-list’ a single value is returned instead of a one-item return value list, or a list in case of multiple return values. None is return in case of an empty list. [Default: ‘list’]