datalad.api.drop

datalad.api.drop(path=None, *, what='filecontent', reckless=None, dataset=None, recursive=False, recursion_limit=None, jobs=None, check=None, if_dirty=None)

Drop content of individual files or entire (sub)datasets

This command is the antagonist of ‘get’. It can undo the retrieval of file content, and the installation of subdatasets.

Dropping is a safe-by-default operation. Before dropping any information, the command confirms the continued availability of file-content (see e.g., configuration ‘annex.numcopies’), and the state of all dataset branches from at least one known dataset sibling. Moreover, prior removal of an entire dataset annex, that it is confirmed that it is no longer marked as existing in the network of dataset siblings.

Importantly, all checks regarding version history availability and local annex availability are performed using the current state of remote siblings as known to the local dataset. This is done for performance reasons and for resilience in case of absent network connectivity. To ensure decision making based on up-to-date information, it is advised to execute a dataset update before dropping dataset components.

Examples

Drop single file content:

> drop('path/to/file')

Drop all file content in the current dataset:

> drop('.')

Drop all file content in a dataset and all its subdatasets:

> drop(dataset='.', recursive=True)

Disable check to ensure the configured minimum number of remote sources for dropped data:

> drop(path='path/to/content', reckless='availability')

Drop (uninstall) an entire dataset (will fail with subdatasets present):

> drop(what='all')

Kill a dataset recklessly with any existing subdatasets too(this will be fast, but will disable any and all safety checks):

> drop(what='all', reckless='kill', recursive=True)
Parameters:
  • path (sequence of str or None, optional) – path of a dataset or dataset component to be dropped. [Default: None]

  • what ({'filecontent', 'allkeys', 'datasets', 'all'}, optional) – select what type of items shall be dropped. With ‘filecontent’, only the file content (git-annex keys) of files in a dataset’s worktree will be dropped. With ‘allkeys’, content of any version of any file in any branch (including, but not limited to the worktree) will be dropped. This effectively empties the annex of a local dataset. With ‘datasets’, only complete datasets will be dropped (implies ‘allkeys’ mode for each such dataset), but no filecontent will be dropped for any files in datasets that are not dropped entirely. With ‘all’, content for any matching file or dataset will be dropped entirely. [Default: ‘filecontent’]

  • reckless ({'modification', 'availability', 'undead', 'kill', None}, optional) – disable individual or all data safety measures that would normally prevent potentially irreversible data-loss. With ‘modification’, unsaved modifications in a dataset will not be detected. This improves performance at the cost of permitting potential loss of unsaved or untracked dataset components. With ‘availability’, detection of dataset/branch-states that are only available in the local dataset, and detection of an insufficient number of file- content copies will be disabled. Especially the latter is a potentially expensive check which might involve numerous network transactions. With ‘undead’, detection of whether a to-be-removed local annex is still known to exist in the network of dataset-clones is disabled. This could cause zombie-records of invalid file availability. With ‘kill’, all safety-checks are disabled. [Default: None]

  • dataset (Dataset or None, optional) – specify the dataset to perform drop from. If no dataset is given, the current working directory is used as operation context. [Default: None]

  • recursive (bool, optional) – if set, recurse into potential subdatasets. [Default: False]

  • recursion_limit (int or None, optional) – limit recursion into subdatasets to the given number of levels. [Default: None]

  • jobs (int or None or {'auto'}, optional) – how many parallel jobs (where possible) to use. “auto” corresponds to the number defined by ‘datalad.runtime.max-annex-jobs’ configuration item NOTE: This option can only parallelize input retrieval (get) and output recording (save). DataLad does NOT parallelize your scripts for you. [Default: None]

  • check (bool, optional) – DEPRECATED: use ‘–reckless availability’. [Default: None]

  • if_dirty – DEPRECATED and IGNORED: use –reckless instead. [Default: None]

  • on_failure ({'ignore', 'continue', 'stop'}, optional) – behavior to perform on failure: ‘ignore’ any failure is reported, but does not cause an exception; ‘continue’ if any failure occurs an exception will be raised at the end, but processing other actions will continue for as long as possible; ‘stop’: processing will stop on first failure and an exception is raised. A failure is any result with status ‘impossible’ or ‘error’. Raised exception is an IncompleteResultsError that carries the result dictionaries of the failures in its failed attribute. [Default: ‘continue’]

  • result_filter (callable or None, optional) – if given, each to-be-returned status dictionary is passed to this callable, and is only returned if the callable’s return value does not evaluate to False or a ValueError exception is raised. If the given callable supports **kwargs it will additionally be passed the keyword arguments of the original API call. [Default: None]

  • result_renderer – select rendering mode command results. ‘tailored’ enables a command- specific rendering style that is typically tailored to human consumption, if there is one for a specific command, or otherwise falls back on the the ‘generic’ result renderer; ‘generic’ renders each result in one line with key info like action, status, path, and an optional message); ‘json’ a complete JSON line serialization of the full result record; ‘json_pp’ like ‘json’, but pretty-printed spanning multiple lines; ‘disabled’ turns off result rendering entirely; ‘<template>’ reports any value(s) of any result properties in any format indicated by the template (e.g. ‘{path}’, compare with JSON output for all key-value choices). The template syntax follows the Python “format() language”. It is possible to report individual dictionary values, e.g. ‘{metadata[name]}’. If a 2nd-level key contains a colon, e.g. ‘music:Genre’, ‘:’ must be substituted by ‘#’ in the template, like so: ‘{metadata[music#Genre]}’. [Default: ‘tailored’]

  • result_xfm ({'datasets', 'successdatasets-or-none', 'paths', 'relpaths', 'metadata'} or callable or None, optional) – if given, each to-be-returned result status dictionary is passed to this callable, and its return value becomes the result instead. This is different from result_filter, as it can perform arbitrary transformation of the result value. This is mostly useful for top- level command invocations that need to provide the results in a particular format. Instead of a callable, a label for a pre-crafted result transformation can be given. [Default: None]

  • return_type ({'generator', 'list', 'item-or-list'}, optional) – return value behavior switch. If ‘item-or-list’ a single value is returned instead of a one-item return value list, or a list in case of multiple return values. None is return in case of an empty list. [Default: ‘list’]