datalad.api.diff

datalad.api.diff(path=None, dataset=None, revision=None, staged=False, ignore_subdatasets='none', report_untracked='normal', recursive=False, recursion_limit=None)

Report changes of dataset components.

Reports can be generated for changes between recorded revisions, or between a revision and the state of a dataset’s work tree.

Unlike ‘git diff’, this command also reports untracked content when comparing a revision to the state of the work tree. Such content is marked with the property state=’untracked’ in the command results.

The following types of changes are distinguished and reported via the state result property:

  • added
  • copied
  • deleted
  • modified
  • renamed
  • typechange
  • unmerged
  • untracked

Whenever applicable, source and/or destination revisions are reported to indicate when exactly within the requested revision range a particular component changed its status.

Optionally, the reported changes can be limited to a subset of paths within a dataset.

Parameters:
  • path (sequence of str or None, optional) – path to be evaluated. [Default: None]
  • dataset (Dataset or None, optional) – specify the dataset to query. If no dataset is given, an attempt is made to identify the dataset based on the input and/or the current working directory. [Default: None]
  • revision – comparison reference specification. Three modes are supported: 1) <revision> changes you have in your working tree relative to the named revision (this can also be a branch name, tag, commit or any label Git can understand). 2) <revision>..<revision> changes between two arbitrary revisions. 3) <revision>…<revision> changes on the branch containing and up to the second <revision>, starting at a common ancestor of both revisions. [Default: None]
  • staged (bool, optional) – get the changes already staged for a commit relative to an optionally given revision (by default the most recent one). [Default: False]
  • ignore_subdatasets ({'none', 'untracked', 'dirty', 'all'}, optional) – speed up execution by (partially) not evaluating the state of subdatasets in a parent dataset. With “none” a subdataset is considered modified when it either contains untracked or modified content or its last saved state differs from that recorded in the parent dataset. When “untracked” is used subdatasets are not considered modified when they only contain untracked content (but they are still scanned for modified content). Using “dirty” ignores all changes to the work tree of subdatasets, only changes to the revisions stored in the parent dataset are shown. Using “all” hides all changes to subdatasets. Note, even with “all” recursive execution will still report other changes in any existing subdataset, only the subdataset record in a parent dataset is not evaluated. [Default: ‘none’]
  • report_untracked ({'no', 'normal', 'all'}, optional) – If and how untracked content is reported when comparing a revision to the state of the work tree. ‘no’: no untracked files are reported; ‘normal’: untracked files and entire untracked directories are reported as such; ‘all’: report individual files even in fully untracked directories. [Default: ‘normal’]
  • recursive (bool, optional) – if set, recurse into potential subdataset. [Default: False]
  • recursion_limit (int or None, optional) – limit recursion into subdataset to the given number of levels. [Default: None]
  • on_failure ({'ignore', 'continue', 'stop'}, optional) – behavior to perform on failure: ‘ignore’ any failure is reported, but does not cause an exception; ‘continue’ if any failure occurs an exception will be raised at the end, but processing other actions will continue for as long as possible; ‘stop’: processing will stop on first failure and an exception is raised. A failure is any result with status ‘impossible’ or ‘error’. Raised exception is an IncompleteResultsError that carries the result dictionaries of the failures in its failed attribute. [Default: ‘continue’]
  • result_filter (callable or None, optional) – if given, each to-be-returned status dictionary is passed to this callable, and is only returned if the callable’s return value does not evaluate to False or a ValueError exception is raised. If the given callable supports **kwargs it will additionally be passed the keyword arguments of the original API call. [Default: None]
  • result_renderer ({'default', 'json', 'json_pp', 'tailored'} or None, optional) – format of return value rendering on stdout. [Default: None]
  • result_xfm ({'paths', 'relpaths', 'datasets', 'successdatasets-or-none'} or callable or None, optional) – if given, each to-be-returned result status dictionary is passed to this callable, and its return value becomes the result instead. This is different from result_filter, as it can perform arbitrary transformation of the result value. This is mostly useful for top- level command invocations that need to provide the results in a particular format. Instead of a callable, a label for a pre-crafted result transformation can be given. [Default: None]
  • return_type ({'generator', 'list', 'item-or-list'}, optional) – return value behavior switch. If ‘item-or-list’ a single value is returned instead of a one-item return value list, or a list in case of multiple return values. None is return in case of an empty list. [Default: ‘list’]
  • run_after – Like run_before, but plugins are executed after the main command has finished. [Default: None]
  • run_before – DataLad plugin to run before the command. PLUGINSPEC is a list comprised of a plugin name plus optional 2-tuples of key-value pairs with arguments for the plugin call (see plugin command documentation for details). PLUGINSPECs must be wrapped in list where each item configures one plugin call. Plugins are called in the order defined by this list. For running plugins that require a dataset argument it is important to provide the respective dataset as the dataset argument of the main command, if it is not in the list of plugin arguments. [Default: None]