datalad.api.metadata

datalad.api.metadata(path=None, dataset=None, get_aggregates=False, reporton='all', recursive=False)

Metadata reporting for files and entire datasets

Two types of metadata are supported:

  1. metadata describing a dataset as a whole (dataset-global metadata), and
  2. metadata for files in a dataset (content metadata).

Both types can be accessed with this command.

Examples

Report the metadata of a single file, as aggregated into the closest locally available dataset, containing the query path:

% datalad metadata somedir/subdir/thisfile.dat

Sometimes it is helpful to get metadata records formatted in a more accessible form, here as pretty-printed JSON:

% datalad -f json_pp metadata somedir/subdir/thisfile.dat

Same query as above, but specify which dataset to query (must be containing the query path):

% datalad metadata -d . somedir/subdir/thisfile.dat

Report any metadata record of any dataset known to the queried dataset:

% datalad metadata --recursive --reporton datasets

Get a JSON-formatted report of aggregated metadata in a dataset, incl. information on enabled metadata extractors, dataset versions, dataset IDs, and dataset paths:

% datalad -f json metadata --get-aggregates
Parameters:
  • path (sequence of str or None, optional) – path(s) to query metadata for. [Default: None]
  • dataset (Dataset or None, optional) – dataset to query. If given, metadata will be reported as stored in this dataset. Otherwise, the closest available dataset containing a query path will be consulted. [Default: None]
  • get_aggregates (bool, optional) – if set, yields all (sub)datasets for which aggregate metadata are available in the dataset. No other action is performed, even if other arguments are given. The reported results contain a datasets’s ID, the commit hash at which metadata aggregation was performed, and the location of the object file(s) containing the aggregated metadata. [Default: False]
  • reporton ({'all', 'datasets', 'files', 'none'}, optional) – choose on what type result to report on: ‘datasets’, ‘files’, ‘all’ (both datasets and files), or ‘none’ (no report). [Default: ‘all’]
  • recursive (bool, optional) – if set, recurse into potential subdataset. [Default: False]
  • on_failure ({'ignore', 'continue', 'stop'}, optional) – behavior to perform on failure: ‘ignore’ any failure is reported, but does not cause an exception; ‘continue’ if any failure occurs an exception will be raised at the end, but processing other actions will continue for as long as possible; ‘stop’: processing will stop on first failure and an exception is raised. A failure is any result with status ‘impossible’ or ‘error’. Raised exception is an IncompleteResultsError that carries the result dictionaries of the failures in its failed attribute. [Default: ‘continue’]
  • proc_post – Like proc_pre, but procedures are executed after the main command has finished. [Default: None]
  • proc_pre – DataLad procedure to run prior to the main command. The argument a list of lists with procedure names and optional arguments. Procedures are called in the order their are given in this list. It is important to provide the respective target dataset to run a procedure on as the dataset argument of the main command. [Default: None]
  • result_filter (callable or None, optional) – if given, each to-be-returned status dictionary is passed to this callable, and is only returned if the callable’s return value does not evaluate to False or a ValueError exception is raised. If the given callable supports **kwargs it will additionally be passed the keyword arguments of the original API call. [Default: None]
  • result_renderer ({'default', 'json', 'json_pp', 'tailored'} or None, optional) – format of return value rendering on stdout. [Default: None]
  • result_xfm ({'paths', 'relpaths', 'datasets', 'successdatasets-or-none', 'metadata'} or callable or None, optional) – if given, each to-be-returned result status dictionary is passed to this callable, and its return value becomes the result instead. This is different from result_filter, as it can perform arbitrary transformation of the result value. This is mostly useful for top- level command invocations that need to provide the results in a particular format. Instead of a callable, a label for a pre-crafted result transformation can be given. [Default: None]
  • return_type ({'generator', 'list', 'item-or-list'}, optional) – return value behavior switch. If ‘item-or-list’ a single value is returned instead of a one-item return value list, or a list in case of multiple return values. None is return in case of an empty list. [Default: ‘list’]