datalad-metadata

Synopsis

datalad-metadata [-h] [-d DATASET] [-a KEY [VAL ...]] [-i KEY [VAL ...]] [--remove KEY [VAL ...]] [--reset KEY [VAL ...]] [--define-key KEY DEFINITION] [-g] [-r] [--recursion-limit LEVELS] [PATH [PATH ...]]

Description

Metadata manipulation for files and whole datasets

Two types of metadata are supported:

  1. metadata describing a dataset as a whole (dataset-global), and
  2. metadata for individual files in a dataset.

Both types can be accessed and modified with this command. Note, however, that this only refers to DataLad’s native metadata, and not to any other metadata that is possibly stored in files of a dataset.

DataLad’s native metadata capability is primarily targeting data description via arbitrary tags and other (brief) key-value attributes (with possibly multiple values for a single key).

Metadata key names are limited to alphanumerics (and [_-.]). Moreover, all key names are converted to lower case.

Dataset (global) metadata

Metadata describing a dataset as a whole is stored in JSON format in the dataset at .datalad/metadata/dataset.json. The amount of metadata that can be stored is not limited by DataLad. However, it should be kept brief as this information is stored in the Git history of the dataset, and access or modification requires to read the entire file.

Arbitrary metadata keys can be used. However, DataLad reserves the keys ‘tag’ and ‘definition’ for its own use. The can still be manipulated without any restrictions like any other metadata items, but doing so can impact DataLad’s metadata-related functionality, handle with care.

The ‘tag’ key is used to store a list of (unique) tags.

The ‘definition’ key is used to store key-value mappings that define metadata keys used elsewhere in the metadata. Using the feature is optional (see –define-key). It can be useful in the context of data discovery needs, where metadata keys can be precisely defined by linking them to specific ontology terms.

File metadata

Metadata storage for individual files is provided by git-annex, and generally the same rules as for dataset-global metadata apply. However, there is just one reserved key name: ‘tag’.

Again, the amount of metadata is not limited, but metadata is stored in git-annex’ internal data structures in the Git repository of a dataset. Large amounts of metadata can slow its performance.

Output rendering

By default, a short summary of the metadata for each dataset (component) is rendered:

<path> (<type>): -|<keys> [<tags>]

where <path> is the path of the respective component, <type> a label for the type of dataset components metadata is presented for. Non-existant metadata is indicated by a dash, otherwise a comma-separated list of metadata keys (except for ‘tag’), is followed by a list of tags, if there are any.

Options

PATH

path(s) to set/get metadata. Constraints: value must be a string [Default: None]

-h, –help, –help-np

show this help message. –help-np forcefully disables the use of a pager for displaying the help message

-d DATASET, –dataset DATASET

Constraints: Value must be a Dataset or a valid identifier of a Dataset (e.g. a path) [Default: None]

-a KEY [VAL …], –add KEY [VAL …]

metadata items to add. If only a key is given, a corresponding tag is added. If a key-value mapping (multiple values at once are supported) is given, the values are added to the metadata item of that key. Constraints: value must be a string [Default: None]

-i KEY [VAL …], –init KEY [VAL …]

like –add, but tags are only added if no tag was present before. Likewise, values are only added to a metadata key, if that key did not exist before. Constraints: value must be a string [Default: None]

–remove KEY [VAL …]

metadata values to remove. If only a key is given, a corresponding tag is removed. If a key-value mapping (multiple values at once are supported) is given, only those values are removed from the metadata item of that key. If no values are left after the removal, the entire item of that key is removed. Constraints: value must be a string [Default: None]

–reset KEY [VAL …]

metadata items to remove. If only a key is given, a corresponding metadata key with all its values is removed. If a key-value mapping (multiple values at once are supported) is given, any existing values for this key are replaced by the given ones. Constraints: value must be a string [Default: None]

–define-key KEY DEFINITION

convenience option to add an item in the dataset’s global metadata (‘definition’ key). This can be used to define (custom) keys used in the datasets’s metadata, for example by providing a URL to an ontology term for a given key label. This option does not need –dataset-global to be set to be in effect. Constraints: value must be a string [Default: None]

-g, –dataset-global

Whether to perform metadata query or modification on the global dataset metadata, or on individual dataset components. For example, without this switch setting metadata using the root path of a dataset, will set the given metadata for all files in a dataset, whereas with this flag only the metadata record of the dataset itself will be altered. [Default: False]

-r, –recursive

if set, recurse into potential subdataset. [Default: False]

–recursion-limit LEVELS

limit recursion into subdataset to the given number of levels. Constraints: value must be convertible to type ‘int’ [Default: None]

Authors

datalad is developed by The DataLad Team and Contributors <team@datalad.org>.