DataLad purposefully uses a terminology that is different from the one used by its technological foundations Git and git-annex. This glossary provides definitions for terms used in the datalad documentation and API, and relates them to the corresponding Git/git-annex concepts.


Extension to a Git repository, provided and managed by git-annex as means to track and distribute large (and small) files without having to inject them directly into a Git repository (which would slow Git operations significantly and impair handling of such repositories in general).


A Command Line Interface. Could be used interactively by executing commands in a shell, or as a programmable API for shell scripts.

DataLad extension

A Python package, developed outside of the core DataLad codebase, which (when installed) typically either provides additional top level datalad commands and/or additional metadata extractors. Visit Handbook, Ch.2. DataLad’s extensions for a representative list of extensions and instructions on how to install them.


A regular Git repository with an (optional) annex.


A dataset (location) that is related to a particular dataset, by sharing content and history. In Git terminology, this is a clone of a dataset that is configured as a remote.


A dataset that is part of another dataset, by means of being tracked as a Git submodule. As such, a subdataset is also a complete dataset and not different from a standalone dataset.


A dataset that contains at least one subdataset.