class datalad.api.Dataset(path)[source]

Representation of a DataLad dataset/repository

This is the core data type of DataLad: a representation of a dataset. At its core, datasets are (git-annex enabled) Git repositories. This class provides all operations that can be performed on a dataset.

Creating a dataset instance is cheap, all actual operations are delayed until they are actually needed. Creating multiple Dataset class instances for the same Dataset location will automatically yield references to the same object.

A dataset instance comprises of two major components: a repo attribute, and a config attribute. The former offers access to low-level functionality of the Git or git-annex repository. The latter gives access to a dataset’s configuration manager.

Most functionality is available via methods of this class, but also as stand-alone functions with the same name in datalad.api.

Parameters:path (str) – Path to the dataset location. This location may or may not exist yet.


param path:Path to the dataset location.
add([dataset, to_git, save, message, …]) Add files/directories to an existing dataset.
aggregate_metadata([guess_native_type, …]) Aggregate meta data of a dataset for later query.
annotate_paths([dataset, recursive, …]) Analyze and act upon input paths
clean([what, recursive, recursion_limit]) Clean up after DataLad (possible temporary files etc.)
clone([path, dataset, description, …]) Obtain a dataset copy from a URL or local source (path)
close() Perform operations which would close any possible process using this Dataset
create([force, description, dataset, …]) Create a new dataset from scratch.
create_sibling([name, target_dir, …]) Create a dataset sibling on a UNIX-like SSH-accessible machine
create_sibling_github([dataset, recursive, …]) Create dataset sibling on Github.
diff([dataset, revision, staged, …]) Report changes of dataset components.
drop([dataset, recursive, recursion_limit, …]) Drop file content from datasets
get([source, dataset, recursive, …]) Get any dataset content (files/directories/subdatasets).
get_subdatasets([pattern, fulfilled, …]) DEPRECATED: use subdatasets()
get_superdataset([datalad_only, topmost, …]) Get the dataset’s superdataset
install([source, dataset, get_data, …]) Install a dataset from a (remote) source.
is_installed() Returns whether a dataset is installed.
metadata([dataset, add, init, remove, …]) Metadata manipulation for files and whole datasets
plugin([dataset, showpluginhelp, showplugininfo]) Generic plugin interface
publish([dataset, to, since, missing, …]) Publish a dataset to a known sibling.
recall_state(whereto) Something that can be used to checkout a particular state (tag, commit) to “undo” a change or switch to a otherwise desired previous state.
remove([dataset, recursive, check, save, …]) Remove components from datasets
rerun([since, dataset, branch, message, onto]) Re-execute previous datalad run commands.
run([dataset, message, rerun]) Run an arbitrary command and record its impact on a dataset.
save([path, dataset, all_updated, …]) Save the current state of a dataset
search([dataset, search, report, …]) Search within available in datasets’ meta data
siblings([dataset, name, url, pushurl, …]) Manage sibling configuration
subdatasets([fulfilled, recursive, …]) Report subdatasets and their properties.
uninstall([dataset, recursive, check, if_dirty]) Uninstall subdatasets
unlock([dataset, recursive, recursion_limit]) Unlock file(s) of a dataset
update([sibling, merge, dataset, recursive, …]) Update a dataset from a sibling.


config Get an instance of the parser for the persistent dataset configuration.
id Identifier of the dataset.
path path to the dataset
repo Get an instance of the version control system/repo for this dataset, or None if there is none yet.