datalad.utils

Return if any of regexes (list or str) searches succesfully for value

datalad.utils.assure_bool(s)[source]

Convert value into boolean following convention for strings

to recognize on,True,yes as True, off,False,no as False

datalad.utils.assure_dict_from_str(s, **kwargs)[source]

Given a multiline string with key=value items convert it to a dictionary

Parameters:
  • s (str or dict) –
  • None if input s is empty (Returns) –
datalad.utils.assure_dir(*args)[source]

Make sure directory exists.

Joins the list of arguments to an os-specific path to the desired directory and creates it, if it not exists yet.

datalad.utils.assure_list(s, copy=False, iterate=True)[source]

Given not a list, would place it into a list. If None - empty list is returned

Parameters:
  • s (list or anything) –
  • copy (bool, optional) – If list is passed, it would generate a shallow copy of the list
  • iterate (bool, optional) – If it is not a list, but something iterable (but not a text_type) iterate over it.
datalad.utils.assure_list_from_str(s, sep='\n')[source]

Given a multiline string convert it to a list of return None if empty

Parameters:s (str or list) –
datalad.utils.assure_tuple_or_list(obj)[source]

Given an object, wrap into a tuple if not list or tuple

datalad.utils.assure_unicode(s, encoding='utf-8')[source]

Convert/decode to unicode (PY2) or str (PY3) if of ‘binary_type’

datalad.utils.auto_repr(cls)[source]

Decorator for a class to assign it an automagic quick and dirty __repr__

It uses public class attributes to prepare repr of a class

Original idea: http://stackoverflow.com/a/27799004/1265472

datalad.utils.better_wraps(to_be_wrapped)[source]

Decorator to replace functools.wraps

This is based on wrapt instead of functools and in opposition to wraps preserves the correct signature of the decorated function. It is written with the intention to replace the use of wraps without any need to rewrite the actual decorators.

class datalad.utils.chpwd(path, mkdir=False, logsuffix='')[source]

Bases: object

Wrapper around os.chdir which also adjusts environ[‘PWD’]

The reason is that otherwise PWD is simply inherited from the shell and we have no ability to assess directory path without dereferencing symlinks.

If used as a context manager it allows to temporarily change directory to the given path

datalad.utils.decode_input(s)[source]

Given input string/bytes, decode according to stdin codepage (or UTF-8) if not defined

If fails – issue warning and decode allowing for errors being replaced

datalad.utils.disable_logger(*args, **kwds)[source]

context manager to temporarily disable logging

This is to provide one of swallow_logs’ purposes without unnecessarily creating temp files (see gh-1865)

Parameters:logger (Logger) – Logger whose handlers will be ordered to not log anything. Default: datalad’s topmost Logger (‘datalad’)
datalad.utils.encode_filename(filename)[source]

Encode unicode filename

datalad.utils.escape_filename(filename)[source]

Surround filename in “” and escape ” in the filename

datalad.utils.expandpath(path, force_absolute=True)[source]

Expand all variables and user handles in a path.

By default return an absolute path

datalad.utils.file_basename(name, return_ext=False)[source]

Strips up to 2 extensions of length up to 4 characters and starting with alpha not a digit, so we could get rid of .tar.gz etc

datalad.utils.find_files(regex, topdir='.', exclude=None, exclude_vcs=True, exclude_datalad=False, dirs=False)[source]

Generator to find files matching regex

Parameters:
  • regex (basestring) –
  • exclude (basestring, optional) – Matches to exclude
  • exclude_vcs – If True, excludes commonly known VCS subdirectories. If string, used as regex to exclude those files (regex: ‘/.(?:git|gitattributes|svn|bzr|hg)(?:/|$)’)
  • exclude_datalad – If True, excludes files known to be datalad meta-data files (e.g. under .datalad/ subdirectory) (regex: ‘/.(?:datalad)(?:/|$)’)
  • topdir (basestring, optional) – Directory where to search
  • dirs (bool, optional) – Either to match directories as well as files
datalad.utils.generate_chunks(container, size)[source]

Given a container, generate chunks from it with size up to size

datalad.utils.get_dataset_root(path)[source]

Return the root of an existent dataset containing a given path

The root path is returned in the same absolute or relative form as the input argument. If no associated dataset exists, or the input path doesn’t exist, None is returned.

datalad.utils.get_func_kwargs_doc(func)[source]

Provides args for a function

Parameters:func (str) – name of the function from which args are being requested
Returns:of the args that a function takes in
Return type:list
datalad.utils.get_logfilename(dspath, cmd='datalad')[source]

Return a filename to use for logging under a dataset/repository

directory would be created if doesn’t exist, but dspath must exist and be a directory

datalad.utils.get_path_prefix(path, pwd=None)[source]

Get path prefix (for current directory)

Returns relative path to the topdir, if we are under topdir, and if not absolute path to topdir. If pwd is not specified - current directory assumed

datalad.utils.get_tempfile_kwargs(tkwargs=None, prefix='', wrapped=None)[source]

Updates kwargs to be passed to tempfile. calls depending on env vars

datalad.utils.get_timestamp_suffix(time_=None, prefix='-')[source]

Return a time stamp (full date and time up to second)

primarily to be used for generation of log files names

datalad.utils.get_trace(edges, start, end, trace=None)[source]

Return the trace/path to reach a node in a tree.

Parameters:
  • edges (sequence(2-tuple)) – The tree given by a sequence of edges (parent, child) tuples. The nodes can be identified by any value and data type that supports the ‘==’ operation.
  • start – Identifier of the start node. Must be present as a value in the parent location of an edge tuple in order to be found.
  • end – Identifier of the target/end node. Must be present as a value in the child location of an edge tuple in order to be found.
  • trace (list) – Mostly useful for recursive calls, and used internally.
Returns:

Returns a list with the trace to the target (the starts and the target are not included in the trace, hence if start and end are directly connected an empty list is returned), or None when no trace to the target can be found, or start and end are identical.

Return type:

None or list

datalad.utils.getpwd()[source]

Try to return a CWD without dereferencing possible symlinks

If no PWD found in the env, output of getcwd() is returned

datalad.utils.is_explicit_path(path)[source]

Return whether a path explicitly points to a location

Any absolute path, or relative path starting with either ‘../’ or ‘./’ is assumed to indicate a location on the filesystem. Any other path format is not considered explicit.

datalad.utils.is_interactive()[source]

Return True if all in/outs are tty

datalad.utils.knows_annex(path)[source]

Returns whether at a given path there is information about an annex

It is just a thin wrapper around GitRepo.is_with_annex() classmethod which also checks for path to exist first.

This includes actually present annexes, but also uninitialized ones, or even the presence of a remote annex branch.

datalad.utils.line_profile(func)[source]
datalad.utils.lmtime(filepath, mtime)[source]

Set mtime for files, while not de-referencing symlinks.

To overcome absence of os.lutime

Works only on linux and OSX ATM

datalad.utils.make_tempfile(*args, **kwds)[source]

Helper class to provide a temporary file name and remove it at the end (context manager)

Parameters:
  • mkdir (bool, optional (default: False)) – If True, temporary directory created using tempfile.mkdtemp()
  • content (str or bytes, optional) – Content to be stored in the file created
  • wrapped (function, optional) – If set, function name used to prefix temporary file name
  • **tkwargs – All other arguments are passed into the call to tempfile.mk{,d}temp(), and resultant temporary filename is passed as the first argument into the function t. If no ‘prefix’ argument is provided, it will be constructed using module and function names (‘.’ replaced with ‘_’).
  • change the used directory without providing keyword argument 'dir' set (To) –
  • DATALAD_TESTS_TEMP_DIR.

Examples

>>> from os.path import exists
>>> from datalad.utils import make_tempfile
>>> with make_tempfile() as fname:
...    k = open(fname, 'w').write('silly test')
>>> assert not exists(fname)  # was removed
>>> with make_tempfile(content="blah") as fname:
...    assert open(fname).read() == "blah"
datalad.utils.md5sum(filename)[source]
datalad.utils.not_supported_on_windows(msg=None)[source]

A little helper to be invoked to consistently fail whenever functionality is not supported (yet) on Windows

datalad.utils.nothing_cm(*args, **kwds)[source]

Just a dummy cm to programmically switch context managers

datalad.utils.optional_args(decorator)[source]

allows a decorator to take optional positional and keyword arguments. Assumes that taking a single, callable, positional argument means that it is decorating a function, i.e. something like this:

@my_decorator
def function(): pass

Calls decorator with decorator(f, *args, **kwargs)

datalad.utils.path_is_subpath(path, prefix)[source]

Return True if path is a subpath of prefix

It will return False if path == prefix.

Parameters:
  • path (str) –
  • prefix (str) –
datalad.utils.path_startswith(path, prefix)[source]

Return True if path starts with prefix path

Parameters:
  • path (str) –
  • prefix (str) –
datalad.utils.posix_relpath(path, start=None)[source]

Behave like os.path.relpath, but always return POSIX paths…

on any platform.

datalad.utils.rmtemp(f, *args, **kwargs)[source]

Wrapper to centralize removing of temp files so we could keep them around

It will not remove the temporary file/directory if DATALAD_TESTS_TEMP_KEEP environment variable is defined

datalad.utils.rmtree(path, chmod_files='auto', *args, **kwargs)[source]

To remove git-annex .git it is needed to make all files and directories writable again first

Parameters:
  • chmod_files (string or bool, optional) – Either to make files writable also before removal. Usually it is just a matter of directories to have write permissions. If ‘auto’ it would chmod files on windows by default
  • *args
  • **kwargs – Passed into shutil.rmtree call
datalad.utils.rotree(path, ro=True, chmod_files=True)[source]

To make tree read-only or writable

Parameters:
  • path (string) – Path to the tree/directory to chmod
  • ro (bool, optional) – Either to make it R/O (default) or RW
  • chmod_files (bool, optional) – Either to operate also on files (not just directories)
datalad.utils.safe_print(s)[source]

Print with protection against UTF-8 encoding errors

datalad.utils.saved_generator(gen)[source]

Given a generator returns two generators, where 2nd one just replays

So the first one would be going through the generated items and 2nd one would be yielding saved items

datalad.utils.setup_exceptionhook(ipython=False)[source]

Overloads default sys.excepthook with our exceptionhook handler.

If interactive, our exceptionhook handler will invoke pdb.post_mortem; if not interactive, then invokes default handler.

datalad.utils.shortened_repr(value, l=30)[source]
datalad.utils.slash_join(base, extension)[source]

Join two strings with a ‘/’, avoiding duplicate slashes

If any of the strings is None the other is returned as is.

datalad.utils.sorted_files(dout)[source]

Return a (sorted) list of files under dout

datalad.utils.swallow_logs(*args, **kwds)[source]

Context manager to consume all logs.

datalad.utils.swallow_outputs(*args, **kwds)[source]

Context manager to help consuming both stdout and stderr, and print()

stdout is available as cm.out and stderr as cm.err whenever cm is the yielded context manager. Internally uses temporary files to guarantee absent side-effects of swallowing into StringIO which lacks .fileno.

print mocking is necessary for some uses where sys.stdout was already bound to original sys.stdout, thus mocking it later had no effect. Overriding print function had desired effect

datalad.utils.try_multiple(ntrials, exception, base, f, *args, **kwargs)[source]

Call f multiple times making exponentially growing delay between the calls

datalad.utils.unique(seq, key=None)[source]

Given a sequence return a list only with unique elements while maintaining order

This is the fastest solution. See https://www.peterbe.com/plog/uniqifiers-benchmark and http://stackoverflow.com/a/480227/1265472 for more information. Enhancement – added ability to compare for uniqueness using a key function

Parameters:
  • seq – Sequence to analyze
  • key (callable, optional) – Function to call on each element so we could decide not on a full element, but on its member etc
datalad.utils.updated(d, update)[source]

Return a copy of the input with the ‘update’

Primarily for updating dictionaries

datalad.utils.with_pathsep(path)[source]

Little helper to guarantee that path ends with /