datalad.utils
- class datalad.utils.ArgSpecFake(args, varargs, keywords, defaults)[source]
Bases:
NamedTuple
-
args:
list
[str
] Alias for field number 0
-
defaults:
Optional
[tuple
[Any
,...
]] Alias for field number 3
-
keywords:
Optional
[str
] Alias for field number 2
-
varargs:
Optional
[str
] Alias for field number 1
-
args:
- class datalad.utils.File(name, executable=False)[source]
Bases:
object
Helper for a file entry in the create_tree/@with_tree
It allows to define additional settings for entries
- Parameters:
name (
str
)executable (
bool
)
- class datalad.utils.SequenceFormatter(separator=' ', element_formatter=<string.Formatter object>, *args, **kwargs)[source]
Bases:
Formatter
string.Formatter subclass with special behavior for sequences.
This class delegates formatting of individual elements to another formatter object. Non-list objects are formatted by calling the delegate formatter’s “format_field” method. List-like objects (list, tuple, set, frozenset) are formatted by formatting each element of the list according to the specified format spec using the delegate formatter and then joining the resulting strings with a separator (space by default).
- Parameters:
separator (
str
)element_formatter (
Formatter
)args (
Any
)kwargs (
Any
)
- class datalad.utils.SwallowLogsAdapter(file_)[source]
Bases:
object
Little adapter to help getting out values
And to stay consistent with how swallow_outputs behaves
- Parameters:
file_ (str | Path | None)
- assert_logged(msg=None, level=None, regex=True, **kwargs)[source]
Provide assertion on whether a msg was logged at a given level
If neither msg nor level provided, checks if anything was logged at all.
- Parameters:
msg (str, optional) – Message (as a regular expression, if regex) to be searched. If no msg provided, checks if anything was logged at a given level.
level (str, optional) – String representing the level to be logged
regex (bool, optional) – If False, regular assert_in is used
**kwargs (str, optional) – Passed to assert_re_in or assert_in
- Return type:
None
- property handle: IO[str]
- property lines: list[str]
- property out: str
- class datalad.utils.SwallowOutputsAdapter[source]
Bases:
object
Little adapter to help getting out/err values
- property err: str
- property handles: tuple[TextIO, TextIO]
- property out: str
- datalad.utils.any_re_search(regexes, value)[source]
Return if any of regexes (list or str) searches successfully for value
- Parameters:
regexes (str | list[str])
value (str)
- Return type:
bool
- datalad.utils.assure_bool(s)
Note: This function is deprecated. Use ensure_bool instead.
- Parameters:
s (
Any
)- Return type:
bool
- datalad.utils.assure_bytes(s, encoding='utf-8')
Note: This function is deprecated. Use ensure_bytes instead.
- Parameters:
s (str | bytes)
encoding (str)
- Return type:
bytes
- datalad.utils.assure_dict_from_str(s, sep='\\n')
Note: This function is deprecated. Use ensure_dict_from_str instead.
- Parameters:
s (str | dict[K, V])
sep (str)
- Return type:
Optional[dict[str, str]] | Optional[dict[K, V]]
- datalad.utils.assure_dir(*args)
Note: This function is deprecated. Use ensure_dir instead.
- Parameters:
args (
str
)- Return type:
str
- datalad.utils.assure_iter(s, cls, copy=False, iterate=True)
Note: This function is deprecated. Use ensure_iter instead.
- Parameters:
s (
Any
)cls (
type
[TypeVar
(ListOrSet
,list
,set
)])copy (
bool
)iterate (
bool
)
- Return type:
TypeVar
(ListOrSet
,list
,set
)
- datalad.utils.assure_list(s, copy=False, iterate=True)
Note: This function is deprecated. Use ensure_list instead.
- Parameters:
s (
Any
)copy (
bool
)iterate (
bool
)
- Return type:
list
- datalad.utils.assure_list_from_str(s, sep='\\n')
Note: This function is deprecated. Use ensure_list_from_str instead.
- Parameters:
s (str | list[T])
sep (str)
- Return type:
Optional[list[str]] | Optional[list[T]]
- datalad.utils.assure_tuple_or_list(obj)
Note: This function is deprecated. Use ensure_tuple_or_list instead.
- Parameters:
obj (Any)
- Return type:
list | tuple
- datalad.utils.assure_unicode(s, encoding=None, confidence=None)
Note: This function is deprecated. Use ensure_unicode instead.
- Parameters:
s (str | bytes)
encoding (Optional[str])
confidence (Optional[float])
- Return type:
str
- datalad.utils.auto_repr(cls, short=True)[source]
Decorator for a class to assign it an automagic quick and dirty __repr__
It uses public class attributes to prepare repr of a class
Original idea: http://stackoverflow.com/a/27799004/1265472
- Parameters:
cls (
type
[TypeVar
(T
)])short (
bool
)
- Return type:
type
[TypeVar
(T
)]
- datalad.utils.bytes2human(n, format='%(value).1f %(symbol)sB')[source]
Convert n bytes into a human readable string based on format. symbols can be either “customary”, “customary_ext”, “iec” or “iec_ext”, see: http://goo.gl/kTQMs
>>> from datalad.utils import bytes2human >>> bytes2human(1) '1.0 B' >>> bytes2human(1024) '1.0 KB' >>> bytes2human(1048576) '1.0 MB' >>> bytes2human(1099511627776127398123789121) '909.5 YB'
>>> bytes2human(10000, "%(value).1f %(symbol)s/sec") '9.8 K/sec'
>>> # precision can be adjusted by playing with %f operator >>> bytes2human(10000, format="%(value).5f %(symbol)s") '9.76562 K'
Taken from: http://goo.gl/kTQMs and subsequently simplified Original Author: Giampaolo Rodola’ <g.rodola [AT] gmail [DOT] com> License: MIT
- Parameters:
n (int | float)
format (str)
- Return type:
str
- datalad.utils.check_symlink_capability(path, target)[source]
helper similar to datalad.tests.utils_pytest.has_symlink_capability
However, for use in a datalad command context, we shouldn’t assume to be able to write to tmpfile and also not import a whole lot from datalad’s test machinery. Finally, we want to know, whether we can create a symlink at a specific location, not just somewhere. Therefore use arbitrary path to test-build a symlink and delete afterwards. Suitable location can therefore be determined by high lever code.
- Parameters:
path (Path)
target (Path)
- Return type:
bool
- class datalad.utils.chpwd(path, mkdir=False, logsuffix='')[source]
Bases:
object
Wrapper around os.chdir which also adjusts environ[‘PWD’]
The reason is that otherwise PWD is simply inherited from the shell and we have no ability to assess directory path without dereferencing symlinks.
If used as a context manager it allows to temporarily change directory to the given path
- Parameters:
path (str | Path | None)
mkdir (bool)
logsuffix (str)
- datalad.utils.collect_method_callstats(func)[source]
Figure out methods which call the method repeatedly on the same instance
- Use case(s):
.repo is expensive since does all kinds of checks.
.config is expensive transitively since it calls .repo each time
Todo
fancy one could look through the stack for the same id(self) to see if that location is already in memo. That would hint to the cases where object is not passed into underlying functions, causing them to redo the same work over and over again
ATM might flood with all “1 lines” calls which are not that informative. The underlying possibly suboptimal use might be coming from their callers. It might or not relate to the previous TODO
- Parameters:
func (
Callable
[[ParamSpec
(P
)],TypeVar
(T
)])- Return type:
Callable
[[ParamSpec
(P
)],TypeVar
(T
)]
- datalad.utils.create_tree(path, tree, archives_leading_dir=True, remove_existing=False)[source]
Given a list of tuples (name, load) create such a tree
if load is a tuple itself – that would create either a subtree or an archive with that content and place it into the tree if name ends with .tar.gz
- datalad.utils.create_tree_archive(path, name, load, overwrite=False, archives_leading_dir=True)[source]
Given an archive name, create under path with specified load tree
- datalad.utils.decode_input(s)[source]
Given input string/bytes, decode according to stdin codepage (or UTF-8) if not defined
If fails – issue warning and decode allowing for errors being replaced
- Parameters:
s (str | bytes)
- Return type:
str
- datalad.utils.disable_logger(logger=None)[source]
context manager to temporarily disable logging
This is to provide one of swallow_logs’ purposes without unnecessarily creating temp files (see gh-1865)
- Parameters:
logger (Logger) – Logger whose handlers will be ordered to not log anything. Default: datalad’s topmost Logger (‘datalad’)
- Return type:
Iterator
[Logger
]
- datalad.utils.dlabspath(path, norm=False)[source]
Symlinks-in-the-cwd aware abspath
os.path.abspath relies on os.getcwd() which would not know about symlinks in the path
TODO: we might want to norm=True by default to match behavior of os .path.abspath?
- Parameters:
path (str | Path)
norm (bool)
- Return type:
str
- datalad.utils.encode_filename(filename)[source]
Encode unicode filename
- Parameters:
filename (str | bytes)
- Return type:
bytes
- datalad.utils.ensure_bool(s)[source]
Convert value into boolean following convention for strings
to recognize on,True,yes as True, off,False,no as False
- Parameters:
s (
Any
)- Return type:
bool
- datalad.utils.ensure_bytes(s, encoding='utf-8')[source]
Convert/encode unicode string to bytes.
If s isn’t a string, return it as is.
- Parameters:
encoding (str, optional) – Encoding to use. “utf-8” is the default
s (str | bytes)
- Return type:
bytes
- datalad.utils.ensure_dict_from_str(s, sep='\\n')[source]
Given a multiline string with key=value items convert it to a dictionary
- Parameters:
s (str or dict)
empty (Returns None if input s is)
sep (str)
- Return type:
Optional[dict[str, str]] | Optional[dict[K, V]]
- datalad.utils.ensure_dir(*args)[source]
Make sure directory exists.
Joins the list of arguments to an os-specific path to the desired directory and creates it, if it not exists yet.
- Parameters:
args (
str
)- Return type:
str
- datalad.utils.ensure_iter(s, cls, copy=False, iterate=True)[source]
Given not a list, would place it into a list. If None - empty list is returned
- Parameters:
s (list or anything)
cls (class) – Which iterable class to ensure
copy (bool, optional) – If correct iterable is passed, it would generate its shallow copy
iterate (bool, optional) – If it is not a list, but something iterable (but not a str) iterate over it.
- Return type:
TypeVar
(ListOrSet
,list
,set
)
- datalad.utils.ensure_list(s, copy=False, iterate=True)[source]
Given not a list, would place it into a list. If None - empty list is returned
- Parameters:
s (list or anything)
copy (bool, optional) – If list is passed, it would generate a shallow copy of the list
iterate (bool, optional) – If it is not a list, but something iterable (but not a str) iterate over it.
- Return type:
list
- datalad.utils.ensure_list_from_str(s, sep='\\n')[source]
Given a multiline string convert it to a list of return None if empty
- Parameters:
s (str or list)
sep (str)
- Return type:
Optional[list[str]] | Optional[list[T]]
- datalad.utils.ensure_result_list(r)[source]
Return a list of result records
Largely same as ensure_list, but special casing a single dict being passed in, which a plain ensure_list would iterate over. Hence, this deals with the three ways datalad commands return results: - single dict - list of dicts - generator
Used for result assertion helpers.
- Parameters:
r (
Any
)- Return type:
list
- datalad.utils.ensure_tuple_or_list(obj)[source]
Given an object, wrap into a tuple if not list or tuple
- Parameters:
obj (Any)
- Return type:
list | tuple
- datalad.utils.ensure_unicode(s, encoding=None, confidence=None)[source]
Convert/decode bytestring to unicode.
If s isn’t a bytestring, return it as is.
- Parameters:
encoding (str, optional) – Encoding to use. If None, “utf-8” is tried, and then if not a valid UTF-8, encoding will be guessed
confidence (float, optional) – A value between 0 and 1, so if guessing of encoding is of lower than specified confidence, ValueError is raised
s (str | bytes)
- Return type:
str
- datalad.utils.ensure_write_permission(path)[source]
Context manager to get write permission on path and restore original mode afterwards.
- Parameters:
path (Path) – path to the target file
- Raises:
PermissionError – if write permission could not be obtained
- Return type:
Iterator
[None
]
- datalad.utils.escape_filename(filename)[source]
Surround filename in “” and escape “ in the filename
- Parameters:
filename (
str
)- Return type:
str
- datalad.utils.expandpath(path, force_absolute=True)[source]
Expand all variables and user handles in a path.
By default return an absolute path
- Parameters:
path (str | Path)
force_absolute (bool)
- Return type:
str
- datalad.utils.file_basename(name, return_ext=False)[source]
Strips up to 2 extensions of length up to 4 characters and starting with alpha not a digit, so we could get rid of .tar.gz etc
- Parameters:
name (str | Path)
return_ext (bool)
- Return type:
str | tuple[str, str]
- datalad.utils.find_files(regex, topdir='.', exclude=None, exclude_vcs=True, exclude_datalad=False, dirs=False)[source]
Generator to find files matching regex
- Parameters:
regex (string)
exclude (string, optional) – Matches to exclude
exclude_vcs (bool) – If True, excludes commonly known VCS subdirectories. If string, used as regex to exclude those files (regex: ‘/\.(?:git|gitattributes|svn|bzr|hg)(?:/|$)’)
exclude_datalad (bool) – If True, excludes files known to be datalad meta-data files (e.g. under .datalad/ subdirectory) (regex: ‘/\.(?:datalad)(?:/|$)’)
topdir (string, optional) – Directory where to search
dirs (bool, optional) – Whether to match directories as well as files
- Return type:
Iterator[str]
- datalad.utils.generate_chunks(container, size)[source]
Given a container, generate chunks from it with size up to size
- Parameters:
container (
list
[TypeVar
(T
)])size (
int
)
- Return type:
Iterator
[list
[TypeVar
(T
)]]
- datalad.utils.generate_file_chunks(files, cmd=None)[source]
Given a list of files, generate chunks of them to avoid exceeding cmdline length
- Parameters:
files (list of str)
cmd (str or list of str, optional) – Command to account for as well
- Return type:
Iterator[list[str]]
- datalad.utils.get_dataset_root(path)[source]
Return the root of an existent dataset containing a given path
The root path is returned in the same absolute or relative form as the input argument. If no associated dataset exists, or the input path doesn’t exist, None is returned.
If path is a symlink or something other than a directory, its the root dataset containing its parent directory will be reported. If none can be found, at a symlink at path is pointing to a dataset, path itself will be reported as the root.
- Parameters:
path (Path-like)
- Return type:
str or None
- datalad.utils.get_encoding_info()[source]
Return a dictionary with various encoding/locale information
- Return type:
dict
[str
,str
]
- datalad.utils.get_home_envvars(new_home)[source]
Return dict with env variables to be adjusted for a new HOME
Only variables found in current os.environ are adjusted.
- Parameters:
new_home (str or Path) – New home path, in native to OS “schema”
- Return type:
dict[str, str]
- datalad.utils.get_ipython_shell()[source]
Detect if running within IPython and returns its ip (shell) object
Returns None if not under ipython (no get_ipython function)
- Return type:
Optional
[Any
]
- datalad.utils.get_linux_distribution()[source]
Compatibility wrapper for {platform,distro}.linux_distribution().
- Return type:
tuple
[str
,str
,str
]
- datalad.utils.get_logfilename(dspath, cmd='datalad')[source]
Return a filename to use for logging under a dataset/repository
directory would be created if doesn’t exist, but dspath must exist and be a directory
- Parameters:
dspath (str | Path)
cmd (str)
- Return type:
str
- datalad.utils.get_open_files(path, log_open=False)[source]
Get open files under a path
Note: This function is very slow on Windows.
- Parameters:
path (str) – File or directory to check for open files under
log_open (bool or int) – If set - logger level to use
- Returns:
path : pid
- Return type:
dict
- datalad.utils.get_path_prefix(path, pwd=None)[source]
Get path prefix (for current directory)
Returns relative path to the topdir, if we are under topdir, and if not absolute path to topdir. If pwd is not specified - current directory assumed
- Parameters:
path (str | Path)
pwd (Optional[str])
- Return type:
str
- datalad.utils.get_sig_param_names(f, kinds)[source]
A helper to selectively return parameters from inspect.signature.
inspect.signature is the ultimate way for introspecting callables. But its interface is not so convenient for a quick selection of parameters (AKA arguments) of desired type or combinations of such. This helper should make it easier to retrieve desired collections of parameters.
Since often it is desired to get information about multiple specific types of parameters, kinds is a list, so in a single invocation of signature and looping through the results we can obtain all information.
- Parameters:
f (callable)
kinds (tuple with values from {'pos_any', 'pos_only', 'kw_any', 'kw_only', 'any'}) – Is a list of what kinds of args to return in result (tuple). Each element should be one of: ‘any_pos’ - positional or keyword which could be used positionally. ‘kw_only’ - keyword only (cannot be used positionally) arguments, ‘any_kw` - any keyword (could be a positional which could be used as a keyword), any – any type from the above.
- Returns:
Each element is a list of parameters (names only) of that “kind”.
- Return type:
tuple
- datalad.utils.get_suggestions_msg(values, known, sep='\\n ')[source]
Return a formatted string with suggestions for values given the known ones
- Parameters:
values (Optional[str | Iterable[str]])
known (str)
sep (str)
- Return type:
str
- datalad.utils.get_tempfile_kwargs(tkwargs=None, prefix='', wrapped=None)[source]
Updates kwargs to be passed to tempfile. calls depending on env vars
- Parameters:
tkwargs (
Optional
[dict
[str
,Any
]])prefix (
str
)wrapped (
Optional
[Callable
])
- Return type:
dict
[str
,Any
]
- datalad.utils.get_timestamp_suffix(time_=None, prefix='-')[source]
Return a time stamp (full date and time up to second)
primarily to be used for generation of log files names
- Parameters:
time_ (int | time.struct_time | None)
prefix (str)
- Return type:
str
- datalad.utils.get_trace(edges, start, end, trace=None)[source]
Return the trace/path to reach a node in a tree.
- Parameters:
edges (sequence(2-tuple)) – The tree given by a sequence of edges (parent, child) tuples. The nodes can be identified by any value and data type that supports the ‘==’ operation.
start (
TypeVar
(T
)) – Identifier of the start node. Must be present as a value in the parent location of an edge tuple in order to be found.end (
TypeVar
(T
)) – Identifier of the target/end node. Must be present as a value in the child location of an edge tuple in order to be found.trace (list) – Mostly useful for recursive calls, and used internally.
- Returns:
Returns a list with the trace to the target (the starts and the target are not included in the trace, hence if start and end are directly connected an empty list is returned), or None when no trace to the target can be found, or start and end are identical.
- Return type:
None or list
- datalad.utils.get_wrapped_class(wrapped)[source]
Determine the command class a wrapped __call__ belongs to
- Parameters:
wrapped (
Callable
)- Return type:
type
- datalad.utils.getargspec(func, *, include_kwonlyargs=False)[source]
Compat shim for getargspec deprecated in python 3.
The main difference from inspect.getargspec (and inspect.getfullargspec for that matter) is that by using inspect.signature we are providing correct args/defaults for functools.wraps’ed functions.
include_kwonlyargs option was added to centralize getting all args, even the ones which are kwonly (follow the
*,
).For internal use and not advised for use in 3rd party code. Please use inspect.signature directly.
- Parameters:
func (
Callable
[...
,Any
])include_kwonlyargs (
bool
)
- Return type:
- datalad.utils.getpwd()[source]
Try to return a CWD without dereferencing possible symlinks
This function will try to use PWD environment variable to provide a current working directory, possibly with some directories along the path being symlinks to other directories. Unfortunately, PWD is used/set only by the shell and such functions as os.chdir and os.getcwd nohow use or modify it, thus os.getcwd() returns path with links dereferenced.
While returning current working directory based on PWD env variable we verify that the directory is the same as os.getcwd() after resolving all symlinks. If that verification fails, we fall back to always use os.getcwd().
Initial decision to either use PWD env variable or os.getcwd() is done upon the first call of this function.
- Return type:
str
- datalad.utils.guard_for_format(arg)[source]
Replace { and } with {{ and }}
To be used in cases if arg is not expected to have provided by user .format() placeholders, but ‘arg’ might become a part of a composite passed to .format(), e.g. via ‘Run’
- Parameters:
arg (
str
)- Return type:
str
- datalad.utils.import_module_from_file(modpath, pkg=None, log=<bound method Logger.debug of <Logger datalad.utils (INFO)>>)[source]
Import provided module given a path
TODO: - RF/make use of it in pipeline.py which has similar logic - join with import_modules above?
- Parameters:
pkg (module, optional) – If provided, and modpath is under pkg.__path__, relative import will be used
modpath (
str
)log (
Callable
[[str
],Any
])
- Return type:
ModuleType
- datalad.utils.import_modules(modnames, pkg, msg='Failed to import {module}', log=<bound method Logger.debug of <Logger datalad.utils (INFO)>>)[source]
Helper to import a list of modules without failing if N/A
- Parameters:
modnames (list of str) – List of module names to import
pkg (str) – Package under which to import
msg (str, optional) – Message template for .format() to log at DEBUG level if import fails. Keys {module} and {package} will be provided and ‘: {exception}’ appended
log (callable, optional) – Logger call to use for logging messages
- Return type:
list
[ModuleType
]
- datalad.utils.is_explicit_path(path)[source]
Return whether a path explicitly points to a location
Any absolute path, or relative path starting with either ‘../’ or ‘./’ is assumed to indicate a location on the filesystem. Any other path format is not considered explicit.
- Parameters:
path (str | Path)
- Return type:
bool
- datalad.utils.is_interactive()[source]
Return True if all in/outs are open and tty.
Note that in a somewhat abnormal case where e.g. stdin is explicitly closed, and any operation on it would raise a ValueError(“I/O operation on closed file”) exception, this function would just return False, since the session cannot be used interactively.
- Return type:
bool
- datalad.utils.join_cmdline(args)[source]
Join command line args into a string using quote_cmdlinearg
- Parameters:
args (
Iterable
[str
])- Return type:
str
- datalad.utils.knows_annex(path)[source]
Returns whether at a given path there is information about an annex
It is just a thin wrapper around GitRepo.is_with_annex() classmethod which also checks for path to exist first.
This includes actually present annexes, but also uninitialized ones, or even the presence of a remote annex branch.
- Parameters:
path (str | Path)
- Return type:
bool
- datalad.utils.line_profile(func)[source]
Q&D helper to line profile the function and spit out stats
- Parameters:
func (
Callable
[[ParamSpec
(P
)],TypeVar
(T
)])- Return type:
Callable
[[ParamSpec
(P
)],TypeVar
(T
)]
- datalad.utils.lmtime(filepath, mtime)[source]
Set mtime for files, while not de-referencing symlinks.
To overcome absence of os.lutime
Works only on linux and OSX ATM
- Parameters:
filepath (str | Path)
mtime (int | float)
- Return type:
None
- datalad.utils.lock_if_required(lock_required, lock)[source]
Acquired and released the provided lock if indicated by a flag
- Parameters:
lock_required (
bool
)lock (
allocate_lock
)
- Return type:
Iterator
[allocate_lock
]
- datalad.utils.make_tempfile(content=None, wrapped=None, **tkwargs)[source]
Helper class to provide a temporary file name and remove it at the end (context manager)
- Parameters:
mkdir (bool, optional (default: False)) – If True, temporary directory created using tempfile.mkdtemp()
content (str or bytes, optional) – Content to be stored in the file created
wrapped (function, optional) – If set, function name used to prefix temporary file name
**tkwargs – All other arguments are passed into the call to tempfile.mk{,d}temp(), and resultant temporary filename is passed as the first argument into the function t. If no ‘prefix’ argument is provided, it will be constructed using module and function names (‘.’ replaced with ‘_’).
set (To change the used directory without providing keyword argument 'dir')
DATALAD_TESTS_TEMP_DIR.
- Return type:
Iterator[str]
Examples
>>> from os.path import exists >>> from datalad.utils import make_tempfile >>> with make_tempfile() as fname: ... k = open(fname, 'w').write('silly test') >>> assert not exists(fname) # was removed
>>> with make_tempfile(content="blah") as fname: ... assert open(fname).read() == "blah"
- Parameters:
tkwargs (Any)
- datalad.utils.map_items(func, v)[source]
A helper to apply func to all elements (keys and values) within dict
No type checking of values passed to func is done, so func should be resilient to values which it should not handle
Initial usecase - apply_recursive(url_fragment, ensure_unicode)
- datalad.utils.md5sum(filename)[source]
Compute an MD5 sum for the given file
- Parameters:
filename (str | Path)
- Return type:
str
- datalad.utils.never_fail(f)[source]
Assure that function never fails – all exceptions are caught
Returns None if function fails internally.
- Parameters:
f (
Callable
[[ParamSpec
(P
)],TypeVar
(T
)])- Return type:
Callable
[[ParamSpec
(P
)],Optional
[TypeVar
(T
)]]
- datalad.utils.not_supported_on_windows(msg=None)[source]
A little helper to be invoked to consistently fail whenever functionality is not supported (yet) on Windows
- Parameters:
msg (
Optional
[str
])- Return type:
None
- datalad.utils.nothing_cm()[source]
Just a dummy cm to programmically switch context managers
- Return type:
Iterator
[None
]
- datalad.utils.obtain_write_permission(path)[source]
Obtains write permission for path and returns previous mode if a change was actually made.
- Parameters:
path (Path) – path to try to obtain write permission for
- Returns:
previous mode of path as return by stat().st_mode if a change in permission was actually necessary, None otherwise.
- Return type:
int or None
- datalad.utils.open_r_encdetect(fname, readahead=1000)[source]
Return a file object in read mode with auto-detected encoding
This is helpful when dealing with files of unknown encoding.
- Parameters:
readahead (int, optional) – How many bytes to read for guessing the encoding type. If negative - full file will be read
fname (str | Path)
- Return type:
IO[str]
- datalad.utils.optional_args(decorator)[source]
allows a decorator to take optional positional and keyword arguments. Assumes that taking a single, callable, positional argument means that it is decorating a function, i.e. something like this:
@my_decorator def function(): pass
Calls decorator with decorator(f, *args, **kwargs)
- datalad.utils.partition(items, predicate=<class 'bool'>)[source]
Partition items by predicate.
- Parameters:
items (iterable)
predicate (callable) – A function that will be mapped over each element in items. The elements will partitioned based on whether the return value is false or true.
- Return type:
tuple
[Iterator
[TypeVar
(T
)],Iterator
[TypeVar
(T
)]]- Returns:
A tuple with two generators, the first for ‘false’ items and the second for
’true’ ones.
Notes
Taken from Peter Otten’s snippet posted at https://nedbatchelder.com/blog/201306/filter_a_list_into_two_parts.html
- datalad.utils.path_is_subpath(path, prefix)[source]
Return True if path is a subpath of prefix
It will return False if path == prefix.
- Parameters:
path (str)
prefix (str)
- Return type:
bool
- datalad.utils.path_startswith(path, prefix)[source]
Return True if path starts with prefix path
- Parameters:
path (str)
prefix (str)
- Return type:
bool
- datalad.utils.posix_relpath(path, start=None)[source]
Behave like os.path.relpath, but always return POSIX paths…
on any platform.
- Parameters:
path (str | Path)
start (Optional[str | Path])
- Return type:
str
- datalad.utils.quote_cmdlinearg(arg)[source]
Perform platform-appropriate argument quoting
- Parameters:
arg (
str
)- Return type:
str
- datalad.utils.read_csv_lines(fname, dialect=None, readahead=16384, **kwargs)[source]
A generator of dict records from a CSV/TSV
Automatically guesses the encoding for each record to convert to UTF-8
- Parameters:
fname (str) – Filename
dialect (str, optional) – Dialect to specify to csv.reader. If not specified – guessed from the file, if fails to guess, “excel-tab” is assumed
readahead (int, optional) – How many bytes to read from the file to guess the type
**kwargs (Any) – Passed to csv.reader
- Return type:
Iterator[dict[str, str]]
- datalad.utils.read_file(fname, decode=True)[source]
A helper to read file passing content via ensure_unicode
- Parameters:
decode (bool, optional) – if False, no ensure_unicode and file content returned as bytes
fname (str | Path)
- Return type:
str | bytes
- datalad.utils.rmdir(path, *args, **kwargs)[source]
os.rmdir with our optional checking for open files
- Parameters:
path (str | Path)
args (Any)
kwargs (Any)
- Return type:
None
- datalad.utils.rmtemp(f, *args, **kwargs)[source]
Wrapper to centralize removing of temp files so we could keep them around
It will not remove the temporary file/directory if DATALAD_TESTS_TEMP_KEEP environment variable is defined
- Parameters:
f (str | Path)
args (Any)
kwargs (Any)
- Return type:
None
- datalad.utils.rmtree(path, chmod_files='auto', children_only=False, *args, **kwargs)[source]
To remove git-annex .git it is needed to make all files and directories writable again first
- Parameters:
path (Path or str) – Path to remove
chmod_files (string or bool, optional) – Whether to make files writable also before removal. Usually it is just a matter of directories to have write permissions. If ‘auto’ it would chmod files on windows by default
children_only (bool, optional) – If set, all files and subdirectories would be removed while the path itself (must be a directory) would be preserved
*args
**kwargs – Passed into shutil.rmtree call
args (Any)
kwargs (Any)
- Return type:
None
- datalad.utils.rotree(path, ro=True, chmod_files=True)[source]
To make tree read-only or writable
- Parameters:
path (string) – Path to the tree/directory to chmod
ro (bool, optional) – Whether to make it R/O (default) or RW
chmod_files (bool, optional) – Whether to operate also on files (not just directories)
- Return type:
None
- datalad.utils.saved_generator(gen)[source]
Given a generator returns two generators, where 2nd one just replays
So the first one would be going through the generated items and 2nd one would be yielding saved items
- Parameters:
gen (
Iterable
[TypeVar
(T
)])- Return type:
tuple
[Iterator
[TypeVar
(T
)],Iterator
[TypeVar
(T
)]]
- datalad.utils.slash_join(base, extension)[source]
Join two strings with a ‘/’, avoiding duplicate slashes
If any of the strings is None the other is returned as is.
- Parameters:
base (
Optional
[str
])extension (
Optional
[str
])
- Return type:
Optional
[str
]
- datalad.utils.split_cmdline(s)[source]
Perform platform-appropriate command line splitting.
Identical to shlex.split() on non-windows platforms.
Modified from https://stackoverflow.com/a/35900070
- Parameters:
s (
str
)- Return type:
list
[str
]
- datalad.utils.swallow_logs(new_level=None, file_=None, name='datalad')[source]
Context manager to consume all logs.
- Parameters:
new_level (str | int | None)
file_ (str | Path | None)
name (str)
- Return type:
Iterator[SwallowLogsAdapter]
- datalad.utils.swallow_outputs()[source]
Context manager to help consuming both stdout and stderr, and print()
stdout is available as cm.out and stderr as cm.err whenever cm is the yielded context manager. Internally uses temporary files to guarantee absent side-effects of swallowing into StringIO which lacks .fileno.
print mocking is necessary for some uses where sys.stdout was already bound to original sys.stdout, thus mocking it later had no effect. Overriding print function had desired effect
- Return type:
Iterator
[SwallowOutputsAdapter
]
- datalad.utils.todo_interface_for_extensions(f)[source]
- Parameters:
f (
TypeVar
(T
))- Return type:
TypeVar
(T
)
- datalad.utils.try_multiple(ntrials, exception, base, f, *args, **kwargs)[source]
Call f multiple times making exponentially growing delay between the calls
- datalad.utils.try_multiple_dec(f, ntrials=None, duration=0.1, exceptions=None, increment_type=None, exceptions_filter=None, logger=None)[source]
Decorator to try function multiple times.
Main purpose is to decorate functions dealing with removal of files/directories and which might need a few seconds to work correctly on Windows which takes its time to release files/directories.
- Parameters:
ntrials (int, optional)
duration (float, optional) – Seconds to sleep before retrying.
increment_type ({None, 'exponential'}) – Note that if it is exponential, duration should typically be > 1.0 so it grows with higher power
exceptions (Exception or tuple of Exceptions, optional) – Exception or a tuple of multiple exceptions, on which to retry
exceptions_filter (callable, optional) – If provided, this function will be called with a caught exception instance. If function returns True - we will re-try, if False - exception will be re-raised without retrying.
logger (callable, optional) – Logger to log upon failure. If not provided, will use stock logger at the level of 5 (heavy debug).
f (Callable[P, T])
- Return type:
Callable[P, T]
- datalad.utils.unique(seq, key=None, reverse=False)[source]
Given a sequence return a list only with unique elements while maintaining order
This is the fastest solution. See https://www.peterbe.com/plog/uniqifiers-benchmark and http://stackoverflow.com/a/480227/1265472 for more information. Enhancement – added ability to compare for uniqueness using a key function
- Parameters:
seq (
Sequence
[TypeVar
(T
)]) – Sequence to analyzekey (callable, optional) – Function to call on each element so we could decide not on a full element, but on its member etc
reverse (bool, optional) – If True, uniqueness checked in the reverse order, so that the later ones will take the order
- Return type:
list
[TypeVar
(T
)]
- datalad.utils.unlink(f)[source]
‘Robust’ unlink. Would try multiple times
On windows boxes there is evidence for a latency of more than a second until a file is considered no longer “in-use”. WindowsError is not known on Linux, and if IOError or any other exception is thrown then if except statement has WindowsError in it – NameError also see gh-2533
- Parameters:
f (str | Path)
- Return type:
None