API
- pathtrees.tree(root: Union[str, _TREE_DEF_TYPE, None] = None, paths: Union[str, _TREE_DEF_TYPE, None] = None, data: dict | None = None) Paths[source]
Build paths from a directory spec.
- Parameters
root (str) – the root directory.
paths (dict) – the directory structure.
- Returns
The initialized Paths object
import pathtrees # define the file structure path = pathtrees.tree('{project}', { 'data': { '{sensor_id}': { '': 'sensor', 'audio': { '{file_id:04d}.flac': 'audio' }, 'spl': { 'spl_{file_id:04d}.csv': 'spl' }, 'embeddings': { 'emb_{file_id:04d}.csv': 'embeddings' }, }, }, })
Note
use empty strings to reference the directory. This works because
os.path.join(path, '') == path
- class pathtrees.Path(*args, data: dict | None = None, tree: Paths | None = None)[source]
Represents a
pathlib.Pathwith placeholders for bits of data. It uses python string formatting to let you fill in the missing bits at a later date.path = pathtrees.Path('projects/{name}/images/frame_{frame_id:04d}.jpg') path.update(name='my_project') # loop over all frames for f in path.glob(): # print out some info about each frame data = path.parse(f) print("frame ID:", data['frame_id']) print("path:", f) ... # do something - load an image idk
There are quite a few methods that had to be wrapped from the original path object so that if we manipulate the path in any way that it can copy the extra attributes needed to manage the data.
- rjoinpath(root: PosixPath) Path[source]
Return an absolute form of the path. TODO: is there a better way?
- property copy: P
Creates a copy of the path object so that data can be altered without affecting the original object.
- unspecify(*keys, inplace: bool = True, parent: bool = True) P[source]
Remove keys from path dictionary
- property fully_specified: bool
Check if the path is fully specified (if True, it can be formatted without raising an Underspecified error.).
- format(**kw) str[source]
Insert data into the path string. (Works like string format.)
- Raises
KeyError if the format string is underspecified. –
- partial_format(**kw) str[source]
Format a field, leaving all unspecified fields to be filled later.
- glob_format(**kw) str[source]
Format a field, setting all unspecified fields as a wildcard (asterisk).
- format_path(**kw) PosixPath[source]
Insert data into the path string. (Works like string format.)
- Raises
KeyError if the format string is underspecified. –
- partial_format_path(**kw) P[source]
Format a field, setting all unspecified fields as a wildcard (asterisk).
- glob_format_path(**kw) PosixPath[source]
Format a field, setting all unspecified fields as a wildcard (asterisk).
- maybe_format(**kw) Union[str, P][source]
Try to format a field. If it fails, return as a Path object.
- glob(*fs) List[str][source]
Glob over all unspecified variables.
- Parameters
*path (str) – additional paths to join. e.g. for a directory you can use
"*.txt"to get all .txt files.- Returns
The paths matching the glob pattern.
- Return type
list
- iglob(*fs) Iterable[str][source]
Iterable glob over all unspecified variables. See
glob()for signature.
- rglob(*fs) List[str][source]
Recursive glob over all unspecified variables. See
glob()for signature.
- irglob(*fs) Iterable[str][source]
Iterable, recursive glob over all unspecified variables. See
glob()for signature.
- parse(path: str, use_data: bool = True) dict[source]
Extract variables from a compiled path.
See
parseto understand the amazing witchery that makes this possible!https://pypi.org/project/parse/
- Parameters
path (str) – The path containing data to parse.
use_data (bool) – Should we fill in the data we already have before parsing? This means fewer variables that need to be parsed. Set False if you do not wish to use the data.
- property parents: _PathParents
A sequence of this path’s logical parents.
- absolute()[source]
Return an absolute version of this path. This function works even if the path doesn’t point to anything.
No normalization is done, i.e. all ‘.’ and ‘..’ will be kept along. Use resolve() to get the canonical path to a file.
- expanduser()[source]
Return a new path with expanded ~ and ~user constructs (as returned by os.path.expanduser)
- property parent
The logical parent of the path.
- relative_to(*other)
Return the relative path to another path identified by the passed arguments. If the operation is not possible (because this is not a subpath of the other path), raise ValueError.
- resolve(strict=False)[source]
Make the path absolute, resolving all symlinks on the way and also normalizing it (for example turning slashes into backslashes under Windows).
- with_name(name)
Return a new path with the file name changed.
- with_suffix(suffix)
Return a new path with the file suffix changed. If the path has no suffix, add given suffix. If the given suffix is an empty string, remove the suffix from the path.
- class pathtrees.Paths(paths: Dict[str, 'Path'], data: dict | None = None)[source]
A hierarchy of paths in your project.
You can arbitrarily nest them and it will join all of the keys leading down to that path. The value is the name that you can refer to it by.
# define your file structure. # a common ML experiment structure (for me anyways) paths = Paths.define('./logs', { '{log_id}': { 'model.h5': 'model', 'model_spec.pkl': 'model_spec', 'plots': { 'epoch_{step_name}': { '{plot_name}.png': 'plot', '': 'plot_dir' } }, # a path join hack that gives you: log_dir > ./logs/{log_id} '', 'log_dir', } }) paths.update(log_id='test1', step_name='epoch_100') # get paths by name paths.model # logs/test1/model.h5 paths.model_spec # logs/test1/model_spec.pkl paths.plot # logs/test1/plots/{step_name}/{plot_name}.png # for example, a keras callback that saves a matplotlib plot every epoch class MyCallback(Callback): def on_epoch_end(self, epoch, logs): # creates a copy of the path tree that has step_name=epoch epoch_paths = paths.specify(step_name=epoch) ... # save one plot plt.imsave(epoch_paths.plot.specify(plot_name='confusion_matrix')) ... # save another plot plt.imsave(epoch_paths.plot.specify(plot_name='auc')) # you can glob over any missing data (e.g. step_name => '*') # equivalent to: glob("logs/test1/plots/{step_name}/auc.png") for path in paths.plot.specify(plot_name='auc').glob(): print(path)
- add(root=None, paths=None) Ps[source]
Build paths from a directory spec.
- Parameters
root (str) – the root directory.
paths (dict) – the directory structure.
- Returns
The initialized Paths object
- rjoinpath(path) Paths[source]
Give these paths a new root! Basically doing root / path for all paths in this tree. This is useful if you want to nest a folder inside another.py
- relative_to(path) Paths[source]
Make these paths relative to another path! Basically doing path.relative_to(root) for all paths in this tree. Use this with
with_rootto change the root directory of the paths.
- parse(path, name: str) dict[source]
Parse data from a formatted string (reverse of string format)
- Parameters
path (str) – the string to parse
name (str) – the name of the path pattern to use.
- update(**kw) Ps[source]
Update specified data in place.
paths = pathtrees.tree({'{a}': aaa}) assert not paths.fully_specified paths.update(a=5) assert paths.fully_specified assert paths.data['a'] == 5
- specify(**kw) Ps[source]
Creates a copy of the path tree then updates the copy’s data.
paths = pathtrees.tree({'{a}': aaa}) paths2 = paths.specify(a=5) assert not paths.fully_specified assert paths2.fully_specified assert 'a' not in paths.data assert paths2.data['a'] == 5
Equivalent to:
paths.copy.update(**kw)
- unspecify(*keys, inplace=False, children=True) Paths[source]
Remove keys from paths dictionary.
paths = pathtrees.tree({'{a}': aaa}) paths.update(a=5) assert paths.fully_specified assert paths.data['a'] == 5 paths.unspecify('a') assert not paths.fully_specified assert 'a' not in paths.data
- property fully_specified: bool
Are all paths fully specified?
paths = pathtrees.tree({'{a}': aaa}) assert not paths.fully_specified paths.update(a=5) assert paths.fully_specified
- format(**kw) Dict[str, str][source]
Try to format all paths as strings. Raises Underspecified if data is missing.
- Parameters
**kw – additional data specified for formatting.
- Returns
key is the name of the path, and the value is the formatted
pathlib.Path.- Return type
dict