Manager#

class caput.pipeline.Manager(psutil_profiling: bool = False)[source]#

Bases: caput.config.Reader

Pipeline manager for setting up and running pipeline tasks.

The manager is in charge of initializing all pipeline tasks, setting them up by providing the appropriate parameters, then executing the methods of the each task in the appropriate order. It also handles intermediate data products and ensuring that the correct products are passed between tasks.

Parameters:
psutil_profilingbool, optional

Use psutil to profile CPU and memory usage. Default is False.

Attributes:
loggingdict[str, str]

Log levels per module. The key “root” stores the root log level.

task_specslist

Configuration of pipeline tasks.

key_patternstr

Regex pattern to match on in order to pass a key to subsequent tasks. This is useful for controlling which keys are passed in tasks which produce multiple outputs. Default is [^\W_], which will cause any key that contains no alphanumeric characters to be ignored.

interactivebool

If True, the :py:meth`~.Manager.run` method becomes a generator and stops on each iteration after selecting the next task to run. This allows a user to interact with the pipeline at each step and to probe the internal state of each task. This feature should be used as follows:

  • p = Manager()

  • p.interactive = True

  • runner = p.runner()

  • next(runner)

Default is False.

enable_breakpointsbool

If True, task breakpoints are enabled. If a task requests a breakpoint, a call to :py:meth`~.Task.breakpoint` is made every time the task is selected to be run. If interactive is True, this does nothing. Default is False.

versionsbool

Module names (str). This list together with the version strings from these modules are attached to output metadata. Default is [].

save_configbool

If this is True, the global pipeline configuration is attached to output metadata. Default is True.

Methods#

add_task(task[, task_spec])

Add a task instance to the pipeline.

from_yaml_file(file_name[, lint, psutil_profiling])

Initialize the pipeline from a YAML configuration file.

from_yaml_str(yaml_doc[, lint, psutil_profiling])

Initialize the pipeline from a YAML configuration string.

run()

Run the pipeline through to completion.

runner()

Main driver for the pipeline.