Task#

class caput.pipeline.Task[source]#

Bases: caput.config.Reader

Base class for all pipeline tasks.

All pipeline tasks should inherit from this class, with functionality and analysis added by over-riding __init__, setup(), validate(), next() and/or finish().

In addition, input parameters may be specified by adding class attributes which are instances of Property. These will then be read from the pipeline yaml file when the pipeline is initialized. The class attributes will be overridden with instance attributes with the same name but with the values specified in the pipeline file.

Attributes:
broadcast_inputsbool

If true, input queues will be broadcast to process all combinations of entries. Otherwise, items in input queues are removed at equal rate. NOT CURRENTLY IMPLEMENTED

limit_outputsint

Limits the number of next outputs from this task before finishing. Default is None, allowing an unlimited number of next products.

base_priorityint

Base integer priority. Priority only matters relative to other tasks in a pipeline, with run order given by sorted(priorities, reverse=True). Task priority is also adjusted based on net difference in input and output, which will typically adjust priority by +/- (0 to 2). base_priority should be set accordingly - factors of 10 (i.e. -10, 10, 20, …) are effective at forcing a task to have highest/lowest priority relative to other tasks. base_priority should be used sparingly when a user wants to enforce a specific non-standard pipeline behaviour. See method priority for details about dynamic priority. Default is 0.

breakpointbool

If true, signals to the pipeline runner to make a call to breakpoint each time this task is run. This will drop the interpreter into pdb, allowing for interactive debugging of the current pipeline and task state. Default is False.

property cacheable[source]#

Override to return True if caching results is implemented.

No caching infrastructure has yet been implemented.

property embarrassingly_parallelizable[source]#

Override to return True if next() is trivially parallelizeable.

This property tells the pipeline that the problem can be parallelized trivially. This only applies to the next() method, which should not change the state of the task.

If this returns True, then the Pipeline will execute next() many times in parallel and handle all the intermediate data efficiently. Otherwise next() must be parallelized internally if at all. setup() and finish() must always be parallelized internally.

Usage of this has not implemented.

property mem_used[source]#

Return the approximate total memory referenced by this task.

Methods#

__str__()

String representation of the task and its state.

finish()

Final analysis stage of pipeline task.

next([input])

Iterative analysis stage of pipeline task.

setup([requires])

First analysis stage of pipeline task.

validate()

Validate the task after instantiation.