ContainerTask#

class caput.pipeline.tasklib.base.ContainerTask[source]#

Bases: MPILoggedTask, caput.pipeline.extensions.ContainerIOMixin

Implements a task whose inputs and outputs are Container objects.

This task implements writing of the output when requested, and handles various types of metadata associated with the container objects.

Tasks inheriting from this class should override process() and optionally setup() or process_finish(). They should not override next() or finish().

Output will be written (using write_output()) to the file self.output_name.

Attributes:

savebool | list[bool], optional

Whether to save the output to disk or not. Can be provided as a list if multiple outputs are being handled. Default is False.

attrsdict | None, optional

A mapping of attribute names and values to set in the .attrs at the root of the output container. String values will be formatted according to the standard Python .format(…) rules, and can interpolate several other values into the string. These are:

count: an integer giving which iteration of the task is this.
tag: a string identifier for the output derived from the
containers tag attribute. If that attribute is not present count is used instead.
key: the name of the output key.
task: the (unqualified) name of the task.
input_tags: a list of the tags for each input argument for the task.
Any existing attribute in the container can be interpolated by the name of its key. The specific values above will override any attribute with the same name.

Incorrectly formatted values will cause an error to be thrown. Default is None

tagstr, optional

Set a format for the tag attached to the output. This is a Python format string which can interpolate the variables listed under attrs above. For example a tag of “cat{count}” will generate catalogs with the tags “cat1”, “cat2”, etc. Default is {tag}.

output_namestr | list[str], optional

A python format string used to construct the filename. All variables given under attrs above can be interpolated into the filename. Can be provided as a list if multiple output are being handled. Valid identifiers are:

count: an integer giving which iteration of the task is this.

tag: a string identifier for the output derived from the
containers tag attribute. If that attribute is not present count is used instead.

key: the name of the output key.

task: the (unqualified) name of the task.

output_root: the value of the output root argument. This is deprecated
and is just used for legacy support. The default value of output_name means the previous behaviour works.

Default is {output_root}{tag}.h5.

compressionbool | dict, optional

Set compression options for each dataset. Provided as a dict with the dataset names as keys and values for chunks, compression, and compression_opts. Any datasets not included in the dict (including if the dict is empty), will use the default parameters set in the dataset spec. If set to False (or anything that evaluates to False, other than an empty dict) chunks and compression will be disabled for all datasets. If no argument in provided, the default parameters set in the dataset spec are used. Note that this will modify these parameters on the container itself, such that if it is written out again downstream in the pipeline these will be used. Default is True.

output_rootstr, optional

Pipeline settable parameter giving the first part of the output path. Deprecated in favour of specifying the output path directly in output_name.

nan_checkbool, optional

Check the output for NaNs (and infs) logging if they are present. Default is True.

nan_dumpbool, optional

If NaN’s are found, dump the container to disk. Default is True.

nan_skipbool, optional

If NaN’s are found, don’t pass on the output. Default is True.

versionsdict[str, str], optional

Keys are module names (str) and values are their version strings. This is attached to output metadata. Default is {}.

pipeline_configdict, optional

Global pipeline configuration. This is attached to output metadata. Default is {}.

Raises:

PipelineRuntimeError: If this is used as a baseclass to a task overriding self.process with variable length or optional arguments.

Methods#

`finish`()	Should not need to override. Implement `process_finish()` instead.
`next`(*input)	Iterates through inputs.