caput.memdata#

A backend format for in-memory or on-disk HDF5-like datasets.

It is useful to have a consistent API for data that is independent of whether that data lives on disk or in memory. h5py provides this to a certain extent, having h5py.Dataset objects that act very much like numpy arrays. memdata extends this, providing in-memory containers, analogous to h5py.Group, h5py.AttributeManager and h5py.Dataset objects.

In addition to these basic classes that copy the h5py API, a higher-level data container is provided that utilizes these classes along with the h5py to provide data that is transparently stored either in memory or on disk.

This also allows the creation and use of memdata objects which can hold data distributed over a number of MPI processes. These MemDatasetDistributed datasets hold MPIArray objects and can be written to, and loaded from disk like normal memdata objects. Support for this must be explicitly enabled in the root group at creation with the distributed=True flag.

Warning

It has been observed that the parallel write of distributed datasets can lock up. This was when using macOS using ompio of OpenMPI 3.0. Switching to romio as the MPI-IO backend helped here, but please report any further issues.

Submodules#

Classes#

MemAttrs

In memory implementation of the h5py.AttributeManager.

MemDataset

Base class for an in memory implementation of h5py.Dataset.

MemDatasetCommon

In memory implementation of h5py.Dataset.

MemDatasetDistributed

Parallel, in-memory implementation of h5py.Dataset.

MemDiskGroup

Group whose data may either be stored on disk or in memory.

MemGroup

In memory implementation of the h5py.Group.

_BaseGroup

Implement the majority of the Group interface.

_MemObjMixin

Mixin represents the identity of an in-memory h5py-like object.

_Storage

Underlying container that provides storage backing for in-memory groups.

_StorageRoot

Root level of the storage tree.

lock_file

Manage a lock file around a file creation operation.

ro_dict

A dict that is read-only to the user.

Functions#

copyattrs(a1, a2[, convert_strings])

Copy attributes from one attribute-like object to another.

deep_group_copy(g1, g2[, selections, ...])

Copy full data tree from one group to another.

get_file(f[, file_format])

Checks if input is a zarr/h5py.File or filename and returns the former.

is_group(obj)

Check if the object is a Group, which includes File objects.