caput.memh5

Module for making in-memory mock-ups of h5py objects.

It is sometimes useful to have a consistent API for data that is independent of whether that data lives on disk or in memory. h5py provides this to a certain extent, having h5py.Dataset objects that act very much like numpy arrays. memh5 extends this, providing an in-memory containers, analogous to h5py.Group, h5py.AttributeManager and h5py.Dataset objects.

In addition to these basic classes that copy the h5py API, A higher level data container is provided that utilizes these classes along with the h5py to provide data that is transparently stored either in memory or on disk.

This also allows the creation and use of memh5 objects which can hold data distributed over a number of MPI processes. These MemDatasetDistributed datasets hold caput.mpiarray.MPIArray objects and can be written to, and loaded from disk like normal memh5 objects. Support for this must be explicitly enabled in the root group at creation with the distributed=True flag.

Warning

It has been observed that the parallel write of distributed datasets can lock up. This was when using macOS using ompio of OpenMPI 3.0. Switching to romio as the MPI-IO backend helped here, but please report any further issues.

Basic Classes

High Level Container

Utility Functions

class caput.memh5.BasicCont(*args, **kwargs)[source]

Bases: caput.memh5.MemDiskGroup

Basic high level data container.

Inherits from MemDiskGroup.

Basic one-level data container that allows any number of datasets in the root group but no nesting. Data history tracking (in BasicCont.history) and array axis interpretation (in BasicCont.index_map) is also provided.

This container is intended to be an example of how a high level container, with a strictly controlled data layout can be implemented by subclassing MemDiskGroup.

Parameters

constructor. (Parameters are passed through to the base class) –

add_history(name, history=None)[source]

Create a new history entry.

Parameters
  • name (str) – Name for history entry.

  • history – History entry (optional). Needs to be json serializable.

Notes

Previously only dictionaries with depth=1 were supported here. The key/value pairs of these where added as attributes to the history group when written to disk. Reading the old history format is still supported, however the history is now an attribute itself and dictionaries of any depth are allowed as history entries.

create_index_map(axis_name, index_map)[source]

Create a new index map.

create_reverse_map(axis_name, index_map)[source]

Create a new reverse map.

dataset_name_allowed(name)[source]

Datasets may only be created and accessed in the root level group.

Returns True is name is a path in the root group i.e. ‘/dataset’.

del_index_map(axis_name)[source]

Delete an index map.

del_reverse_map(axis_name)[source]

Delete an index map.

group_name_allowed(name)[source]

No groups are exposed to the user. Returns False.

property history

Stores the analysis history for this data.

Do not try to add a new entry by assigning to an element of this property. Use add_history() instead.

Returns

history – Each entry is a dictionary containing metadata about that stage in history. There is also an ‘order’ entry which specifies how the other entries are ordered in time.

Return type

read only dictionary

property index_map

Stores representions of the axes of datasets.

The index map contains arrays used to interpret the axes of the various datasets. For instance, the ‘time’, ‘prod’ and ‘freq’ axes of the visibilities are described in the index map.

Do not try to add a new index_map by assigning to an item of this property. Use create_index_map() instead.

Returns

index_map – Entries are 1D arrays used to interpret the axes of datasets.

Return type

read only dictionary

redistribute(dist_axis)[source]

Redistribute parallel datasets along a specified axis.

Parameters

dist_axis (int, string, or list of) – The axis can be specified by an integer index (positive or negative), or by a string label which must correspond to an entry in the axis attribute on the dataset. If a list is supplied, each entry is tried in turn, which allows different datasets to be redistributed along differently labelled axes.

property reverse_map

Stores the reverse map from product index to stack index.

Do not try to add a new index_map by assigning to an item of this property. Use create_index_map() instead.

Returns

reverse_map – Entry is a 1D arrays used to map from product index to stack index.

Return type

read only dictionary

class caput.memh5.MemAttrs[source]

Bases: dict

In memory implementation of the h5py.AttributeManager.

Currently just a normal dictionary.

class caput.memh5.MemDataset(**kwargs)[source]

Bases: caput.memh5._MemObjMixin

Base class for an in memory implementation of h5py.Dataset.

This is only an abstract base class. Use MemDatasetCommon or MemDatasetDistributed.

property attrs

Attributes attached to this object.

Returns

attrs

Return type

MemAttrs

property chunks

Chunk shape of the dataset.

Not implemented in base class.

property compression

Name or identifier of HDF5 compression filter for the dataset.

Not implemented in base class.

property compression_opts

Compression options for the dataset.

See HDF5 documentation for compression filters. Not implemented in base class.

property dtype

numpy data type of the dataset.

Not implemented in base class.

property shape

Shape of the dataset.

Not implemented in base class.

class caput.memh5.MemDatasetCommon(shape, dtype, chunks=None, compression=None, compression_opts=None, **kwargs)[source]

Bases: caput.memh5.MemDataset

In memory implementation of h5py.Dataset.

Inherits from MemDataset. Encapsulates a numpy array mocked up to look like an hdf5 dataset. Similar to h5py datasets, this implements slicing like a numpy array but as it is not actually a many operations won’t work (e.g. ufuncs).

Parameters
  • shape (tuple) – Shape of array to initialise.

  • dtype (numpy dtype) – Type of array to create.

property chunks

Chunk shape of the dataset.

Not implemented in base class.

property comm

Reference to the MPI communicator.

property compression

Name or identifier of HDF5 compression filter for the dataset.

Not implemented in base class.

property compression_opts

Compression options for the dataset.

See HDF5 documentation for compression filters. Not implemented in base class.

property dtype

numpy data type of the dataset.

Not implemented in base class.

classmethod from_numpy_array(data, chunks=None, compression=None, compression_opts=None, **kwargs)[source]

Initialise from a numpy array.

Parameters
  • data (np.ndarray) – Array to initialise from.

  • compression (str or int) – Name or identifier of HDF5 or Zarr compression filter.

  • compression_opts – See HDF5 and Zarr documentation for compression filters. Compression options for the dataset.

Returns

dset – Dataset encapsulating the numpy array.

Return type

MemDatasetCommon

property shape

Shape of the dataset.

Not implemented in base class.

class caput.memh5.MemDatasetDistributed(shape, dtype, axis=0, comm=None, chunks=None, compression=None, compression_opts=None, **kwargs)[source]

Bases: caput.memh5.MemDataset

Parallel, in-memory implementation of h5py.Dataset.

Inherits from MemDataset. Encapsulates an MPIArray mocked up to look like an h5py dataset. Similar to h5py datasets, this implements slicing like a numpy array but as it is not actually a many operations won’t work (e.g. ufuncs).

Parameters
  • shape (tuple) – Shape of array to initialise. This is the global shape.

  • dtype (numpy dtype) – Type of array to create.

  • axis (int, optional) – Index of axis to distribute the array over.

  • comm (MPI.Comm, optional) – MPI communicator to distribute over. If None use MPI.COMM_WORLD.

property chunks

The chunk shape of the dataset.

property comm

Reference to the MPI communicator.

property compression

Name or identifier of HDF5 compression filter for the dataset.

Not implemented in base class.

property compression_opts

Compression options for the dataset.

See HDF5 documentation for compression filters. Not implemented in base class.

property distributed_axis

The index of the axis over which this dataset is distributed.

property dtype

The numpy data type of the dataset

property global_shape

Global shape of the distributed dataset.

The shape of the whole array that is distributed between multiple nodes.

property local_shape

Local shape of the distributed dataset.

The shape of the part of the distributed array that is allocated to this node.

redistribute(axis)[source]

Change the axis that the dataset is distributed over.

Parameters

axis (integer) – Axis to distribute over.

property shape

Shape of the dataset.

Not implemented in base class.

class caput.memh5.MemDiskGroup(data_group=None, distributed=False, comm=None, file_format=None)[source]

Bases: caput.memh5._BaseGroup

Container whose data may either be stored on disk or in memory.

This container is intended to have the same basic API h5py.Group and MemGroup but whose underlying data could live either on disk or in memory.

Aside from providing a few convenience methods, this class isn’t that useful by itself. It is almost as easy to use h5py.Group or MemGroup directly. Where it becomes more useful is for creating more specialized data containers which can subclass this class. A basic but useful example is provided in BasicCont.

This class also supports the same distributed features as MemGroup, but only when wrapping that class. Attempting to create a distributed object wrapping a h5py.File object will raise an exception. For similar reasons, MemDiskGroup.to_disk() will not work, however, MemDiskGroup.save() will work fine.

Parameters
  • data_group (h5py.Group, MemGroup or string, optional) – Underlying h5py like data container where data will be stored. If a string, open a h5py file with that name. If not provided a new MemGroup instance will be created.

  • distributed (boolean, optional) – Allow the container to hold distributed datasets.

  • comm (MPI.Comm, optional) – MPI Communicator to distributed over. If not set, use MPI.COMM_WORLD.

  • detect_subclass (boolean, optional) – If data_group is specified, whether to inspect for a ‘__memh5_subclass’ attribute which specifies a subclass to return.

  • file_format (fileformats.FileFormat) – File format to use. File format will be guessed if not supplied. Default None.

close()[source]

Closes file if on disk if file was opened on initialization.

create_dataset(name, *args, **kwargs)[source]

Create and return a new dataset.

All parameters are passed through to the create_dataset() method of the underlying storage, whether it be an h5py.Group or a MemGroup.

create_group(name)[source]

Create and return a new group.

dataset_common_to_distributed(name, distributed_axis=0)[source]

Convert a common dataset to a distributed one.

Parameters
  • name (string) – Dataset name.

  • distributed_axis (int, optional) – Axis to distribute the data over.

Returns

dset

Return type

memh5.MemDatasetDistributed

dataset_distributed_to_common(name)[source]

Convert a distributed dataset to a common one.

Parameters

name (string) – Dataset name.

Returns

dset

Return type

memh5.MemDatasetCommon

static dataset_name_allowed(name)[source]

Used by subclasses to restrict creation of and access to datasets.

This method is called by create_dataset(), require_dataset(), and __getitem__() to check that the supplied group name is allowed.

The idea is that subclasses that want to specialize and restrict the layout of the data container can implement this method instead of re-implementing the above mentioned methods.

Parameters

name (string) – Absolute path to proposed dataset.

Returns

allowedTrue

Return type

bool

flush()[source]

Flush the buffers of the underlying hdf5 file if on disk.

classmethod from_file(file_, ondisk=False, distributed=False, comm=None, detect_subclass=True, convert_attribute_strings=None, convert_dataset_strings=None, file_format=None, **kwargs)[source]

Create data object from analysis hdf5 file, store in memory or on disk.

If ondisk is True, do not load into memory but store data in h5py objects that remain associated with the file on disk. This is almost identical to the default constructor, when providing a file as the data_group object, however provides more flexibility when opening the file through the additional keyword arguments.

This does not call __init__ on the subclass when restoring.

Parameters
  • file (string or h5py.Group object) – File with the hdf5 data. File must be compatible with memh5 objects.

  • ondisk (bool) – Whether the data should be stored in-place in file_ or should be copied into memory.

  • distributed (boolean, optional) – Allow the container to hold distributed datasets.

  • comm (MPI.Comm, optional) – MPI Communicator to distributed over. If not set, use MPI.COMM_WORLD.

  • detect_subclass (boolean, optional) – If data_group is specified, whether to inspect for a ‘__memh5_subclass’ attribute which specifies a subclass to return.

  • convert_attribute_strings (bool, optional) – Try and convert attribute string types to unicode. If not specified, look up the name as a class attribute to find a default, and otherwise use True.

  • convert_dataset_strings (bool, optional) – Try and convert dataset string types to unicode. If not specified, look up the name as a class attribute to find a default, and otherwise use False.

  • <axis_name>_sel (list or slice) – Axis selections can be given to only read a subset of the containers. A slice can be given, or a list of specific array indices for that axis.

  • file_format (fileformats.FileFormat) – File format to use. Default is None, i.e. guess from file name.

  • **kwargs (any other arguments) – Any additional keyword arguments are passed to h5py.File’s constructor if file_ is a filename and silently ignored otherwise.

classmethod from_group(data_group=None, detect_subclass=True)[source]

Create data object from a given group.

This wraps the given group object, optionally returning the correct subclass. This does not call __init__ on the subclass when this happens.

Parameters
  • data_group (h5py.Group, MemGroup or string, optional) – h5py like data containerto wrap.

  • detect_subclass (boolean, optional) – If data_group is specified, whether to inspect for a ‘__memh5_subclass’ attribute which specifies a subclass to return.

Returns

grp

Return type

MemDiskGroup

static group_name_allowed(name)[source]

Used by subclasses to restrict creation of and access to groups.

This method is called by create_group(), require_group(), and __getitem__() to check that the supplied group name is allowed.

The idea is that subclasses that want to specialize and restrict the layout of the data container can implement this method instead of re-implementing the above mentioned methods.

Parameters

name (string) – Absolute path to proposed group.

Returns

allowedTrue

Return type

bool

property ondisk

Whether the data is stored on disk as opposed to in memory.

save(filename, convert_attribute_strings=None, convert_dataset_strings=None, file_format=<class 'caput.fileformats.HDF5'>, **kwargs)[source]

Save data to hdf5/zarr file.

Parameters
  • filename (str) – Name of the file to save into.

  • convert_attribute_strings (bool, optional) – Try and convert attribute string types to a format HDF5 understands. If not specified, look up the name as a class attribute to find a default, and otherwise use True.

  • convert_dataset_strings (bool, optional) – Try and convert dataset string types to bytestrings before saving to HDF5. If not specified, look up the name as a class attribute to find a default, and otherwise use False.

  • file_format (fileformats.FileFormat) – File format to use. Default fileformats.HDF5.

  • **kwargs – Keyword arguments passed through to the file creating, e.g. mode.

to_disk(filename, file_format=<class 'caput.fileformats.HDF5'>, **kwargs)[source]

Return a version of this data that lives on disk.

Parameters
  • filename (str) – File name.

  • file_format (fileformats.FileFormat) – File format to use. Default fileformats.HDF5.

  • **kwargs – Keyword arguments passed through to the file creating, e.g. mode.

Return type

Instance of this data object that is written to disk.

to_memory()[source]

Return a version of this data that lives in memory.

class caput.memh5.MemGroup(distributed=False, comm=None)[source]

Bases: caput.memh5._BaseGroup

In memory implementation of the h5py.Group.

This class doubles as the memory implementation of h5py.File, object, since the distinction between a file and a group for in-memory data is moot.

Parameters
  • distributed (boolean, optional) – Allow memh5 object to hold distributed datasets.

  • comm (MPI.Comm, optional) – MPI Communicator to distributed over. If not set, use MPI.COMM_WORLD.

create_dataset(name, shape=None, dtype=None, data=None, distributed=False, distributed_axis=None, chunks=None, compression=None, compression_opts=None, **kwargs)[source]

Create a new dataset.

Parameters
  • name (string) – Dataset name.

  • shape (tuple, optional) – Shape tuple. This gives the global shape for a distributed dataset.

  • dtype (np.dtype, optional) – Numpy datatype of the dataset.

  • data (np.ndarray or MPIArray, optional) – Data array to initialise from. Uses a view of the original where possible.

  • distributed (boolean, optional) – Create a distributed dataset or not.

  • distributed_axis (int, optional) – Axis to distribute the data over. If specified with initialisation data this will cause create a copy with the correct distribution.

  • compression (str or int) – Name or identifier of HDF5 or Zarr compression filter.

  • compression_opts – See HDF5 and Zarr documentation for compression filters. Compression options for the dataset.

Returns

dset

Return type

memh5.MemDataset

create_group(name)[source]

Create a group within the storage tree.

dataset_common_to_distributed(name, distributed_axis=0)[source]

Convert a common dataset to a distributed one.

Parameters
  • name (string) – Dataset name.

  • distributed_axis (int, optional) – Axis to distribute the data over.

Returns

dset

Return type

memh5.MemDatasetDistributed

dataset_distributed_to_common(name)[source]

Convert a distributed dataset to a common one.

Parameters

name (string) – Dataset name.

Returns

dset

Return type

memh5.MemDatasetCommon

classmethod from_file(filename, distributed=False, hints=True, comm=None, selections=None, convert_dataset_strings=False, convert_attribute_strings=True, file_format=None, **kwargs)[source]

Create a new instance by copying from a file group.

Any keyword arguments are passed on to the constructor for h5py.File or zarr.File.

Parameters
  • filename (string) – Name of file to load.

  • distributed (boolean, optional) – Whether to load file in distributed mode.

  • hints (boolean, optional) – If in distributed mode use hints to determine whether datasets are distributed or not.

  • comm (MPI.Comm, optional) – MPI communicator to distributed over. If None use MPI.COMM_WORLD.

  • selections (dict) – If this is not None, it should map dataset names to axis selections as valid numpy indexes.

  • convert_attribute_strings (bool, optional) – Try and convert attribute string types to unicode. Default is True.

  • convert_dataset_strings (bool, optional) – Try and convert dataset string types to unicode. Default is False.

  • file_format (fileformats.FileFormat, optional) – File format to use. Default is None, i.e. guess from the name.

Returns

group – Root group of loaded file.

Return type

memh5.Group

classmethod from_group(group)[source]

Create a new instance by deep copying an existing group.

Agnostic as to whether the group to be copied is a MemGroup or an h5py.Group (which includes h5py.File and zarr.File objects).

classmethod from_hdf5(filename, distributed=False, hints=True, comm=None, selections=None, convert_dataset_strings=False, convert_attribute_strings=True, **kwargs)[source]

Create a new instance by copying from an hdf5 group.

Any keyword arguments are passed on to the constructor for h5py.File.

Parameters
  • filename (string) – Name of file to load.

  • distributed (boolean, optional) – Whether to load file in distributed mode.

  • hints (boolean, optional) – If in distributed mode use hints to determine whether datasets are distributed or not.

  • comm (MPI.Comm, optional) – MPI communicator to distributed over. If None use MPI.COMM_WORLD.

  • selections (dict) – If this is not None, it should map dataset names to axis selections as valid numpy indexes.

  • convert_attribute_strings (bool, optional) – Try and convert attribute string types to unicode. Default is True.

  • convert_dataset_strings (bool, optional) – Try and convert dataset string types to unicode. Default is False.

Returns

group – Root group of loaded file.

Return type

memh5.Group

property mode

String indicating if group is readonly (“r”) or read-write (“r+”).

MemGroup is always read-write.

to_file(filename, mode='w', hints=True, convert_attribute_strings=True, convert_dataset_strings=False, file_format=None, **kwargs)[source]

Replicate object on disk in an hdf5 or zarr file.

Any keyword arguments are passed on to the constructor for h5py.File or zarr.File.

Parameters
  • filename (str) – File to save into.

  • hints (boolean, optional) – Whether to write hints into the file that described whether datasets are distributed, or not.

  • convert_attribute_strings (bool, optional) – Try and convert attribute string types to a unicode type that HDF5 understands. Default is True.

  • convert_dataset_strings (bool, optional) – Try and convert dataset string types to bytestrings. Default is False.

  • file_format (fileformats.FileFormat, optional) – File format to use. Default is None, i.e. guess from the name.

to_hdf5(filename, mode='w', hints=True, convert_attribute_strings=True, convert_dataset_strings=False, **kwargs)[source]

Replicate object on disk in an hdf5 file.

Any keyword arguments are passed on to the constructor for h5py.File.

Parameters
  • filename (str) – File to save into.

  • hints (boolean, optional) – Whether to write hints into the file that described whether datasets are distributed, or not.

  • convert_attribute_strings (bool, optional) – Try and convert attribute string types to a unicode type that HDF5 understands. Default is True.

  • convert_dataset_strings (bool, optional) – Try and convert dataset string types to bytestrings. Default is False.

caput.memh5.attrs2dict(attrs)[source]

Safely copy an h5py attributes object to a dictionary.

caput.memh5.bytes_to_unicode(s)[source]

Ensure that a string (or collection of) are unicode.

Any byte strings found will be transformed into unicode. Standard collections are processed recursively. Numpy arrays of byte strings are converted. Any other types are returned as is.

Note that as HDF5 files will often contain ASCII strings which h5py converts to byte strings this will be needed even when fully transitioned to Python 3.

Parameters

s (object) – Object to convert.

Returns

u – Converted object.

Return type

object

caput.memh5.check_unicode(dset)[source]

Test if dataset contains unicode so we can raise an appropriate error.

If there is no unicode, return the data from the array.

Parameters

dset (MemDataset) – Dataset to check.

Returns

The converted array. If no conversion was required, just returns arr.

Return type

dset

caput.memh5.copyattrs(a1, a2, convert_strings=False)[source]

Copy attributes from one h5py/zarr/memh5 attribute object to another.

Parameters
  • a1 (h5py/zarr/memh5 object) – Attributes to copy from.

  • a1 – Attributes to copy into.

  • convert_strings (bool, optional) – Convert string attributes (or lists/arrays of them) to ensure that they are unicode.

caput.memh5.deep_group_copy(g1, g2, selections=None, convert_dataset_strings=False, convert_attribute_strings=True, file_format=<class 'caput.fileformats.HDF5'>, skip_distributed=False, postprocess=None)[source]

Copy full data tree from one group to another.

Copies from g1 to g2. An axis downselection can be specified by supplying the parameter ‘selections’. For example to select the first two indexes in g1[“foo”][“bar”], do

>>> g1 = MemGroup()
>>> foo = g1.create_group("foo")
>>> ds = foo.create_dataset(name="bar", data=np.arange(3))
>>> g2 = MemGroup()
>>> deep_group_copy(g1, g2, selections={"foo/bar": slice(2)})
>>> list(g2["foo"]["bar"])
[0, 1]
Parameters
  • g1 (h5py.Group or zarr.Group) – Deep copy from this group.

  • g2 (h5py.Group or zarr.Group) – Deep copy to this group.

  • selections (dict) – If this is not None, it should have a subset of the same hierarchical structure as g1, but ultimately describe axis selections for group entries as valid numpy indexes.

  • convert_attribute_strings (bool, optional) – Convert string attributes (or lists/arrays of them) to ensure that they are unicode.

  • convert_dataset_strings (bool, optional) – Convert strings within datasets to ensure that they are unicode.

  • file_format (fileformats.FileFormat) – File format to use. Default fileformats.HDF5.

  • skip_distributed (bool, optional) – If True skip the write for any distributed dataset, and return a list of the names of all datasets that were skipped. If False (default) throw a ValueError if any distributed datasets are encountered.

  • postprocess (function, optional) – A function that takes is called on each node, with the source and destination entries, and can modify either.

Returns

distributed_dataset_names – Names of the distributed datasets if skip_distributed is True. Otherwise None is returned.

Return type

list

caput.memh5.dtype_to_bytestring(dt)[source]

Convert unicode strings in a dtype to byte strings.

This will attempt to parse a numpy dtype and convert strings to bytes.

Warning

Custom alignment will not be preserved in these type conversions as the byte and unicode string types are of different sizes.

Parameters

dt (np.dtype) – Data type to convert.

Returns

new_dt – A new datatype with the converted string type.

Return type

np.dtype

caput.memh5.dtype_to_unicode(dt)[source]

Convert byte strings in a dtype to unicode.

This will attempt to parse a numpy dtype and convert strings to unicode.

Warning

Custom alignment will not be preserved in these type conversions as the byte and unicode string types are of different sizes.

Parameters

dt (np.dtype) – Data type to convert.

Returns

new_dt – A new datatype with the converted string type.

Return type

np.dtype

caput.memh5.ensure_bytestring(arr)[source]

If needed convert the array to contain bytestrings not unicode.

Parameters

arr (np.ndarray) – Input array.

Returns

arr_conv – The converted array. If no conversion was required, just returns arr.

Return type

np.ndarray

caput.memh5.ensure_unicode(arr)[source]

If needed convert the array to contain unicode strings not bytestrings.

Parameters

arr (np.ndarray) – Input array.

Returns

arr_conv – The converted array. If no conversion was required, just returns arr.

Return type

np.ndarray

caput.memh5.format_abs_path(path)[source]

Return absolute path string, formated without any extra ‘/’s.

caput.memh5.get_file(f, file_format=None, **kwargs)[source]

Checks if input is a zarr/h5py.File or filename and returns the former.

Parameters
  • f (h5py/zarr Group or filename string) –

  • file_format (fileformats.FileFormat) – File format to use. File format will be guessed if not supplied. Default None.

  • **kwargs (all keyword arguments) – Passed to h5py.File constructor or zarr.open_group. If f is already an open file, silently ignores all keywords.

Returns

  • f (hdf5 or zarr group)

  • opened (bool) – Whether the a file was opened or not (i.e. was already open).

caput.memh5.get_h5py_File(f, **kwargs)[source]

Convenience function in order to not break old functionality.

caput.memh5.has_bytestring(dt)[source]

Test if data type contains any unicode fields.

See has_kind.

caput.memh5.has_kind(dt, kind)[source]

Test if a numpy datatype has any fields of a specified type.

Parameters
  • dt (np.dtype) – Data type to convert.

  • kind (str) – Numpy type code character. e.g. “S” for bytestring and “U” for unicode.

Returns

has_kind – True if it contains the requested kind.

Return type

bool

caput.memh5.has_unicode(dt)[source]

Test if data type contains any unicode fields.

See has_kind.

caput.memh5.is_group(obj)[source]

Check if the object is a Group, which includes File objects.

In most cases, if it isn’t a Group it’s a Dataset, so this can be used to check for Datasets as well.

class caput.memh5.ro_dict(d=None)[source]

Bases: collections.abc.Mapping

A dict that is read-only to the user.

This class isn’t strictly read-only but it cannot be modified through the traditional dict interface. This prevents the user from mistaking this for a normal dictionary.

Provides the same interface for reading as the builtin python dict but no methods for writing.

Parameters

d (dict) – Initial data for the new dictionary.