pyiron_base.interfaces.has_hdf module

Interface for classes to serialize to HDF5.

class pyiron_base.interfaces.has_hdf.HasHDF

Bases: ABC

Mixin class for objects that can write themselves to HDF.

Subclasses must implement _from_hdf(), _to_hdf() and _get_hdf_group_name(). They may implement from_hdf_args().

from_hdf() and to_hdf() shall respect the given group_name in the following way. If either the argument or the method _get_hdf_group_name() returns not None they shall create a new subgroup in the given HDF object and then call _from_hdf() or _to_hdf() with this subgroup and afterwards call ProjectHDFio.close() on it. If both are None it shall pass the given HDF object unchanged.

Subclasses that need to read special arguments from HDF before an instance can be created, can overwrite from_hdf_args() and return the arguments in a dict that can be **kwargs-passed to the __init__ of the subclass. When loading an object with ProjectHDFio.to_object this method is called internally, used to create an instance on which then from_hdf() is called.

Subclasses may specify an __hdf_version__ to signal changes in the layout of the data in HDF. from_hdf() will read this value and pass it verbatim to the subclasses _from_hdf(). No semantics are imposed on this value, but it is usually a three digit version number.

Here’s a toy class that enables writting `list`s to HDF.

>>> class HDFList(list, HasHDF):
...     def _from_hdf(self, hdf, version=None):
...         values = []
...         for n in hdf.list_nodes():
...             if not n.startswith("__index_"): continue
...             values.append((int(n.split("__index_")[1]), hdf[n]))
...         values = sorted(values, key=lambda e: e[0])
...         self.clear()
...         self.extend(list(zip(*values))[1])
...     def _to_hdf(self, hdf):
...         for i, v in enumerate(self):
...             hdf[f"__index_{i}"] = v
...     def _get_hdf_group_name(self):
...         return "list"

We can use this simply like any other list, but also call the new HDF methods on it after we get an HDF object.

>>> l = HDFList([1,2,3,4])
>>> from pyiron_base import Project
>>> pr = Project('test_foo')
>>> hdf = pr.create_hdf(pr.path, 'list')

Since we return “list” in _get_hdf_group_name() by default our list gets written into a group of the same name.

>>> l.to_hdf(hdf)
>>> hdf
{'groups': ['list'], 'nodes': []}
>>> hdf['list']
{'groups': [], 'nodes': ['HDF_VERSION', 'NAME', 'OBJECT', 'TYPE', '__index_0', '__index_1', '__index_2', '__index_3']}

(Since this is a docstring, actually calling ProjectHDFio.to_object() wont’ work, so we’ll simulate it.)

>>> lcopy = HDFList()
>>> lcopy.from_hdf(hdf)
>>> lcopy
[1, 2, 3, 4]

We can also override the target group name by passing it >>> l.to_hdf(hdf, “my_group”) >>> hdf {‘groups’: [‘list’, ‘my_group’], ‘nodes’: []}

>>> hdf.remove_file()
>>> pr.remove(enable=True)

When using this class as a mixin that also derives from classes having a legacy implementation here’s a simple recipe

>>> class MyOldClass:
...     def to_hdf(self, hdf, group_name):
...         ... # whatever you need to save
...     def from_hdf(self, hdf, group_name):
...         ... # whatever you need to restore
>>> class MyDerivedClass(MyOldClass, HasHDF):
...     def to_hdf(self, hdf, group_name):
...         MyOldClass.to_hdf(self, hdf=hdf, group_name=group_name)
...         HasHDF.to_hdf(self, hdf=hdf, group_name=group_name)
...     def from_hdf(self, hdf, group_name):
...         MyOldClass.from_hdf(self, hdf=hdf, group_name=group_name)
...         HasHDF.to_hdf(self, hdf=hdf, group_name=group_name)

i.e. explicitly call both methods with the same group_name. The call to HasHDF.to_hdf() has to be last so that the type information is consistently written to HDF.

If you’re deriving from GenericJob it will already take care of descending into group_name, so you can pass “” as the group_name like so

>>> from pyiron_base import GenericJob
>>> class MyHybridJob(GenericJob, HasHDF):
...     def to_hdf(self, hdf, group_name):
...         GenericJob.to_hdf(self, hdf=hdf, group_name=group_name)
...         HasHDF.to_hdf(self, hdf=self.project_hdf5, group_name="")
...     def from_hdf(self, hdf, group_name):
...         MyOldClass.from_hdf(self, hdf=hdf, group_name=group_name)
...         HasHDF.to_hdf(self, hdf=self.project_hdf5, group_name="")
from_hdf(hdf: ProjectHDFio, group_name: str = None)

Read object to HDF.

If group_name is given descend into subgroup in hdf first.

Parameters:
  • hdf (ProjectHDFio) – HDF group to read from

  • group_name (str, optional) – name of subgroup

classmethod from_hdf_args(hdf: ProjectHDFio) dict

Read arguments for instance creation from HDF5 file.

Parameters:

hdf (ProjectHDFio) – HDF5 group object

Returns:

arguments that can be **kwarg-passed to cls().

Return type:

dict

rewrite_hdf(hdf: ProjectHDFio, group_name: str = None)

Update the HDF representation.

If an object is read from an older layout, this will remove the old data and rewrite it in the newest layout.

Parameters:
  • hdf (ProjectHDFio) – HDF group to read/write

  • group_name (str, optional) – name of subgroup

to_hdf(hdf: ProjectHDFio, group_name: str = None)

Write object to HDF.

If group_name is given create a subgroup in hdf first.

Parameters:
  • hdf (ProjectHDFio) – HDF group to write to

  • group_name (str, optional) – name of subgroup