HDF5 Serialization Architecture

Structure

Each hierachical object lives under its own group in the hdf, i.e. objects that are attributes of another must have their own sub-group in that larger objects group. In its group each object must store

  • ‘TYPE’ equal to str(type(self)) this provides the module path and class name from which pyiron will load a class

  • ‘NAME’ equal to type(self).__name__ the unqualified class name, informational only

They may also store
  • ‘HDF_VERSION’ equal to a version string with format MAJOR.MINOR.PATCH the version of the structure of the type in HDF5; all classes must be able to read from HDF5 with at least the same MAJOR release, but explicit breaking behaviour should be very rare

  • ‘VERSION’ equal to a version string with format MAJOR.MINOR.PATCH the version of the functionality of the class; higher version must not change the HDF5 structure unless they also change HDF_VERSION

For example a class defined like this .. code-block:

class Foo:
    def __init__(self, parameter):
        self.bar = Bar()
        self.baz = Baz()
        self.parameter = parameter

should be serialized as .. code-block:

foo/
foo/TYPE
foo/NAME
foo/VERSION
foo/HDF_VERSION
foo/parameter
foo/bar/
foo/bar/TYPE
foo/bar/NAME
foo/bar/VERSION
foo/bar/HDF_VERSION
foo/baz/
foo/baz/TYPE
foo/baz/NAME
foo/baz/VERSION
foo/baz/HDF_VERSION

Writing to HDF5

Each type must define a to_hdf(self, hdf, group_name = None) method that takes the given hdf object, creates a subgroup called group_name in it (if given) and then serializes itself to this group. Some objects may keep a default ProjectHDFio object during their lifetime (e.g. jobs), in this case hdf maybe an optional parameter.

Reading from HDF5

Each type must define a from_hdf(self, hdf, group_name = None) method and may define a from_hdf_args(cls, hdf). from_hdf() restores the state of the already initialized object from the information stored in the HDF5 file. from_hdf_args() reads the required parameters to instantiate the object from HDF5 and returns them in a dict.

To read an object from a given ProjectHDFio path, call the to_object() method. This will first call import_class to read the class object, then make_from_hdf() to instantiate it, if the class defines from_hd_args() it will be called to supply the correct init parameters. to_object() can also be supplied with additional paraters to overrride the ones written to HDF5, in particular it will always provide job_name and project. However only those parameters that are needed (i.e. declared by that classes’ __init__()) will be passed.