pyiron_base.storage.hdfio module

Classes to map the Python objects to HDF5 data structures

class pyiron_base.storage.hdfio.DummyHDFio(project, h5_path: str, cont: dict | None = None, root=None)

Bases: HasGroups

A dummy ProjectHDFio implementation to serialize objects into a dict instead of a HDF5 file.

It is modeled after ProjectHDFio, but supports just enough methods to successfully write objects.

After all desired objects have been written to it, you may extract a pure dict from with with .to_dict.

A simple example for storing data containers:

>>> from pyiron_base import DataContainer, Project
>>> pr = Project(...)
>>> hdf = DummyHDFio(pr, '/', {})
>>> d = DataContainer({'a': 42, 'b':{'c':4, 'g':33}})
>>> d.to_hdf(hdf)
>>> hdf.to_dict()
{'READ_ONLY': False,
 'a__index_0': 42,
 'b__index_1': {
     'READ_ONLY': False,
     'c__index_0': 4,
     'g__index_1': 33,
     'NAME': 'DataContainer',
     'TYPE': "<class
     'pyiron_base.storage.datacontainer.DataContainer'>",
     'OBJECT': 'DataContainer',
     'VERSION': '0.1.0',
     'HDF_VERSION': '0.2.0'
 },
 'NAME': 'DataContainer',
 'TYPE': "<class
 'pyiron_base.storage.datacontainer.DataContainer'>",
 'OBJECT': 'DataContainer',
 'VERSION': '0.1.0',
 'HDF_VERSION': '0.2.0'}
close() DummyHDFio

Surface from a sub group.

If this object was not returned from a previous call to open() it returns itself silently.

create_group(name: str)

Create a new sub group.

Parameters:

name (str) – name of the new group

get(key, default=None)

Internal wrapper function for __getitem__() - self[name]

Parameters:
  • key (str, slice) – path to the data or key of the data object

  • default (object) – default value to return if key doesn’t exist

Returns:

data or data object

Return type:

dict, list, float, int

property h5_path
open(name: str) DummyHDFio

Descend into a sub group.

If name does not exist yet, create a new group. Calling close() on the returned object returns this object.

Parameters:

name (str) – name of sub group

Returns:

sub group

Return type:

GenericStorage

property project
to_dict() dict
to_object(class_name=None, **kwargs)

Load the full pyiron object from an HDF5 file

Parameters:
  • class_name (str, optional) – if the ‘TYPE’ node is not available in the HDF5 file a manual object type can be set, must be as reported by str(type(obj))

  • **kwargs – optional parameters optional parameters to override init parameters

Returns:

pyiron object of the given class_name

class pyiron_base.storage.hdfio.FileHDFio(file_name, h5_path='/', mode='a')

Bases: HasGroups, Pointer

Class that provides all info to access a h5 file. This class is based on h5io.py, which allows to get and put a large variety of jobs to/from h5

Implements HasGroups. Groups are HDF groups in the file, nodes are HDF datasets.

Parameters:
  • file_name (str) – absolute path of the HDF5 file

  • h5_path (str) – absolute path inside the h5 path - starting from the root group

  • mode (str) – mode : {‘a’, ‘w’, ‘r’, ‘r+’}, default ‘a’ See HDFStore docstring or tables.open_file for info about modes

file_name
absolute path to the HDF5 file
h5_path
path inside the HDF5 file - also stored as absolute path
history
previously opened groups / folders
file_exists
boolean if the HDF5 was already written
base_name
name of the HDF5 file but without any file extension
file_path
directory where the HDF5 file is located
is_root
boolean if the HDF5 object is located at the root level of the HDF5 file
is_open
boolean if the HDF5 file is currently opened - if an active file handler exists
is_empty
boolean if the HDF5 file is empty
property base_name

Name of the HDF5 file - but without the file extension .h5

Returns:

file name without the file extension

Return type:

str

close()

Close the current HDF5 path and return to the path before the last open

copy()

Copy the Python object which links to the HDF5 file - in contrast to copy_to() which copies the content of the HDF5 file to a new location.

Returns:

New FileHDFio object pointing to the same HDF5 file

Return type:

FileHDFio

create_group(name, track_order=False)

Create an HDF5 group - similar to a folder in the filesystem - the HDF5 groups allow the users to structure their data.

Parameters:
  • name (str) – name of the HDF5 group

  • track_order (bool) – if False this groups tracks its elements in alphanumeric order, if True in insertion order

Returns:

FileHDFio object pointing to the new group

Return type:

FileHDFio

create_project_from_hdf5()

Internal function to create a pyiron project pointing to the directory where the HDF5 file is located.

Returns:

pyiron project object

Return type:

Project

property file_path

Path where the HDF5 file is located - posixpath.dirname()

Returns:

HDF5 file location

Return type:

str

get(key, default=None)

Internal wrapper function for __getitem__() - self[name]

Parameters:
  • key (str, slice) – path to the data or key of the data object

  • default (object) – default value to return if key doesn’t exist

Returns:

data or data object

Return type:

dict, list, float, int

get_from_table(path, name)

Get a specific value from a pandas.Dataframe

Parameters:
  • path (str) – relative path to the data object

  • name (str) – parameter key

Returns:

the value associated to the specific parameter key

Return type:

dict, list, float, int

get_pandas(name)

Load a dictionary from the HDF5 file and display the dictionary as pandas Dataframe

Parameters:

name (str) – HDF5 node name

Returns:

The dictionary is returned as pandas.Dataframe object

Return type:

pandas.Dataframe

get_size(hdf)

Get size of the groups inside the HDF5 file

Parameters:

hdf (FileHDFio) – hdf file

Returns:

file size in Bytes

Return type:

float

groups()

Filter HDF5 file by groups

Returns:

an HDF5 file which is filtered by groups

Return type:

FileHDFio

hd_copy(hdf_old, hdf_new, exclude_groups=None, exclude_nodes=None)
Parameters:
  • hdf_old (ProjectHDFio) – old hdf

  • hdf_new (ProjectHDFio) – new hdf

  • exclude_groups (list/None) – list of groups to delete

  • exclude_nodes (list/None) – list of nodes to delete

items()

List all keys and values as items of all groups and nodes of the HDF5 file

Returns:

list of sets (key, value)

Return type:

list

keys()

List all groups and nodes of the HDF5 file - where groups are equivalent to directories and nodes to files.

Returns:

all groups and nodes

Return type:

list

list_dirs()

equivalent to os.listdirs (consider groups as equivalent to dirs)

Returns:

list of groups in pytables for the path self.h5_path

Return type:

(list)

listdirs()

equivalent to os.listdirs (consider groups as equivalent to dirs)

Returns:

list of groups in pytables for the path self.h5_path

Return type:

(list)

nodes()

Filter HDF5 file by nodes

Returns:

an HDF5 file which is filtered by nodes

Return type:

FileHDFio

open(h5_rel_path)

Create an HDF5 group and enter this specific group. If the group exists in the HDF5 path only the h5_path is set correspondingly otherwise the group is created first.

Parameters:

h5_rel_path (str) – relative path from the current HDF5 path - h5_path - to the new group

Returns:

FileHDFio object pointing to the new group

Return type:

FileHDFio

put(key, value)

Store data inside the HDF5 file

Parameters:
  • key (str) – key to store the data

  • value (pandas.DataFrame, pandas.Series, dict, list, float, int) – basically any kind of data is supported

read_dict_from_hdf(group_paths=[], recursive=False)

Read data from HDF5 file into a dictionary - by default only the nodes are converted to dictionaries, additional sub groups can be specified using the group_paths parameter.

Parameters:
  • group_paths (list) – list of additional groups to be included in the dictionary, for example: [“input”, “output”, “output/generic”] These groups are defined relative to the h5_path.

  • recursive (bool) – Load all subgroups recursively

Returns:

The loaded data. Can be of any type supported by write_hdf5.

Return type:

dict

remove_file()

Remove the HDF5 file with all the related content

remove_group()

Remove an HDF5 group - if it exists. If the group does not exist no error message is raised.

rewrite_hdf5(job_name=None, info=False, exclude_groups=None, exclude_nodes=None)

Rewrite the entire hdf file.

Parameters:

info (True/False) – whether to give the information on how much space has been saved

show_hdf()

Iterating over the HDF5 datastructure and generating a human readable graph.

values()

List all values for all groups and nodes of the HDF5 file

Returns:

list of all values

Return type:

list

write_dict_to_hdf(data_dict)

Write a dictionary to HDF5

Parameters:

data_dict (dict) – dictionary with objects which should be written to HDF5

class pyiron_base.storage.hdfio.ProjectHDFio(project, file_name, h5_path=None, mode=None)

Bases: FileHDFio

The ProjectHDFio class connects the FileHDFio and the Project class, it is derived from the FileHDFio class but in addition the a project object instance is located at self.project enabling direct access to the database and other project related functionality, some of which are mapped to the ProjectHDFio class as well.

Parameters:
  • project (Project) – pyiron Project the current HDF5 project is located in

  • file_name (str) – name of the HDF5 file - in contrast to the FileHDFio object where file_name represents the absolute path of the HDF5 file.

  • h5_path (str) – absolute path inside the h5 path - starting from the root group

  • mode (str) – mode : {‘a’, ‘w’, ‘r’, ‘r+’}, default ‘a’ See HDFStore docstring or tables.open_file for info about modes

.. attribute:: project

Project instance the ProjectHDFio object is located in

.. attribute:: root_path

the pyiron user directory, defined in the .pyiron configuration

.. attribute:: project_path

the relative path of the current project / folder starting from the root path of the pyiron user directory

.. attribute:: path

the absolute path of the current project / folder plus the absolute path in the HDF5 file as one path

.. attribute:: file_name

absolute path to the HDF5 file

.. attribute:: h5_path

path inside the HDF5 file - also stored as absolute path

.. attribute:: history

previously opened groups / folders

.. attribute:: file_exists

boolean if the HDF5 was already written

.. attribute:: base_name

name of the HDF5 file but without any file extension

.. attribute:: file_path

directory where the HDF5 file is located

.. attribute:: is_root

boolean if the HDF5 object is located at the root level of the HDF5 file

.. attribute:: is_open

boolean if the HDF5 file is currently opened - if an active file handler exists

.. attribute:: is_empty

boolean if the HDF5 file is empty

.. attribute:: user

current unix/linux/windows user who is running pyiron

.. attribute:: sql_query

an SQL query to limit the jobs within the project to a subset which matches the SQL query.

.. attribute:: db

connection to the SQL database

.. attribute:: working_directory

working directory of the job is executed in - outside the HDF5 file

property base_name

The absolute path to of the current pyiron project - absolute path on the file system, not including the HDF5 path.

Returns:

current project path

Return type:

str

copy()

Copy the ProjectHDFio object - copying just the Python object but maintaining the same pyiron path

Returns:

copy of the ProjectHDFio object

Return type:

ProjectHDFio

create_hdf(path, job_name)

Create an ProjectHDFio object to store project related information - for testing aggregated data

Parameters:
  • path (str) – absolute path

  • job_name (str) – name of the HDF5 container

Returns:

HDF5 object

Return type:

ProjectHDFio

create_project_from_hdf5()

Internal function to create a pyiron project pointing to the directory where the HDF5 file is located.

Returns:

pyiron project object

Return type:

Project

create_working_directory()

Create the working directory on the file system if it does not exist already.

property db

Get connection to the SQL database

Returns:

database conncetion

Return type:

DatabaseAccess

get_job_id(job_specifier)

get the job_id for job named job_name in the local project path from database

Parameters:

job_specifier (str, int) – name of the job or job ID

Returns:

job ID of the job

Return type:

int

inspect(job_specifier)

Inspect an existing pyiron object - most commonly a job - from the database

Parameters:

job_specifier (str, int) – name of the job or job ID

Returns:

Access to the HDF5 object - not a GenericJob object - use load() instead.

Return type:

JobCore

load(job_specifier, convert_to_object=True)

Load an existing pyiron object - most commonly a job - from the database

Parameters:
  • job_specifier (str, int) – name of the job or job ID

  • convert_to_object (bool) – convert the object to an pyiron object or only access the HDF5 file - default=True accessing only the HDF5 file is about an order of magnitude faster, but only provides limited functionality. Compare the GenericJob object to JobCore object.

Returns:

Either the full GenericJob object or just a reduced JobCore object

Return type:

GenericJob, JobCore

load_from_jobpath(job_id=None, db_entry=None, convert_to_object=True)

Internal function to load an existing job either based on the job ID or based on the database entry dictionary.

Parameters:
  • job_id (int) – Job ID - optional, but either the job_id or the db_entry is required.

  • db_entry (dict) – database entry dictionary - optional, but either the job_id or the db_entry is required.

  • convert_to_object (bool) – convert the object to an pyiron object or only access the HDF5 file - default=True accessing only the HDF5 file is about an order of magnitude faster, but only provides limited functionality. Compare the GenericJob object to JobCore object.

Returns:

Either the full GenericJob object or just a reduced JobCore object

Return type:

GenericJob, JobCore

property name
property path

Absolute path of the HDF5 group starting from the system root - combination of the absolute system path plus the absolute path inside the HDF5 file starting from the root group.

Returns:

absolute path

Return type:

str

property project

Get the project instance the ProjectHDFio object is located in

Returns:

pyiron project

Return type:

Project

property project_path

the relative path of the current project / folder starting from the root path of the pyiron user directory

Returns:

relative path of the current project / folder

Return type:

str

remove_job(job_specifier, _unprotect=False)

Remove a single job from the project based on its job_specifier - see also remove_jobs()

Parameters:
  • job_specifier (str, int) – name of the job or job ID

  • _unprotect (bool) – [True/False] delete the job without validating the dependencies to other jobs - default=False

property root_path

the pyiron user directory, defined in the .pyiron configuration

Returns:

pyiron user directory of the current project

Return type:

str

property sql_query

Get the SQL query for the project

Returns:

SQL query

Return type:

str

to_object(class_name=None, **kwargs)

Load the full pyiron object from an HDF5 file

Parameters:
  • class_name (str, optional) – if the ‘TYPE’ node is not available in the HDF5 file a manual object type can be set, must be as reported by str(type(obj))

  • **kwargs – optional parameters optional parameters to override init parameters

Returns:

pyiron object of the given class_name

property user

Get current unix/linux/windows user who is running pyiron

Returns:

username

Return type:

str

property working_directory

Get the working directory of the current ProjectHDFio object. The working directory equals the path but it is represented by the filesystem:

/absolute/path/to/the/file.h5/path/inside/the/hdf5/file

becomes:

/absolute/path/to/the/file_hdf5/path/inside/the/hdf5/file

Returns:

absolute path to the working directory

Return type:

str