pyiron_base.storage.hdfio module
Classes to map the Python objects to HDF5 data structures
- class pyiron_base.storage.hdfio.DummyHDFio(project, h5_path: str, cont: dict | None = None, root=None)
Bases:
HasGroups
A dummy ProjectHDFio implementation to serialize objects into a dict instead of a HDF5 file.
It is modeled after ProjectHDFio, but supports just enough methods to successfully write objects.
After all desired objects have been written to it, you may extract a pure dict from with with .to_dict.
A simple example for storing data containers:
>>> from pyiron_base import DataContainer, Project >>> pr = Project(...) >>> hdf = DummyHDFio(pr, '/', {}) >>> d = DataContainer({'a': 42, 'b':{'c':4, 'g':33}}) >>> d.to_hdf(hdf) >>> hdf.to_dict() {'READ_ONLY': False, 'a__index_0': 42, 'b__index_1': { 'READ_ONLY': False, 'c__index_0': 4, 'g__index_1': 33, 'NAME': 'DataContainer', 'TYPE': "<class 'pyiron_base.storage.datacontainer.DataContainer'>", 'OBJECT': 'DataContainer', 'VERSION': '0.1.0', 'HDF_VERSION': '0.2.0' }, 'NAME': 'DataContainer', 'TYPE': "<class 'pyiron_base.storage.datacontainer.DataContainer'>", 'OBJECT': 'DataContainer', 'VERSION': '0.1.0', 'HDF_VERSION': '0.2.0'}
- close() DummyHDFio
Surface from a sub group.
If this object was not returned from a previous call to
open()
it returns itself silently.
- create_group(name: str)
Create a new sub group.
- Parameters:
name (str) – name of the new group
- get(key, default=None)
Internal wrapper function for __getitem__() - self[name]
- Parameters:
key (str, slice) – path to the data or key of the data object
default (object) – default value to return if key doesn’t exist
- Returns:
data or data object
- Return type:
dict, list, float, int
- property h5_path
- open(name: str) DummyHDFio
Descend into a sub group.
If name does not exist yet, create a new group. Calling
close()
on the returned object returns this object.- Parameters:
name (str) – name of sub group
- Returns:
sub group
- Return type:
GenericStorage
- property project
- to_dict() dict
- to_object(class_name=None, **kwargs)
Load the full pyiron object from an HDF5 file
- Parameters:
class_name (str, optional) – if the ‘TYPE’ node is not available in the HDF5 file a manual object type can be set, must be as reported by str(type(obj))
**kwargs – optional parameters optional parameters to override init parameters
- Returns:
pyiron object of the given class_name
- class pyiron_base.storage.hdfio.FileHDFio(file_name, h5_path='/', mode='a')
Bases:
HasGroups
,Pointer
Class that provides all info to access a h5 file. This class is based on h5io.py, which allows to get and put a large variety of jobs to/from h5
Implements
HasGroups
. Groups are HDF groups in the file, nodes are HDF datasets.- Parameters:
file_name (str) – absolute path of the HDF5 file
h5_path (str) – absolute path inside the h5 path - starting from the root group
mode (str) – mode : {‘a’, ‘w’, ‘r’, ‘r+’}, default ‘a’ See HDFStore docstring or tables.open_file for info about modes
- file_name
- absolute path to the HDF5 file
- h5_path
- path inside the HDF5 file - also stored as absolute path
- history
- previously opened groups / folders
- file_exists
- boolean if the HDF5 was already written
- base_name
- name of the HDF5 file but without any file extension
- file_path
- directory where the HDF5 file is located
- is_root
- boolean if the HDF5 object is located at the root level of the HDF5 file
- is_open
- boolean if the HDF5 file is currently opened - if an active file handler exists
- is_empty
- boolean if the HDF5 file is empty
- property base_name
Name of the HDF5 file - but without the file extension .h5
- Returns:
file name without the file extension
- Return type:
str
- close()
Close the current HDF5 path and return to the path before the last open
- copy()
Copy the Python object which links to the HDF5 file - in contrast to copy_to() which copies the content of the HDF5 file to a new location.
- Returns:
New FileHDFio object pointing to the same HDF5 file
- Return type:
- create_group(name, track_order=False)
Create an HDF5 group - similar to a folder in the filesystem - the HDF5 groups allow the users to structure their data.
- Parameters:
name (str) – name of the HDF5 group
track_order (bool) – if False this groups tracks its elements in alphanumeric order, if True in insertion order
- Returns:
FileHDFio object pointing to the new group
- Return type:
- create_project_from_hdf5()
Internal function to create a pyiron project pointing to the directory where the HDF5 file is located.
- Returns:
pyiron project object
- Return type:
- property file_path
Path where the HDF5 file is located - posixpath.dirname()
- Returns:
HDF5 file location
- Return type:
str
- get(key, default=None)
Internal wrapper function for __getitem__() - self[name]
- Parameters:
key (str, slice) – path to the data or key of the data object
default (object) – default value to return if key doesn’t exist
- Returns:
data or data object
- Return type:
dict, list, float, int
- get_from_table(path, name)
Get a specific value from a pandas.Dataframe
- Parameters:
path (str) – relative path to the data object
name (str) – parameter key
- Returns:
the value associated to the specific parameter key
- Return type:
dict, list, float, int
- get_pandas(name)
Load a dictionary from the HDF5 file and display the dictionary as pandas Dataframe
- Parameters:
name (str) – HDF5 node name
- Returns:
The dictionary is returned as pandas.Dataframe object
- Return type:
pandas.Dataframe
- get_size(hdf)
Get size of the groups inside the HDF5 file
- Parameters:
hdf (FileHDFio) – hdf file
- Returns:
file size in Bytes
- Return type:
float
- groups()
Filter HDF5 file by groups
- Returns:
an HDF5 file which is filtered by groups
- Return type:
- hd_copy(hdf_old, hdf_new, exclude_groups=None, exclude_nodes=None)
- Parameters:
hdf_old (ProjectHDFio) – old hdf
hdf_new (ProjectHDFio) – new hdf
exclude_groups (list/None) – list of groups to delete
exclude_nodes (list/None) – list of nodes to delete
- items()
List all keys and values as items of all groups and nodes of the HDF5 file
- Returns:
list of sets (key, value)
- Return type:
list
- keys()
List all groups and nodes of the HDF5 file - where groups are equivalent to directories and nodes to files.
- Returns:
all groups and nodes
- Return type:
list
- list_dirs()
equivalent to os.listdirs (consider groups as equivalent to dirs)
- Returns:
list of groups in pytables for the path self.h5_path
- Return type:
(list)
- listdirs()
equivalent to os.listdirs (consider groups as equivalent to dirs)
- Returns:
list of groups in pytables for the path self.h5_path
- Return type:
(list)
- nodes()
Filter HDF5 file by nodes
- Returns:
an HDF5 file which is filtered by nodes
- Return type:
- open(h5_rel_path)
Create an HDF5 group and enter this specific group. If the group exists in the HDF5 path only the h5_path is set correspondingly otherwise the group is created first.
- Parameters:
h5_rel_path (str) – relative path from the current HDF5 path - h5_path - to the new group
- Returns:
FileHDFio object pointing to the new group
- Return type:
- put(key, value)
Store data inside the HDF5 file
- Parameters:
key (str) – key to store the data
value (pandas.DataFrame, pandas.Series, dict, list, float, int) – basically any kind of data is supported
- read_dict_from_hdf(group_paths=[], recursive=False)
Read data from HDF5 file into a dictionary - by default only the nodes are converted to dictionaries, additional sub groups can be specified using the group_paths parameter.
- Parameters:
group_paths (list) – list of additional groups to be included in the dictionary, for example: [“input”, “output”, “output/generic”] These groups are defined relative to the h5_path.
recursive (bool) – Load all subgroups recursively
- Returns:
The loaded data. Can be of any type supported by
write_hdf5
.- Return type:
dict
- remove_file()
Remove the HDF5 file with all the related content
- remove_group()
Remove an HDF5 group - if it exists. If the group does not exist no error message is raised.
- rewrite_hdf5(job_name=None, info=False, exclude_groups=None, exclude_nodes=None)
Rewrite the entire hdf file.
- Parameters:
info (True/False) – whether to give the information on how much space has been saved
- show_hdf()
Iterating over the HDF5 datastructure and generating a human readable graph.
- values()
List all values for all groups and nodes of the HDF5 file
- Returns:
list of all values
- Return type:
list
- write_dict_to_hdf(data_dict)
Write a dictionary to HDF5
- Parameters:
data_dict (dict) – dictionary with objects which should be written to HDF5
- class pyiron_base.storage.hdfio.ProjectHDFio(project, file_name, h5_path=None, mode=None)
Bases:
FileHDFio
The ProjectHDFio class connects the FileHDFio and the Project class, it is derived from the FileHDFio class but in addition the a project object instance is located at self.project enabling direct access to the database and other project related functionality, some of which are mapped to the ProjectHDFio class as well.
- Parameters:
project (Project) – pyiron Project the current HDF5 project is located in
file_name (str) – name of the HDF5 file - in contrast to the FileHDFio object where file_name represents the absolute path of the HDF5 file.
h5_path (str) – absolute path inside the h5 path - starting from the root group
mode (str) – mode : {‘a’, ‘w’, ‘r’, ‘r+’}, default ‘a’ See HDFStore docstring or tables.open_file for info about modes
- .. attribute:: project
Project instance the ProjectHDFio object is located in
- .. attribute:: root_path
the pyiron user directory, defined in the .pyiron configuration
- .. attribute:: project_path
the relative path of the current project / folder starting from the root path of the pyiron user directory
- .. attribute:: path
the absolute path of the current project / folder plus the absolute path in the HDF5 file as one path
- .. attribute:: file_name
absolute path to the HDF5 file
- .. attribute:: h5_path
path inside the HDF5 file - also stored as absolute path
- .. attribute:: history
previously opened groups / folders
- .. attribute:: file_exists
boolean if the HDF5 was already written
- .. attribute:: base_name
name of the HDF5 file but without any file extension
- .. attribute:: file_path
directory where the HDF5 file is located
- .. attribute:: is_root
boolean if the HDF5 object is located at the root level of the HDF5 file
- .. attribute:: is_open
boolean if the HDF5 file is currently opened - if an active file handler exists
- .. attribute:: is_empty
boolean if the HDF5 file is empty
- .. attribute:: user
current unix/linux/windows user who is running pyiron
- .. attribute:: sql_query
an SQL query to limit the jobs within the project to a subset which matches the SQL query.
- .. attribute:: db
connection to the SQL database
- .. attribute:: working_directory
working directory of the job is executed in - outside the HDF5 file
- property base_name
The absolute path to of the current pyiron project - absolute path on the file system, not including the HDF5 path.
- Returns:
current project path
- Return type:
str
- copy()
Copy the ProjectHDFio object - copying just the Python object but maintaining the same pyiron path
- Returns:
copy of the ProjectHDFio object
- Return type:
- create_hdf(path, job_name)
Create an ProjectHDFio object to store project related information - for testing aggregated data
- Parameters:
path (str) – absolute path
job_name (str) – name of the HDF5 container
- Returns:
HDF5 object
- Return type:
- create_project_from_hdf5()
Internal function to create a pyiron project pointing to the directory where the HDF5 file is located.
- Returns:
pyiron project object
- Return type:
- create_working_directory()
Create the working directory on the file system if it does not exist already.
- property db
Get connection to the SQL database
- Returns:
database conncetion
- Return type:
- get_job_id(job_specifier)
get the job_id for job named job_name in the local project path from database
- Parameters:
job_specifier (str, int) – name of the job or job ID
- Returns:
job ID of the job
- Return type:
int
- inspect(job_specifier)
Inspect an existing pyiron object - most commonly a job - from the database
- Parameters:
job_specifier (str, int) – name of the job or job ID
- Returns:
Access to the HDF5 object - not a GenericJob object - use load() instead.
- Return type:
- load(job_specifier, convert_to_object=True)
Load an existing pyiron object - most commonly a job - from the database
- Parameters:
job_specifier (str, int) – name of the job or job ID
convert_to_object (bool) – convert the object to an pyiron object or only access the HDF5 file - default=True accessing only the HDF5 file is about an order of magnitude faster, but only provides limited functionality. Compare the GenericJob object to JobCore object.
- Returns:
Either the full GenericJob object or just a reduced JobCore object
- Return type:
- load_from_jobpath(job_id=None, db_entry=None, convert_to_object=True)
Internal function to load an existing job either based on the job ID or based on the database entry dictionary.
- Parameters:
job_id (int) – Job ID - optional, but either the job_id or the db_entry is required.
db_entry (dict) – database entry dictionary - optional, but either the job_id or the db_entry is required.
convert_to_object (bool) – convert the object to an pyiron object or only access the HDF5 file - default=True accessing only the HDF5 file is about an order of magnitude faster, but only provides limited functionality. Compare the GenericJob object to JobCore object.
- Returns:
Either the full GenericJob object or just a reduced JobCore object
- Return type:
- property name
- property path
Absolute path of the HDF5 group starting from the system root - combination of the absolute system path plus the absolute path inside the HDF5 file starting from the root group.
- Returns:
absolute path
- Return type:
str
- property project
Get the project instance the ProjectHDFio object is located in
- Returns:
pyiron project
- Return type:
- property project_path
the relative path of the current project / folder starting from the root path of the pyiron user directory
- Returns:
relative path of the current project / folder
- Return type:
str
- remove_job(job_specifier, _unprotect=False)
Remove a single job from the project based on its job_specifier - see also remove_jobs()
- Parameters:
job_specifier (str, int) – name of the job or job ID
_unprotect (bool) – [True/False] delete the job without validating the dependencies to other jobs - default=False
- property root_path
the pyiron user directory, defined in the .pyiron configuration
- Returns:
pyiron user directory of the current project
- Return type:
str
- property sql_query
Get the SQL query for the project
- Returns:
SQL query
- Return type:
str
- to_object(class_name=None, **kwargs)
Load the full pyiron object from an HDF5 file
- Parameters:
class_name (str, optional) – if the ‘TYPE’ node is not available in the HDF5 file a manual object type can be set, must be as reported by str(type(obj))
**kwargs – optional parameters optional parameters to override init parameters
- Returns:
pyiron object of the given class_name
- property user
Get current unix/linux/windows user who is running pyiron
- Returns:
username
- Return type:
str
- property working_directory
Get the working directory of the current ProjectHDFio object. The working directory equals the path but it is represented by the filesystem:
/absolute/path/to/the/file.h5/path/inside/the/hdf5/file
- becomes:
/absolute/path/to/the/file_hdf5/path/inside/the/hdf5/file
- Returns:
absolute path to the working directory
- Return type:
str