pyiron_base.storage.hdfio.ProjectHDFio

Contents

pyiron_base.storage.hdfio.ProjectHDFio#

class pyiron_base.storage.hdfio.ProjectHDFio(project: pyiron_base.project.generic.Project, file_name: str, h5_path: str | None = None, mode: str | None = None)[source]#

Bases: FileHDFio, BaseHDFio

The ProjectHDFio class connects the FileHDFio and the Project class, it is derived from the FileHDFio class but in addition the a project object instance is located at self.project enabling direct access to the database and other project related functionality, some of which are mapped to the ProjectHDFio class as well.

Parameters:
  • project (Project) – pyiron Project the current HDF5 project is located in

  • file_name (str) – name of the HDF5 file - in contrast to the FileHDFio object where file_name represents the absolute path of the HDF5 file.

  • h5_path (str) – absolute path inside the h5 path - starting from the root group

  • mode (str) – mode : {‘a’, ‘w’, ‘r’, ‘r+’}, default ‘a’ See HDFStore docstring or tables.open_file for info about modes

.. attribute:: project

Project instance the ProjectHDFio object is located in

.. attribute:: root_path

the pyiron user directory, defined in the .pyiron configuration

.. attribute:: project_path

the relative path of the current project / folder starting from the root path of the pyiron user directory

.. attribute:: path

the absolute path of the current project / folder plus the absolute path in the HDF5 file as one path

.. attribute:: file_name

absolute path to the HDF5 file

.. attribute:: h5_path

path inside the HDF5 file - also stored as absolute path

.. attribute:: history

previously opened groups / folders

.. attribute:: file_exists

boolean if the HDF5 was already written

.. attribute:: base_name

name of the HDF5 file but without any file extension

.. attribute:: file_path

directory where the HDF5 file is located

.. attribute:: is_root

boolean if the HDF5 object is located at the root level of the HDF5 file

.. attribute:: is_open

boolean if the HDF5 file is currently opened - if an active file handler exists

.. attribute:: is_empty

boolean if the HDF5 file is empty

.. attribute:: user

current unix/linux/windows user who is running pyiron

.. attribute:: sql_query

an SQL query to limit the jobs within the project to a subset which matches the SQL query.

.. attribute:: db

connection to the SQL database

.. attribute:: working_directory

working directory of the job is executed in - outside the HDF5 file

__init__(project: pyiron_base.project.generic.Project, file_name: str, h5_path: str | None = None, mode: str | None = None) None[source]#

Methods

__init__(project, file_name[, h5_path, mode])

clear()

close()

Close the current HDF5 path and return to the path before the last open.

copy()

Copy the ProjectHDFio object - copying just the Python object but maintaining the same pyiron path

copy_to(destination[, file_name, maintain_name])

Copy the content of the HDF5 file to a new location.

create_group(name[, track_order])

Create an HDF5 group - similar to a folder in the filesystem - the HDF5 groups allow the users to structure their data.

create_hdf(path, job_name)

Create an ProjectHDFio object to store project related information - for testing aggregated data

create_project_from_hdf5()

Internal function to create a pyiron project pointing to the directory where the HDF5 file is located.

create_working_directory()

Create the working directory on the file system if it does not exist already.

file_size()

Get the size of the HDF5 file.

get(key[, default])

Get data from the HDF5 file.

get_from_table(path, name)

Get a specific value from a pandas.DataFrame.

get_job_id(job_specifier)

get the job_id for job named job_name in the local project path from database

get_pandas(name)

Load a dictionary from the HDF5 file and display the dictionary as a pandas DataFrame.

get_size(hdf)

Get the size of the groups inside the HDF5 file.

groups()

Filter HDF5 file by groups.

hd_copy(hdf_old, hdf_new[, exclude_groups, ...])

Copy data from one HDF5 file to another.

inspect(job_specifier)

Inspect an existing pyiron object - most commonly a job - from the database

items()

List all keys and values as items of all groups and nodes of the HDF5 file.

keys()

List all groups and nodes of the HDF5 file - where groups are equivalent to directories and nodes to files.

list_all()

Returns dictionary of :method:`.list_groups()` and :method:`.list_nodes()`.

list_dirs()

Equivalent to os.listdirs (consider groups as equivalent to dirs).

list_groups()

Return a list of names of all nested groups.

list_h5_path([h5_path])

List all groups and nodes of the HDF5 file.

list_nodes()

Return a list of names of all nested nodes.

listdirs()

Equivalent to os.listdirs (consider groups as equivalent to dirs).

load(job_specifier[, convert_to_object])

Load an existing pyiron object - most commonly a job - from the database

load_from_jobpath([job_id, db_entry, ...])

Internal function to load an existing job either based on the job ID or based on the database entry dictionary.

nodes()

Filter HDF5 file by nodes.

open(h5_rel_path)

Create an HDF5 group and enter this specific group.

pop(k[,d])

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem()

as a 2-tuple; but raise KeyError if D is empty.

put(key, value)

Store data inside the HDF5 file.

read_dict_from_hdf([group_paths, recursive])

Read data from HDF5 file into a dictionary - by default only the nodes are converted to dictionaries, additional sub groups can be specified using the group_paths parameter.

remove_file()

Remove the HDF5 file with all the related content.

remove_group()

Remove an HDF5 group if it exists.

remove_job(job_specifier[, _unprotect])

Remove a single job from the project based on its job_specifier.

rewrite_hdf5([job_name, info, ...])

Rewrite the entire hdf file.

setdefault(k[,d])

show_hdf()

Iterate over the HDF5 data structure and generate a human-readable graph.

to_dict([hierarchical])

Get the content of the HDF5 file at the current h5_path returned as a dictionary.

to_object([class_name])

Load the full pyiron object from an HDF5 file

update([E, ]**F)

If E present and has a .keys() method, does: for k in E.keys(): D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

values()

List all values for all groups and nodes of the HDF5 file.

write_dict(data_dict[, compression])

Write a dictionary to the HDF5 file.

write_dict_to_hdf(data_dict)

Write a dictionary to HDF5

Attributes

base_name

The absolute path to of the current pyiron project - absolute path on the file system, not including the HDF5 path.

db

Get connection to the SQL database

file_exists

Check if the HDF5 file exists already.

file_name

Get the file name of the HDF5 file.

file_path

Get the directory where the HDF5 file is located.

h5_path

Get the path in the HDF5 file starting from the root group.

is_empty

Check if the HDF5 file is empty.

is_root

Check if the current h5_path is pointing to the HDF5 root group.

name

Get the name of the HDF5 group.

path

Absolute path of the HDF5 group starting from the system root - combination of the absolute system path plus the absolute path inside the HDF5 file starting from the root group.

project

Get the project instance the ProjectHDFio object is located in

project_path

the relative path of the current project / folder starting from the root path of the pyiron user directory

root_path

the pyiron user directory, defined in the .pyiron configuration

sql_query

Get the SQL query for the project

user

Get current unix/linux/windows user who is running pyiron

working_directory

Get the working directory of the current ProjectHDFio object. The working directory equals the path but it is represented by the filesystem: /absolute/path/to/the/file.h5/path/inside/the/hdf5/file becomes: /absolute/path/to/the/file_hdf5/path/inside/the/hdf5/file.

property base_name: str#

The absolute path to of the current pyiron project - absolute path on the file system, not including the HDF5 path.

Returns:

current project path

Return type:

str

clear() None.  Remove all items from D.#
close() None#

Close the current HDF5 path and return to the path before the last open.

copy() ProjectHDFio[source]#

Copy the ProjectHDFio object - copying just the Python object but maintaining the same pyiron path

Returns:

copy of the ProjectHDFio object

Return type:

ProjectHDFio

copy_to(destination: Pointer, file_name: str = None, maintain_name: bool = True) Pointer#

Copy the content of the HDF5 file to a new location.

Parameters:
  • destination (Pointer) – The Pointer object pointing to the new location.

  • file_name (str, optional) – The name of the new HDF5 file. Defaults to None.

  • maintain_name (bool, optional) – Whether to maintain the names of the HDF5 groups. Defaults to True.

Returns:

The Pointer object pointing to a file which now contains the same content as the current file.

Return type:

Pointer

create_group(name: str, track_order: bool = False) FileHDFio#

Create an HDF5 group - similar to a folder in the filesystem - the HDF5 groups allow the users to structure their data.

Parameters:
  • name (str) – Name of the HDF5 group

  • track_order (bool) – If False, this groups tracks its elements in alphanumeric order, if True, in insertion order

Returns:

FileHDFio object pointing to the new group

Return type:

FileHDFio

create_hdf(path: str, job_name: str) ProjectHDFio[source]#

Create an ProjectHDFio object to store project related information - for testing aggregated data

Parameters:
  • path (str) – absolute path

  • job_name (str) – name of the HDF5 container

Returns:

HDF5 object

Return type:

ProjectHDFio

create_project_from_hdf5() Project[source]#

Internal function to create a pyiron project pointing to the directory where the HDF5 file is located.

Returns:

pyiron project object

Return type:

Project

create_working_directory() None[source]#

Create the working directory on the file system if it does not exist already.

property db: DatabaseAccess#

Get connection to the SQL database

Returns:

database conncetion

Return type:

DatabaseAccess

property file_exists: bool#

Check if the HDF5 file exists already.

Returns:

True if the file exists, False otherwise.

Return type:

bool

property file_name: str#

Get the file name of the HDF5 file.

Returns:

The absolute path to the HDF5 file.

Return type:

str

property file_path: str#

Get the directory where the HDF5 file is located.

Returns:

Directory where the HDF5 file is located

Return type:

str

file_size() float#

Get the size of the HDF5 file.

Returns:

The file size in bytes.

Return type:

float

get(key: str, default: object | None = None) Dict | List | float | int#

Get data from the HDF5 file.

Parameters:
  • key (str) – Path to the data or key of the data object

  • default (object) – Default value to return if key doesn’t exist

Returns:

Data or data object

Return type:

Union[Dict, List, float, int]

get_from_table(path: str, name: str) Dict | List | float | int#

Get a specific value from a pandas.DataFrame.

Parameters:
  • path (str) – Relative path to the data object

  • name (str) – Parameter key

Returns:

The value associated with the specific parameter key

Return type:

Union[Dict, List, float, int]

get_job_id(job_specifier: str | int) int[source]#

get the job_id for job named job_name in the local project path from database

Parameters:

job_specifier (str, int) – name of the job or job ID

Returns:

job ID of the job

Return type:

int

get_pandas(name: str) DataFrame#

Load a dictionary from the HDF5 file and display the dictionary as a pandas DataFrame.

Parameters:

name (str) – HDF5 node name

Returns:

The dictionary as a pandas DataFrame object

Return type:

pd.DataFrame

get_size(hdf: FileHDFio) float#

Get the size of the groups inside the HDF5 file.

Parameters:

hdf (FileHDFio) – HDF5 file

Returns:

File size in Bytes

Return type:

float

groups() FileHDFio#

Filter HDF5 file by groups.

Returns:

An HDF5 file which is filtered by groups

Return type:

FileHDFio

property h5_path: str#

Get the path in the HDF5 file starting from the root group.

Returns:

The HDF5 path.

Return type:

str

hd_copy(hdf_old: FileHDFio, hdf_new: FileHDFio, exclude_groups: List[str] | None = None, exclude_nodes: List[str] | None = None) None#

Copy data from one HDF5 file to another.

Parameters:
  • hdf_old (FileHDFio) – Source HDF5 file

  • hdf_new (FileHDFio) – Destination HDF5 file

  • exclude_groups (List[str]) – List of groups to exclude from the copy

  • exclude_nodes (List[str]) – List of nodes to exclude from the copy

inspect(job_specifier: str | int) JobCore[source]#

Inspect an existing pyiron object - most commonly a job - from the database

Parameters:

job_specifier (str, int) – name of the job or job ID

Returns:

Access to the HDF5 object - not a GenericJob object - use load() instead.

Return type:

JobCore

property is_empty: bool#

Check if the HDF5 file is empty.

Returns:

True if the file is empty, False otherwise.

Return type:

bool

property is_root: bool#

Check if the current h5_path is pointing to the HDF5 root group.

Returns:

True if the current h5_path is the root group, False otherwise.

Return type:

bool

items() List[Tuple[str, Dict | List | float | int]]#

List all keys and values as items of all groups and nodes of the HDF5 file.

Returns:

List of sets (key, value)

Return type:

List[Tuple[str, Union[Dict, List, float, int]]]

keys() List[str]#

List all groups and nodes of the HDF5 file - where groups are equivalent to directories and nodes to files.

Returns:

All groups and nodes

Return type:

List[str]

list_all()#

Returns dictionary of :method:`.list_groups()` and :method:`.list_nodes()`.

Returns:

results of :method:`.list_groups() under the key "groups"; results of :method:`.list_nodes()` und the

key “nodes”

Return type:

dict

list_dirs() List[str]#

Equivalent to os.listdirs (consider groups as equivalent to dirs).

Returns:

List of groups in pytables for the path self.h5_path

Return type:

List[str]

list_groups()#

Return a list of names of all nested groups.

Returns:

group names

Return type:

list of str

list_h5_path(h5_path: str = '') Dict[str, List[str]]#

List all groups and nodes of the HDF5 file.

Parameters:

h5_path (str, optional) – The path to a group in the HDF5 file from where the data is read. Defaults to “”.

Returns:

A dictionary with keys “groups” and “nodes” containing lists of groups and nodes.

Return type:

Dict[str, List[str]]

list_nodes()#

Return a list of names of all nested nodes.

Returns:

node names

Return type:

list of str

listdirs() List[str]#

Equivalent to os.listdirs (consider groups as equivalent to dirs).

Returns:

List of groups in pytables for the path self.h5_path

Return type:

List[str]

load(job_specifier: str | int, convert_to_object: bool = True) GenericJob | JobCore[source]#

Load an existing pyiron object - most commonly a job - from the database

Parameters:
  • job_specifier (str, int) – name of the job or job ID

  • convert_to_object (bool) – convert the object to an pyiron object or only access the HDF5 file - default=True accessing only the HDF5 file is about an order of magnitude faster, but only provides limited functionality. Compare the GenericJob object to JobCore object.

Returns:

Either the full GenericJob object or just a reduced JobCore object

Return type:

GenericJob, JobCore

load_from_jobpath(job_id: int | None = None, db_entry: dict | None = None, convert_to_object: bool = True) GenericJob | JobCore[source]#

Internal function to load an existing job either based on the job ID or based on the database entry dictionary.

Parameters:
  • job_id (int, optional) – Job ID - optional, but either the job_id or the db_entry is required.

  • db_entry (dict, optional) – database entry dictionary - optional, but either the job_id or the db_entry is required.

  • convert_to_object (bool) – convert the object to an pyiron object or only access the HDF5 file - default=True accessing only the HDF5 file is about an order of magnitude faster, but only provides limited functionality. Compare the GenericJob object to JobCore object.

Returns:

Either the full GenericJob object or just a reduced JobCore object

Return type:

GenericJob, JobCore

property name: str#

Get the name of the HDF5 group.

Returns:

The name of the HDF5 group.

Return type:

str

nodes() FileHDFio#

Filter HDF5 file by nodes.

Returns:

An HDF5 file which is filtered by nodes

Return type:

FileHDFio

open(h5_rel_path: str) FileHDFio#

Create an HDF5 group and enter this specific group. If the group exists in the HDF5 path, only the h5_path is set correspondingly, otherwise the group is created first.

Parameters:

h5_rel_path (str) – Relative path from the current HDF5 path - h5_path - to the new group

Returns:

FileHDFio object pointing to the new group

Return type:

FileHDFio

property path: str#

Absolute path of the HDF5 group starting from the system root - combination of the absolute system path plus the absolute path inside the HDF5 file starting from the root group.

Returns:

absolute path

Return type:

str

pop(k[, d]) v, remove specified key and return the corresponding value.#

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem() (k, v), remove and return some (key, value) pair#

as a 2-tuple; but raise KeyError if D is empty.

property project: pyiron_base.project.generic.Project#

Get the project instance the ProjectHDFio object is located in

Returns:

pyiron project

Return type:

Project

property project_path: str#

the relative path of the current project / folder starting from the root path of the pyiron user directory

Returns:

relative path of the current project / folder

Return type:

str

put(key: str, value: DataFrame | Series | Dict | List | float | int) None#

Store data inside the HDF5 file.

Parameters:
  • key (str) – Key to store the data

  • value (Union[pandas.DataFrame, pandas.Series, Dict, List, float, int]) – Data to store

read_dict_from_hdf(group_paths: List[str] = [], recursive: bool = False) dict#

Read data from HDF5 file into a dictionary - by default only the nodes are converted to dictionaries, additional sub groups can be specified using the group_paths parameter.

Parameters:
  • group_paths (List[str]) – list of additional groups to be included in the dictionary, for example: [“input”, “output”, “output/generic”] These groups are defined relative to the h5_path.

  • recursive (bool) – Load all subgroups recursively

Returns:

The loaded data. Can be of any type supported by write_hdf5.

Return type:

Dict

remove_file() None#

Remove the HDF5 file with all the related content.

remove_group() None#

Remove an HDF5 group if it exists. If the group does not exist, no error message is raised.

remove_job(job_specifier: str | int, _unprotect: bool = False) None[source]#

Remove a single job from the project based on its job_specifier.

Parameters:
  • job_specifier (Union[str, int]) – Name of the job or job ID.

  • _unprotect (bool) – [True/False] Delete the job without validating the dependencies to other jobs. Default is False.

rewrite_hdf5(job_name: str | None = None, info: bool = False, exclude_groups: List[str] | None = None, exclude_nodes: List[str] | None = None) None#

Rewrite the entire hdf file.

Parameters:
  • job_name (Optional[str]) – Deprecated argument, ignored.

  • info (bool) – Whether to give the information on how much space has been saved.

  • exclude_groups (Optional[List[str]]) – List of groups to exclude from the copy.

  • exclude_nodes (Optional[List[str]]) – List of nodes to exclude from the copy.

property root_path: str#

the pyiron user directory, defined in the .pyiron configuration

Returns:

pyiron user directory of the current project

Return type:

str

setdefault(k[, d]) D.get(k,d), also set D[k]=d if k not in D#
show_hdf() None#

Iterate over the HDF5 data structure and generate a human-readable graph.

property sql_query: str#

Get the SQL query for the project

Returns:

SQL query

Return type:

str

to_dict(hierarchical: bool = False) Dict[str, Any]#

Get the content of the HDF5 file at the current h5_path returned as a dictionary.

Parameters:

hierarchical (bool, optional) – Whether to convert the internal hierarchy of the HDF5 file to a hierarchical dictionary. Defaults to False.

Returns:

A dictionary with the content of the HDF5 file.

Return type:

Dict[str, Any]

to_object(class_name: str | None = None, **kwargs) object[source]#

Load the full pyiron object from an HDF5 file

Parameters:
  • class_name (str, optional) – if the ‘TYPE’ node is not available in the HDF5 file a manual object type can be set, must be as reported by str(type(obj))

  • **kwargs – optional parameters optional parameters to override init parameters

Returns:

pyiron object of the given class_name

update([E, ]**F) None.  Update D from mapping/iterable E and F.#

If E present and has a .keys() method, does: for k in E.keys(): D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

property user: str#

Get current unix/linux/windows user who is running pyiron

Returns:

username

Return type:

str

values() List[Dict | List | float | int]#

List all values for all groups and nodes of the HDF5 file.

Returns:

List of all values

Return type:

List[Union[Dict, List, float, int]]

property working_directory: str#

Get the working directory of the current ProjectHDFio object. The working directory equals the path but it is represented by the filesystem:

/absolute/path/to/the/file.h5/path/inside/the/hdf5/file

becomes:

/absolute/path/to/the/file_hdf5/path/inside/the/hdf5/file

Returns:

absolute path to the working directory

Return type:

str

write_dict(data_dict: Dict[str, Any], compression: int = 4) None#

Write a dictionary to the HDF5 file.

Parameters:
  • data_dict (Dict[str, Any]) –

    Dictionary of data objects to be stored in the HDF5 file, the keys provide the path inside the HDF5 file and the values the data to be stored in those nodes. The corresponding HDF5 groups are created automatically:

    {

    ‘/hdf5root/group/node_name’: {}, ‘/hdf5root/group/subgroup/node_name’: […],

    }

  • compression (int, optional) – The compression level to use (0-9) to compress data using gzip. Defaults to 4.

write_dict_to_hdf(data_dict: dict) None#

Write a dictionary to HDF5

Parameters:

data_dict (dict) – dictionary with objects which should be written to HDF5