pyiron_base.jobs.datamining module

class pyiron_base.jobs.datamining.FunctionContainer(system_function_lst=None)

Bases: object

Class which is able to append, store and retreive a set of functions.

class pyiron_base.jobs.datamining.JobFilters

Bases: object

Certain predefined job filters

static job_name_contains(job_name_segment)
static job_type(job_type)
class pyiron_base.jobs.datamining.PyironTable(project, name=None, system_function_lst=None, csv_file_name=None)

Bases: object

Class for easy, efficient, and pythonic analysis of data from pyiron projects

Parameters:
  • project (pyiron.project.Project/None) – The project to analyze

  • name (str) – Name of the pyiron table

  • system_function_lst (list/ None) – List of built-in functions

create_table(file, job_status_list, executor=None, enforce_update=False)

Create or update the table.

If this method has been called before and there are new functions added to add, apply them on the previously analyzed jobs. If this method has been called before and there are new jobs added to analysis_project, apply all functions to them.

The result is available via get_dataframe().

Warning

The executor, if given, must not naively pickle the mapped functions or arguments, as PyironTable relies on lambda functions internally. Use with executors that rely on dill or cloudpickle instead. Pyiron provides such executors in the pympipool sub packages.

Parameters:
  • file (FileHDFio) – HDF were the previous state of the table is stored

  • job_status_list (list of str) – only consider jobs with these statuses

  • executor (concurrent.futures.Executor) – executor for parallel execution

  • enforce_update (bool) – if True always regenerate the table completely.

property db_filter_function

Function to filter the a project database table before job specific functions are applied.

The function must take a pyiron project table in the pandas.DataFrame format (project.job_table()) and return a boolean pandas.DataSeries with the same number of rows as the project table

Example:

>>> def job_filter_function(df):
>>>    return (df["chemicalformula"=="H2"]) & (df["hamilton"=="Vasp"])
>>> table.db_filter_function = job_filter_function
property filter

Object containing pre-defined filter functions

Returns:

The object containing the filters

Return type:

pyiron.table.datamining.JobFilters

property filter_function

Function to filter each job before more expensive functions are applied

Example:

>>> def job_filter_function(job):
>>>     return (job.status == "finished") & ("murn" in job.job_name)
>>> table.filter_function = job_filter_function
get_dataframe()
property name

Name of the table. Takes the project name if not specified

Returns:

Name of the table

Return type:

str

refill_dict(diff_dict_lst)

Ensure that all dictionaries in the list have the same keys.

Keys that are not in a dict are set to None.

static total_lst_of_keys(diff_dict_lst)

Get unique list of all keys occuring in list.

class pyiron_base.jobs.datamining.TableJob(project, job_name)

Bases: GenericJob

Since a project can have a large number of jobs, it is often necessary to “filter” the data to extract useful information. PyironTable is a tool that allows the user to do this efficiently.

Example:

>>> # Prepare random data
>>> for T in T_range:
>>>     lmp = pr.create.job.Lammps(('lmp', T))
>>>     lmp.structure = pr.create.structure.bulk('Ni', cubic=True).repeat(5)
>>>     lmp.calc_md(temperature=T)
>>>     lmp.run()
>>> def db_filter_function(job_table):
>>>     return (job_table.status == "finished") & (job_table.hamilton == "Lammps")
>>> def get_energy(job):
>>>     return job["output/generic/energy_pot"][-1]
>>> def get_temperature(job):
>>>     return job['output/generic/temperature'][-1]
>>> table.db_filter_function = db_filter_function
>>> table.add["energy"] = get_energy
>>> table.add["temperature"] = get_temperature
>>> table.run()
>>> table.get_dataframe()

This returns a dataframe containing job-id, energy and temperature.

Alternatively, the filter function can be applied on the job

>>> def job_filter_function(job):
>>>     return (job.status == "finished") & ("lmp" in job.job_name)
>>> table.filter_function = job_filter_function
property add

Add a function to analyse job data

Example:

>>> def get_energy(job):
>>>     return job["output/generic/energy_pot"][-1]
>>> table.add["energy"] = get_energy
property analysis_project

which pyiron project should be searched for jobs

WARNING: setting this resets any previously added analysis and filter functions

Type:

Project

property convert_to_object

if True convert fully load jobs before passing them to functions, if False use inspect mode.

Type:

bool

property db_filter_function

database level filter function

The function should accept a dataframe, the job table of analysis_project and return a bool index into it. Jobs where the index is False are excluced from the analysis.

Type:

function

property enforce_update

if True re-evaluate all function on all jobs when update_table() is called.

Type:

bool

property filter
property filter_function

job level filter function

The function should accept a GenericJob or JobCore object and return a bool, if it returns False the job is excluced from the analysis.

Type:

function

from_dict(job_dict)
from_hdf(hdf=None, group_name=None)

Restore pyiron table job from HDF5

Parameters:
  • hdf

  • group_name

get_dataframe()

Returns aggregated results over all jobs.

Returns:

pandas.Dataframe

property job_status

only jobs with status in this list are included in the table.

Type:

list of str

property pyiron_table
property ref_project
run_static()

The run static function is called by run to execute the simulation.

to_dict()
to_hdf(hdf=None, group_name=None)

Store pyiron table job in HDF5

Parameters:
  • hdf

  • group_name

update_table(job_status_list=None)

Update the pyiron table object, add new columns if a new function was added or add new rows for new jobs.

By default this function does not recompute already evaluated functions on already existing jobs. To force a complete re-evaluation set enforce_update to True.

Parameters:

job_status_list (list/None) – List of job status which are added to the table by default [“finished”]. Deprecated, use job_status instead!

validate_ready_to_run()

Validate that the calculation is ready to be executed. By default no generic checks are performed, but one could check that the input information is complete or validate the consistency of the input at this point.

Raises:

ValueError – if ready check is unsuccessful

pyiron_base.jobs.datamining.always_true(_)

A function that always returns True no matter what!

Returns:

True

Return type:

bool

pyiron_base.jobs.datamining.always_true_pandas(job_table)

A function which returns a pandas Series with all True values based on the size of the input pandas dataframe :param job_table: Input dataframe :type job_table: pandas.DataFrame

Returns:

A series of True values

Return type:

pandas.Series

pyiron_base.jobs.datamining.get_job_id(job)