pyiron_base.jobs.datamining module
- class pyiron_base.jobs.datamining.FunctionContainer(system_function_lst=None)
Bases:
object
Class which is able to append, store and retreive a set of functions.
- class pyiron_base.jobs.datamining.JobFilters
Bases:
object
Certain predefined job filters
- static job_name_contains(job_name_segment)
- static job_type(job_type)
- class pyiron_base.jobs.datamining.PyironTable(project, name=None, system_function_lst=None, csv_file_name=None)
Bases:
object
Class for easy, efficient, and pythonic analysis of data from pyiron projects
- Parameters:
project (pyiron.project.Project/None) – The project to analyze
name (str) – Name of the pyiron table
system_function_lst (list/ None) – List of built-in functions
- create_table(file, job_status_list, executor=None, enforce_update=False)
Create or update the table.
If this method has been called before and there are new functions added to
add
, apply them on the previously analyzed jobs. If this method has been called before and there are new jobs added toanalysis_project
, apply all functions to them.The result is available via
get_dataframe()
.Warning
The executor, if given, must not naively pickle the mapped functions or arguments, as PyironTable relies on lambda functions internally. Use with executors that rely on dill or cloudpickle instead. Pyiron provides such executors in the pympipool sub packages.
- Parameters:
file (FileHDFio) – HDF were the previous state of the table is stored
job_status_list (list of str) – only consider jobs with these statuses
executor (concurrent.futures.Executor) – executor for parallel execution
enforce_update (bool) – if True always regenerate the table completely.
- property db_filter_function
Function to filter the a project database table before job specific functions are applied.
The function must take a pyiron project table in the pandas.DataFrame format (project.job_table()) and return a boolean pandas.DataSeries with the same number of rows as the project table
Example:
>>> def job_filter_function(df): >>> return (df["chemicalformula"=="H2"]) & (df["hamilton"=="Vasp"])
>>> table.db_filter_function = job_filter_function
- property filter
Object containing pre-defined filter functions
- Returns:
The object containing the filters
- Return type:
pyiron.table.datamining.JobFilters
- property filter_function
Function to filter each job before more expensive functions are applied
Example:
>>> def job_filter_function(job): >>> return (job.status == "finished") & ("murn" in job.job_name)
>>> table.filter_function = job_filter_function
- get_dataframe()
- property name
Name of the table. Takes the project name if not specified
- Returns:
Name of the table
- Return type:
str
- refill_dict(diff_dict_lst)
Ensure that all dictionaries in the list have the same keys.
Keys that are not in a dict are set to None.
- static total_lst_of_keys(diff_dict_lst)
Get unique list of all keys occuring in list.
- class pyiron_base.jobs.datamining.TableJob(project, job_name)
Bases:
GenericJob
Since a project can have a large number of jobs, it is often necessary to “filter” the data to extract useful information. PyironTable is a tool that allows the user to do this efficiently.
Example:
>>> # Prepare random data >>> for T in T_range: >>> lmp = pr.create.job.Lammps(('lmp', T)) >>> lmp.structure = pr.create.structure.bulk('Ni', cubic=True).repeat(5) >>> lmp.calc_md(temperature=T) >>> lmp.run()
>>> def db_filter_function(job_table): >>> return (job_table.status == "finished") & (job_table.hamilton == "Lammps")
>>> def get_energy(job): >>> return job["output/generic/energy_pot"][-1]
>>> def get_temperature(job): >>> return job['output/generic/temperature'][-1]
>>> table.db_filter_function = db_filter_function
>>> table.add["energy"] = get_energy >>> table.add["temperature"] = get_temperature >>> table.run() >>> table.get_dataframe()
This returns a dataframe containing job-id, energy and temperature.
Alternatively, the filter function can be applied on the job
>>> def job_filter_function(job): >>> return (job.status == "finished") & ("lmp" in job.job_name)
>>> table.filter_function = job_filter_function
- property add
Add a function to analyse job data
Example:
>>> def get_energy(job): >>> return job["output/generic/energy_pot"][-1]
>>> table.add["energy"] = get_energy
- property analysis_project
which pyiron project should be searched for jobs
WARNING: setting this resets any previously added analysis and filter functions
- Type:
- property convert_to_object
if True convert fully load jobs before passing them to functions, if False use inspect mode.
- Type:
bool
- property db_filter_function
database level filter function
The function should accept a dataframe, the job table of
analysis_project
and return a bool index into it. Jobs where the index is False are excluced from the analysis.- Type:
function
- property enforce_update
if True re-evaluate all function on all jobs when
update_table()
is called.- Type:
bool
- property filter
- property filter_function
job level filter function
The function should accept a GenericJob or JobCore object and return a bool, if it returns False the job is excluced from the analysis.
- Type:
function
- from_dict(job_dict)
- from_hdf(hdf=None, group_name=None)
Restore pyiron table job from HDF5
- Parameters:
hdf
group_name
- get_dataframe()
Returns aggregated results over all jobs.
- Returns:
pandas.Dataframe
- property job_status
only jobs with status in this list are included in the table.
- Type:
list of str
- property pyiron_table
- property ref_project
- run_static()
The run static function is called by run to execute the simulation.
- to_dict()
- to_hdf(hdf=None, group_name=None)
Store pyiron table job in HDF5
- Parameters:
hdf
group_name
- update_table(job_status_list=None)
Update the pyiron table object, add new columns if a new function was added or add new rows for new jobs.
By default this function does not recompute already evaluated functions on already existing jobs. To force a complete re-evaluation set
enforce_update
to True.- Parameters:
job_status_list (list/None) – List of job status which are added to the table by default [“finished”]. Deprecated, use
job_status
instead!
- validate_ready_to_run()
Validate that the calculation is ready to be executed. By default no generic checks are performed, but one could check that the input information is complete or validate the consistency of the input at this point.
- Raises:
ValueError – if ready check is unsuccessful
- pyiron_base.jobs.datamining.always_true(_)
A function that always returns True no matter what!
- Returns:
True
- Return type:
bool
- pyiron_base.jobs.datamining.always_true_pandas(job_table)
A function which returns a pandas Series with all True values based on the size of the input pandas dataframe :param job_table: Input dataframe :type job_table: pandas.DataFrame
- Returns:
A series of True values
- Return type:
pandas.Series
- pyiron_base.jobs.datamining.get_job_id(job)