Developers#
Explain the technology behind the pyiron_base
workflow manager.
Install from Source#
For development, it is recommended to first create a conda environment containing all of pyiron’s dependencies. The dependencies are available in pyiron environment.yml file.
git clone https://github.com/pyiron/pyiron_base.git
conda env create -f pyiron/environment.yml
If conda is not available on your machine, the next best thing would be to install pyiron and its dependencies via pip.
To include this version in your PYTHONPATH
add the following line to your ~/.profile
or ~/.bashrc
configuration:
export PYTHONPATH=${HOME}/path/to/pyiron_base/:${PYTHONPATH}
When you import pyiron in any python shell or jupyter notebook it should load the version from ~/path/to/
. Finally you
can switch to other branches using git:
git checkout -b main
In this case we switch to the main branch.
HDF5 Serialization#
Structure#
Each hierachical object lives under its own group in the hdf, i.e. objects that are attributes of another must have their own sub-group in that larger objects group. In its group each object must store
TYPE
equal tostr(type(self))
this provides the module path and class name from which pyiron will load a classNAME
equal totype(self).__name__
the unqualified class name, informational only
They may also store
HDF_VERSION
equal to a version string with formatMAJOR.MINOR.PATCH
the version of the structure of the type in HDF5; all classes must be able to read from HDF5 with at least the same MAJOR release, but explicit breaking behaviour should be very rareVERSION
equal to a version string with formatMAJOR.MINOR.PATCH
the version of the functionality of the class; higher version must not change the HDF5 structure unless they also change HDF_VERSION
For example a class defined like this
class Foo:
def __init__(self, parameter):
self.bar = Bar()
self.baz = Baz()
self.parameter = parameter
should be serialized as
foo/
foo/TYPE
foo/NAME
foo/VERSION
foo/HDF_VERSION
foo/parameter
foo/bar/
foo/bar/TYPE
foo/bar/NAME
foo/bar/VERSION
foo/bar/HDF_VERSION
foo/baz/
foo/baz/TYPE
foo/baz/NAME
foo/baz/VERSION
foo/baz/HDF_VERSION
Writing to HDF5#
Each type must define a to_hdf(self, hdf, group_name = None)
method that takes the given hdf
object, creates a
subgroup called group_name
in it (if given) and then serializes itself to this group. Some objects may keep a default
ProjectHDFio
object during their lifetime (e.g. jobs), in this case hdf
maybe an optional parameter.
Reading from HDF5#
Each type must define a from_hdf(self, hdf, group_name = None)
method and may define a from_hdf_args(cls, hdf)
.
from_hdf()
restores the state of the already initialized object from the information stored in the HDF5 file.
from_hdf_args()
reads the required parameters to instantiate the object from HDF5 and returns them in a dict
.
To read an object from a given ProjectHDFio
path, call the to_object()
method. This will first call import_class
to read the class object, then make_from_hdf()
to instantiate it, if the class defines from_hd_args()
it will be
called to supply the correct init parameters. to_object()
can also be supplied with additional parameters to override
the ones written to HDF5, in particular it will always provide job_name
and project
. However only those parameters
that are needed (i.e. declared by that classes’ __init__()
) will be passed.
Run function#
Explain how a job is executed.
Queuing System#
If you just want to configure the queue setup, look into the documentation. The following details on the code flow for job submission to the queue.
Every time pyiron submits a job to the queue (reachable from the current location - for remote setup this is run on the remote machine) it runs:
The job submission is handled by the queue adapter which populates the slurm run template
#!/bin/bash
#SBATCH --output=time.out
#SBATCH --job-name={{job_name}}
#SBATCH --workdir={{working_directory}}
#SBATCH --get-user-env=L
#SBATCH --partition=slurm
{%- if run_time_max %}
#SBATCH --time={{ [1, run_time_max // 60]|max }}
{%- endif %}
{%- if memory_max %}
#SBATCH --mem={{memory_max}}G
{%- endif %}
#SBATCH --cpus-per-task={{cores}}
(copied from here)
and submits this into the queue. I.e. the command running will be
command = (
"python -m pyiron_base.cli wrapper -p "
+ job.working_directory
+ " -j "
+ str(job.job_id)
)
which essentially does a job.load()
and a job.run()
on the compute node.
The job.run()
calls finally
where the str(executable) or the executable.executable_path point to the shell script for the chosen version as defined in the resources. e.g. run multi core LAMMPS 2020.03.03 (run_lammps_2020.03.03_mpi.sh):
#!/bin/bash
mpiexec -n $1 --oversubscribe lmp_mpi -in control.inp;
(copied from here)
Command Line#
Adding a new sub command is done by adding a new module to pyiron.cli
. This module needs to define a register
and a
main
function. The former is called with an argparse.ArgumentParser
instance as sole argument and should define the
command line interface in the usual way. The latter will be called
with the parsed arguments and should just execute whatever it is that utility should be doing. Additionally, if you
need to control the formatter_class
and epilog
keyword arguments when creating the argparse.ArgumentParser
instance you can set the formatter
and epilog
toplevel variables (see the ls
sub command for an example).
Finally, you must add the module to the pyiron.cli.cli_modules
dict.