pyiron_base.database.filetable module
File based database interface
- class pyiron_base.database.filetable.FileTable(index_from)
Bases:
IsDatabase
File table should behave to the user like a database, but it infers project hierarchy directly from the file system hierarchy.
Because indexing the file system can be expensive, and projects sometimes get re-initialized, it is important to keep the (re)instantiation cost for this class as minimal as possible.
- Parameters:
index_from (str) – The file path to start indexing at, i.e. the project path.
fileindex (PyFileIndex) – In case the file path in index_from is already indexed, then the index can be provided as additional input parameter.
- add_item_dict(par_dict)
Create a new database item
- Parameters:
par_dict (dict) –
- Dictionary with the item values and column names as keys, like:
{‘chemicalformula’: ‘BO’,
’computer’: ‘localhost’, ‘hamilton’: ‘VAMPS’, ‘hamversion’: ‘1.1’, ‘job’: ‘testing’, ‘subjob’ : ‘SubJob’, ‘parentid’: 0L, ‘myCol’: ‘Blubbablub’, ‘project’: ‘database.testing’, ‘projectpath’: ‘/root/directory/tmp’, ‘status’: ‘KAAAA’, ‘timestart’: datetime(2016, 5, 2, 11, 31, 4, 253377), ‘timestop’: datetime(2016, 5, 2, 11, 31, 4, 371165), ‘totalcputime’: 0.117788, ‘username’: ‘Test’}
- Returns:
Database ID of the item created as an int, like: 3
- Return type:
int
- delete_item(item_id)
Delete Item from database
- Parameters:
item_id (int) – Databse Item ID (Integer), like: 38
- force_reset(fileindex=None)
Reset cache of the FileTable object
- Parameters:
fileindex (PyFileIndex) – File index for the current directory
- get_child_ids(job_specifier, project=None, status=None)
Get the childs for a specific job
- Parameters:
job_specifier (str) – name of the master job or the master jobs job ID
project (str) – project_path - this is in contrast to the project_path in GenericPath
status (str) – filter childs which match a specific status - None by default
- Returns:
list of child IDs
- Return type:
list
- static get_extract(path, mtime)
- get_item_by_id(item_id)
Get item from database by searching for a specific item Id.
- Parameters:
item_id (int) – Databse Item ID (Integer), like: 38
- Returns:
- Dictionary where the key is the column name, like:
- {‘chemicalformula’: u’BO’,
’computer’: u’localhost’, ‘hamilton’: u’VAMPS’, ‘hamversion’: u’1.1’, ‘id’: 1, ‘job’: u’testing’, ‘masterid’: None, ‘parentid’: 0, ‘project’: u’database.testing’, ‘projectpath’: u’/root/directory/tmp’, ‘status’: u’KAAAA’, ‘subjob’: u’SubJob’, ‘timestart’: datetime.datetime(2016, 5, 2, 11, 31, 4, 253377), ‘timestop’: datetime.datetime(2016, 5, 2, 11, 31, 4, 371165), ‘totalcputime’: 0.117788, ‘username’: u’Test’}
- Return type:
dict
- get_items_dict(item_dict, return_all_columns=True)
Get list of jobs which fulfills the query in the dictionary
- Parameters:
item_dict (dict) –
a dict type, which has a certain syntax for this function: a normal dict like {‘hamilton’: ‘VAMPE’, ‘hamversion’: ‘1.1’} has similarities with a simple query like
select * from table_name where hamilton = ‘VAMPE AND hamversion = ‘1.1’
as seen it puts an AND for every key, value combination in the dict and searches for it.
another syntax is for an OR statement, simply: {‘hamilton’: [‘VAMPE’, ‘LAMMPS’]}, the query would be:
select * from table_name where hamilton = ‘VAMPE’ OR hamilton = ‘LAMMPS’
- and lastly for a LIKE statement, simply: {‘project’: ‘database.%’}, the query would be
select * from table_name where project LIKE ‘database.%’
that means you can simply add the syntax for a like statement like ‘%’ and it will automatically operate a like-search
- of course you can also use a more complex select method, with everything in use:
- {‘hamilton’: [‘VAMPE’, ‘LAMMPS’],
’project’: ‘databse%’, ‘hamversion’: ‘1.1’}
- select * from table_name where (hamilton = ‘VAMPE’ Or hamilton = ‘LAMMPS’) AND
(project LIKE ‘database%’) AND hamversion = ‘1.1’
return_all_columns (bool) – return all columns or only the ‘id’ - still the format stays the same.
- Returns:
- the function returns a list of dicts like get_items_sql, but it does not format datetime:
- [{‘chemicalformula’: u’Ni108’,
’computer’: u’mapc157’, ‘hamilton’: u’LAMMPS’, ‘hamversion’: u’1.1’, ‘id’: 24, ‘job’: u’DOF_1_0’, ‘parentid’: 21L, ‘project’: u’lammps.phonons.Ni_fcc’, ‘projectpath’: u’D:/PyIron/PyIron_data/projects’, ‘status’: u’finished’, ‘timestart’: datetime.datetime(2016, 6, 24, 10, 17, 3, 140000), ‘timestop’: datetime.datetime(2016, 6, 24, 10, 17, 3, 173000), ‘totalcputime’: 0.033, ‘username’: u’test’},
- {‘chemicalformula’: u’Ni108’,
’computer’: u’mapc157’, ‘hamilton’: u’LAMMPS’, ‘hamversion’: u’1.1’, ‘id’: 21, ‘job’: u’ref’, ‘parentid’: 20L, ‘project’: u’lammps.phonons.Ni_fcc’, ‘projectpath’: u’D:/PyIron/PyIron_data/projects’, ‘status’: u’finished’, ‘timestart’: datetime.datetime(2016, 6, 24, 10, 17, 2, 429000), ‘timestop’: datetime.datetime(2016, 6, 24, 10, 17, 2, 463000), ‘totalcputime’: 0.034, ‘username’: u’test’},…….]
- Return type:
list
- get_job_id(job_specifier, project=None)
Get job ID from filetable
- Parameters:
job_specifier (str) – Job ID or job name
project (str/ None) – project_path as string
- Returns:
job ID
- Return type:
int/ None
- get_job_status(job_id)
Get status of a given job selected by its job ID
- Parameters:
job_id (int) – job ID as integer
- Returns:
status of the job
- Return type:
str
- get_job_working_directory(job_id)
Get the working directory of a particular job
- Parameters:
job_id (int) – job ID as integer
- Returns:
working directory as absolute path
- Return type:
str
- init_table(fileindex, working_dir_lst=None)
Initialize the filetable class
- Parameters:
fileindex (pandas.DataFrame) – file system index for the current project path
working_dir_lst (list/ None) – list of working directories
- Returns:
list of dictionaries
- Return type:
list
- set_job_status(job_id, status)
Set job status
- Parameters:
job_id (int) – job ID as integer
status (str) – job status
- update()
Update the filetable cache
- class pyiron_base.database.filetable.FileTableSingleton(name, bases, namespace, /, **kwargs)
Bases:
ABCMeta
Indexing the file system for each FileTable can be expensive, so we use a singleton system that does this once for each path instead.
- pyiron_base.database.filetable.filter_function(file_name)
- pyiron_base.database.filetable.get_hamilton_from_file(hdf5_file, job_name)
- pyiron_base.database.filetable.get_hamilton_version_from_file(hdf5_file, job_name)
- pyiron_base.database.filetable.get_job_status_from_file(hdf5_file, job_name)