pyiron_base.database.filetable.FileTable#

class pyiron_base.database.filetable.FileTable(index_from: str)[source]#

Bases: IsDatabase

File table should behave to the user like a database, but it infers project hierarchy directly from the file system hierarchy.

Because indexing the file system can be expensive, and projects sometimes get re-initialized, it is important to keep the (re)instantiation cost for this class as minimal as possible.

Parameters:
  • index_from (str) – The file path to start indexing at, i.e. the project path.

  • fileindex (PyFileIndex) – In case the file path in index_from is already indexed, then the index can be provided as additional input parameter.

__init__(index_from: str, fileindex: PyFileIndex = None)[source]#

Methods

__init__(index_from[, fileindex])

add_item_dict(par_dict)

Create a new database item

delete_item(item_id)

Delete Item from database

force_reset([fileindex])

Reset cache of the FileTable object

get_child_ids(job_specifier[, project, status])

Get the childs for a specific job

get_db_columns()

Get column names

get_extract(path, mtime)

Extract job information from a given file path and modification time.

get_item_by_id(item_id)

Get item from database by searching for a specific item Id.

get_items_dict(item_dict[, return_all_columns])

Get list of jobs which fulfills the query in the dictionary

get_job_id(job_specifier[, project])

Get job ID from filetable

get_job_ids(sql_query, user, project_path[, ...])

Return the job IDs matching a specific query

get_job_status(job_id)

Get status of a given job selected by its job ID

get_job_working_directory(job_id)

Get the working directory of a particular job

get_jobs(sql_query, user, project_path[, ...])

Internal function to return the jobs as dictionary rather than a pandas.Dataframe

get_table_headings([table_name])

Get column names; if given table_name can select one of multiple tables defined in the database, but subclasses may ignore it

init_table(fileindex[, working_dir_lst])

Initialize the filetable class

item_update(par_dict, item_id)

job_table(sql_query, user, project_path[, ...])

Access the job_table.

set_job_status(job_id, status)

Set job status

update()

Update the filetable cache

Attributes

view_mode

Get view_mode - if view_moded is enable pyiron has read only access to the database.

add_item_dict(par_dict: dict) int[source]#

Create a new database item

Parameters:

par_dict (dict) –

Dictionary with the item values and column names as keys, like:

{‘chemicalformula’: ‘BO’,

’computer’: ‘localhost’, ‘hamilton’: ‘VAMPS’, ‘hamversion’: ‘1.1’, ‘job’: ‘testing’, ‘subjob’ : ‘SubJob’, ‘parentid’: 0L, ‘myCol’: ‘Blubbablub’, ‘project’: ‘database.testing’, ‘projectpath’: ‘/root/directory/tmp’, ‘status’: ‘KAAAA’, ‘timestart’: datetime(2016, 5, 2, 11, 31, 4, 253377), ‘timestop’: datetime(2016, 5, 2, 11, 31, 4, 371165), ‘totalcputime’: 0.117788, ‘username’: ‘Test’}

Returns:

Database ID of the item created as an int, like: 3

Return type:

int

delete_item(item_id: int) None[source]#

Delete Item from database

Parameters:

item_id (int) – Databse Item ID (Integer), like: 38

force_reset(fileindex: PyFileIndex | None = None) None[source]#

Reset cache of the FileTable object

Parameters:

fileindex (PyFileIndex) – File index for the current directory

get_child_ids(job_specifier: str | int, project: str | None = None, status: str | None = None) List[int][source]#

Get the childs for a specific job

Parameters:
  • job_specifier (str) – name of the master job or the master jobs job ID

  • project (str) – project_path - this is in contrast to the project_path in GenericPath

  • status (str) – filter childs which match a specific status - None by default

Returns:

list of child IDs

Return type:

list

get_db_columns() List[str]#

Get column names

Returns:

list of column names like:

[‘id’, ‘parentid’, ‘masterid’, ‘projectpath’, ‘project’, ‘job’, ‘subjob’, ‘chemicalformula’, ‘status’, ‘hamilton’, ‘hamversion’, ‘username’, ‘computer’, ‘timestart’, ‘timestop’, ‘totalcputime’]

Return type:

list

static get_extract(path: str, mtime: datetime) dict[source]#

Extract job information from a given file path and modification time.

Parameters:
  • path (str) – The file path.

  • mtime (datetime.datetime) – The modification time.

Returns:

A dictionary containing the extracted job information.

Return type:

dict

get_item_by_id(item_id: int) dict[source]#

Get item from database by searching for a specific item Id.

Parameters:

item_id (int) – Databse Item ID (Integer), like: 38

Returns:

Dictionary where the key is the column name, like:
{‘chemicalformula’: u’BO’,

’computer’: u’localhost’, ‘hamilton’: u’VAMPS’, ‘hamversion’: u’1.1’, ‘id’: 1, ‘job’: u’testing’, ‘masterid’: None, ‘parentid’: 0, ‘project’: u’database.testing’, ‘projectpath’: u’/root/directory/tmp’, ‘status’: u’KAAAA’, ‘subjob’: u’SubJob’, ‘timestart’: datetime.datetime(2016, 5, 2, 11, 31, 4, 253377), ‘timestop’: datetime.datetime(2016, 5, 2, 11, 31, 4, 371165), ‘totalcputime’: 0.117788, ‘username’: u’Test’}

Return type:

dict

get_items_dict(item_dict: dict, return_all_columns: bool = True) List[dict][source]#

Get list of jobs which fulfills the query in the dictionary

Parameters:
  • item_dict (dict) –

    a dict type, which has a certain syntax for this function: a normal dict like {‘hamilton’: ‘VAMPE’, ‘hamversion’: ‘1.1’} has similarities with a simple query like

    select * from table_name where hamilton = ‘VAMPE AND hamversion = ‘1.1’

    as seen it puts an AND for every key, value combination in the dict and searches for it.

    another syntax is for an OR statement, simply: {‘hamilton’: [‘VAMPE’, ‘LAMMPS’]}, the query would be:

    select * from table_name where hamilton = ‘VAMPE’ OR hamilton = ‘LAMMPS’

    and lastly for a LIKE statement, simply: {‘project’: ‘database.%’}, the query would be

    select * from table_name where project LIKE ‘database.%’

    that means you can simply add the syntax for a like statement like ‘%’ and it will automatically operate a like-search

    of course you can also use a more complex select method, with everything in use:
    {‘hamilton’: [‘VAMPE’, ‘LAMMPS’],

    ’project’: ‘databse%’, ‘hamversion’: ‘1.1’}

    select * from table_name where (hamilton = ‘VAMPE’ Or hamilton = ‘LAMMPS’) AND

    (project LIKE ‘database%’) AND hamversion = ‘1.1’

  • return_all_columns (bool) – return all columns or only the ‘id’ - still the format stays the same.

Returns:

the function returns a list of dicts, but it does not format datetime:
[{‘chemicalformula’: u’Ni108’,

’computer’: u’mapc157’, ‘hamilton’: u’LAMMPS’, ‘hamversion’: u’1.1’, ‘id’: 24, ‘job’: u’DOF_1_0’, ‘parentid’: 21L, ‘project’: u’lammps.phonons.Ni_fcc’, ‘projectpath’: u’D:/PyIron/PyIron_data/projects’, ‘status’: u’finished’, ‘timestart’: datetime.datetime(2016, 6, 24, 10, 17, 3, 140000), ‘timestop’: datetime.datetime(2016, 6, 24, 10, 17, 3, 173000), ‘totalcputime’: 0.033, ‘username’: u’test’},

{‘chemicalformula’: u’Ni108’,

’computer’: u’mapc157’, ‘hamilton’: u’LAMMPS’, ‘hamversion’: u’1.1’, ‘id’: 21, ‘job’: u’ref’, ‘parentid’: 20L, ‘project’: u’lammps.phonons.Ni_fcc’, ‘projectpath’: u’D:/PyIron/PyIron_data/projects’, ‘status’: u’finished’, ‘timestart’: datetime.datetime(2016, 6, 24, 10, 17, 2, 429000), ‘timestop’: datetime.datetime(2016, 6, 24, 10, 17, 2, 463000), ‘totalcputime’: 0.034, ‘username’: u’test’},…….]

Return type:

list

get_job_id(job_specifier: str | int, project: str | None = None) int[source]#

Get job ID from filetable

Parameters:
  • job_specifier (str) – Job ID or job name

  • project (str/ None) – project_path as string

Returns:

job ID

Return type:

int/ None

get_job_ids(sql_query: str, user: str, project_path: str, recursive: bool = True) List[int]#

Return the job IDs matching a specific query

Parameters:
  • database (DatabaseAccess) – Database object

  • sql_query (str) – SQL query to enter a more specific request

  • user (str) – username of the user whoes user space should be searched

  • project_path (str) – root_path - this is in contrast to the project_path in GenericPath

  • recursive (bool) – search subprojects [True/False]

Returns:

a list of job IDs

Return type:

list

get_job_status(job_id: int) str[source]#

Get status of a given job selected by its job ID

Parameters:

job_id (int) – job ID as integer

Returns:

status of the job

Return type:

str

get_job_working_directory(job_id: int) str | None[source]#

Get the working directory of a particular job

Parameters:

job_id (int) – job ID as integer

Returns:

working directory as absolute path

Return type:

str

get_jobs(sql_query: str, user: str, project_path: str, recursive: bool = True, columns: List[str] | None = None) List[dict]#

Internal function to return the jobs as dictionary rather than a pandas.Dataframe

Parameters:
  • sql_query (str) – SQL query to enter a more specific request

  • user (str) – username of the user whoes user space should be searched

  • project_path (str) – root_path - this is in contrast to the project_path in GenericPath

  • recursive (bool) – search subprojects [True/False]

  • columns (list) – by default only the columns [‘id’, ‘project’] are selected, but the user can select a subset of [‘id’, ‘status’, ‘chemicalformula’, ‘job’, ‘subjob’, ‘project’, ‘projectpath’, ‘timestart’, ‘timestop’, ‘totalcputime’, ‘computer’, ‘hamilton’, ‘hamversion’, ‘parentid’, ‘masterid’]

Returns:

columns are used as keys and point to a list of the corresponding values

Return type:

dict

get_table_headings(table_name: str | None = None) List[str]#

Get column names; if given table_name can select one of multiple tables defined in the database, but subclasses may ignore it

Parameters:

table_name (str) – simple string of a table_name like: ‘jobs_username’

Returns:

list of column names like:

[‘id’, ‘parentid’, ‘masterid’, ‘projectpath’, ‘project’, ‘job’, ‘subjob’, ‘chemicalformula’, ‘status’, ‘hamilton’, ‘hamversion’, ‘username’, ‘computer’, ‘timestart’, ‘timestop’, ‘totalcputime’]

Return type:

list

init_table(fileindex: PyFileIndex, working_dir_lst: List[str] | None = None) List[dict][source]#

Initialize the filetable class

Parameters:
  • fileindex (pandas.DataFrame) – file system index for the current project path

  • working_dir_lst (list/ None) – list of working directories

Returns:

list of dictionaries

Return type:

list

job_table(sql_query: str, user: str, project_path: str, recursive: bool = True, columns: List[str] | None = None, all_columns: bool = False, sort_by: str = 'id', max_colwidth: int = 200, full_table: bool = False, element_lst: List[str] | None = None, job_name_contains: str = '', mode: Literal['regex', 'glob'] = 'glob', **kwargs)#

Access the job_table.

Parameters:
  • sql_query (str) – SQL query to enter a more specific request

  • user (str) – username of the user whoes user space should be searched

  • project_path (str) – root_path - this is in contrast to the project_path in GenericPath

  • recursive (bool) – search subprojects [True/False]

  • columns (list) – by default only the columns [‘job’, ‘project’, ‘chemicalformula’] are selected, but the user can select a subset of [‘id’, ‘status’, ‘chemicalformula’, ‘job’, ‘subjob’, ‘project’, ‘projectpath’, ‘timestart’, ‘timestop’, ‘totalcputime’, ‘computer’, ‘hamilton’, ‘hamversion’, ‘parentid’, ‘masterid’]

  • all_columns (bool) – Select all columns - this overwrites the columns option.

  • sort_by (str) – Sort by a specific column

  • max_colwidth (int) – set the column width

  • full_table (bool) – Whether to show the entire pandas table

  • element_lst (list) – list of elements required in the chemical formular - by default None

  • job_name_contains (str) – (deprecated) A string which should be contained in every job_name

  • mode (str) – search mode when kwargs are given.

  • **kwargs (dict) – Optional arguments for filtering with keys matching the project database column name (eg. status=”finished”). Asterisk can be used to denote a wildcard, for zero or more instances of any character

Returns:

Return the result as a pandas.Dataframe object

Return type:

pandas.Dataframe

set_job_status(job_id: int, status: str) None[source]#

Set job status

Parameters:
  • job_id (int) – job ID as integer

  • status (str) – job status

update() None[source]#

Update the filetable cache

property view_mode: bool#

Get view_mode - if view_moded is enable pyiron has read only access to the database.

Some implementations do not allow to set this value.

Returns:

True when view_mode is enabled

Return type:

bool