Process Mining Package

This package utilizes PM4PY open source library on the data objects extracted from proces event logs using the process data layer, and provides methods for applying process mining techniques on the data to gain perspective into the process model

Submodules

sax.core.process_mining.process_mining module

sax.core.process_mining.process_mining.create_from_dataframe(dataframe, kloop_unroling: bool = False, case_id: str = 'Id', activity_key: str = 'Source', timestamp_key: str = 'Timestamp', lifecycle_type: str = 'Type', timestamp_format: str = '%Y-%m-%d %H:%M:%S%z', starttime_column: str = '') RawEventData

Creates event log from dataframe

Parameters

param eventlog:

XES event log file

type eventlog:

Path to the file

param kloop_unroling:

whether to perform kloop_unrolling (renaming repetitive activities for further causal discovery)

type kloop_unroling:

boolean

param case_id:

name of the case id column, defaults to XESFormatter.Parameters.CASE_ID

type case_id:

str, optional

param activity_key:

name of the activity column, defaults to XESFormatter.Parameters.ACTIVITY

type activity_key:

str, optional

param timestamp_key:

name of the timestamp column, defaults to XESFormatter.Parameters.TIMESTAMP

type timestamp_key:

str, optional

param lifecycle_type:

name of the event lifecycle column, defaults to XESFormatter.Parameters.TYPE

type lifecycle_type:

str, optional

param timestamp_format:

timestamp format, defaults to XESFormatter.Parameters.TIMESTAMP_FORMAT

type timestamp_format:

str, optional

Returns

return:

Raw event data object

rtype:

RawEventData

Raises

ValueError: If the specified event log data is not in dataframe format

sax.core.process_mining.process_mining.discover_bpmn_model(dataframe: RawEventData, variants: List[str] | None = None) BPMN

Performs process mining on the event log data to discover bpmn model

Parameters:
  • dataframe (RawEventData) -- event log

  • variants (List[str]) -- a list of variant names to perform discovery on

Returns:

BPMN

Return type:

BPMN

sax.core.process_mining.process_mining.discover_dfg(dataframe: RawEventData, variants: List[str] | None = None, lifecycleTypes=None)

Apply dfg mining algorithm on the RawEventData event log object to discover heuristic net

Parameters:
  • dataframe (RawEventData) -- event log

  • variants (List[str]) -- a list of variant names to perform discovery on

  • lifecycleTypes (List, optional) -- lifecycle event types to filter, defaults to None

Returns:

dfg

Return type:

sax.core.process_mining.process_mining.discover_heuristics_net(dataframe: RawEventData, variants: List[str] | None = None, lifecycleTypes=None) HeuristicsNet

Apply heuristic mining algorithm on the RawEventData event log object to discover heuristic net

Parameters:
  • dataframe (RawEventData) -- event log

  • variants (List[str]) -- a list of variant names to perform discovery on

  • lifecycleTypes (List, optional) -- lifecycle event types to filter, defaults to None

Returns:

heuristic net

Return type:

HeuristicsNet

sax.core.process_mining.process_mining.discover_process_map(dataframe: RawEventData, variants: List[str] | None = None, lifecycleTypes=None) Tuple[dict, dict, dict]

Discover process map

Parameters:
  • dataframe (RawEventData) -- event log

  • variants (List[str]) -- a list of variant names to perform discovery on

  • lifecycleTypes (List, optional) -- event lifecycle types to filter, defaults to None

Returns:

process map

Return type:

Tuple[dict,dict,dict]

sax.core.process_mining.process_mining.discover_process_tree(dataframe: RawEventData, variants: List[str] | None = None, lifecycleTypes=None) ProcessTree

Perform process mining on the event log to discover process tree

Parameters:
  • dataframe (RawEventData) -- _description_

  • variants (List[str]) -- a list of variant names to perform discovery on

  • lifecycleTypes (_type_, optional) -- _description_, defaults to None

Returns:

_description_

Return type:

ProcessTree

sax.core.process_mining.process_mining.filter_end_activities(dataframe: RawEventData, activities, variants: List[str] | None = None, retain=True)

Filter cases having an end activity in the provided list

Parameters:
  • dataframe (RawEventData) -- event log

  • activities (List) -- collection of end activities

  • variants (List[str]) -- a list of variant names which represent the variants to explore from the event log

  • retain (bool, optional) -- if True, we retain the traces containing the given end activities, if false, we drop the traces

Returns:

filtered dataframe

Return type:

Union[EventLog, pd.DataFrame]

sax.core.process_mining.process_mining.filter_start_activities(dataframe: RawEventData, activities, variants: List[str] | None = None, retain=True)

Filter cases having a start activity in the provided list

Parameters:
  • dataframe (RawEventData) -- event log

  • activities (List) -- collection of start activities

  • variants (List[str]) -- a list of variant names which represent the variants to explore from the event log

  • retain (bool, optional) -- if True, we retain the traces containing the given start activities, if false, we drop the traces

Returns:

filtered dataframe

Return type:

Union[EventLog, pd.DataFrame]

sax.core.process_mining.process_mining.get_data_process_representation(dataframe: RawEventData, variants: List[str] | None = None)

The purpose of this function is to take a raw event log as input and output a dictionary representation of the process model discovered when mining this event log. :param dataframe: A pandas dataframe containing the raw event log data. :type dataframe: RawEventData :param variants: a list of variant names which represent the variants to explore from the event log :type variants: List[str] :return: A dictionary representing the process model, where each key is a tuple representing a transition between two activities, and the value is the strength of that transition as determined by the frequency with which it occurs in the event log. :rtype: dict

sax.core.process_mining.process_mining.get_end_activities(dataframe: RawEventData, variants: List[str] | None = None)

Returns the end activities from a log object

Parameters:
  • dataframe (RawEventData) -- event log

  • variants (List[str]) -- a list of variant names which represent the variants to explore from the event log

Returns:

Dictionary of end activities along with their count

Return type:

dict

sax.core.process_mining.process_mining.get_model_process_representation(model)
sax.core.process_mining.process_mining.get_start_activities(dataframe: RawEventData, variants: List[str] | None = None)

Returns the start activities from a log object

Parameters:
  • dataframe (RawEventData) -- event log

  • variants (List[str]) -- a list of variant names which represent the variants to explore from the event log

Returns:

Dictionary of start activities along with their count

Return type:

dict

sax.core.process_mining.process_mining.import_csv(eventlog, kloop_unroling: bool = False, case_id: str = 'Id', activity_key: str = 'Source', timestamp_key: str = 'Timestamp', lifecycle_type: str = 'Type', timestamp_format: str = '%Y-%m-%d %H:%M:%S%z', starttime_column: str = '') RawEventData

Parse CSV file into event log

Parameters

param eventlog:

CSV event log file

type eventlog:

Path to the file

param kloop_unroling:

whether to perform kloop_unrolling (renaming repetitive activities for further causal discovery)

type kloop_unroling:

boolean

param case_id:

name of the case id column, defaults to XESFormatter.Parameters.CASE_ID

type case_id:

str, optional

param activity_key:

name of the activity column, defaults to XESFormatter.Parameters.ACTIVITY

type activity_key:

str, optional

param timestamp_key:

name of the timestamp column, defaults to XESFormatter.Parameters.TIMESTAMP

type timestamp_key:

str, optional

param lifecycle_type:

name of the event lifecycle column, defaults to XESFormatter.Parameters.TYPE

type lifecycle_type:

str, optional

param timestamp_format:

timestamp format, defaults to XESFormatter.Parameters.TIMESTAMP_FORMAT

type timestamp_format:

str, optional

Returns

return:

Raw event data object

rtype:

RawEventData

Raises: FileNotFoundError: If the specified event log file does not exist, this exception will be raised.

sax.core.process_mining.process_mining.import_mxml(eventlog, kloop_unroling: bool = False, case_id: str = 'case:concept:name', activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', lifecycle_type: str = 'lifecycle:transition', timestamp_format: str = '%Y-%m-%d %H:%M:%S.%f') RawEventData

Parse MXML file into event log

Parameters

param eventlog:

XES event log file

type eventlog:

Path to the file

param kloop_unroling:

whether to perform kloop_unrolling (renaming repetitive activities for further causal discovery)

type kloop_unroling:

boolean

param case_id:

name of the case id column, defaults to XESFormatter.Parameters.CASE_ID

type case_id:

str, optional

param activity_key:

name of the activity column, defaults to XESFormatter.Parameters.ACTIVITY

type activity_key:

str, optional

param timestamp_key:

name of the timestamp column, defaults to XESFormatter.Parameters.TIMESTAMP

type timestamp_key:

str, optional

param lifecycle_type:

name of the event lifecycle column, defaults to XESFormatter.Parameters.TYPE

type lifecycle_type:

str, optional

param timestamp_format:

timestamp format, defaults to XESFormatter.Parameters.TIMESTAMP_FORMAT

type timestamp_format:

str, optional

Returns

return:

Raw event data object

rtype:

RawEventData

Raises

ValueError: If the specified event log data is not in MXML format

sax.core.process_mining.process_mining.import_xes(eventlog, kloop_unroling: bool = False, case_id: str = 'case:concept:name', activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', lifecycle_type: str = 'lifecycle:transition', timestamp_format: str = '%Y-%m-%d %H:%M:%S.%f') RawEventData

Parse XES file into event log

Parameters

param eventlog:

XES event log file

type eventlog:

Path to the file

param kloop_unroling:

whether to perform kloop_unrolling (renaming repetitive activities for further causal discovery)

type kloop_unroling:

boolean

param case_id:

name of the case id column, defaults to XESFormatter.Parameters.CASE_ID

type case_id:

str, optional

param activity_key:

name of the activity column, defaults to XESFormatter.Parameters.ACTIVITY

type activity_key:

str, optional

param timestamp_key:

name of the timestamp column, defaults to XESFormatter.Parameters.TIMESTAMP

type timestamp_key:

str, optional

param lifecycle_type:

name of the event lifecycle column, defaults to XESFormatter.Parameters.TYPE

type lifecycle_type:

str, optional

param timestamp_format:

timestamp format, defaults to XESFormatter.Parameters.TIMESTAMP_FORMAT

type timestamp_format:

str, optional

return:

Raw event data object

rtype:

RawEventData

Raises: FileNotFoundError: If the specified event log file does not exist, this exception will be raised.

sax.core.process_mining.process_mining.view_bpmn_model(bpmn_model: BPMN)

Create a view of the BPMN model :param bpmn_model: BPMN :type bpmn_model: BPMN

sax.core.process_mining.process_mining.view_dfg(dfg: dict, formatted_log)

Create view of the dfg

sax.core.process_mining.process_mining.view_heuristics_net(map: HeuristicsNet)

Create view of the heuristic net

Parameters:

map (HeuristicsNet) -- Heuristic net

sax.core.process_mining.process_mining.view_process_map(dfg, start_activities, end_activities)

Create a view of process map

Parameters:
  • dfg (DFG) -- dfg

  • start_activities (List) -- list of start activities

  • end_activities (List) -- list of end activities

sax.core.process_mining.process_mining.view_process_tree(process_tree: ProcessTree)

Create process tree view :param process_tree: prpocess tree :type process_tree: ProcessTree

Module contents