Process Mining Package
This package utilizes PM4PY open source library on the data objects extracted from proces event logs using the process data layer, and provides methods for applying process mining techniques on the data to gain perspective into the process model
Submodules
sax.core.process_mining.process_mining module
- sax.core.process_mining.process_mining.create_from_dataframe(dataframe, kloop_unroling: bool = False, case_id: str = 'Id', activity_key: str = 'Source', timestamp_key: str = 'Timestamp', lifecycle_type: str = 'Type', timestamp_format: str = '%Y-%m-%d %H:%M:%S%z', starttime_column: str = '') RawEventData
Creates event log from dataframe
Parameters
- param eventlog:
XES event log file
- type eventlog:
Path to the file
- param kloop_unroling:
whether to perform kloop_unrolling (renaming repetitive activities for further causal discovery)
- type kloop_unroling:
boolean
- param case_id:
name of the case id column, defaults to XESFormatter.Parameters.CASE_ID
- type case_id:
str, optional
- param activity_key:
name of the activity column, defaults to XESFormatter.Parameters.ACTIVITY
- type activity_key:
str, optional
- param timestamp_key:
name of the timestamp column, defaults to XESFormatter.Parameters.TIMESTAMP
- type timestamp_key:
str, optional
- param lifecycle_type:
name of the event lifecycle column, defaults to XESFormatter.Parameters.TYPE
- type lifecycle_type:
str, optional
- param timestamp_format:
timestamp format, defaults to XESFormatter.Parameters.TIMESTAMP_FORMAT
- type timestamp_format:
str, optional
Returns
- return:
Raw event data object
- rtype:
RawEventData
Raises
ValueError: If the specified event log data is not in dataframe format
- sax.core.process_mining.process_mining.discover_bpmn_model(dataframe: RawEventData, variants: List[str] | None = None) BPMN
Performs process mining on the event log data to discover bpmn model
- Parameters:
dataframe (RawEventData) -- event log
variants (List[str]) -- a list of variant names to perform discovery on
- Returns:
BPMN
- Return type:
BPMN
- sax.core.process_mining.process_mining.discover_dfg(dataframe: RawEventData, variants: List[str] | None = None, lifecycleTypes=None)
Apply dfg mining algorithm on the RawEventData event log object to discover heuristic net
- Parameters:
dataframe (RawEventData) -- event log
variants (List[str]) -- a list of variant names to perform discovery on
lifecycleTypes (List, optional) -- lifecycle event types to filter, defaults to None
- Returns:
dfg
- Return type:
- sax.core.process_mining.process_mining.discover_heuristics_net(dataframe: RawEventData, variants: List[str] | None = None, lifecycleTypes=None) HeuristicsNet
Apply heuristic mining algorithm on the RawEventData event log object to discover heuristic net
- Parameters:
dataframe (RawEventData) -- event log
variants (List[str]) -- a list of variant names to perform discovery on
lifecycleTypes (List, optional) -- lifecycle event types to filter, defaults to None
- Returns:
heuristic net
- Return type:
HeuristicsNet
- sax.core.process_mining.process_mining.discover_process_map(dataframe: RawEventData, variants: List[str] | None = None, lifecycleTypes=None) Tuple[dict, dict, dict]
Discover process map
- Parameters:
dataframe (RawEventData) -- event log
variants (List[str]) -- a list of variant names to perform discovery on
lifecycleTypes (List, optional) -- event lifecycle types to filter, defaults to None
- Returns:
process map
- Return type:
Tuple[dict,dict,dict]
- sax.core.process_mining.process_mining.discover_process_tree(dataframe: RawEventData, variants: List[str] | None = None, lifecycleTypes=None) ProcessTree
Perform process mining on the event log to discover process tree
- Parameters:
dataframe (RawEventData) -- _description_
variants (List[str]) -- a list of variant names to perform discovery on
lifecycleTypes (_type_, optional) -- _description_, defaults to None
- Returns:
_description_
- Return type:
ProcessTree
- sax.core.process_mining.process_mining.filter_end_activities(dataframe: RawEventData, activities, variants: List[str] | None = None, retain=True)
Filter cases having an end activity in the provided list
- Parameters:
dataframe (RawEventData) -- event log
activities (List) -- collection of end activities
variants (List[str]) -- a list of variant names which represent the variants to explore from the event log
retain (bool, optional) -- if True, we retain the traces containing the given end activities, if false, we drop the traces
- Returns:
filtered dataframe
- Return type:
Union[EventLog, pd.DataFrame]
- sax.core.process_mining.process_mining.filter_start_activities(dataframe: RawEventData, activities, variants: List[str] | None = None, retain=True)
Filter cases having a start activity in the provided list
- Parameters:
dataframe (RawEventData) -- event log
activities (List) -- collection of start activities
variants (List[str]) -- a list of variant names which represent the variants to explore from the event log
retain (bool, optional) -- if True, we retain the traces containing the given start activities, if false, we drop the traces
- Returns:
filtered dataframe
- Return type:
Union[EventLog, pd.DataFrame]
- sax.core.process_mining.process_mining.get_data_process_representation(dataframe: RawEventData, variants: List[str] | None = None)
The purpose of this function is to take a raw event log as input and output a dictionary representation of the process model discovered when mining this event log. :param dataframe: A pandas dataframe containing the raw event log data. :type dataframe: RawEventData :param variants: a list of variant names which represent the variants to explore from the event log :type variants: List[str] :return: A dictionary representing the process model, where each key is a tuple representing a transition between two activities, and the value is the strength of that transition as determined by the frequency with which it occurs in the event log. :rtype: dict
- sax.core.process_mining.process_mining.get_end_activities(dataframe: RawEventData, variants: List[str] | None = None)
Returns the end activities from a log object
- Parameters:
dataframe (RawEventData) -- event log
variants (List[str]) -- a list of variant names which represent the variants to explore from the event log
- Returns:
Dictionary of end activities along with their count
- Return type:
dict
- sax.core.process_mining.process_mining.get_model_process_representation(model)
- sax.core.process_mining.process_mining.get_start_activities(dataframe: RawEventData, variants: List[str] | None = None)
Returns the start activities from a log object
- Parameters:
dataframe (RawEventData) -- event log
variants (List[str]) -- a list of variant names which represent the variants to explore from the event log
- Returns:
Dictionary of start activities along with their count
- Return type:
dict
- sax.core.process_mining.process_mining.import_csv(eventlog, kloop_unroling: bool = False, case_id: str = 'Id', activity_key: str = 'Source', timestamp_key: str = 'Timestamp', lifecycle_type: str = 'Type', timestamp_format: str = '%Y-%m-%d %H:%M:%S%z', starttime_column: str = '') RawEventData
Parse CSV file into event log
Parameters
- param eventlog:
CSV event log file
- type eventlog:
Path to the file
- param kloop_unroling:
whether to perform kloop_unrolling (renaming repetitive activities for further causal discovery)
- type kloop_unroling:
boolean
- param case_id:
name of the case id column, defaults to XESFormatter.Parameters.CASE_ID
- type case_id:
str, optional
- param activity_key:
name of the activity column, defaults to XESFormatter.Parameters.ACTIVITY
- type activity_key:
str, optional
- param timestamp_key:
name of the timestamp column, defaults to XESFormatter.Parameters.TIMESTAMP
- type timestamp_key:
str, optional
- param lifecycle_type:
name of the event lifecycle column, defaults to XESFormatter.Parameters.TYPE
- type lifecycle_type:
str, optional
- param timestamp_format:
timestamp format, defaults to XESFormatter.Parameters.TIMESTAMP_FORMAT
- type timestamp_format:
str, optional
Returns
- return:
Raw event data object
- rtype:
RawEventData
Raises: FileNotFoundError: If the specified event log file does not exist, this exception will be raised.
- sax.core.process_mining.process_mining.import_mxml(eventlog, kloop_unroling: bool = False, case_id: str = 'case:concept:name', activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', lifecycle_type: str = 'lifecycle:transition', timestamp_format: str = '%Y-%m-%d %H:%M:%S.%f') RawEventData
Parse MXML file into event log
Parameters
- param eventlog:
XES event log file
- type eventlog:
Path to the file
- param kloop_unroling:
whether to perform kloop_unrolling (renaming repetitive activities for further causal discovery)
- type kloop_unroling:
boolean
- param case_id:
name of the case id column, defaults to XESFormatter.Parameters.CASE_ID
- type case_id:
str, optional
- param activity_key:
name of the activity column, defaults to XESFormatter.Parameters.ACTIVITY
- type activity_key:
str, optional
- param timestamp_key:
name of the timestamp column, defaults to XESFormatter.Parameters.TIMESTAMP
- type timestamp_key:
str, optional
- param lifecycle_type:
name of the event lifecycle column, defaults to XESFormatter.Parameters.TYPE
- type lifecycle_type:
str, optional
- param timestamp_format:
timestamp format, defaults to XESFormatter.Parameters.TIMESTAMP_FORMAT
- type timestamp_format:
str, optional
Returns
- return:
Raw event data object
- rtype:
RawEventData
Raises
ValueError: If the specified event log data is not in MXML format
- sax.core.process_mining.process_mining.import_xes(eventlog, kloop_unroling: bool = False, case_id: str = 'case:concept:name', activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', lifecycle_type: str = 'lifecycle:transition', timestamp_format: str = '%Y-%m-%d %H:%M:%S.%f') RawEventData
Parse XES file into event log
Parameters
- param eventlog:
XES event log file
- type eventlog:
Path to the file
- param kloop_unroling:
whether to perform kloop_unrolling (renaming repetitive activities for further causal discovery)
- type kloop_unroling:
boolean
- param case_id:
name of the case id column, defaults to XESFormatter.Parameters.CASE_ID
- type case_id:
str, optional
- param activity_key:
name of the activity column, defaults to XESFormatter.Parameters.ACTIVITY
- type activity_key:
str, optional
- param timestamp_key:
name of the timestamp column, defaults to XESFormatter.Parameters.TIMESTAMP
- type timestamp_key:
str, optional
- param lifecycle_type:
name of the event lifecycle column, defaults to XESFormatter.Parameters.TYPE
- type lifecycle_type:
str, optional
- param timestamp_format:
timestamp format, defaults to XESFormatter.Parameters.TIMESTAMP_FORMAT
- type timestamp_format:
str, optional
- return:
Raw event data object
- rtype:
RawEventData
Raises: FileNotFoundError: If the specified event log file does not exist, this exception will be raised.
- sax.core.process_mining.process_mining.view_bpmn_model(bpmn_model: BPMN)
Create a view of the BPMN model :param bpmn_model: BPMN :type bpmn_model: BPMN
- sax.core.process_mining.process_mining.view_dfg(dfg: dict, formatted_log)
Create view of the dfg
- sax.core.process_mining.process_mining.view_heuristics_net(map: HeuristicsNet)
Create view of the heuristic net
- Parameters:
map (HeuristicsNet) -- Heuristic net
- sax.core.process_mining.process_mining.view_process_map(dfg, start_activities, end_activities)
Create a view of process map
- Parameters:
dfg (DFG) -- dfg
start_activities (List) -- list of start activities
end_activities (List) -- list of end activities
- sax.core.process_mining.process_mining.view_process_tree(process_tree: ProcessTree)
Create process tree view :param process_tree: prpocess tree :type process_tree: ProcessTree