Process Data package
The package contains classes for data containers for different modules and their corresponding perpsectives:
base class
sax.process_data.data.BaseProcessDataObject
serving as base;sax.process_data.raw_event_data.RawEventData
representing a raw process event data log, each row in the dataframe representing single activity in the process and its attributes, useful for process mining;sax.process_data.tabular_data.TabularEventData
representing a tabular view into the event log, each row representing a single trace for single process case. This representation is useful for causal execution dependency discovery.
Additionally the package contains a base class for data representation of discovery results sax.process_data.discovery_result.ResultInfo
extended by process and causal discovery modules
The package also contains a sub-package with different formatters implementation - allowing to process standard file formats of event logs (XES,MXML and CSV) into data objects.
Subpackages
Submodules
sax.core.process_data.data module
- class sax.core.process_data.data.BaseProcessDataObject(data: DataFrame, mandatory_properties: dict, optional_properties: dict)
Bases:
object
Data object for process event log data. The data object contains dataframe created from various event log formats holding basic process event log data, such as trace id, activity name, activity timestamp.
- copy()
Make a deep copy of the data object.
- Returns:
A copy of the BaseProcessDataObject instance.
- Return type:
- getActivitiesForTrace(pid: str) list[str]
Return a list of all activities for the provided case id
- Parameters:
pid (str) -- process id
- Returns:
List of all activity names
- Return type:
list[str]
- getCaseIdColumnName() str
Get the name of the column representing case id
- Returns:
CaseId column name
- Return type:
str
- getData() DataFrame
Return the dataframe represeting the event log part of the data object
- Returns:
A dataframe representing the event log
- Return type:
DataFrame
- getLength() int
Return the number of traces in the event log
- Returns:
Length of the event log
- Return type:
int
- getMandatoryProperties() dict
Get mandatory properties of the data object
- Returns:
Mandatory properties mapping
- Return type:
dict
- getOptionalProperties() dict
Get optional properties of the data object
- Returns:
Optional properties mapping
- Return type:
dict
- getTrace(pid: str) Dict[str, Any]
Return all the information regarding the given trace in a dictionary, where the keys are the activity names and the values are the activity payloads
- Returns:
A dictionary representing a single trace
- Return type:
Dict[str, List[str]]
- getVariants() dict
Return a dictionary mapping of all possible process variants in the event log, to a number of traces per each variant
- Returns:
Mapping of all possible process variants in the event log, to a number of traces per each variant
- Return type:
dict
sax.core.process_data.discovery_result module
- class sax.core.process_data.discovery_result.ResultInfo
Bases:
object
Object holding the result of the discovery on particular dimension. The purpose of the object is to summarize and return the discovery results
- Parameters:
object (_type_) -- _description_
- abstract getDiscoveryResult() str
Return the discovery results summarized as a string
- Returns:
Discovery result
- Return type:
str
sax.core.process_data.raw_event_data module
- class sax.core.process_data.raw_event_data.RawEventData(data: DataFrame, mandatory_properties: dict, optional_properties: dict)
Bases:
BaseProcessDataObject
Raw event representation of event log, where each row represents a single event in a trace (one activity), and the columns are the activity name, activity timestamp, and additional activity payload attributes
- List
alias of
List
- copy()
Creates a copy of the current RawEventData object.
Returns
- copied_objectRawEventData
A copy of the current RawEventData object.
- filterLifecycleEvents(lifecycleTypes) RawEventData
Filter the data according to the provided list of desired event lifecycle types, and return a new data object containing only the chosen lifecycle event types
Parameters
lifecycleTypes : List of chosen lifecycleTypes events (such as 'complete') type lifecycleTypes: List
Returns
a new dataobject containing only the chosen event types rtype: RawEventData
- filterVariants(variant_keys: List[str]) RawEventData
Get specific variants data for specified variants and return a new RawEventData object only with the filtered variants data.
Parameters
- variant_keysList[str]
The list of variant names to retrieve.
Returns
- variantRawEventData
The RawEventData object representing the specified variant subset of traces.
- getLog()
Return the pm4py event log representing this dataframe object
Parameters
None
Returns
- logpm4py event log
The pm4py event log representing this dataframe object
- getMandatoryPropertiesData()
Return the mandatory properties columns: caseID, activity and event lifecycle columns and timestamp columns
Parameters
None
Returns
- return:
mandatory properties data
- rtype:
_type_
- getOptionalPropertiesData()
Return the mandatory properties columns: caseID, activity and event lifecycle columns and timestamp columns
Parameters
None
Returns
- return:
mandatory properties data
- rtype:
_type_
- getVariants() dict
Get all variants in the event log, along with number of traces for each variant
Returns
A dictionary where each key represents a variant name, which is a comma-separated list of all activities in the variant in the order of occurence,and the corresponding value is the number of traces for that variant.
- getVariantsKeys() dict
Get the names of all variants in the event log
Returns
A dictionary where each key represents a variant name, which is a comma-separated list of all activities in the variant in the order of occurence, and the corresponding value is a list of all traces case-ids in this variant.
- transposeFullDataframe() DataFrame
- transposeToTabular() TabularEventData
- Transposes the provided event log data object to a new data object where each trace is represented by a single row (instead of a row per activity),
where the columns are activity names and the values in the columns are the timestamps of those activities end time.
Returns
- data object in tabular formatTabularEventData
A new data object in tabular format.
sax.core.process_data.tabular_data module
- class sax.core.process_data.tabular_data.TabularEventData(data: DataFrame, mandatory_properties: dict, optional_properties: dict)
Bases:
BaseProcessDataObject
Tabular representation of event log, where each row represents a full trace,each column representing a particular activity within the trace. The column names are the names of the activities, the column values are the timestamps of activity completion times. Additional columns represent activity attribute, each attribute column name composed of
<activityName>__<attributeName>
- getActivitiesAttributesData() DataFrame
Return all optional attributes columns content
Parameters
None
Returns
- return:
optional attribute columns content
- rtype:
DataFrame
- getActivitiesData() DataFrame
Return the content of activities columns
Parameters
None
Returns
- return:
activities columns content
- rtype:
DataFrame
- getActivityAttributesData(activityName) DataFrame
Return attribute columns for the specified activity
Parameters
- param activityName:
activityName
- type activityName:
str
Returns
- return:
content of attribute columns for the specified activity
- rtype:
DataFrame
- getCaseAndActivitiesData() DataFrame
Return the mandatory properties columns content Parameters ---------- None
Returns
- return:
dataframe comprised only of the content of mandatory columns (caseID and activities columns)
- rtype:
DataFrame