Process Data package

The package contains classes for data containers for different modules and their corresponding perpsectives:

  • base class sax.process_data.data.BaseProcessDataObject serving as base;

  • sax.process_data.raw_event_data.RawEventData representing a raw process event data log, each row in the dataframe representing single activity in the process and its attributes, useful for process mining;

  • sax.process_data.tabular_data.TabularEventData representing a tabular view into the event log, each row representing a single trace for single process case. This representation is useful for causal execution dependency discovery.

Additionally the package contains a base class for data representation of discovery results sax.process_data.discovery_result.ResultInfo extended by process and causal discovery modules The package also contains a sub-package with different formatters implementation - allowing to process standard file formats of event logs (XES,MXML and CSV) into data objects.

Subpackages

Submodules

sax.core.process_data.data module

class sax.core.process_data.data.BaseProcessDataObject(data: DataFrame, mandatory_properties: dict, optional_properties: dict)

Bases: object

Data object for process event log data. The data object contains dataframe created from various event log formats holding basic process event log data, such as trace id, activity name, activity timestamp.

copy()

Make a deep copy of the data object.

Returns:

A copy of the BaseProcessDataObject instance.

Return type:

BaseProcessDataObject

getActivitiesForTrace(pid: str) list[str]

Return a list of all activities for the provided case id

Parameters:

pid (str) -- process id

Returns:

List of all activity names

Return type:

list[str]

getCaseIdColumnName() str

Get the name of the column representing case id

Returns:

CaseId column name

Return type:

str

getData() DataFrame

Return the dataframe represeting the event log part of the data object

Returns:

A dataframe representing the event log

Return type:

DataFrame

getLength() int

Return the number of traces in the event log

Returns:

Length of the event log

Return type:

int

getMandatoryProperties() dict

Get mandatory properties of the data object

Returns:

Mandatory properties mapping

Return type:

dict

getOptionalProperties() dict

Get optional properties of the data object

Returns:

Optional properties mapping

Return type:

dict

getTrace(pid: str) Dict[str, Any]

Return all the information regarding the given trace in a dictionary, where the keys are the activity names and the values are the activity payloads

Returns:

A dictionary representing a single trace

Return type:

Dict[str, List[str]]

getVariants() dict

Return a dictionary mapping of all possible process variants in the event log, to a number of traces per each variant

Returns:

Mapping of all possible process variants in the event log, to a number of traces per each variant

Return type:

dict

sax.core.process_data.discovery_result module

class sax.core.process_data.discovery_result.ResultInfo

Bases: object

Object holding the result of the discovery on particular dimension. The purpose of the object is to summarize and return the discovery results

Parameters:

object (_type_) -- _description_

abstract getDiscoveryResult() str

Return the discovery results summarized as a string

Returns:

Discovery result

Return type:

str

sax.core.process_data.raw_event_data module

class sax.core.process_data.raw_event_data.RawEventData(data: DataFrame, mandatory_properties: dict, optional_properties: dict)

Bases: BaseProcessDataObject

Raw event representation of event log, where each row represents a single event in a trace (one activity), and the columns are the activity name, activity timestamp, and additional activity payload attributes

List

alias of List

copy()

Creates a copy of the current RawEventData object.

Returns

copied_objectRawEventData

A copy of the current RawEventData object.

filterLifecycleEvents(lifecycleTypes) RawEventData

Filter the data according to the provided list of desired event lifecycle types, and return a new data object containing only the chosen lifecycle event types

Parameters

lifecycleTypes : List of chosen lifecycleTypes events (such as 'complete') type lifecycleTypes: List

Returns

a new dataobject containing only the chosen event types rtype: RawEventData

filterVariants(variant_keys: List[str]) RawEventData

Get specific variants data for specified variants and return a new RawEventData object only with the filtered variants data.

Parameters

variant_keysList[str]

The list of variant names to retrieve.

Returns

variantRawEventData

The RawEventData object representing the specified variant subset of traces.

getLog()

Return the pm4py event log representing this dataframe object

Parameters

None

Returns

logpm4py event log

The pm4py event log representing this dataframe object

getMandatoryPropertiesData()

Return the mandatory properties columns: caseID, activity and event lifecycle columns and timestamp columns

Parameters

None

Returns

return:

mandatory properties data

rtype:

_type_

getOptionalPropertiesData()

Return the mandatory properties columns: caseID, activity and event lifecycle columns and timestamp columns

Parameters

None

Returns

return:

mandatory properties data

rtype:

_type_

getVariants() dict

Get all variants in the event log, along with number of traces for each variant

Returns

A dictionary where each key represents a variant name, which is a comma-separated list of all activities in the variant in the order of occurence,and the corresponding value is the number of traces for that variant.

getVariantsKeys() dict

Get the names of all variants in the event log

Returns

A dictionary where each key represents a variant name, which is a comma-separated list of all activities in the variant in the order of occurence, and the corresponding value is a list of all traces case-ids in this variant.

transposeFullDataframe() DataFrame
transposeToTabular() TabularEventData
Transposes the provided event log data object to a new data object where each trace is represented by a single row (instead of a row per activity),

where the columns are activity names and the values in the columns are the timestamps of those activities end time.

Returns

data object in tabular formatTabularEventData

A new data object in tabular format.

sax.core.process_data.tabular_data module

class sax.core.process_data.tabular_data.TabularEventData(data: DataFrame, mandatory_properties: dict, optional_properties: dict)

Bases: BaseProcessDataObject

Tabular representation of event log, where each row represents a full trace,each column representing a particular activity within the trace. The column names are the names of the activities, the column values are the timestamps of activity completion times. Additional columns represent activity attribute, each attribute column name composed of <activityName>__<attributeName>

getActivitiesAttributesData() DataFrame

Return all optional attributes columns content

Parameters

None

Returns

return:

optional attribute columns content

rtype:

DataFrame

getActivitiesData() DataFrame

Return the content of activities columns

Parameters

None

Returns

return:

activities columns content

rtype:

DataFrame

getActivityAttributesData(activityName) DataFrame

Return attribute columns for the specified activity

Parameters

param activityName:

activityName

type activityName:

str

Returns

return:

content of attribute columns for the specified activity

rtype:

DataFrame

getCaseAndActivitiesData() DataFrame

Return the mandatory properties columns content Parameters ---------- None

Returns

return:

dataframe comprised only of the content of mandatory columns (caseID and activities columns)

rtype:

DataFrame

getCaseIDData() DataFrame

Return the values of case Id colum

Parameters

None

Returns

return:

caseId column content

rtype:

DataFrame

getTrace(pid: str) Dict[str, Any]

Get a trace by its process id (case id).

Parameters

pidstr

Case id of the trace to retrieve.

Returns

Dict[str, Any]

The trace data as a dictionary (activity names and timestamps)

Raises

KeyError

If a trace with such case id does not exist in the log.

Module contents