Process out-of-sequence events

Scenario

The logistics team want to track the hourly number of events captured by door sensors in each building.

Events from door sensors can be slow to reach the Kafka topic, so the team need to handle a Kafka topic with events that are out of sequence.

Before you begin

The instructions in this tutorial use the Tutorial environment, which includes a selection of topics each with a live stream of events, created to allow you to explore features in IBM Event Automation. Following the setup instructions to deploy the demo environment gives you a complete instance of IBM Event Automation that you can use to follow this tutorial for yourself.

Versions

This tutorial uses the following versions of Event Automation capabilities. Screenshots may differ from the current interface if you are using a newer version.

  • Event Streams 11.5.0
  • Event Endpoint Management 11.3.0
  • Event Processing 1.2.0

Instructions

Step 1 : Discover the topic to use

For this scenario, you need to find information about the source of door badge events.

  1. Go to the Event Endpoint Management catalog.

    screenshot

    If you need a reminder of how to access the Event Endpoint Management catalog you can review Accessing the tutorial environment.

    If there are no topics in the catalog, you may need to complete the tutorial setup step to populate the catalog.

  2. The Door badge events topic contains events about door badge events.

    screenshot

    Note the warning in the catalog:

    Note that door events can take up to 3 minutes to reach the Kafka topic, so the badge time value in the message payload should be treated as the canonical timestamp for the event.

    This delay can be inconsistent, so messages on the topic are often out of sequence as a result.

  3. If the topic owner hadn’t provided this warning, you would have needed to observe messages on the Kafka topic itself to identify this. Confirm this by observing messages on the DOOR.BADGEIN topic in the Event Streams topic viewer.

    If you need a reminder about how to access the Event Streams catalog you can review Accessing the tutorial environment.

    Look for examples of messages that, even on the same partition, result in an older badge event (according to the badgetime property) being on the topic after an earlier badge event.

    screenshot

    screenshot

    Verify that the timestamp on the Kafka message is not a reliable indicator of when the event occurred, and is frequently up to a few minutes after the actual event.

Step 2 : Create the source of events

  1. Go to the Event Processing home page.

    screenshot

    If you need a reminder about how to access the Event Processing home page, you can review Accessing the tutorial environment.

  2. Create a flow, and give it a name and description to explain that you will use it to track hourly badge events.

  3. Update the Event source node.

    screenshot

  4. Add an event source.

    screenshot

  5. In the Cluster connection pane, fill in the server address by using the value copied from the Event Endpoint Management catalog.

    screenshot

    Click Next.

  6. In the Access credentials pane, paste the credentials by using a username and password created in Event Endpoint Management using the Generate access credentials button.

    screenshot

  7. Select the DOOR.BADGEIN topic to process events from, and then click Next.

    screenshot

  8. The format JSON is auto-selected in the Message format drop-down and the sample message is auto-populated in the JSON sample message field.

    screenshot

    Click Next.

  9. In the Key and headers pane, click Next.

    screenshot

    Note: The key and headers are displayed automatically if they are available in the selected topic message.

  10. In the Event details pane, enter the node name as door events in the Node name field.

    screenshot

  11. Verify that the type of the badgetime property has been automatically detected as Timestamp.

    screenshot

  12. Configure the event source to use the badgetime property as the source of the event time, and to tolerate lateness of up to 3 minutes.

    screenshot

  13. Click Configure to finalize the event source.

Step 3 : Extract information to aggregate on

  1. Add a Transform node to the flow.

    screenshot

    Create a transform node by dragging one onto the canvas. You can find this in the Processors section of the left panel.

  2. Create a Transform node to extract the building name from the door ID.

    screenshot

    Suggested function expression:

    REGEXP_EXTRACT(`door`, '([A-Z]+)\-([0-9]+)\-([0-9]+)', 1)
    

    The door ID is made up of:

    <building id> - <floor number> - <door number>
    

    For example:

    H-0-36
    

    The regular expression function is capturing the first set of letters before the first hyphen character.

  3. Click Configure to finalize the transform.

Step 4 : Count the occurrences

  1. Add an Aggregate node to the flow.

    screenshot

    Create an aggregate node by dragging one onto the canvas. You can find this in the Processors section of the left panel.

  2. Define the time window to group badge events by as 1 hour.

    screenshot

  3. Count the number of door badge events (by counting the unique record ID), grouped by the building.

    screenshot

  4. Rename the output properties to make the results easier to understand.

    screenshot

Step 5 : Test the flow

The final step is to run your completed flow.

  1. Use the Run menu, and select Include historical to run your filter on the history of sensor events available on this topic.

    screenshot

  2. When you have finished reviewing the results, you can stop this flow.

Recap

It is not unusual to need to process events on a Kafka topic that are out of sequence and with unreliable timestamps in the message header.

Event Processing makes it easy to perform time-based analysis of such events, by allowing you to specify what value to use as a reliable source of time and describe how long it should wait for out-of-sequence events.