3 - Aggregate events to detect trends over time

Aggregate

Aggregates enable you to process events over a time-window. This enables a summary view of a situation that can be useful to identify overall trends.

Transform

When processing events we can modify events to create additional properties, which are derived from the event. Transforms work on individual events in the stream.

Scenario : Track how many products of each type are sold per hour

In this scenario, we identify the product that has sold the most units in each hourly window. This could be used to drive a constantly updating event streams view of “Trending Products”.

Before you begin

The instructions in this tutorial use the Tutorial environment, which includes a selection of topics each with a live stream of events, created to allow you to explore features in IBM Event Automation. Following the setup instructions to deploy the demo environment gives you a complete instance of IBM Event Automation that you can use to follow this tutorial for yourself.

Versions

This tutorial uses the following versions of Event Automation capabilities. Screenshots may differ from the current interface if you are using a newer version.

  • Event Streams 11.5.0
  • Event Endpoint Management 11.3.0
  • Event Processing 1.2.0

Instructions

Step 1 : Create a flow

  1. Go to the Event Processing home page.

    screenshot

    If you need a reminder of how to access the Event Processing home page, you can review Accessing the tutorial environment.

  2. Create a flow, and give it a name and description that explains you will use it to track how many products are sold of each type.

    screenshot

Step 2 : Provide a source of events

The next step is to bring the stream of events to process into the flow. We will reuse the topic connection information from an earlier tutorial.

  1. Update the Event source node.

    screenshot

    Hover over the node and click Edit icon Edit to configure the node.

  2. Choose the ORDERS topic that you used in the Identify orders from a specific region tutorial.

    screenshot

    Tip: If you haven’t followed that tutorial, you can click Add new event source instead, and follow the Provide a source of events steps in the previous tutorial to define a new Event source from scratch.

    Click Next.

  3. In the Event details pane, the schema for events on this topic defined before is displayed.

    screenshot

    Click Configure.

Step 3 : Extract product type from events

The product description value in the events includes several attributes of the jeans that are sold - the size, material, and style. We would like to aggregate the data based on this information. This data in the description is combined as a single string in a consistent way. This means we can extract them using regular expressions.

In the next step, we extract product type into a separate property so that we can use them to filter and aggregate events later in the flow.

  1. Add a Transform node and link it to your event source.

    screenshot

    Create a transform node by dragging one onto the canvas. You can find this in the Processors section of the left panel.

    Click and drag from the small gray dot on the event source to the matching dot on the transform node.

  2. Give the transform node a name that describes what it will do: get product type.

    screenshot

    Hover over the transform node and click Edit icon Edit to configure the node.

  3. Add a new property for the product type that you will generate with a regular expression.

    screenshot

    Click Create new property.

    Name the property product type.

    Use the assistant to choose the REGEXP_EXTRACT function from the list.

    Did you know? The REGEXP_EXTRACT function allows you to extract data from a text property using regular expressions.

  4. Define the regular expression that extracts the product type from the description.

    Product descriptions are all made up of four words.

    Some examples:

    • XXS Navy Cargo Jeans
    • M Stonewashed Capri Jeans
    • XL White Bootcut Jeans
    • S Acid-washed Jogger Jeans

    Each word contains similar information in each description:

    word 1 : Size. This is made up of one-or-more uppercase letters.

    word 2 : Material or color, made up of a mixed-case word, optionally with a hyphen.

    word 3 : The type of jeans, made up of a mixed-case word, optionally with a hyphen.

    word 4 : The text “Jeans”.

    screenshot

    Create a regular expression that extracts the third word from the description text, by filling in the assistant form with the following values:

    text :

    description
    

    This identifies which property in the order events that contains the text that you want to apply the regular expression to.

    regex :

    ([A-Z]+) ([A-Za-z\-]+) ([A-Za-z\-]+) Jeans
    

    This can be used to match the description as shown above - describing the four words that every description contains.

    index :

    3
    

    This specifies that you want the new product type property to contain the third word in the description.

  5. Click Insert into expression to complete the assistant.

    screenshot

  6. As you aren’t modifying existing properties, click Next.

  7. Click Configure to finalize the transform.

Step 4 : Test the flow

The next step is to test your event processing flow and view the results.

  1. Use the Run menu, and select Include historical to run your filter on the history of order events available on this Kafka topic.

    screenshot

    Tip: It is good to regularly test as you develop your event processing flow to confirm that the last node you have added is doing what you expected.

    Note the new property for product type is populated with the data extracted from the description property.

  2. When you have finished reviewing the results, you can stop this flow.

Step 5 : Count the number of events of each type

Now that you have transformed the stream of events to include the type attribute, the next step is to total the number of items sold (using this new property).

  1. Add an Aggregate node and link it to your event source.

    screenshot

    Create an aggregate node by dragging one onto the canvas. You can find this in the Windowed section of the left panel.

    Click and drag from the small gray dot on the transform to the matching dot on the aggregate node.

  2. Name the aggregate node to show that it will count the number of units sold of each type: hourly sales by type.

    screenshot

    Hover over the aggregate node and click Edit icon Edit to configure the node.

  3. Specify a 1-hour window.

    screenshot

  4. Sum the number of items sold in each hour, grouped by product type.

    screenshot

    Select SUM as the aggregate function.

    The property we are adding up is quantity - the number of items that are sold in each order. This will add up the number of items sold in each order that happens within the time window.

    Finally, select the new property product type as the property to group by. This will add up the number of items sold of each type.

  5. Rename the new aggregate properties.

    screenshot

    Tip: It can be helpful to adjust the name of properties to something that will make sense to you, such as describing the SUM property as total sales.

  6. Click Configure to finalize the aggregate.

Step 6 : Test the flow

The final step is to run your event processing flow and view the results.

  1. Use the Run menu, and select Include historical to run your filter on the history of order events available on this Kafka topic.

    screenshot

    The output window shows that your aggregate is returning the total number of items for each type of product within each window of time.

  2. When you have finished reviewing the results, you can stop this flow.

Recap

You used a transform node to dynamically extract a property from the description within the order events.

You used an aggregate node to count the events based on this extracted property, grouping into one-hour time windows.

Next step

In the next tutorial, you will try an interval join to correlate related events from multiple different event streams.