Considerations for GDPR

Notice:

Clients are responsible for ensuring their own compliance with various laws and regulations, including the European Union General Data Protection Regulation. Clients are solely responsible for obtaining advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulations that may affect the clients’ business and any actions the clients may need to take to comply with such laws and regulations.

The products, services, and other capabilities described herein are not suitable for all client situations and may have restricted availability. IBM does not provide legal, accounting, or auditing advice or represent or warrant that its services or products will ensure that clients are in compliance with any law or regulation.

GDPR stands for General Data Protection Regulation.

GDPR has been adopted by the European Union and will apply from May 25, 2018.

GDPR establishes a stronger data protection regulatory framework for processing of personal data of individuals. GDPR brings:

New and enhanced rights for individuals
Widened definition of personal data
New obligations for companies and organisations handling personal data
Potential for significant financial penalties for non-compliance
Compulsory data breach notification

This document is intended to help you in your preparations for GDPR readiness.

Configuration to support data handling requirements

The GDPR legislation requires that personal data is strictly controlled and that the integrity of the data is maintained. This requires the data to be secured against loss through system failure and also through unauthorized access or via theft of computer equipment or storage media. The exact requirements will depend on the nature of the information that will be stored or transmitted by Event Streams. Areas for consideration to address these aspects of the GDPR legislation include:

Physical access to the assets where the product is installed
Encryption of data both at rest and in flight
Managing access to topics which hold sensitive material.

Data Life Cycle

Event Streams is a general purpose pub-sub technology built on Apache Kafka® which can be used for the purpose of connecting applications. Some of these applications may be IBM-owned but others may be third-party products provided by other technology suppliers. As a result, Event Streams can be used to exchange many forms of data, some of which could potentially be subject to GDPR.

What types of data flow through Event Streams?

There is no one definitive answer to this question because use cases vary through application deployment.

Where is data stored?

As messages flow through the system, message data is stored on physical storage media configured by the deployment. It may also reside in logs collected by pods within the deployment. This information may include data governed by GDPR.

Personal data used for online contact with IBM

Event Streams clients can submit online comments/feedback requests to contact IBM about Event Streams in a variety of ways, primarily:

Public issue reporting and feature suggestions via Event Streams Git Hub portal
Private issue reporting via IBM Support

Typically, only the client name and email address are used to enable personal replies for the subject of the contact. The use of personal data conforms to the IBM Online Privacy Statement.

Data Collection

Event Streams can be used to collect personal data. When assessing your use of Event Streams and the demands of GDPR, you should consider the types of personal data which in your circumstances are passing through the system. You may wish to consider aspects such as:

How is data being passed to an Event Streams topic? Has it been encrypted or digitally signed beforehand?
What type of storage has been configured within the Event Streams? Has encryption been enabled?
How does data flow between nodes in the Event Streams deployment? Has internal network traffic been encrypted?

Data Storage

When messages are published to topics, Event Streams will store the message data on stateful media within the cluster for one or more nodes within the deployment. Consideration should be given to securing this data when at rest.

The following items highlight areas where Event Streams may indirectly persist application provided data which users may also wish to consider when ensuring compliance with GDPR.

Kubernetes activity logs for containers running within the Pods that make up the Event Streams deployment
Logs captured on the local file system for the Kafka container running in the Kakfa pod for each node

By default, messages published to topics are retained for a week after their initial receipt, but this can be configured by modifying Kafka broker settings. These settings are configured using the EventStreams custom resource.

Data Access

The Kafka core APIs can be used to access message data within the Event Streams system:

Producer API to allow data to be sent to a topic
Consumer API to allow data to be read from a topic
Streams API to allow transformation of data from an input topic to an output topic
Connect API to allow connectors to continually move data in or out of a topic from an external system

For more information about controlling access to data stored in Event Streams, see managing access.

Cluster-level configuration and resources, including logs that might contain message data, are accessible through your Kubernetes platform.

Access and authorization controls can be used to control which users are able to access this cluster-level information.

Data Processing

Encryption of connection to Event Streams

Connections to Event Streams are secured using TLS. If you want to use your own CA certificates instead of those generated by the operator, you can provide them in the EventStreams custom resource settings.

Encryption of connections within Event Streams

Internal communication between Event Streams pods is encrypted by default using TLS.

Data Monitoring

Event Streams provides a range of monitoring features that users can exploit to gain a better understanding of how applications are performing.