Set up a Kubernetes clusters for KFP execution
📝 Table of Contents
- A Kind deployment supported platforms
- Preinstalled software components
- A Kind deployment
- An existing cluster
- Installation steps
- Installation on an existing Kubernetes cluster
- Clean up the cluster
The project provides instructions and deployment automation to run all components in an all-inclusive fashion on a single machine using a Kind cluster and a local data storage (MinIO). However, this topology is not suitable for processing medium and large datasets, and deployment should be carried out on a real Kubernetes or OpenShift cluster. Therefore, we recommend using Kind cluster for only for local testing and debugging, not production loads. For production loads use a real Kubernetes cluster.
Running a Kind Kubernetes cluster with Kubeflow pipelines (KFP) and MinIO requires significant memory. We recommend deploying it on machines with at least 32 GB of RAM and 8-9 CPU cores. RHEL OS requires more resources, e.g. 64 GB RAM and 32 CPU cores.
A Kind deployment supported Platforms
Executing KFP, MinIO, and Ray on a single Kind cluster pushes the system to its load limits. Therefore, although we are working on extending support for additional platforms, not all platforms/configurations are currently supported.
Operating System | Container Agent | Support | Comments |
---|---|---|---|
RHEL 7 | any | - | Kind doesn't support RHEL 7 |
RHEL 8 | |||
RHEL 9.4 | Docker | Yes | |
RHEL 9.4 | Podman | No | Issues with Ray job executions |
Ubuntu 24-04 | Docker | Yes | |
Ubuntu 24-04 | Podman | ||
Windows WSL2 | Docker | Yes | |
Windows WSL2 | Podman | ||
MacOS amd64 | Docker | Yes | |
MacOS amd64 | Podman | ||
MacOS arm64 | Docker | ||
MacOS arm64 | Podman | No | Issues with Ray job executions |
Preinstalled software components
Depending on whether a Kind cluster or an existing Kubernetes cluster is used, different software packages need to be preinstalled.
Kind deployment
The following programs should be manually installed:
- Helm 3.10.0 or greater must be installed and configured on your machine.
- Kind tool for running local Kubernetes clusters 0.14.0 or newer must be installed on your machine.
- Kubectl 1.26 or newer must be installed on your machine.
- MinIO Client (mc) must be installed on your machine. Please
choose your OS system, and process according to "(Optional) Install the MinIO Client". You have to install the
mc
client only. - git client, we use git client to clone installation repository
- lsof usually it is part of Linux or MacOS distribution.
- Container agent such as Docker or Podman
Existing Kubernetes cluster
Deployment on an existing cluster requires less pre-installed software Only the following programs should be manually installed:
- Helm 3.10.0 or greater must be installed and configured on your machine.
- Kubectl 1.26 or newer must be installed on your machine, and be able to connect to the external cluster.
- Deployment of the test data requires MinIO Client (mc) Please
choose your OS system, and process according to "(Optional) Install the MinIO Client". Only the
mc
client should be installed.
Installation steps
Before installation, you have to decide which KFP version do you want to use. In order to use KFP v2, please set the following environment variable:
Now, you can create a Kind cluster with all required software installed using the following command:
from this main package directory. If you do not want to upload the testing data into the locally deployed Minio, and reduce memory footprint, please set: You can access the KFP dashboard at http://localhost:8080/ and the MinIO dashboard at http://localhost:8090/Installation on an existing Kubernetes cluster
Alternatively you can deploy pipeline to the existing Kubernetes cluster.
In order to execute data transformers on the remote Kubernetes cluster, the following packages should be installed on the cluster:
- KubeFlow Pipelines (KFP). Currently, we use upstream Argo-based KFP v1.
- KubeRay controller and KubeRay API Server
You can install the software from their repositories, or you can use our installation scripts.
Once your local kubectl is configured to connect to the external cluster do the following:
-
In addition, you should configure external access to the KFP UI (
svc/ml-pipeline-ui
in thekubeflow
ns) and the Ray Server API (svc/kuberay-apiserver-service
in thekuberay
ns). Depends on your cluster and its deployment it can be LoadBalancer services, Ingresses or Routes. -
Optionally, you can upload the test data into the MinIO Object Store, deployed as part of KFP. In order to do this, please provide external access to the Minio (
svc/minio-service
in thekubeflow
ns) and execute the following commands from the root directory:
Clean up the cluster
If you use an external Kubernetes cluster set the EXTERNAL_CLUSTER
environment variable.