Installing backend services

Using the distributed MySQL backend for ado¶

This guide is intended for administrators who are responsible for deploying the distributed MySQL backend for ADO or provisioning new projects on it.

Overview¶

We recommend using the Percona Operator for MySQL, which is built on Percona XtraDB Cluster, to provide a resilient and production-ready MySQL backend. This guide assumes that this setup is being used.

Deployment Instructions¶

Kubernetes¶

You can deploy the Percona Operator and create a Percona XtraDB Cluster using either of the following methods:

Click on the links to follow the official Percona documentation.

OpenShift¶

In OpenShift environments, the operator can be installed via OperatorHub using the Operator Lifecycle Manager (OLM).

Refer to the official OpenShift-specific guide here:

👉 OpenShift Deployment Guide

Onboarding projects¶

Warning

Before proceeding make sure you have followed the steps in Deployment Instructions.

Pre-requisites¶

Software¶

To run the scripts in this guide you will need to have the following tools installed:

kubectl: https://kubernetes.io/docs/tasks/tools/#kubectl
mysql client version 8: https://formulae.brew.sh/formula/mysql-client@8.4

Note

We assume that your active namespace is the one in which you installed your Percona XtraDB Cluster.

PXC Cluster name¶

You will need to know the name of your pxc cluster:

kubectl get pxc -o jsonpath='{.items[].metadata.name}'

We will refer to its name as $PXC_NAME.

PXC Cluster root credentials¶

You will need a highly privileged account to onboard new projects, as you will need to create databases, users, and grant permissions. For this reason, we will use the default root account.

You can retrieve its password with:

kubectl get secret $PXC_NAME-secrets --template='{{.data.root}}' | base64 -d

We will refer to this password as $MYSQL_ADMIN_PASSWORD.

Onboarding new projects¶

The simplest way to onboard a new project called $PROJECT_NAME is to use the forward_mysql_and_onboard_new_project.sh. This script creates a new project in the MySQL DB and outputs an ado context YAML that can be used to connect to it.

For example:

./forward_mysql_and_onboard_new_project.sh --admin-user root \
                      --admin-pass $MYSQL_ADMIN_PASSWORD \
                      --pxc-name $PXC_NAME \
                      --project-name $PROJECT_NAME

Alternatively, if you are using a hosted MySQL instance somewhere (e.g., on the Cloud), you can use the other script: onboard_new_project.sh:

./onboard_new_project.sh --admin-user root \
                      --admin-pass $MYSQL_ADMIN_PASSWORD \
                      --mysql-endpoint $MYSQL_ENDPOINT \
                      --project-name $PROJECT_NAME

Once the project is created the context YAML can be shared with whoever needs access to the project.

Deploying Kuberay and creating a RayCluster¶

This guide is intended for users who want to run operations on an autoscaling ray cluster deployed on kubernetes/OpenShift. Depending on cluster permissions users may need someone with administrator privileges to install KubeRay and/or create RayCluster objects.

Installing KubeRay¶

Warning

KubeRay is included in OpenShift AI and OpenDataHub. Skip this step if they are already installed in your cluster.

You can install the KubeRay Operator either via Helm or Kustomize by following the official documentation.

Deploying a RayCluster¶

Ray version compatibility

The ray version set in KubeRay YAML and the one used in the ray head and worker containers must be compatible. For a more in depth guide refer to the RayCluster configuration page.

We provide an example set of values for deploying a RayCluster via KubeRay. To deploy it run:

helm upgrade --install ado-ray kuberay/ray-cluster --version 1.1.0 --values backend/kuberay/vanilla-ray.yaml

Feel free to customize the example file provided to suit your cluster, such as uncommenting GPU-enabled workers.

Enabling ado actuators to create K8s resources¶

Configuring a ServiceAccount for the RayCluster¶

The default Kubernetes ServiceAccount created for a RayCluster does not have enough permissions for an ado actuator to create Kubernetes resources (e.g., deployments, pods, services, etc.) as part of its operations. Users are required to create a custom ServiceAccount bound to a Role with sufficient permissions, before creating the RayCluster, to avoid runtime errors. Below is an example ServiceAccount bound to a Role that allows monitoring, creating, deleting, and updating of pods, deployments and services. It also provides access to the RayCluster resources.

# Copyright (c) IBM Corporation
# SPDX-License-Identifier: MIT


apiVersion: v1
kind: ServiceAccount
metadata:
  name: ray-deployer
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: ray-deployer
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: ray-deployer
subjects:
- kind: ServiceAccount
  name: ray-deployer
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: ray-deployer
rules:
- apiGroups: ["ray.io"]
  resources:
  - rayclusters
  verbs: ["get", "patch"]
- apiGroups: ["apps"]
  resources:
  - pods
  - deployments
  verbs: ["get", "create", "delete", "list", "watch", "update"]
- apiGroups: [""]
  resources:
  - services
  - persistentvolumeclaims
  verbs: ["get", "create", "delete", "list", "watch", "update"]

From the root of the ado project run the below command:

kubectl apply -f backend/kuberay/service-account.yaml

This will create a ServiceAccount named ray-deployer. We will reference this name later when deploying the RayCluster.

More information about ServiceAccount, Role, and RoleBinding objects can be found in the official Kubernetes RBAC documentation.

Associating a RayCluster with the ServiceAccount¶

The below command shows how to set the serviceAccountName property for head and worker nodes.

helm upgrade --install ado-ray kuberay/ray-cluster --version 1.1.0 \
  --values backend/kuberay/vanilla-ray-service-account.yaml \
  --set head.serviceAccountName=ray-deployer \
  --set worker.serviceAccountName=ray-deployer

Best Practices for Efficient GPU Resource Utilization¶

To maximize the efficiency of your RayCluster and minimize GPU resource fragmentation, we recommend the following:

Enable Ray Autoscaler
This allows Ray to dynamically adjust the number of worker replicas based on task demand.
Use Multiple GPU Worker Variants
Define several GPU worker types with varying GPU counts. This flexibility helps match task requirements more precisely and reduces idle GPU time.

Recommended Worker Configuration Strategy¶

Create GPU worker variants with increasing GPU counts, where each variant has double the GPUs of the previous one. Limit each variant to a maximum of 2 replicas, ensuring that their combined GPU usage does not exceed the capacity of a single replica of the next larger variant.

Example: Kubernetes Cluster with 4 Nodes (8 GPUs Each)¶

Recommended worker setup:

2 replicas of a worker with 1 GPU
2 replicas of a worker with 2 GPUs
2 replicas of a worker with 4 GPUs
4 replicas of a worker with 8 GPUs

Example: The contents of the additionalWorkerGroups field of a RayCluster with 4 Nodes each with 8 NVIDIA-A100-SXM4-80GB GPUs, 64 CPU cores, and 1TB memory

one-A100-80G-gpu-WG:
  replicas: 0
  minReplicas: 0
  maxReplicas: 2
  rayStartParams:
    block: 'true'
    num-gpus: '1'
    resources: '"{\"NVIDIA-A100-SXM4-80GB\": 1}"'
  containerEnv:
    - name: OMP_NUM_THREADS
      value: "1"
    - name: OPENBLAS_NUM_THREADS
      value: "1"
  lifecycle:
    preStop:
      exec:
        command: [ "/bin/sh","-c","ray stop" ]
  # securityContext: ...
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: nvidia.com/gpu.product
                operator: In
                values:
                  - NVIDIA-A100-SXM4-80GB
  resources:
    limits:
      cpu: 8
      nvidia.com/gpu: 1
      memory: 100Gi
    requests:
      cpu: 8
      nvidia.com/gpu: 1
      memory: 100Gi
  # volumes: ...
  # volumeMounts: ....

two-A100-80G-gpu-WG:
  replicas: 0
  minReplicas: 0
  maxReplicas: 2
  rayStartParams:
    block: 'true'
    num-gpus: '2'
    resources: '"{\"NVIDIA-A100-SXM4-80GB\": 2}"'
  containerEnv:
    - name: OMP_NUM_THREADS
      value: "1"
    - name: OPENBLAS_NUM_THREADS
      value: "1"
  lifecycle:
    preStop:
      exec:
        command: [ "/bin/sh","-c","ray stop" ]
  # securityContext: ...
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: nvidia.com/gpu.product
                operator: In
                values:
                  - NVIDIA-A100-SXM4-80GB
  resources:
    limits:
      cpu: 15
      nvidia.com/gpu: 2
      memory: 200Gi
    requests:
      cpu: 15
      nvidia.com/gpu: 2
      memory: 200Gi
  # volumes: ...
  # volumeMounts: ....

four-A100-80G-gpu-WG:
  replicas: 0
  minReplicas: 0
  maxReplicas: 2
  rayStartParams:
    block: 'true'
    num-gpus: '4'
    resources: '"{\"NVIDIA-A100-SXM4-80GB\": 4}"'
  containerEnv:
    - name: OMP_NUM_THREADS
      value: "1"
    - name: OPENBLAS_NUM_THREADS
      value: "1"
  lifecycle:
    preStop:
      exec:
        command: [ "/bin/sh","-c","ray stop" ]
  # securityContext: ...
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: nvidia.com/gpu.product
                operator: In
                values:
                  - NVIDIA-A100-SXM4-80GB
  resources:
    limits:
      cpu: 30
      nvidia.com/gpu: 4
      memory: 400Gi
    requests:
      cpu: 30
      nvidia.com/gpu: 4
      memory: 400Gi
  # volumes: ...
  # volumeMounts: ....

eight-A100-80G-gpu-WG:
  replicas: 0
  minReplicas: 0
  maxReplicas: 4
  rayStartParams:
    block: 'true'
    num-gpus: '8'
    resources: '"{\"NVIDIA-A100-SXM4-80GB\": 8, \"full-worker\": 1}"'
  containerEnv:
    - name: OMP_NUM_THREADS
      value: "1"
    - name: OPENBLAS_NUM_THREADS
      value: "1"
  lifecycle:
    preStop:
      exec:
        command: [ "/bin/sh","-c","ray stop" ]
  # securityContext: ...
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: nvidia.com/gpu.product
                operator: In
                values:
                  - NVIDIA-A100-SXM4-80GB

  resources:
    limits:
      cpu: 60
      nvidia.com/gpu: 8
      memory: 800Gi
    requests:
      cpu: 60
      nvidia.com/gpu: 8
      memory: 800Gi
  # volumes: ...
  # volumeMounts: ....

full-worker custom resource

Notice that the only variant with a full-worker custom resource is the one with 8 GPUs. Some actuators, like SFTTrainer, use this custom resource for measurements that involve reserving an entire GPU node.

RayClusters and SFTTrainer¶

HuggingFace home directory

If you want to run multi-node measurements with the SFTTrainer actuator make sure that all nodes in your multi-node setup have read and write access to your HuggingFace home directory. On Kubernetes with RayClusters, avoid S3-like filesystems as that is known to cause failures in transformers. Use a NFS or GPFS-backed PersistentVolumeClaim instead.

Deploying ADO's API via Ray Serve¶

Overview¶

This guide explains how to spin up ADO's API using Ray Serve on your local machine and on a Kuberay cluster.

Prerequisites¶

Local deployment¶

Ensure you have created a virtual environment for ADO by following the instructions in our development documentation

Kuberay deployment¶

Make sure you have completed the setup outlined in our instructions for deploying Kuberay on your Kubernetes cluster.

Instructions¶

Deploying locally¶

Serving the API locally is very easy. Run the following command in your terminal:

serve run orchestrator.api.rest:ado_rest_api

You should see output similar to this:

2025-09-19 11:50:39,727 INFO scripts.py:507 -- Running import path: 'orchestrator.api.rest:ado_rest_api'.
2025-09-19 11:50:45,496 INFO worker.py:1942 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
(ProxyActor pid=98735) INFO 2025-09-19 11:50:48,686 proxy 127.0.0.1 -- Proxy starting on node 05f52ce870e3943ff5ec646472a93f5b552fd48ad45cdd8286569db1 (HTTP port: 8000).
(ProxyActor pid=98735) INFO 2025-09-19 11:50:48,801 proxy 127.0.0.1 -- Got updated endpoints: {}.
INFO 2025-09-19 11:50:48,825 serve 98612 -- Started Serve in namespace "serve".
INFO 2025-09-19 11:50:48,852 serve 98612 -- Connecting to existing Serve app in namespace "serve". New http options will not be applied.
(ServeController pid=98727) INFO 2025-09-19 11:50:49,037 controller 98727 -- Deploying new version of Deployment(name='AdoRESTApi', app='default') (initial target replicas: 1).
(ProxyActor pid=98735) INFO 2025-09-19 11:50:49,078 proxy 127.0.0.1 -- Got updated endpoints: {Deployment(name='AdoRESTApi', app='default'): EndpointInfo(route='/', app_is_cross_language=False)}.
(ProxyActor pid=98735) INFO 2025-09-19 11:50:49,098 proxy 127.0.0.1 -- Started <ray.serve._private.router.SharedRouterLongPollClient object at 0x11531f730>.
(ServeController pid=98727) INFO 2025-09-19 11:50:49,177 controller 98727 -- Adding 1 replica to Deployment(name='AdoRESTApi', app='default').
INFO 2025-09-19 11:50:50,096 serve 98612 -- Application 'default' is ready at http://127.0.0.1:8000/.

Once you see the final line, the API is running. Open the interactive OpenAPI documentation at http://127.0.0.1:8000/docs or the ReDoc version at http://127.0.0.1:8000/redoc.

Deploying on Kuberay¶

Ray Serve applications are deployed on Kuberay via RayServices. An example RayService is provided:

# Copyright (c) IBM Corporation
# SPDX-License-Identifier: MIT

apiVersion: ray.io/v1alpha1
kind: RayService
metadata:
  name: ado-api
spec:
  serveConfigV2: |
    applications: 
      - import_path: orchestrator.api.rest:ado_rest_api
        name: orchestrator_api
        deployments:
        - name: AdoRESTApi
  rayClusterConfig:
    rayVersion: "2.49.1" # should match the Ray version in the image of the containers
    ######################headGroupSpecs#################################
    # Ray head pod template.
    headGroupSpec:
      rayStartParams:
        #Following https://ray-project.github.io/kuberay/best-practice/worker-head-reconnection/
        num-cpus: "0"
        dashboard-host: "0.0.0.0"
        block: "true"
      #pod template
      template:
        spec:
          containers:
            - name: ray-head
              image: quay.io/ado/ado:6a6f6c389e95b0450e4574336999d3187684bc28-py310-cu121-ofed2410v1140
              imagePullPolicy: Always
              env:
                - name: OMP_NUM_THREADS
                  value: "1"
                - name: OPENBLAS_NUM_THREADS
                  value: "1"
              resources:
                limits:
                  cpu: 4
                  memory: 16Gi
                requests:
                  cpu: 4
                  memory: 16Gi
              ports:
                - containerPort: 6379
                  name: gcs-server
                - containerPort: 8265
                  name: dashboard
                - containerPort: 10001
                  name: client
                - containerPort: 8000
                  name: serve
    workerGroupSpecs:
      - replicas: 1
        minReplicas: 1
        maxReplicas: 5
        # logical group name, for this called small-group, also can be functional
        groupName: small-group
        rayStartParams:
          block: "true"
        #pod template
        template:
          spec:
            containers:
              - name: ray-worker
                image: quay.io/ado/ado:6a6f6c389e95b0450e4574336999d3187684bc28-py310-cu121-ofed2410v1140
                imagePullPolicy: Always
                env:
                  - name: OMP_NUM_THREADS
                    value: "1"
                  - name: OPENBLAS_NUM_THREADS
                    value: "1"
                lifecycle:
                  preStop:
                    exec:
                      command: [ "/bin/sh", "-c", "ray stop" ]
                resources:
                  limits:
                    cpu: 4
                    memory: 16Gi
                  requests:
                    cpu: 4
                    memory: 16Gi

From the root of the ado project directory, you can deploy it with:

kubectl apply -f backend/api/ado-api-rayserve.yaml

Kuberay will automatically create a service for the Serve endpoint called ${RAY_SERVICE_NAME}-serve-svc. In the case of our example, this will be ado-api-serve-svc.

Tip

For ease of use, we suggest exposing the service using either a Route (on OpenShift), a LoadBalancer service or an Ingress. Make sure you take appropriate security measures to protect the endpoint.

You can access it via port-forward using:

kubectl port-forward svc/ado-api-serve-svc 8000:8000

You can then navigate to the interactive OpenAPI documentation at http://127.0.0.1:8000/docs or the ReDoc version at http://127.0.0.1:8000/redoc.

Installing backend services

Using the distributed MySQL backend for ado¶

Overview¶

Deployment Instructions¶

Kubernetes¶

OpenShift¶

Onboarding projects¶

Pre-requisites¶

Software¶

PXC-related variables¶

PXC Cluster name¶

PXC Cluster root credentials¶

Onboarding new projects¶

Deploying Kuberay and creating a RayCluster¶

Installing KubeRay¶

Deploying a RayCluster¶

Enabling ado actuators to create K8s resources¶

Configuring a ServiceAccount for the RayCluster¶

Associating a RayCluster with the ServiceAccount¶

Best Practices for Efficient GPU Resource Utilization¶

Recommended Worker Configuration Strategy¶

Example: Kubernetes Cluster with 4 Nodes (8 GPUs Each)¶

RayClusters and SFTTrainer¶

Deploying ADO's API via Ray Serve¶

Overview¶

Prerequisites¶

Local deployment¶

Kuberay deployment¶

Instructions¶

Deploying locally¶

Deploying on Kuberay¶