Skip to content

Installing backend services

Using the distributed MySQL backend for ado

This guide is intended for administrators who are responsible for deploying the distributed MySQL backend for ADO or provisioning new projects on it.

Overview

We recommend using the Percona Operator for MySQL, which is built on Percona XtraDB Cluster, to provide a resilient and production-ready MySQL backend. This guide assumes that this setup is being used.

Deployment Instructions

Kubernetes

You can deploy the Percona Operator and create a Percona XtraDB Cluster using either of the following methods:

Click on the links to follow the official Percona documentation.

OpenShift

In OpenShift environments, the operator can be installed via OperatorHub using the Operator Lifecycle Manager (OLM).

Refer to the official OpenShift-specific guide here:

👉 OpenShift Deployment Guide

Onboarding projects

Warning

Before proceeding make sure you have followed the steps in Deployment Instructions.

Pre-requisites

Software

To run the scripts in this guide you will need to have the following tools installed:

Note

We assume that your active namespace is the one in which you installed your Percona XtraDB Cluster.

PXC Cluster name

You will need to know the name of your pxc cluster:

kubectl get pxc -o jsonpath='{.items[].metadata.name}'

We will refer to its name as $PXC_NAME.

PXC Cluster root credentials

You will need a highly privileged account to onboard new projects, as you will need to create databases, users, and grant permissions. For this reason, we will use the default root account.

You can retrieve its password with:

kubectl get secret $PXC_NAME-secrets --template='{{.data.root}}' | base64 -d

We will refer to this password as $MYSQL_ADMIN_PASSWORD.

Onboarding new projects

The simplest way to onboard a new project called $PROJECT_NAME is to use the forward_mysql_and_onboard_new_project.sh. This script creates a new project in the MySQL DB and outputs an ado context YAML that can be used to connect to it.

For example:

./forward_mysql_and_onboard_new_project.sh --admin-user root \
                      --admin-pass $MYSQL_ADMIN_PASSWORD \
                      --pxc-name $PXC_NAME \
                      --project-name $PROJECT_NAME

Alternatively, if you are using a hosted MySQL instance somewhere (e.g., on the Cloud), you can use the other script: onboard_new_project.sh:

./onboard_new_project.sh --admin-user root \
                      --admin-pass $MYSQL_ADMIN_PASSWORD \
                      --mysql-endpoint $MYSQL_ENDPOINT \
                      --project-name $PROJECT_NAME

Once the project is created the context YAML can be shared with whoever needs access to the project.

Deploying Kuberay and creating a RayCluster

This guide is intended for users who want to run operations on an autoscaling ray cluster deployed on kubernetes/OpenShift. Depending on cluster permissions users may need someone with administrator privileges to install KubeRay and/or create RayCluster objects.

Installing KubeRay

Warning

KubeRay is included in OpenShift AI and OpenDataHub. Skip this step if they are already installed in your cluster.

You can install the KubeRay Operator either via Helm or Kustomize by following the official documentation.

Deploying a RayCluster

Warning

The ray versions must be compatible. For a more in depth guide refer to the RayCluster configuration page.

Note

When running multi-node measurement make sure that all nodes in your multi-node setup have read and write access to your HuggingFace home directory. On Kubernetes with RayCluster, avoid S3-like filesystems as that is known to cause failures in transformers. Use a NFS or GPFS-backed PersistentVolumeClaim instead.

Best Practices for Efficient GPU Resource Utilization

To maximize the efficiency of your RayCluster and minimize GPU resource fragmentation, we recommend the following:

  • Enable Ray Autoscaler
    This allows Ray to dynamically adjust the number of worker replicas based on task demand.

  • Use Multiple GPU Worker Variants
    Define several GPU worker types with varying GPU counts. This flexibility helps match task requirements more precisely and reduces idle GPU time.

Create GPU worker variants with increasing GPU counts, where each variant has double the GPUs of the previous one. Limit each variant to a maximum of 2 replicas, ensuring that their combined GPU usage does not exceed the capacity of a single replica of the next larger variant.

Example: Kubernetes Cluster with 4 Nodes (8 GPUs Each)

Recommended worker setup:

  • 2 replicas of a worker with 1 GPU
  • 2 replicas of a worker with 2 GPUs
  • 2 replicas of a worker with 4 GPUs
  • 4 replicas of a worker with 8 GPUs
Example: The contents of the additionalWorkerGroups field of a RayCluster with 4 Nodes each with 8 NVIDIA-A100-SXM4-80GB GPUs, 64 CPU cores, and 1TB memory
one-A100-80G-gpu-WG:
  replicas: 0
  minReplicas: 0
  maxReplicas: 2
  rayStartParams:
    block: 'true'
    num-gpus: '1'
    resources: '"{\"NVIDIA-A100-SXM4-80GB\": 1}"'
  containerEnv:
    - name: OMP_NUM_THREADS
      value: "1"
    - name: OPENBLAS_NUM_THREADS
      value: "1"
  lifecycle:
    preStop:
      exec:
        command: [ "/bin/sh","-c","ray stop" ]
  # securityContext: ...
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: nvidia.com/gpu.product
                operator: In
                values:
                  - NVIDIA-A100-SXM4-80GB
  resources:
    limits:
      cpu: 8
      nvidia.com/gpu: 1
      memory: 100Gi
    requests:
      cpu: 8
      nvidia.com/gpu: 1
      memory: 100Gi
  # volumes: ...
  # volumeMounts: ....

two-A100-80G-gpu-WG:
  replicas: 0
  minReplicas: 0
  maxReplicas: 2
  rayStartParams:
    block: 'true'
    num-gpus: '2'
    resources: '"{\"NVIDIA-A100-SXM4-80GB\": 2}"'
  containerEnv:
    - name: OMP_NUM_THREADS
      value: "1"
    - name: OPENBLAS_NUM_THREADS
      value: "1"
  lifecycle:
    preStop:
      exec:
        command: [ "/bin/sh","-c","ray stop" ]
  # securityContext: ...
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: nvidia.com/gpu.product
                operator: In
                values:
                  - NVIDIA-A100-SXM4-80GB
  resources:
    limits:
      cpu: 15
      nvidia.com/gpu: 2
      memory: 200Gi
    requests:
      cpu: 15
      nvidia.com/gpu: 2
      memory: 200Gi
  # volumes: ...
  # volumeMounts: ....

four-A100-80G-gpu-WG:
  replicas: 0
  minReplicas: 0
  maxReplicas: 2
  rayStartParams:
    block: 'true'
    num-gpus: '4'
    resources: '"{\"NVIDIA-A100-SXM4-80GB\": 4}"'
  containerEnv:
    - name: OMP_NUM_THREADS
      value: "1"
    - name: OPENBLAS_NUM_THREADS
      value: "1"
  lifecycle:
    preStop:
      exec:
        command: [ "/bin/sh","-c","ray stop" ]
  # securityContext: ...
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: nvidia.com/gpu.product
                operator: In
                values:
                  - NVIDIA-A100-SXM4-80GB
  resources:
    limits:
      cpu: 30
      nvidia.com/gpu: 4
      memory: 400Gi
    requests:
      cpu: 30
      nvidia.com/gpu: 4
      memory: 400Gi
  # volumes: ...
  # volumeMounts: ....

eight-A100-80G-gpu-WG:
  replicas: 0
  minReplicas: 0
  maxReplicas: 4
  rayStartParams:
    block: 'true'
    num-gpus: '8'
    resources: '"{\"NVIDIA-A100-SXM4-80GB\": 8, \"full-worker\": 1}"'
  containerEnv:
    - name: OMP_NUM_THREADS
      value: "1"
    - name: OPENBLAS_NUM_THREADS
      value: "1"
  lifecycle:
    preStop:
      exec:
        command: [ "/bin/sh","-c","ray stop" ]
  # securityContext: ...
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: nvidia.com/gpu.product
                operator: In
                values:
                  - NVIDIA-A100-SXM4-80GB

  resources:
    limits:
      cpu: 60
      nvidia.com/gpu: 8
      memory: 800Gi
    requests:
      cpu: 60
      nvidia.com/gpu: 8
      memory: 800Gi
  # volumes: ...
  # volumeMounts: ....

Note

Notice that the only variant with a full-worker custom resource is the one with 8 GPUs. Some actuators, like SFTTrainer, use this custom resource for measurements that involve reserving an entire GPU node.

We provide an example set of values for deploying a RayCluster via KubeRay. To deploy it, simply run:

helm upgrade --install ado-ray kuberay/ray-cluster --version 1.1.0 --values backend/kuberay/vanilla-ray.yaml

Feel free to customize it to suit your cluster, such as uncommenting GPU-enabled workers.

Deploying ADO's API via Ray Serve

Overview

This guide explains how to spin up ADO's API using Ray Serve on your local machine and on a Kuberay cluster.

Prerequisites

Local deployment

Ensure you have created a virtual environment for ADO by following the instructions in our development documentation

Kuberay deployment

Make sure you have completed the setup outlined in our instructions for deploying Kuberay on your Kubernetes cluster.

Instructions

Deploying locally

Serving the API locally is very easy. Run the following command in your terminal:

serve run orchestrator.api.rest:ado_rest_api

You should see output similar to this:

2025-09-19 11:50:39,727 INFO scripts.py:507 -- Running import path: 'orchestrator.api.rest:ado_rest_api'.
2025-09-19 11:50:45,496 INFO worker.py:1942 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
(ProxyActor pid=98735) INFO 2025-09-19 11:50:48,686 proxy 127.0.0.1 -- Proxy starting on node 05f52ce870e3943ff5ec646472a93f5b552fd48ad45cdd8286569db1 (HTTP port: 8000).
(ProxyActor pid=98735) INFO 2025-09-19 11:50:48,801 proxy 127.0.0.1 -- Got updated endpoints: {}.
INFO 2025-09-19 11:50:48,825 serve 98612 -- Started Serve in namespace "serve".
INFO 2025-09-19 11:50:48,852 serve 98612 -- Connecting to existing Serve app in namespace "serve". New http options will not be applied.
(ServeController pid=98727) INFO 2025-09-19 11:50:49,037 controller 98727 -- Deploying new version of Deployment(name='AdoRESTApi', app='default') (initial target replicas: 1).
(ProxyActor pid=98735) INFO 2025-09-19 11:50:49,078 proxy 127.0.0.1 -- Got updated endpoints: {Deployment(name='AdoRESTApi', app='default'): EndpointInfo(route='/', app_is_cross_language=False)}.
(ProxyActor pid=98735) INFO 2025-09-19 11:50:49,098 proxy 127.0.0.1 -- Started <ray.serve._private.router.SharedRouterLongPollClient object at 0x11531f730>.
(ServeController pid=98727) INFO 2025-09-19 11:50:49,177 controller 98727 -- Adding 1 replica to Deployment(name='AdoRESTApi', app='default').
INFO 2025-09-19 11:50:50,096 serve 98612 -- Application 'default' is ready at http://127.0.0.1:8000/.

Once you see the final line, the API is running. Open the interactive OpenAPI documentation at http://127.0.0.1:8000/docs or the ReDoc version at http://127.0.0.1:8000/redoc.

Deploying on Kuberay

Ray Serve applications are deployed on Kuberay via RayServices. An example RayService is provided:

# Copyright (c) IBM Corporation
# SPDX-License-Identifier: MIT

apiVersion: ray.io/v1alpha1
kind: RayService
metadata:
  name: ado-api
spec:
  serveConfigV2: |
    applications: 
      - import_path: orchestrator.api.rest:ado_rest_api
        name: orchestrator_api
        deployments:
        - name: AdoRESTApi
  rayClusterConfig:
    rayVersion: "2.49.1" # should match the Ray version in the image of the containers
    ######################headGroupSpecs#################################
    # Ray head pod template.
    headGroupSpec:
      rayStartParams:
        #Following https://ray-project.github.io/kuberay/best-practice/worker-head-reconnection/
        num-cpus: "0"
        dashboard-host: "0.0.0.0"
        block: "true"
      #pod template
      template:
        spec:
          containers:
            - name: ray-head
              image: quay.io/ado/ado:6a6f6c389e95b0450e4574336999d3187684bc28-py310-cu121-ofed2410v1140
              imagePullPolicy: Always
              env:
                - name: OMP_NUM_THREADS
                  value: "1"
                - name: OPENBLAS_NUM_THREADS
                  value: "1"
              resources:
                limits:
                  cpu: 4
                  memory: 16Gi
                requests:
                  cpu: 4
                  memory: 16Gi
              ports:
                - containerPort: 6379
                  name: gcs-server
                - containerPort: 8265
                  name: dashboard
                - containerPort: 10001
                  name: client
                - containerPort: 8000
                  name: serve
    workerGroupSpecs:
      - replicas: 1
        minReplicas: 1
        maxReplicas: 5
        # logical group name, for this called small-group, also can be functional
        groupName: small-group
        rayStartParams:
          block: "true"
        #pod template
        template:
          spec:
            containers:
              - name: ray-worker
                image: quay.io/ado/ado:6a6f6c389e95b0450e4574336999d3187684bc28-py310-cu121-ofed2410v1140
                imagePullPolicy: Always
                env:
                  - name: OMP_NUM_THREADS
                    value: "1"
                  - name: OPENBLAS_NUM_THREADS
                    value: "1"
                lifecycle:
                  preStop:
                    exec:
                      command: [ "/bin/sh", "-c", "ray stop" ]
                resources:
                  limits:
                    cpu: 4
                    memory: 16Gi
                  requests:
                    cpu: 4
                    memory: 16Gi

From the root of the ado project directory, you can deploy it with:

kubectl apply -f backend/api/ado-api-rayserve.yaml

Kuberay will automatically create a service for the Serve endpoint called ${RAY_SERVICE_NAME}-serve-svc. In the case of our example, this will be ado-api-serve-svc.

Tip

For ease of use, we suggest exposing the service using either a Route (on OpenShift), a LoadBalancer service or an Ingress. Make sure you take appropriate security measures to protect the endpoint.

You can access it via port-forward using:

kubectl port-forward svc/ado-api-serve-svc 8000:8000

You can then navigate to the interactive OpenAPI documentation at http://127.0.0.1:8000/docs or the ReDoc version at http://127.0.0.1:8000/redoc.