Installing backend services
Using the distributed MySQL backend for ado¶
This guide is intended for administrators who are responsible for deploying the distributed MySQL backend for ADO or provisioning new projects on it.
Overview¶
We recommend using the Percona Operator for MySQL, which is built on Percona XtraDB Cluster, to provide a resilient and production-ready MySQL backend. This guide assumes that this setup is being used.
Deployment Instructions¶
Kubernetes¶
You can deploy the Percona Operator and create a Percona XtraDB Cluster using either of the following methods:
Click on the links to follow the official Percona documentation.
OpenShift¶
In OpenShift environments, the operator can be installed via OperatorHub using the Operator Lifecycle Manager (OLM).
Refer to the official OpenShift-specific guide here:
👉 OpenShift Deployment Guide
Onboarding projects¶
Warning
Before proceeding make sure you have followed the steps in Deployment Instructions.
Pre-requisites¶
Software¶
To run the scripts in this guide you will need to have the following tools installed:
kubectl
: https://kubernetes.io/docs/tasks/tools/#kubectlmysql
client version 8: https://formulae.brew.sh/formula/mysql-client@8.4
PXC-related variables¶
Note
We assume that your active namespace is the one in which you installed your Percona XtraDB Cluster.
PXC Cluster name¶
You will need to know the name of your pxc
cluster:
kubectl get pxc -o jsonpath='{.items[].metadata.name}'
We will refer to its name as $PXC_NAME
.
PXC Cluster root credentials¶
You will need a highly privileged account to onboard new projects, as you will need to create databases, users, and grant permissions. For this reason, we will use the default root
account.
You can retrieve its password with:
kubectl get secret $PXC_NAME-secrets --template='{{.data.root}}' | base64 -d
We will refer to this password as $MYSQL_ADMIN_PASSWORD
.
Onboarding new projects¶
The simplest way to onboard a new project called $PROJECT_NAME
is to use the forward_mysql_and_onboard_new_project.sh
. This script creates a new project in the MySQL DB and outputs an ado context YAML that can be used to connect to it.
For example:
./forward_mysql_and_onboard_new_project.sh --admin-user root \
--admin-pass $MYSQL_ADMIN_PASSWORD \
--pxc-name $PXC_NAME \
--project-name $PROJECT_NAME
Alternatively, if you are using a hosted MySQL instance somewhere (e.g., on the Cloud), you can use the other script: onboard_new_project.sh
:
./onboard_new_project.sh --admin-user root \
--admin-pass $MYSQL_ADMIN_PASSWORD \
--mysql-endpoint $MYSQL_ENDPOINT \
--project-name $PROJECT_NAME
Once the project is created the context YAML can be shared with whoever needs access to the project.
Deploying Kuberay and creating a RayCluster¶
This guide is intended for users who want to run operations on an autoscaling ray cluster deployed on kubernetes/OpenShift. Depending on cluster permissions users may need someone with administrator privileges to install KubeRay and/or create RayCluster objects.
Installing KubeRay¶
Warning
KubeRay is included in OpenShift AI and OpenDataHub. Skip this step if they are already installed in your cluster.
You can install the KubeRay Operator either via Helm or Kustomize by following the official documentation.
Deploying a RayCluster¶
Warning
The ray
versions must be compatible. For a more in depth guide refer to the RayCluster configuration page.
Note
When running multi-node measurement make sure that all nodes in your multi-node setup have read and write access to your HuggingFace home directory. On Kubernetes with RayCluster, avoid S3-like filesystems as that is known to cause failures in transformers. Use a NFS or GPFS-backed PersistentVolumeClaim instead.
Best Practices for Efficient GPU Resource Utilization¶
To maximize the efficiency of your RayCluster and minimize GPU resource fragmentation, we recommend the following:
-
Enable Ray Autoscaler
This allows Ray to dynamically adjust the number of worker replicas based on task demand. -
Use Multiple GPU Worker Variants
Define several GPU worker types with varying GPU counts. This flexibility helps match task requirements more precisely and reduces idle GPU time.
Recommended Worker Configuration Strategy¶
Create GPU worker variants with increasing GPU counts, where each variant has double the GPUs of the previous one. Limit each variant to a maximum of 2 replicas, ensuring that their combined GPU usage does not exceed the capacity of a single replica of the next larger variant.
Example: Kubernetes Cluster with 4 Nodes (8 GPUs Each)¶
Recommended worker setup:
- 2 replicas of a worker with 1 GPU
- 2 replicas of a worker with 2 GPUs
- 2 replicas of a worker with 4 GPUs
- 4 replicas of a worker with 8 GPUs
Example: The contents of the additionalWorkerGroups field of a RayCluster with 4 Nodes each with 8 NVIDIA-A100-SXM4-80GB GPUs, 64 CPU cores, and 1TB memory
one-A100-80G-gpu-WG:
replicas: 0
minReplicas: 0
maxReplicas: 2
rayStartParams:
block: 'true'
num-gpus: '1'
resources: '"{\"NVIDIA-A100-SXM4-80GB\": 1}"'
containerEnv:
- name: OMP_NUM_THREADS
value: "1"
- name: OPENBLAS_NUM_THREADS
value: "1"
lifecycle:
preStop:
exec:
command: [ "/bin/sh","-c","ray stop" ]
# securityContext: ...
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: nvidia.com/gpu.product
operator: In
values:
- NVIDIA-A100-SXM4-80GB
resources:
limits:
cpu: 8
nvidia.com/gpu: 1
memory: 100Gi
requests:
cpu: 8
nvidia.com/gpu: 1
memory: 100Gi
# volumes: ...
# volumeMounts: ....
two-A100-80G-gpu-WG:
replicas: 0
minReplicas: 0
maxReplicas: 2
rayStartParams:
block: 'true'
num-gpus: '2'
resources: '"{\"NVIDIA-A100-SXM4-80GB\": 2}"'
containerEnv:
- name: OMP_NUM_THREADS
value: "1"
- name: OPENBLAS_NUM_THREADS
value: "1"
lifecycle:
preStop:
exec:
command: [ "/bin/sh","-c","ray stop" ]
# securityContext: ...
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: nvidia.com/gpu.product
operator: In
values:
- NVIDIA-A100-SXM4-80GB
resources:
limits:
cpu: 15
nvidia.com/gpu: 2
memory: 200Gi
requests:
cpu: 15
nvidia.com/gpu: 2
memory: 200Gi
# volumes: ...
# volumeMounts: ....
four-A100-80G-gpu-WG:
replicas: 0
minReplicas: 0
maxReplicas: 2
rayStartParams:
block: 'true'
num-gpus: '4'
resources: '"{\"NVIDIA-A100-SXM4-80GB\": 4}"'
containerEnv:
- name: OMP_NUM_THREADS
value: "1"
- name: OPENBLAS_NUM_THREADS
value: "1"
lifecycle:
preStop:
exec:
command: [ "/bin/sh","-c","ray stop" ]
# securityContext: ...
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: nvidia.com/gpu.product
operator: In
values:
- NVIDIA-A100-SXM4-80GB
resources:
limits:
cpu: 30
nvidia.com/gpu: 4
memory: 400Gi
requests:
cpu: 30
nvidia.com/gpu: 4
memory: 400Gi
# volumes: ...
# volumeMounts: ....
eight-A100-80G-gpu-WG:
replicas: 0
minReplicas: 0
maxReplicas: 4
rayStartParams:
block: 'true'
num-gpus: '8'
resources: '"{\"NVIDIA-A100-SXM4-80GB\": 8, \"full-worker\": 1}"'
containerEnv:
- name: OMP_NUM_THREADS
value: "1"
- name: OPENBLAS_NUM_THREADS
value: "1"
lifecycle:
preStop:
exec:
command: [ "/bin/sh","-c","ray stop" ]
# securityContext: ...
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: nvidia.com/gpu.product
operator: In
values:
- NVIDIA-A100-SXM4-80GB
resources:
limits:
cpu: 60
nvidia.com/gpu: 8
memory: 800Gi
requests:
cpu: 60
nvidia.com/gpu: 8
memory: 800Gi
# volumes: ...
# volumeMounts: ....
Note
Notice that the only variant with a full-worker custom resource is the one with 8 GPUs. Some actuators, like SFTTrainer, use this custom resource for measurements that involve reserving an entire GPU node.
We provide an example set of values for deploying a RayCluster via KubeRay. To deploy it, simply run:
helm upgrade --install ado-ray kuberay/ray-cluster --version 1.1.0 --values backend/kuberay/vanilla-ray.yaml
Feel free to customize it to suit your cluster, such as uncommenting GPU-enabled workers.
Deploying ADO's API via Ray Serve¶
Overview¶
This guide explains how to spin up ADO's API using Ray Serve on your local machine and on a Kuberay cluster.
Prerequisites¶
Local deployment¶
Ensure you have created a virtual environment for ADO by following the instructions in our development documentation
Kuberay deployment¶
Make sure you have completed the setup outlined in our instructions for deploying Kuberay on your Kubernetes cluster.
Instructions¶
Deploying locally¶
Serving the API locally is very easy. Run the following command in your terminal:
serve run orchestrator.api.rest:ado_rest_api
You should see output similar to this:
2025-09-19 11:50:39,727 INFO scripts.py:507 -- Running import path: 'orchestrator.api.rest:ado_rest_api'.
2025-09-19 11:50:45,496 INFO worker.py:1942 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
(ProxyActor pid=98735) INFO 2025-09-19 11:50:48,686 proxy 127.0.0.1 -- Proxy starting on node 05f52ce870e3943ff5ec646472a93f5b552fd48ad45cdd8286569db1 (HTTP port: 8000).
(ProxyActor pid=98735) INFO 2025-09-19 11:50:48,801 proxy 127.0.0.1 -- Got updated endpoints: {}.
INFO 2025-09-19 11:50:48,825 serve 98612 -- Started Serve in namespace "serve".
INFO 2025-09-19 11:50:48,852 serve 98612 -- Connecting to existing Serve app in namespace "serve". New http options will not be applied.
(ServeController pid=98727) INFO 2025-09-19 11:50:49,037 controller 98727 -- Deploying new version of Deployment(name='AdoRESTApi', app='default') (initial target replicas: 1).
(ProxyActor pid=98735) INFO 2025-09-19 11:50:49,078 proxy 127.0.0.1 -- Got updated endpoints: {Deployment(name='AdoRESTApi', app='default'): EndpointInfo(route='/', app_is_cross_language=False)}.
(ProxyActor pid=98735) INFO 2025-09-19 11:50:49,098 proxy 127.0.0.1 -- Started <ray.serve._private.router.SharedRouterLongPollClient object at 0x11531f730>.
(ServeController pid=98727) INFO 2025-09-19 11:50:49,177 controller 98727 -- Adding 1 replica to Deployment(name='AdoRESTApi', app='default').
INFO 2025-09-19 11:50:50,096 serve 98612 -- Application 'default' is ready at http://127.0.0.1:8000/.
Once you see the final line, the API is running. Open the interactive OpenAPI documentation at http://127.0.0.1:8000/docs or the ReDoc version at http://127.0.0.1:8000/redoc.
Deploying on Kuberay¶
Ray Serve applications are deployed on Kuberay via RayService
s. An example RayService is provided:
# Copyright (c) IBM Corporation
# SPDX-License-Identifier: MIT
apiVersion: ray.io/v1alpha1
kind: RayService
metadata:
name: ado-api
spec:
serveConfigV2: |
applications:
- import_path: orchestrator.api.rest:ado_rest_api
name: orchestrator_api
deployments:
- name: AdoRESTApi
rayClusterConfig:
rayVersion: "2.49.1" # should match the Ray version in the image of the containers
######################headGroupSpecs#################################
# Ray head pod template.
headGroupSpec:
rayStartParams:
#Following https://ray-project.github.io/kuberay/best-practice/worker-head-reconnection/
num-cpus: "0"
dashboard-host: "0.0.0.0"
block: "true"
#pod template
template:
spec:
containers:
- name: ray-head
image: quay.io/ado/ado:6a6f6c389e95b0450e4574336999d3187684bc28-py310-cu121-ofed2410v1140
imagePullPolicy: Always
env:
- name: OMP_NUM_THREADS
value: "1"
- name: OPENBLAS_NUM_THREADS
value: "1"
resources:
limits:
cpu: 4
memory: 16Gi
requests:
cpu: 4
memory: 16Gi
ports:
- containerPort: 6379
name: gcs-server
- containerPort: 8265
name: dashboard
- containerPort: 10001
name: client
- containerPort: 8000
name: serve
workerGroupSpecs:
- replicas: 1
minReplicas: 1
maxReplicas: 5
# logical group name, for this called small-group, also can be functional
groupName: small-group
rayStartParams:
block: "true"
#pod template
template:
spec:
containers:
- name: ray-worker
image: quay.io/ado/ado:6a6f6c389e95b0450e4574336999d3187684bc28-py310-cu121-ofed2410v1140
imagePullPolicy: Always
env:
- name: OMP_NUM_THREADS
value: "1"
- name: OPENBLAS_NUM_THREADS
value: "1"
lifecycle:
preStop:
exec:
command: [ "/bin/sh", "-c", "ray stop" ]
resources:
limits:
cpu: 4
memory: 16Gi
requests:
cpu: 4
memory: 16Gi
From the root of the ado project directory, you can deploy it with:
kubectl apply -f backend/api/ado-api-rayserve.yaml
Kuberay will automatically create a service for the Serve endpoint called ${RAY_SERVICE_NAME}-serve-svc
. In the case of our example, this will be ado-api-serve-svc
.
Tip
For ease of use, we suggest exposing the service using either a Route (on OpenShift), a LoadBalancer service or an Ingress. Make sure you take appropriate security measures to protect the endpoint.
You can access it via port-forward using:
kubectl port-forward svc/ado-api-serve-svc 8000:8000
You can then navigate to the interactive OpenAPI documentation at http://127.0.0.1:8000/docs or the ReDoc version at http://127.0.0.1:8000/redoc.