Adding GPU nodes to self-managed OpenShift on AWS🔗
When deploying self-managed OpenShift on AWS, the compute nodes are represented as one or more OpenShift MachineSet
s. If your cluster is deployed in a single zone, there will be 1 MachineSet
which defines the number and type of ec2 instances created in the AWS account. For multi-zone clusters, there will be 3 MachineSet
s.
Find the compute node MachineSet
🔗
Below is an example of a compute node (worker) MachineSet
created by the OpenShift installer. The instance type defines the node that is spun up, with 32 vCPUs and 128 GB of memory.
apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
annotations:
capacity.cluster-autoscaler.kubernetes.io/labels: kubernetes.io/arch=amd64
machine.openshift.io/GPU: '0'
machine.openshift.io/memoryMb: '131072'
machine.openshift.io/vCPU: '32'
resourceVersion: '22985'
name: fk-aws-sts-2th7t-worker-us-east-1a
uid: be5a9880-eaa0-4054-a77f-e5c4432eb51f
creationTimestamp: '2024-10-16T20:30:43Z'
generation: 1
managedFields:
- apiVersion: machine.openshift.io/v1beta1
fieldsType: FieldsV1
fieldsV1:
'f:metadata':
'f:labels':
.: {}
'f:machine.openshift.io/cluster-api-cluster': {}
'f:spec':
.: {}
'f:replicas': {}
'f:selector': {}
'f:template':
.: {}
'f:metadata':
.: {}
'f:labels':
.: {}
'f:machine.openshift.io/cluster-api-cluster': {}
'f:machine.openshift.io/cluster-api-machine-role': {}
'f:machine.openshift.io/cluster-api-machine-type': {}
'f:machine.openshift.io/cluster-api-machineset': {}
'f:spec':
.: {}
'f:lifecycleHooks': {}
'f:metadata': {}
'f:providerSpec':
.: {}
'f:value':
'f:instanceType': {}
'f:metadata':
.: {}
'f:creationTimestamp': {}
'f:blockDevices': {}
'f:kind': {}
'f:securityGroups': {}
'f:deviceIndex': {}
'f:ami':
.: {}
'f:id': {}
'f:metadataServiceOptions': {}
'f:tags': {}
.: {}
'f:placement':
.: {}
'f:availabilityZone': {}
'f:region': {}
'f:subnet':
.: {}
'f:filters': {}
'f:apiVersion': {}
'f:iamInstanceProfile':
.: {}
'f:id': {}
'f:credentialsSecret':
.: {}
'f:name': {}
'f:userDataSecret':
.: {}
'f:name': {}
manager: cluster-bootstrap
operation: Update
time: '2024-10-16T20:30:43Z'
- apiVersion: machine.openshift.io/v1beta1
fieldsType: FieldsV1
fieldsV1:
'f:status': {}
manager: cluster-bootstrap
operation: Update
subresource: status
time: '2024-10-16T20:30:43Z'
- apiVersion: machine.openshift.io/v1beta1
fieldsType: FieldsV1
fieldsV1:
'f:metadata':
'f:annotations':
.: {}
'f:capacity.cluster-autoscaler.kubernetes.io/labels': {}
'f:machine.openshift.io/GPU': {}
'f:machine.openshift.io/memoryMb': {}
'f:machine.openshift.io/vCPU': {}
manager: machine-controller-manager
operation: Update
time: '2024-10-16T20:35:26Z'
- apiVersion: machine.openshift.io/v1beta1
fieldsType: FieldsV1
fieldsV1:
'f:status':
'f:availableReplicas': {}
'f:fullyLabeledReplicas': {}
'f:observedGeneration': {}
'f:readyReplicas': {}
'f:replicas': {}
manager: machineset-controller
operation: Update
subresource: status
time: '2024-10-16T20:41:50Z'
namespace: openshift-machine-api
labels:
machine.openshift.io/cluster-api-cluster: fk-aws-sts-2th7t
spec:
replicas: 3
selector:
matchLabels:
machine.openshift.io/cluster-api-cluster: fk-aws-sts-2th7t
machine.openshift.io/cluster-api-machineset: fk-aws-sts-2th7t-worker-us-east-1a
template:
metadata:
labels:
machine.openshift.io/cluster-api-cluster: fk-aws-sts-2th7t
machine.openshift.io/cluster-api-machine-role: worker
machine.openshift.io/cluster-api-machine-type: worker
machine.openshift.io/cluster-api-machineset: fk-aws-sts-2th7t-worker-us-east-1a
spec:
lifecycleHooks: {}
metadata: {}
providerSpec:
value:
userDataSecret:
name: worker-user-data
placement:
availabilityZone: us-east-1a
region: us-east-1
credentialsSecret:
name: aws-cloud-credentials
instanceType: m5.8xlarge
metadata:
creationTimestamp: null
blockDevices:
- ebs:
encrypted: true
iops: 0
kmsKey:
arn: ''
volumeSize: 120
volumeType: gp3
securityGroups:
- filters:
- name: 'tag:Name'
values:
- fk-aws-sts-2th7t-worker-sg
kind: AWSMachineProviderConfig
metadataServiceOptions: {}
tags:
- name: kubernetes.io/cluster/fk-aws-sts-2th7t
value: owned
deviceIndex: 0
ami:
id: ami-0d653d86d4113326a
subnet:
filters:
- name: 'tag:Name'
values:
- fk-aws-sts-2th7t-private-us-east-1a
apiVersion: machine.openshift.io/v1beta1
iamInstanceProfile:
id: fk-aws-sts-2th7t-worker-profile
status:
availableReplicas: 3
fullyLabeledReplicas: 3
observedGeneration: 1
readyReplicas: 3
replicas: 3
Transform to GPU MachineSet
🔗
To create a GPU MachineSet
in the same region, copy the yaml into your favourite text editor and remove all the unnecessary properties. Below is an example of the resulting yaml, highlighting the items that have changed to define the GPU node(s). In this example there is only 1 GPU node of type g6e.8xlarge
. Only the highlighted properties should be changed.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
|
Create the GPU MachineSet
🔗
Once ready, do the following to create the MachineSet
:
- Go to the OpenShift console
- Click the "+" sign at the top of the page
- Paste the updated yaml into the the window
The MachineSet
will create Machine
CRs in the openshift-machines
project. If the instance type is available in the selected AWS region and there is enough capacity, the AWS instance(s) will be created and after a 5-10 minutes, it/they will appear as nodes in the OpenShift cluster.