Benchmarking Geospatial Models with vLLM¶

The scenario

In this example, the vllm_performance actuator is used to benchmark geospatial models (IBM-NASA Prithvi) for Earth observation tasks.

Geospatial models process satellite imagery for tasks like flood detection, land use classification, and environmental monitoring. Unlike text-based LLMs, these models:

Accept base64-encoded satellite images as input
Output classification results rather than text tokens
Have different performance characteristics and optimization requirements

In this example:

We will define a space of geospatial model deployment configurations to test
Use the test-geospatial-deployment-v1 experiment to create and benchmark vLLM deployments serving Prithvi models
Explore how deployment parameters affect inference latency for flood detection tasks

Prerequisites

Be logged-in to your Kubernetes/OpenShift cluster
Have access to a namespace where you can create vLLM deployments
Install the following Python packages locally:

pip install ado-vllm-performance

TL;DR

Create the following files and execute:

# Create resources and run operation
ado create op -f geospatial_operation.yaml \
   --with space=geospatial_space.yaml --with ac=vllm_actuator_configuration.yaml

See configuring the vllm_performance actuator for configuration options.

Verify the installation¶

Verify the installation with:

ado get actuators --details

The actuator vllm_performance should appear in the list. To see the geospatial experiments:

ado get experiments --details

You should see experiments including test-geospatial-deployment-v1, test-geospatial-endpoint-v1, test-geospatial-deployment-custom-dataset-v1, and test-geospatial-endpoint-custom-dataset-v1.

Create an actuator configuration¶

The vllm-performance actuator needs information about the target cluster. This is provided via an actuatorconfiguration.

First execute:

ado template actuatorconfiguration --actuator-identifier vllm_performance -o vllm_actuator_configuration.yaml

Edit the file and set correct values for at least the namespace field. In this example we are assuming the namespace the user has access to is named vllm-testing.

# you MUST set this to a namespace where you can create vLLM deployments
namespace: vllm-testing
# Required to access Prithvi models
hf_token: <your HuggingFace access token>

Then save this configuration:

ado create actuatorconfiguration -f vllm_actuator_configuration.yaml

Define the geospatial configurations to test¶

For geospatial models, we focus on deployment parameters that affect inference latency since these models output classification results rather than generating tokens. Key parameters include:

GPU configuration: Type and number of GPUs
Memory allocation: CPU and GPU memory
Batch processing: max_num_seq for concurrent requests
Workload pattern: Request rate and concurrency

Save the following as geospatial_space.yaml:

# Copyright IBM Corporation 2025, 2026
# SPDX-License-Identifier: MIT
metadata:
  name: geospatial-flood-detection-space
  description: "Explore Prithvi geospatial model deployment configurations for flood detection"
entitySpace:
  - identifier: model
    propertyDomain:
      values:
        - "ibm-nasa-geospatial/Prithvi-EO-2.0-300M-TL-Sen1Floods11"
  - identifier: n_gpus
    propertyDomain:
      values: [1]
  - identifier: gpu_type
    propertyDomain:
      values:
        - "NVIDIA-A100-80GB-PCIe"
  - identifier: memory
    propertyDomain:
      values:
        - "64Gi"
        - "128Gi"
  - identifier: max_num_seq
    propertyDomain:
      values: [32, 64, 128]
  - identifier: request_rate
    propertyDomain:
      values: [10, 50, 100]
  - identifier: dataset
    propertyDomain:
      values:
        - "india_url_in_b64_out"
        - "valencia_url_in_b64_out"
measurementSpace:
  - actuatorIdentifier: vllm_performance
    experimentIdentifier: test-geospatial-deployment-v1

Then run:

ado create space -f geospatial_space.yaml

This space explores:

Two pre-packaged flood detection datasets (India and Valencia regions)
Different memory allocations (64Gi vs 128Gi)
Various batch sizes (32, 64, 128 concurrent requests)
Multiple request rates (10, 50, 100 requests/second)

Explore the space with random_walk¶

We'll use the random_walk operator with grouped sampling to efficiently explore the space. Grouped sampling ensures we test all workload patterns for a given deployment before creating a new one.

Save the following as geospatial_operation.yaml:

# Copyright IBM Corporation 2025, 2026
# SPDX-License-Identifier: MIT
metadata:
  name: geospatial-exploration
  description: "Explore geospatial model deployment configurations"
operatorIdentifier: random_walk
operatorConfiguration:
  sampler:
    samplerIdentifier: grouped_explicit_grid_sampler
    samplerConfiguration:
      grouping:
        - model
        - n_gpus
        - gpu_type
        - memory
        - max_num_seq
      numberOfSamples: all

Then, start the operation with:

ado create operation -f geospatial_operation.yaml \
           --use-latest space --use-latest actuatorconfiguration

As it runs, a table of results is updated live in the terminal.

Understanding the Results¶

Geospatial experiments measure end-to-end latency metrics:

duration: Total benchmark duration
completed: Number of successful requests
request_throughput: Requests processed per second
mean_e2el_ms: Mean end-to-end latency in milliseconds
p50_e2el_ms, p99_e2el_ms: Latency percentiles

Unlike text LLMs, geospatial models don't generate tokens, so metrics like TTFT (Time To First Token) and TPOT (Time Per Output Token) are not applicable.

Monitor the deployment¶

While the operation is running you can monitor the deployment:

# In a separate terminal
oc get deployments --watch -n vllm-testing

You can also get the results table by executing (in another terminal):

ado show entities operation --use-latest

Check final results¶

When the experiment finishes, inspect all results with:

ado show entities space --output csv --use-latest > entities.csv

Pre-packaged Datasets¶

The actuator includes two pre-packaged datasets for flood detection:

india_url_in_b64_out: Satellite imagery from India region with flood detection labels
valencia_url_in_b64_out: Satellite imagery from Valencia region with flood detection labels

These datasets contain base64-encoded satellite images suitable for the Prithvi-EO-2.0 flood detection models.

Using Custom Datasets¶

To use your own geospatial datasets, use the test-geospatial-deployment-custom-dataset-v1 experiment. Your dataset should be a JSONL (JSON Lines) file where each line is a JSON object with this structure:

{"prompt": {"data": {"data": "https://example.com/path/to/image.tif",
"data_format": "url", "out_data_format": "b64_json",
"indices": [1, 2, 3, 8, 11, 12]}}}
{"prompt": {"data": {"data": "https://example.com/path/to/image2.tif",
"data_format": "url", "out_data_format": "b64_json",
"indices": [1, 2, 3, 8, 11, 12]}}}

Model-Specific Payload Format

The payload structure shown above is specific to the IBM-NASA Prithvi geospatial models (Prithvi-EO-2.0-300M and 600M). If you are using a different geospatial model, you must adapt the payload format to match your model's expected input structure. Consult your model's documentation for the correct payload format, including:

Required fields and their structure
Supported data formats (URL, base64, etc.)
Expected spectral band indices
Any model-specific parameters

Each line contains a prompt object with a data object containing:

data: URL or base64-encoded string of the satellite image
data_format: Format of the input data ("url" or "b64")
out_data_format: Format for output data ("b64_json")
indices: List of spectral band indices to use (e.g., [1, 2, 3, 8, 11, 12] for Sentinel-2 bands used by Prithvi models)

Update your space definition to use the custom dataset experiment:

measurementSpace:
  - actuatorIdentifier: vllm_performance
    experimentIdentifier: test-geospatial-deployment-custom-dataset-v1

And add the dataset path to your entity space:

entitySpace:
  - identifier: dataset
    propertyDomain:
      values:
        - "/path/to/your/dataset.jsonl"

Next steps¶

Try the 600M parameter Prithvi model by changing the model identifier to ibm-nasa-geospatial/Prithvi-EO-2.0-600M-TL-Sen1Floods11
Explore different GPU types if your cluster has multiple options
Test endpoint benchmarking with test-geospatial-endpoint-v1 if you have an existing deployment
Use the RayTune operator to find optimal configurations for your latency requirements
Run the exploration on the OpenShift/Kubernetes cluster to avoid keeping your laptop open
Check the vllm_performance actuator documentation for more configuration options