vLLM IOProcessor Plugins#

vLLM's IOProcessor plugins are a mechanism that enables processing of input/ouput inferende data from/to any modality. So, as an example, these plugins allow for the output of a model to be transformed into an image.

TerraTorch provides plugins for the handling of input/ouput GeoTiff images when serving models via vLLM.

More information can be found in the vLLM official documentation

Using IOProcessor Plugins#

IOProcessor plugins are instantiated a vLLM startup time via a dedicated flag --io_processor_plugin. The below snippet shows an example of a vLLM server started for serving a TerraTorch model using the terratorch_segmentation_plugin.

vllm serve \
    --model=ibm-nasa-geospatial/Prithvi-EO-2.0-300M-TL-Sen1Floods11 \
    --model-impl terratorch \
    --task embed --trust-remote-code \
    --skip-tokenizer-init --enforce-eager \
    --io-processor-plugin terratorch_segmentation

Inference requests are then sent to the vLLM server URL under the /pooling endpoint.

The format of the request is described below for the terratorch_segmentation plugin, where the model and softmax fields are pre-defined and are only processed by vLLM, while the data field is plugin dependent. Refer to the single plugin documentation to get more information on the request data format.

request_payload = {
    "data": {
        "data": "image_url",
        "data_format": "url",
        "out_data_format": "b64_json",
    },
    "model": "ibm-nasa-geospatial/Prithvi-EO-2.0-300M-TL-Sen1Floods11",
    "softmax": False,
}