vLLM IOProcessor Plugins#
vLLM's IOProcessor plugins are a mechanism that enables processing of input/ouput inferende data from/to any modality. So, as an example, these plugins allow for the output of a model to be transformed into an image.
TerraTorch provides plugins for the handling of input/ouput GeoTiff images when serving models via vLLM.
More information can be found in the vLLM official documentation
Using IOProcessor Plugins#
IOProcessor plugins are instantiated a vLLM startup time via a dedicated flag --io_processor_plugin
. The below snippet shows an example of a vLLM server started for serving a TerraTorch model using the terratorch_segmentation_plugin
.
vllm serve \
--model=ibm-nasa-geospatial/Prithvi-EO-2.0-300M-TL-Sen1Floods11 \
--model-impl terratorch \
--task embed --trust-remote-code \
--skip-tokenizer-init --enforce-eager \
--io-processor-plugin terratorch_segmentation
Inference requests are then sent to the vLLM server URL under the /pooling
endpoint.
The format of the request is described below for the terratorch_segmentation
plugin, where the model
and softmax
fields are pre-defined and are only processed by vLLM, while the data
field is plugin dependent. Refer to the single plugin documentation to get more information on the request data format.