Embedding Service

The EmbeddingService provides functionality to generate text embeddings using IBM watsonx.ai encoder models. It converts text inputs into dense vector representations that can be used for semantic search, similarity comparison, clustering, and retrieval-augmented generation (RAG).

Quick Start

EmbeddingService embeddingService = EmbeddingService.builder()
    .apiKey(WATSONX_API_KEY)
    .projectId(WATSONX_PROJECT_ID)
    .baseUrl(CloudRegion.DALLAS)
    .modelId("ibm/granite-embedding-278m-multilingual")
    .build();

EmbeddingResponse response = embeddingService.embedding("Hello, world!");
System.out.println(response.results().get(0).embedding());
// → [-0.029937625, 0.05433679, 0.013135133, 0.018311847, ...]

Note: To see the list of available embedding models, refer to supported encoder models.


Overview

The EmbeddingService enables you to:

  • Embed single or multiple text inputs in a single request.
  • Process large batches automatically.
  • Configure token truncation to handle long inputs gracefully.
  • Optionally return the original input text alongside each embedding vector.
  • Build semantic search, similarity, and RAG applications.

Service Configuration

Basic Setup

To start using the Embedding Service, create an EmbeddingService instance with the minimum required configuration.

EmbeddingService embeddingService = EmbeddingService.builder()
    .apiKey(WATSONX_API_KEY)
    .projectId(WATSONX_PROJECT_ID)
    .baseUrl("https://us-south.ml.cloud.ibm.com")
    .modelId("ibm/granite-embedding-278m-multilingual")
    .build();

Using CloudRegion

Instead of manually specifying the baseUrl, you can use the CloudRegion to automatically configure the correct endpoint.

EmbeddingService embeddingService = EmbeddingService.builder()
    .apiKey(WATSONX_API_KEY)
    .projectId(WATSONX_PROJECT_ID)
    .baseUrl(CloudRegion.DALLAS)
    .modelId("ibm/granite-embedding-278m-multilingual")
    .build();

Builder Parameters

Parameter Type Required Description
apiKey String Conditional API key for IBM Cloud authentication
authenticator Authenticator Conditional Custom authentication (alternative to apiKey)
projectId String Conditional Project ID where the model is deployed
spaceId String Conditional Space ID (alternative to projectId)
baseUrl String/CloudRegion Yes watsonx.ai service base URL
modelId String Yes Embedding model ID
timeout Duration No Request timeout (default: 60 seconds)
logRequests Boolean No Enable request logging (default: false)
logResponses Boolean No Enable response logging (default: false)
httpClient HttpClient No Custom HTTP client
verifySsl Boolean No SSL certificate verification (default: true)
version String No API version override

Either apiKey or authenticator must be provided. Either projectId or spaceId must be specified.


Examples

Embedding a Single Input

The simplest use case — pass a single string and retrieve its vector representation.

EmbeddingResponse response = embeddingService.embedding("Embedding this!");
List<Float> vector = response.results().get(0).embedding();
System.out.println("Vector size: " + vector.size());
// → Vector size: 768

Embedding Multiple Inputs

Pass multiple strings in a single call. Results are returned in the same order as the inputs.

EmbeddingResponse response = embeddingService.embedding(
    "First input",
    "Second input",
    "Third input"
);

var firstEmbedding = response.results().get(0);
var secondEmbedding = response.results().get(1);
var thirdEmbedding = response.results().get(2);

System.out.println(firstEmbedding);
// → [0.01608275, 0.033017233, 0.01521849, 0.022984304, ...]

System.out.println(secondEmbedding);
// → [-0.0025639886, 0.018150007, -8.951856E-4, 0.030161599, ...]

System.out.println(thirdEmbedding);
// → [0.024885714, -0.005718433, 0.0036718687, 0.03666839, ...]

You can also pass a List<String>:

List<String> inputs = List.of("apple", "banana", "cherry");
EmbeddingResponse response = embeddingService.embedding(inputs);

Customizing Generation Parameters

Use EmbeddingParameters to control token truncation and whether to include the original input text in the response.

EmbeddingParameters parameters = EmbeddingParameters.builder()
    .truncateInputTokens(512)
    .inputText(true)
    .build();

EmbeddingResponse response = embeddingService.embedding(
    List.of("A very long document that might exceed the model's token limit..."),
    parameters
);

EmbeddingResponse.Result result = response.results().get(0);
System.out.println("Input text: " + result.input());
System.out.println("Vector: " + result.embedding());

Embedding Parameters

The EmbeddingParameters class allows you to fine-tune how inputs are processed.

Builder Reference

Parameter Type Description
truncateInputTokens Integer Maximum number of tokens per input. Inputs exceeding this limit are truncated from the right (the start is preserved).
inputText Boolean When true, each result includes the original input text in the input() field.
modelId String Override the default model for this request.
projectId String Override the default project ID for this request.
spaceId String Override the default space ID for this request.
transactionId String Request tracking ID.
crypto Crypto Encryption configuration.

EmbeddingResponse

The EmbeddingResponse contains the generated vectors and usage metadata.

Field Type Description
modelId() String The model used to generate the embeddings
createdAt() String Timestamp of when the embeddings were generated
results() List<Result> One result per input, in the same order as the request
inputTokenCount() Integer Total number of input tokens processed across all inputs

Each Result in the list exposes:

Field Type Description
embedding() List<Float> The vector representation of the input text
input() String The original input text (only populated when inputText(true) is set)


Back to top

Copyright 2025 IBM Corporation. Licensed under the Apache License 2.0.

This site uses Just the Docs, a documentation theme for Jekyll.