Embedding Service
The EmbeddingService provides functionality to generate text embeddings using IBM watsonx.ai encoder models. It converts text inputs into dense vector representations that can be used for semantic search, similarity comparison, clustering, and retrieval-augmented generation (RAG).
Quick Start
EmbeddingService embeddingService = EmbeddingService.builder()
.apiKey(WATSONX_API_KEY)
.projectId(WATSONX_PROJECT_ID)
.baseUrl(CloudRegion.DALLAS)
.modelId("ibm/granite-embedding-278m-multilingual")
.build();
EmbeddingResponse response = embeddingService.embedding("Hello, world!");
System.out.println(response.results().get(0).embedding());
// → [-0.029937625, 0.05433679, 0.013135133, 0.018311847, ...]
Note: To see the list of available embedding models, refer to supported encoder models.
Overview
The EmbeddingService enables you to:
- Embed single or multiple text inputs in a single request.
- Process large batches automatically.
- Configure token truncation to handle long inputs gracefully.
- Optionally return the original input text alongside each embedding vector.
- Build semantic search, similarity, and RAG applications.
Service Configuration
Basic Setup
To start using the Embedding Service, create an EmbeddingService instance with the minimum required configuration.
EmbeddingService embeddingService = EmbeddingService.builder()
.apiKey(WATSONX_API_KEY)
.projectId(WATSONX_PROJECT_ID)
.baseUrl("https://us-south.ml.cloud.ibm.com")
.modelId("ibm/granite-embedding-278m-multilingual")
.build();
Using CloudRegion
Instead of manually specifying the baseUrl, you can use the CloudRegion to automatically configure the correct endpoint.
EmbeddingService embeddingService = EmbeddingService.builder()
.apiKey(WATSONX_API_KEY)
.projectId(WATSONX_PROJECT_ID)
.baseUrl(CloudRegion.DALLAS)
.modelId("ibm/granite-embedding-278m-multilingual")
.build();
Builder Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
apiKey | String | Conditional | API key for IBM Cloud authentication |
authenticator | Authenticator | Conditional | Custom authentication (alternative to apiKey) |
projectId | String | Conditional | Project ID where the model is deployed |
spaceId | String | Conditional | Space ID (alternative to projectId) |
baseUrl | String/CloudRegion | Yes | watsonx.ai service base URL |
modelId | String | Yes | Embedding model ID |
timeout | Duration | No | Request timeout (default: 60 seconds) |
logRequests | Boolean | No | Enable request logging (default: false) |
logResponses | Boolean | No | Enable response logging (default: false) |
httpClient | HttpClient | No | Custom HTTP client |
verifySsl | Boolean | No | SSL certificate verification (default: true) |
version | String | No | API version override |
Either
apiKeyorauthenticatormust be provided. EitherprojectIdorspaceIdmust be specified.
Examples
Embedding a Single Input
The simplest use case — pass a single string and retrieve its vector representation.
EmbeddingResponse response = embeddingService.embedding("Embedding this!");
List<Float> vector = response.results().get(0).embedding();
System.out.println("Vector size: " + vector.size());
// → Vector size: 768
Embedding Multiple Inputs
Pass multiple strings in a single call. Results are returned in the same order as the inputs.
EmbeddingResponse response = embeddingService.embedding(
"First input",
"Second input",
"Third input"
);
var firstEmbedding = response.results().get(0);
var secondEmbedding = response.results().get(1);
var thirdEmbedding = response.results().get(2);
System.out.println(firstEmbedding);
// → [0.01608275, 0.033017233, 0.01521849, 0.022984304, ...]
System.out.println(secondEmbedding);
// → [-0.0025639886, 0.018150007, -8.951856E-4, 0.030161599, ...]
System.out.println(thirdEmbedding);
// → [0.024885714, -0.005718433, 0.0036718687, 0.03666839, ...]
You can also pass a List<String>:
List<String> inputs = List.of("apple", "banana", "cherry");
EmbeddingResponse response = embeddingService.embedding(inputs);
Customizing Generation Parameters
Use EmbeddingParameters to control token truncation and whether to include the original input text in the response.
EmbeddingParameters parameters = EmbeddingParameters.builder()
.truncateInputTokens(512)
.inputText(true)
.build();
EmbeddingResponse response = embeddingService.embedding(
List.of("A very long document that might exceed the model's token limit..."),
parameters
);
EmbeddingResponse.Result result = response.results().get(0);
System.out.println("Input text: " + result.input());
System.out.println("Vector: " + result.embedding());
Embedding Parameters
The EmbeddingParameters class allows you to fine-tune how inputs are processed.
Builder Reference
| Parameter | Type | Description |
|---|---|---|
truncateInputTokens | Integer | Maximum number of tokens per input. Inputs exceeding this limit are truncated from the right (the start is preserved). |
inputText | Boolean | When true, each result includes the original input text in the input() field. |
modelId | String | Override the default model for this request. |
projectId | String | Override the default project ID for this request. |
spaceId | String | Override the default space ID for this request. |
transactionId | String | Request tracking ID. |
crypto | Crypto | Encryption configuration. |
EmbeddingResponse
The EmbeddingResponse contains the generated vectors and usage metadata.
| Field | Type | Description |
|---|---|---|
modelId() | String | The model used to generate the embeddings |
createdAt() | String | Timestamp of when the embeddings were generated |
results() | List<Result> | One result per input, in the same order as the request |
inputTokenCount() | Integer | Total number of input tokens processed across all inputs |
Each Result in the list exposes:
| Field | Type | Description |
|---|---|---|
embedding() | List<Float> | The vector representation of the input text |
input() | String | The original input text (only populated when inputText(true) is set) |