Chat Service

The ChatService provides functionality to interact with IBM watsonx.ai foundation models for conversational AI applications. It supports synchronous and streaming chat completions, tool calling, reasoning, and structured outputs.

Quick Start

ChatService chatService = ChatService.builder()
    .apiKey(WATSONX_API_KEY)
    .projectId(WATSONX_PROJECT_ID)
    .baseUrl(CloudRegion.DALLAS)
    .modelId("ibm/granite-4-h-small")
    .build();

ChatResponse response = chatService.chat("Hello! How are you?");
System.out.println(response.toAssistantMessage().content());
// → Hello! How can I help you today?

Note: To see the list of available models, refer to Supported Foundation Models.

Overview

The ChatService enables you to:

Build conversational AI applications with multi-turn dialogue.
Stream responses in real-time for interactive experiences.
Enable models to call external functions and tools.
Maintain conversation history and context.
Configure generation parameters for customized outputs.
Handle structured JSON responses.
Support reasoning capabilities for complex problem-solving.

Service Configuration

Basic Setup

To start using the Chat Service, create a ChatService instance with the minimum required configuration. The following example shows the essential parameters needed to authenticate and select a model. Additional configuration options are described below.

ChatService chatService = ChatService.builder()
    .apiKey(WATSONX_API_KEY)
    .projectId(WATSONX_PROJECT_ID)
    .baseUrl("https://us-south.ml.cloud.ibm.com")
    .modelId("ibm/granite-4-h-small")
    .build();

Using CloudRegion

Instead of manually specifying the baseUrl, you can use the CloudRegion to automatically configure the correct endpoint for your IBM Cloud region. This is more convenient and less error-prone.

ChatService chatService = ChatService.builder()
    .apiKey(WATSONX_API_KEY)
    .projectId(WATSONX_PROJECT_ID)
    .baseUrl(CloudRegion.DALLAS)
    .modelId("ibm/granite-4-h-small")
    .build();

Builder Parameters

Parameter	Type	Required	Description
`apiKey`	String	Conditional	API key for IBM Cloud authentication
`authenticator`	Authenticator	Conditional	Custom authentication (alternative to `apiKey`)
`projectId`	String	Conditional	Project ID where the model is deployed
`spaceId`	String	Conditional	Space ID where the model is deployed (alternative to `projectId`)
`baseUrl`	String/CloudRegion	Yes	watsonx.ai service base URL
`modelId`	String	Yes	Foundation model ID
`timeout`	Duration	No	Request timeout (default: 60 seconds)
`parameters`	ChatParameters	No	Default parameters applied to all requests
`tools`	List<Tool>	No	Default tools available to the model
`messageInterceptor`	MessageInterceptor	No	Modify assistant messages before returning
`toolInterceptor`	ToolInterceptor	No	Normalize/modify tool call arguments
`logRequests`	Boolean	No	Enable request logging (default: false)
`logResponses`	Boolean	No	Enable response logging (default: false)
`httpClient`	HttpClient	No	Custom HTTP client
`verifySsl`	Boolean	No	SSL certificate verification (default: true)
`version`	String	No	API version override

Either apiKey or authenticator must be provided. Either projectId or spaceId must be specified.

Advanced Configuration

You can configure default parameters and tools that will automatically apply to every chat request created by this ChatService instance. These defaults simplify reuse and ensure consistent behavior across multiple calls.

ChatParameters defaultParameters = ChatParameters.builder()
    .maxCompletionTokens(1000)
    .temperature(0.7)
    .build();

Tool emailTool = Tool.of(
    "send_email",
    "Send an email",
    JsonSchema.object()
        .property("to", JsonSchema.string())
        .property("subject", JsonSchema.string())
        .property("body", JsonSchema.string())
        .required("to", "subject", "body")
);

var chatService = ChatService.builder()
    .apiKey(WATSONX_API_KEY)
    .projectId(WATSONX_PROJECT_ID)
    .baseUrl(CloudRegion.DALLAS)
    .modelId("ibm/granite-4-h-small")
    .parameters(defaultParameters)
    .tools(emailTool)
    .build();

Message Types

The ChatService uses structured message objects to represent all interactions in a conversation. Each message type serves a specific role, ensuring that conversation flows are consistent and easy to manage.

SystemMessage – defines the assistant’s behavior and personality before the conversation begins. Use this to prime the model with instructions or context.
UserMessage – represents input from a user, which can include text, images, video, or audio. A single UserMessage can contain multiple content elements.
AssistantMessage – represents a response from the assistant, which can include text, reasoning information, and any tool calls executed during the conversation.
ToolMessage – represents a response from a tool invoked by the assistant.

Tip: Always start your conversation with a SystemMessage to set clear instructions for the assistant. Default behavior, content, and context can then be extended with UserMessage inputs, and responses are represented by AssistantMessage and ToolMessage.

SystemMessage

Sets the assistant’s behavior and personality.

SystemMessage.of("You are a helpful assistant specialized in programming.");

UserMessage

Sends text or multimodal content.

// Plain text
UserMessage.text("Hello!");

// With image from file
UserMessage.of(
    TextContent.of("Describe this image"),
    ImageContent.from(new File("image.jpg"))
);

// Shorthand image with Path
UserMessage.image("Analyze this image", Paths.get("image.png"));

AssistantMessage

Represents the model’s response.

AssistantMessage assistantMessage = response.toAssistantMessage();

String content = assistantMessage.content();
String thinking = assistantMessage.thinking(); // Available when reasoning is enabled
boolean hasTools = assistantMessage.hasToolCalls();
List<ToolCall> tools = assistantMessage.toolCalls();

Examples

Simple Chat

The simplest possible interaction — send a single message and get a response. This is perfect for one-off questions or when you don’t need to maintain conversation history.

ChatResponse response = chatService.chat("What is the capital of France?");
System.out.println(response.toAssistantMessage().content());
// → Paris is the capital of France.

Multi-Turn Conversation

For more natural interactions, maintain a conversation history so the model can remember previous messages and provide context-aware responses.

var conversation = new ArrayList<ChatMessage>();
conversation.add(SystemMessage.of("You are a helpful assistant"));
conversation.add(UserMessage.text("What is the capital of France?"));

var response = chatService.chat(conversation);
conversation.add(response.toAssistantMessage());

System.out.println(response.toAssistantMessage().content());
// → The capital of France is Paris.

conversation.add(UserMessage.text("What is its population?"));
response = chatService.chat(conversation);

System.out.println(response.toAssistantMessage().content());
// → Paris has a population of approximately 2.2 million people...

Customizing Generation Parameters

Parameters let you fine-tune the generation behavior — shorter answers, more creative output, or deterministic results.

var parameters = ChatParameters.builder()
    .maxCompletionTokens(100)
    .temperature(0.3)
    .topP(0.9)
    .build();

List<ChatMessage> messages = List.of(
    SystemMessage.of("You are a concise assistant"),
    UserMessage.text("Explain quantum computing")
);

var response = chatService.chat(messages, parameters);

Streaming

Streaming lets you display text as it’s generated instead of waiting for the complete response, creating a more responsive user experience.

Simple Streaming

Pass a Consumer<String> to receive each text chunk as it arrives:

CompletableFuture<ChatResponse> future = chatService.chatStreaming(
    List.of(UserMessage.text("Tell me a story about a robot")),
    System.out::print
);

ChatResponse finalResponse = future.get();

Streaming with ChatHandler

For more control over the streaming process — metadata, finish reasons, tool call fragments, error handling — implement ChatHandler:

chatService.chatStreaming(
    messages,
    new ChatHandler() {
        @Override
        public void onPartialResponse(String text, PartialChatResponse partial) {
            System.out.print(text);
        }

        @Override
        public void onCompleteResponse(ChatResponse response) {
            System.out.println("Total tokens: " + response.usage().totalTokens());
        }

        @Override
        public void onError(Throwable error) {
            System.err.println("Error: " + error.getMessage());
        }
    }
);

Callback	Required	Description
`onPartialResponse`	Yes	Called for each text chunk as it arrives
`onCompleteResponse`	No	Called once when streaming completes successfully
`onError`	No	Called when an error occurs
`onPartialToolCall`	No	Called for each fragment of a streaming tool call
`onCompleteToolCall`	No	Called once per tool when arguments are fully assembled
`onPartialThinking`	No	Called for each chunk of reasoning content
`failOnFirstError`	No	Return `true` to stop streaming on first error (default: `false`)

Threading note: All callbacks execute sequentially. On Java 21+, virtual threads are used by default. Custom executors can be configured via the CallbackExecutorProvider SPI.

Tool Calling

Tool calling enables the model to invoke external functions instead of just generating text. The model decides when an action is needed — querying a database, calling an API, sending a message — and returns a structured tool call that your code executes.

Basic Tool Calling

Define a tool, pass it to the request, and handle the tool call in a loop:

Tool emailTool = Tool.of(
    "send_email",
    "Send an email to a recipient",
    JsonSchema.object()
        .property("to", JsonSchema.string("Email address"))
        .property("subject", JsonSchema.string("Email subject"))
        .property("body", JsonSchema.string("Email body"))
        .required("to", "subject", "body")
);

List<ChatMessage> messages = new ArrayList<>(List.of(
    SystemMessage.of("You are a helpful assistant"),
    UserMessage.text("Send an email to john@example.com with body \"Hello from watsonx.ai\"")
));

ChatResponse response = chatService.chat(messages, List.of(emailTool));
AssistantMessage assistantMsg = response.toAssistantMessage();

if (assistantMsg.hasToolCalls()) {

    List<ToolMessage> toolMessages = assistantMsg.processTools((toolName, args) -> {
        sendEmail(args.get("to"), args.get("subject"), args.get("body"));
        return "Email sent successfully to " + args.get("to");
    });

    messages.add(assistantMsg);
    messages.addAll(toolMessages);
    response = chatService.chat(messages, List.of(emailTool));
}

System.out.println(response.toAssistantMessage().content());
// → The email has been sent successfully to john@example.com.

Guided Choice (Constrained Output)

When you need the model to choose from a specific set of options, use guided choice. This is ideal for classification tasks, yes/no questions, or any scenario where you want to constrain the possible outputs.

ChatParameters parameters = ChatParameters.builder()
    .guidedChoice("Yes", "No")
    .build();

ChatRequest request = ChatRequest.builder()
    .messages(UserMessage.text("Is 2 + 2 equal to 5?"))
    .parameters(parameters)
    .build();

String answer = chatService.chat(request).toAssistantMessage().content();
System.out.println(answer);
// → "No"

Interceptors

Interceptors run automatically after every non-streaming response, before the result is returned to your application. They are configured once on the service builder and apply transparently to all subsequent calls. Both are @FunctionalInterface — pass a lambda directly.

Message Interceptor

MessageInterceptor lets you modify or sanitize the assistant’s text content. Common uses: stripping whitespace, filtering unwanted patterns, normalizing formatting.

Note: MessageInterceptor applies to non-streaming requests only. For streaming, process the content directly inside ChatHandler callbacks.

ChatService chatService = ChatService.builder()
    // ...
    .messageInterceptor((ctx, message) -> message == null ? "" : message.strip())
    .build();

Tool Interceptor

ToolInterceptor lets you modify tool call arguments before they are executed or returned. Common uses: input validation, unwrapping double-encoded JSON, normalizing values.

ChatService chatService = ChatService.builder()
    // ...
    .toolInterceptor((ctx, functionCall) -> {
        var args = functionCall.arguments();
        // Unwrap double-encoded JSON strings if present
        return args != null && args.startsWith("\"")
            ? functionCall.withArguments(Json.fromJson(args, String.class))
            : functionCall;
    })
    .build();

InterceptorContext

Both interceptors receive an InterceptorContext as their first argument, which provides access to the current request, the current response, and a way to invoke the model again.

Method	Description
`ctx.request()`	The original `ChatRequest` that triggered this response
`ctx.response()`	An `Optional<ChatResponse>` with the current response
`ctx.invoke(ChatRequest)`	Sends a new request to the model and returns its response

ctx.invoke() reuses the same ChatService instance — same model, project, base URL, and default parameters — so you can add a second reasoning step without instantiating anything new. Per-request overrides are still possible via ChatParameters:

ChatService chatService = ChatService.builder()
    .apiKey(WATSONX_API_KEY)
    .projectId(WATSONX_PROJECT_ID)
    .baseUrl(CloudRegion.DALLAS)
    .modelId("ibm/granite-4-h-small")
    .messageInterceptor((ctx, message) -> {

        // Override the model just for this verification call
        var verificationParams = ChatParameters.builder()
            .modelId("mistralai/mistral-small-3-1-24b-instruct-2503")
            .guidedChoice("PASS", "FAIL")
            .build();

        var verificationRequest = ChatRequest.builder()
            .parameters(verificationParams)
            .messages(
                SystemMessage.of("You are a fact-checker. Reply with PASS or FAIL."),
                UserMessage.text("Is this response factually correct?\n\n" + message))
            .build();

        var verdict = ctx.invoke(verificationRequest).toAssistantMessage().content();
        return verdict.equals("FAIL")
            ? "I'm not confident in my answer. Please consult an expert."
            : message;
    })
    .build();

chatService.chat("Does water boil on the Moon?");

ctx.invoke() counts as a separate API call and consumes additional tokens. Use it when the benefit — validation, rewriting, classification — justifies the cost.

Structured Output

When you need the model to return data in a specific format, use structured output. The model is constrained to produce valid JSON, making it straightforward to deserialize the response directly into your domain objects.

JSON Mode

Enable JSON mode to instruct the model to always produce a valid JSON object. Define the expected structure in your system prompt:

record Response(String name, List<String> useCases) {}

ChatParameters parameters = ChatParameters.builder()
    .responseAsJson()
    .build();

List<ChatMessage> messages = List.of(
    SystemMessage.of("You are a helpful assistant that outputs JSON"),
    UserMessage.text("""
        Give me a programming language with their use cases.
        Use the following JSON format:
        {
            "name": ...
            "use_cases": [...]
        }""")
);

ChatResponse response = chatService.chat(messages, parameters);
System.out.println(response.toAssistantMessage().toObject(Response.class));
// → Response[name=Python, useCases=[Web development, Data analysis, ...]]

JSON Schema Mode

For stricter control, provide a schema that defines exactly what structure you expect. The model will generate output that conforms to the schema:

JsonSchema schema = JsonSchema.array().items(JsonSchema.string()).build();

ChatParameters parameters = ChatParameters.builder()
    .responseAsJsonSchema(schema)
    .build();

List<ChatMessage> messages = List.of(
    SystemMessage.of("You are a helpful assistant"),
    UserMessage.text("Give me three programming languages")
);

ChatResponse response = chatService.chat(messages, parameters);
var languages = response.toAssistantMessage().toObject(TypeToken.listOf(String.class));
System.out.println(languages);
// → ["Python", "JavaScript", "Java"]

Note: By default, Jackson uses snake_case for JSON property names. Make sure the field names in your prompt and schema follow the same convention (e.g., use_cases instead of useCases) to ensure correct deserialization.

Vision

Vision-enabled models can analyze images alongside text — useful for image description, visual question answering, OCR, and more. Include an image directly in the UserMessage:

ChatService chatService = ChatService.builder()
    .apiKey(WATSONX_API_KEY)
    .projectId(WATSONX_PROJECT_ID)
    .baseUrl(CloudRegion.DALLAS)
    .modelId("mistralai/mistral-medium-2505")  // Vision-capable model
    .build();

var message = UserMessage.image(
    "Give a short description of the image",
    Paths.get("/path/to/image.jpg")
);

var response = chatService.chat(message);
System.out.println(response.toAssistantMessage().content());

Model compatibility: Not all models support vision. Check the Supported Foundation Models page before using this feature.

Reasoning / Thinking Mode

Some foundation models can include internal reasoning (also called “thinking”) steps as part of their response. Depending on the model, this reasoning may be embedded in the same text as the final response, or returned separately in a dedicated field.

There are two configuration modes:

ExtractionTags — for models that return reasoning and response in the same text block.
ThinkingEffort / Boolean — for models that already separate reasoning and response automatically.

Models that mix reasoning and response in the same text

Use ExtractionTags when the model outputs reasoning and response together as a single string. The tags define XML-like markers used to separate the two parts:

ChatService chatService = ChatService.builder()
    .apiKey(WATSONX_API_KEY)
    .projectId(WATSONX_PROJECT_ID)
    .baseUrl(CloudRegion.DALLAS)
    .modelId("ibm/granite-3-3-8b-instruct")
    .build();

ChatRequest request = ChatRequest.builder()
    .messages(UserMessage.text("Why is the sky blue?"))
    .thinking(ExtractionTags.of("think", "response"))
    .build();

ChatResponse response = chatService.chat(request);
AssistantMessage message = response.toAssistantMessage();

System.out.println("Reasoning: " + message.thinking());
System.out.println("Answer:    " + message.content());

Tag behavior:

Both tags specified: extract reasoning from the first tag, response from the second.
Only the reasoning tag specified: everything outside that tag is treated as the response.

Streaming with ExtractionTags:

chatService.chatStreaming(request, new ChatHandler() {
    @Override
    public void onPartialThinking(String chunk, PartialChatResponse partial) {
        System.out.print(chunk);  // Streams the reasoning in real-time
    }

    @Override
    public void onPartialResponse(String chunk, PartialChatResponse partial) {
        System.out.print(chunk);  // Streams the answer in real-time
    }
});

Models that return reasoning and response as separate fields

For models that already separate reasoning from response, use ThinkingEffort to control how much reasoning the model applies, or enable it with a boolean flag:

ChatService chatService = ChatService.builder()
    .apiKey(WATSONX_API_KEY)
    .projectId(WATSONX_PROJECT_ID)
    .baseUrl(CloudRegion.DALLAS)
    .modelId("openai/gpt-oss-120b")
    .build();

ChatRequest request = ChatRequest.builder()
    .messages(UserMessage.text("Why is the sky blue?"))
    .thinking(ThinkingEffort.HIGH)
    .build();

AssistantMessage message = chatService.chat(request).toAssistantMessage();
System.out.println("Reasoning: " + message.thinking());
System.out.println("Answer:    " + message.content());

ToolRegistry

When working with multiple tools, ToolRegistry centralizes tool definitions and execution logic, making the agentic loop cleaner and easier to maintain.

Basic Usage

ToolService toolService = ToolService.builder()
    .apiKey(WATSONX_API_KEY)
    .baseUrl(CloudRegion.DALLAS)
    .build();

ToolRegistry toolRegistry = ToolRegistry.builder()
    .register(new GoogleSearchTool(toolService), new WebCrawlerTool(toolService))
    .build();

ChatService chatService = ChatService.builder()
    .apiKey(WATSONX_API_KEY)
    .projectId(WATSONX_PROJECT_ID)
    .baseUrl(CloudRegion.DALLAS)
    .modelId("ibm/granite-4-h-small")
    .tools(toolRegistry.tools())
    .build();

List<ChatMessage> messages = new ArrayList<>();
messages.add(SystemMessage.of("You are a helpful assistant"));
messages.add(UserMessage.text("Is there a watsonx.ai Java SDK?"));

AssistantMessage assistant = chatService.chat(messages).toAssistantMessage();
messages.add(assistant);

while (assistant.hasToolCalls()) {
    messages.addAll(assistant.processTools(toolRegistry::execute));
    assistant = chatService.chat(messages).toAssistantMessage();
    messages.add(assistant);
}

System.out.println(assistant.content());
// → Yes – IBM publishes a **Java SDK for watsonx.ai** ...

Creating Custom Tools

Implement ExecutableTool to define your own tools for use with ToolRegistry:

public class WeatherTool implements ExecutableTool {

    @Override
    public String name() {
        return "get_weather";
    }

    @Override
    public Tool schema() {
        return Tool.of(
            "get_weather",
            "Get current weather for a location",
            JsonSchema.object()
                .property("location", JsonSchema.string("City name"))
                .property("unit", JsonSchema.string("celsius or fahrenheit"))
                .required("location")
                .build()
        );
    }

    @Override
    public String execute(ToolArguments args) {
        String location = args.get("location");
        // ... call weather API
        return "The weather in " + location + " is ...";
    }
}

Lifecycle Callbacks

ToolRegistry supports three lifecycle callbacks for monitoring and controlling tool execution:

ToolRegistry registry = ToolRegistry.builder()
    .register(new WeatherTool())
    .beforeExecution((toolName, toolArgs) -> System.out.println("Calling: " + toolName))
    .afterExecution((toolName, toolArgs, result) -> System.out.println("Result: " + result))
    .onError((toolName, toolArgs, error) -> System.err.println(toolName + " failed: " + error.getMessage()))
    .build();

Selective Tool Registration

ToolRegistry registry = ToolRegistry.builder()
    .register(new WeatherTool(), new SearchTool(), new CalculatorTool())
    .build();

// Use all tools
ChatService chatService = ChatService.builder()
    .tools(registry.tools())
    .build();

// Use only specific tools
ChatService limitedService = ChatService.builder()
    .tools(registry.tools("get_weather", "search"))
    .build();

Chat Parameters

ChatParameters allows you to fine-tune the behavior of chat requests — response length, creativity, repetition handling, output format, and more.

Builder Reference

Parameter	Type	Range	Description
`maxCompletionTokens`	Integer	≥ 0	Maximum tokens in the response (0 = model max)
`temperature`	Double	0.0 – 2.0	Randomness (0.0 = deterministic)
`topP`	Double	0.0 – 1.0	Nucleus sampling threshold
`frequencyPenalty`	Double	-2.0 – 2.0	Discourage frequent tokens
`presencePenalty`	Double	-2.0 – 2.0	Encourage new topics
`repetitionPenalty`	Double	> 1.0	Discourage repeated words/phrases
`lengthPenalty`	Double	Any	> 1.0 shorter, < 1.0 longer, 1.0 neutral
`stop`	List<String>	Max 4	Stop sequences to end generation
`seed`	Integer	Any	Random seed for reproducibility
`n`	Integer	≥ 1	Number of completions to generate
`logprobs`	Boolean	—	Return log probabilities
`topLogprobs`	Integer	≥ 1	Top token log probs (requires `logprobs=true`)
`logitBias`	Map<String, Integer>	—	Adjust token probabilities
`timeLimit`	Duration	Any	Maximum generation time
`toolChoiceOption`	ToolChoiceOption	AUTO, REQUIRED, NONE	Tool selection strategy
`toolChoice`	String	Tool name	Force a specific tool call
`guidedChoice`	Set<String>	Any	Constrain output to one of the given options
`guidedRegex`	String	Valid regex	Constrain output to a regex pattern
`guidedGrammar`	String	CFG grammar	Constrain output to a context-free grammar
`responseFormat`	—	—	Use `responseAsText()`, `responseAsJson()`, `responseAsJsonSchema()`
`modelId`	String	—	Override default model for this request
`projectId`	String	—	Override default project for this request
`spaceId`	String	—	Override default space for this request
`transactionId`	String	—	Request tracking ID
`crypto`	Crypto	—	Encryption configuration