Lab guide for document processing workflow¶

Overview¶

This lab focuses on a practical use case involving document classification, followed by key data extraction from both unstructured documents (primarily contracts) and structured ones (such as invoices). At the outset, the guide outlines the sequence of activities required to support the user journey of extracting essential information from uploaded documents.

In this lab scenario, a financial services firm processes numerous contracts and invoices weekly, extracting critical information like payment terms, amounts, and due dates. Currently, this is done manually, consuming significant employee time and increasing the risk of errors. With growing document volumes, the firm faces scalability challenges and inefficiencies in their processes.

Pre-requisites¶

Make sure you've already setup the environment:
Lab 0 - Environment setup
ADK Installation
Download files
Download the document-extraction-lab.zip file from Lab1 folder.

Reference Architecture¶

Key Components¶

Agents

Document Processing Agent (Main Agent): Manages end-to-end document ingestion, classification, extraction, and display.

Workflow

Document Classifier: Distinguishes between document types (Invoice vs. Contract).
Document Extractor (Contract Extractor): Extracts fields such as Buyer, Supplier, and Effective Date from contract documents.
Document Extractor (Invoice Extractor): Extracts Invoice Number, Terms, Description, and Bill To address from invoice documents.
Branch Node: Routes the workflow to the appropriate extractor based on document type.
Display to User Activity: Displays extracted fields in a readable message format for end users.
End Node: Returns the final classification result (Contract or Invoice) as output.

Steps¶

Create an Agent¶

Open the watsonx Orchestrate UI. Click on Create new agent on the bottom left.
Enter Document processing agent into the Name field (A), then enter This agent is able to classify documents and extract core fields to retrieve into the Description field (B) and click Create (C).

The Agent Builder opens; the screen is divided into three key areas:

Navigation Panel (A): move between different sections of Agent Builder.
Configuration (B): set up and customize the functionality of your agent.
Preview (C): preview and test your agent.

Below is a description of each section:

Profile: Define your agents purpose, usage scenarios, and interaction style. Describe what the agent does, when it should be used (especially in multi-agent configurations), and choose its style Default or ReAct guide how it interprets requests, plans, and uses tools.
Knowledge: Equip your agent with knowledge by uploading files or connecting to conversational search platforms such as Milvus, Elasticsearch, or custom-built sources. This ensures the agent can generate accurate, contextual responses by drawing on relevant content.
Toolset: Provide your agent with tools to perform tasks. Tools can be added from the Catalog, imported from OpenAPI specification files or MCP servers, or built with custom flows. Tools extend the agents capabilities, enabling it to automate actions such as retrieving data or sending emails.
Behavior: Define how the agent interacts with users, formats data, and handles requests. Add rules and instructions to shape its tone, response style, and overall behavior during interactions.
Channels: Connect your agent to communication platforms like Slack or embed it in a website.

Create an agentic workflow¶

Click Toolset (A). Add click add tool (B).
Click on Create an agentic workflow (A).
Click the pencil icon (A) to rename the workflow.
Type Document processing in the Name (A). Type This flow classifies and extracts values from documents. In the Description (B). Scroll down to the Output section (C).

Define the process output¶

Scrolling down, you’ll find the Input and Output sections, which apply to the entire flow. In this case, no inputs are required since they are already handled within the flow itself through the user activity file upload. For the output, instead of using a text output node, the flow can return the classification result, defined as a string.

Click Add output (A). and select String (B).
Type “Class_name” in the Name field (A). Click Add (B). Then click Save (C).

Design the workflow diagram¶

In this section, you will add the different workflow activities including the document classifier and the 2 extractors.

Move the cursor over the link until the + icon appears. Click the + to add an activity (A). The first activity of the process consists in uploading the document to process. It will be implemented as a user activity.
Select Create new (A) and then select User activity (B).

Note

A green box appears, indicating the user activity. This will contain the series of activities that the user must perform whilst interacting with the agent.
Move the cursor on the user activity link and click + (A) to add the activities the user will have to perform.
Select the Interactions tab (A) then click File upload (B).

The first activity of the process is now defined. The next step is to classify the document the user will have provided. To do this, you will add a watsonx Orchestrate Document Classifier. op-14

Add a document classifier¶

In this section, you will add a document classifier to identify Invoice documents from the Contract ones.

Move the cursor over the last link of the process and click + (A). Select Create new (B) then click Document classifier (C).
Click the Document classifier node (A).
Click the edit icon (A).
Click Add class (A).
Type Contract (A).
Click Add class again and then type ‘Invoice’ (A).

You are now ready to test your document classifier.
Click Test classifier (A).
Drag and drop the two sample files, Invoice.pdf and Contract.pdf from your local machine location (A).
Wait for the documents to be analyzed and classified (A).

Note

The two documents have been successfully classified as a Contract class for contract.pdf and an Invoice class for the Invoice.pdf one.
Click Done (A).

The next step is to route the process on the right extractor path depending on the document type. To do this, you will add a branch node responsible for the triage.

Add a Branch node¶

Move the cursor over the last link of the diagram then click + (A).
Click Branch (A).
Click the Branch 1 node (A).

Note

When creating a branch from scratch, two paths are generated by default, but additional paths can be added as needed.
Move the cursor over the Branch 1 name then click the edit icon (A)
Type Document type in the node name field (A) then hit Enter.
Click the Path 1 row (A).
Type Invoice (A).
Click Edit condition (A) to edit the condition and select invoice documents.
Click the + icon (A) to add conditions.

You will use the result of the Document classifier step as the variable to evaluate for the routing.
Click Document classifier (A) and then click class_name (B).
Select the ‘==’ operator (A).
Click + to specify the value to check (A).
Type ‘Invoice’ (A) then hit Enter.

The process will be routed in this branch if the document type is Invoice.

Note

Conditions can also be edited with the expression editor ( icon). The equivalent expression to enter using the expression editor will then be: flow["Document classifier"].output.class_name == "Contract"
Click the Back icon (A).

Let’s just rename the second path Contract. As you will just have 2 types of documents, any document that is not an invoice (i.e. contracts) will be routed to this branch.
Click the Path 2 row (A).
Type Contract (A) then hit Return.

You are now ready to add the 2 document extractors, one for each document type.

Add a document extractor¶

In this section, you will create two document extractors. Each document extractor will be responsible for extracting specific data from each document class:

Invoice:

Invoice #
Terms
Description

Contract:

Buyer
Supplier
Effective date

Click Add + (A).
Select Create new (A) then click Document extractor (B).
Click the Document extractor node (A) then click the edit icon (B).
Click the Document extractor name (A) to edit it.
Type ‘Contract extractor’ (A) as a new name then click the save icon (B).
Drag and drop the Contract.pdf file (A) from your computer to the dropping area.
Wait for the document to be processed.
Once uploaded, the sample document is displayed, and the page is divided into the following key segments:
1. Activity Name – Displays the name of the document extractor activity. Rename it to reflect the sample document you've uploaded.
2. LLM Model Selection – Choose your preferred LLM model from the drop-down menu. You can switch models at any time while prompting them for document extraction.
3. Field Definitions – Define the list of fields you want to extract from the sample document. The selected LLM will retrieve the corresponding values.
4. Document Navigation & Upload – Use the drop-down to navigate between uploaded documents. You can upload up to five additional documents for prompting and testing.
5. Document Viewer: Shows the uploaded document with the specified fields for extraction highlighted.
Click the model expand icon (A).

Tip

You can change the LLM model by clicking on the ‘Model’ drop-down. Here you’ll see all the available models that you can select. You can identify the current model that is being used, as you’ll see a tick icon. You can change the model at any time while you test to see which one is most accurate at extracting fields. For now keep the meta-lama model.

Let’s now define the fields to extract.
Click Add field (A).
Type Buyer (A) and hit Return.

Note

The LLM will retrieve a value associated with the key you define here, which will auto-highlight in the document preview.
In the same way, add the 2 following fields:
- Supplier
- Effective date
Your screen should look like:
Click x (A) to close the extractor.
Click in the background to close the Contract extractor property view (A).

Next, you will create the Invoice extractor in the corresponding branch of the process.
Move the mouse over the invoice branch and click + (A)
Select Document extractor (A).
Click the Document extractor node a move it for a nicer layout (A).
Repeat the from step 3 to:
- Rename the extractor ‘Invoice Extractor’.
- Add the Invoice.pdf document to the extractor.
You should have the following screen:

Let’s now add the fields to extract.
Click Add field (A).
Type Invoice (A) and hit Return.
The invoice number should have been recognized (A):
Repeat from Step 20 to add the following fields:
- Terms
- Description
You will now add the Bill to field. Click Add + (A)
Type ‘Bill to’ (A) end press Enter.
Observe the result (A).

The entire billing address was not correctly highlighted. Let’s train the model to recognize the address field.
Hover your mouse over the ‘Bill to’ field and click on the edit icon (A).
The left pane appears (A) so you can re-prompt your model to more accurately extract the value for the selected field. Let’s Add example showing where and how to extract the Bill to information.
Click Add example (A).

The ‘Input’ and ‘Output’ sections will appear, allowing you to specify the exact key-value pair you want to extract from the sample document. This helps to re-prompt the model to accurately capture the full value for the selected field.
Click inside the ‘Input’ field (A) to enable ‘Select to Copy’ mode, then draw a box around the entire key-value pair (B) on the document to define what should be extracted.

As soon as you release the mouse button the entire text is captured in the Input Field. The model now knows where to find the information. But the ‘Bill to’ text is not required. You will now teach the model what the output should be for this precise example.
Click inside the ‘Output’ field to enable ‘Select to Copy’ mode (A) then draw a box around the address section (B) on the document to define what should be extracted (excluding ‘Bill to’).
Once highlighted, release the mouse button and this will auto-populate the Output section (A).
Click the Show on Document button (A) to let the LLM retry the extraction.

Note

The LLM will use the re-prompt the key-value pair and highlight the updated value in the document preview.
The ‘Bill To’ address is re-highlighted (A) after prompting the model, indicating that the LLM has been successfully trained to extract this information. Click on the ‘back’ icon (B) to return to the original page.

Note

The model can only be prompted using unstructured data. Re-prompting on structured formats, like tables, is not currently supported. Please check with your instructors who can guide you with the ProductRoadmap for more details.
All the fields that you’ve defined are highlighted on the sample document. Once complete, click X to close the window (A).
Click on the diagram background to close the Invoice extractor property view (A).

You will now add the user display activities for each branch.

Add a Display to user activity¶

To complete the flow, we need to add a final activity that presents the extracted values to the user. This step allows us to format and display the results clearly, based on the core fields that we defined when we prompted the model for each document type.

Hover the last link in the Contract extractor branch and click + (A).
Select Create new (A) then click User activity (B).
Click + (A) to add a new action in the user activity node.
Click Display to user (A).
Select Message (A).
Click the Message box to edit its content (A).
Hover the Message 1 name (A) then click the edit button (B) to rename it.
Type Extracted contract fields (A) and hit Return.
Type the following to create a table output:
```
Buyer|Supplier|Effective date
--|--|--
{flow["Contract extractor"].output.buyer}|{flow["Contract extractor"].output.supplier}|{flow["Contract extractor"].output.effective_date}
```
To display the results for the field, you must assign a variable to it. Click on the Select variable ‘[x]’ button and assign the variables.
Your screen should look like (A):
Click the background to remove the property view (A).
Repeating from Step 1, create a user activity in the Invoice branch (A) to display the following fields:
- Invoice #
- Terms
- Description
- Bill to
Note

The invoice variables will be under the Invoice extractor folder.
You should get the following result:

Define the end flow¶

Now that the output message has been defined for both file types, we can now conclude the flow by updating the end node. The end node will just return the Document type (i.e. Contract/Invoice).

Click the End node (A).
Click Edit data mapping (A).

Note that the output, ‘class_name’ is auto-mapped by the LLM (A). This means that the LLM will attempt to automatically link the input/outputs.

Note

The auto-map feature works effectively for those flows that are less complex. But for the more complex flows, (for instance if you had two ‘class_name’ outputs) then explicit mapping performs best.
Hover the variable row and click on the Variable icon (A).
Click the Document classifier (A).
Select class_name (A).
The output ‘class_name’ is now assigned to the ‘class_name’ variable (A). Click x (B) to close the window.
Click Done (A).

Add a behavior¶

To trigger the tool you just created, you must update the agent behavior.

Click Behavior (A) and type Invoke the Document extractor tool and output the result (B) in the Instructions.

Your agent is ready to be tested.

Test your agent¶

In this section, you will use the Agent preview to test your agent behavior.

Type Invoke document extractor (A) in the Preview instructions area.
Click Add file (A).
Upload the Contract.pdf file and hit send.

After uploading your document, the agent will automatically execute the predefined workflow steps in the background. It will extract relevant data, focusing on the fields it has been prompted on, and generate a response with the extracted information. Additionally, it will provide insights into the document classification.
After a few seconds the agent will reply with the extracted data and the document type (A).

Conclusion

👏 Congratulations on completing the lab! 🎉