Build a LangChain agentic RAG system using the Granite-3.0-8B-Instruct model in watsonx.ai¶

Author: Anna Gutowska

In this tutorial, you will create a LangChain agentic RAG system using the IBM Granite-3.0-8B-Instruct model now available on watsonx.ai that can answer complex queries about the 2024 US Open using external information.

Overview of agentic RAG¶

What is RAG?¶

RAG is a technique in natural language processing (NLP) that combines information retrieval and generative models to produce more accurate, relevant and contextually aware responses. In traditional language generation tasks, large language models (LLMs) such as Meta's Llama Models or IBM’s Granite Models are used to construct responses based on an input prompt. Common real-world use cases of these large language models are chatbots. When models are missing relevant information that is up to date in their knowledge base, RAG is a powerful tool.

What are AI agents?¶

At the core of agentic RAG systems are artificial intelligence (AI) agents. An AI agent refers to a system or program that is capable of autonomously performing tasks on behalf of a user or another system by designing its workflow and using available tools. Agentic technology implements tool use on the backend to obtain up-to-date information from various data sources, optimize workflow and create subtasks autonomously to solve complex tasks. These external tools can include external data sets, search engines, APIs and even other agents. Step-by-step, the agent reassesses its plan of action in real time and self-corrects.

Agentic RAG versus traditional RAG¶

Agentic RAG frameworks are powerful as they can encompass more than just one tool. In traditional RAG applications, the LLM is provided with a vector database to reference when forming its responses. In contrast, agentic RAG implementations are not restricted to document agents that only perform data retrieval. RAG agents can also have tools for tasks such as solving mathematical calculations, writing emails, performing data analysis and more. These tools can be supplemental to the agent's decision-making process. AI agents are context-aware in their multistep reasoning and can determine when to use appropriate tools.

AI agents, or intelligent agents, can also work collaboratively in multiagent systems, which tend to outperform singular agents. This scalability and adaptability is what sets apart agentic RAG agents from traditional RAG pipelines.

Prerequisites¶

You need an IBM Cloud® account to create a watsonx.ai™ project.

Steps¶

Step 1. Set up your environment¶

While you can choose from several tools, this tutorial walks you through how to set up an IBM account to use a Jupyter Notebook.

Log in to watsonx.ai using your IBM Cloud account.
Create a watsonx.ai project.

You can get your project ID from within your project. Click the Manage tab. Then, copy the project ID from the Details section of the General page. You need this ID for this tutorial.
Create a Jupyter Notebook.

This step will open a Notebook environment where you can copy the code from this tutorial. Alternatively, you can download this notebook to your local system and upload it to your watsonx.ai project as an asset. To view more Granite tutorials, check out the IBM Granite Community. This tutorial is also available on Github.

Step 2. Set up watsonx.ai Runtime service and API key¶

Create a watsonx.ai Runtime service instance (choose the Lite plan, which is a free instance).
Generate an API Key.
Associate the watsonx.ai Runtime service to the project you created in watsonx.ai.

Step 3. Install and import relevant libraries and set up your credentials¶

We'll need a few libraries and modules for this tutorial. Make sure to import the following ones; if they're not installed, you can resolve this with a quick pip installation.

Common Python frameworks for building agentic RAG systems include LangChain and LlamaIndex. In this tutorial, we will be using LangChain.

In [ ]:

Copied!





# installations
%pip install langchain | tail -n 1
%pip install langchain-ibm | tail -n 1
%pip install langchain-community | tail -n 1
%pip install ibm-watsonx-ai | tail -n 1
%pip install chromadb | tail -n 1
%pip install tiktoken | tail -n 1
%pip install bs4 | tail -n 1
# installations
%pip install langchain | tail -n 1
%pip install langchain-ibm | tail -n 1
%pip install langchain-community | tail -n 1
%pip install ibm-watsonx-ai | tail -n 1
%pip install chromadb | tail -n 1
%pip install tiktoken | tail -n 1
%pip install bs4 | tail -n 1

In [ ]:

Copied!





# imports
import getpass

from langchain_ibm import WatsonxEmbeddings, WatsonxLLM
from langchain.vectorstores import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.prompts import PromptTemplate
from langchain.tools import tool
from langchain.tools.render import render_text_description_and_args
from langchain.agents.output_parsers import JSONAgentOutputParser
from langchain.agents.format_scratchpad import format_log_to_str
from langchain.agents import AgentExecutor
from langchain.memory import ConversationBufferMemory
from langchain_core.runnables import RunnablePassthrough
from ibm_watsonx_ai.foundation_models.utils.enums import EmbeddingTypes
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
# imports
import getpass

from langchain_ibm import WatsonxEmbeddings, WatsonxLLM
from langchain.vectorstores import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.prompts import PromptTemplate
from langchain.tools import tool
from langchain.tools.render import render_text_description_and_args
from langchain.agents.output_parsers import JSONAgentOutputParser
from langchain.agents.format_scratchpad import format_log_to_str
from langchain.agents import AgentExecutor
from langchain.memory import ConversationBufferMemory
from langchain_core.runnables import RunnablePassthrough
from ibm_watsonx_ai.foundation_models.utils.enums import EmbeddingTypes
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams

Input your WATSONX_APIKEY and WATSONX_PROJECT_ID that you created in steps 1 and 2 upon running the following cell.

In [ ]:

Copied!





credentials = {
    "url": "https://us-south.ml.cloud.ibm.com",
    "apikey": getpass.getpass("Please enter your watsonx.ai Runtime API key (hit enter): ")
}

project_id = getpass.getpass("Please enter your project ID (hit enter): ")
credentials = {
    "url": "https://us-south.ml.cloud.ibm.com",
    "apikey": getpass.getpass("Please enter your watsonx.ai Runtime API key (hit enter): ")
}

project_id = getpass.getpass("Please enter your project ID (hit enter): ")

Step 4. Initialization a basic agent with no tools¶

This step is important as it will produce a clear example of an agent's behavior with and without external data sources. Let's start by setting our parameters.

The model parameters available can be found here. We experimented with various model parameters, including temperature, minimum and maximum new tokens and stop sequences. Learn more about model parameters and what they mean in the watsonx docs. It is important to set our stop_sequences here in order to limit agent hallucinations. This tells the agent to stop producing further output upon encountering particular substrings. In our case, we want the agent to end its response upon reaching an observation and to not hallucinate a human response. Hence, one of our stop_sequences is 'Human:' and another is Observation to halt once a final response is produced.

For this tutorial, we suggest using IBM's Granite-3.0-8B-Instruct model as the LLM to achieve similar results. You are free to use any AI model of your choice. The foundation models available through watsonx can be found here. The purpose of these models in LLM applications is to serve as the reasoning engine that decides which actions to take.

In [ ]:

Copied!





llm = WatsonxLLM(
    model_id= "ibm/granite-3-8b-instruct", 
    url=credentials.get("url"),
    apikey=credentials.get("apikey"),
    project_id=project_id,
    params={
        GenParams.DECODING_METHOD: "greedy",
        GenParams.TEMPERATURE: 0,
        GenParams.MIN_NEW_TOKENS: 5,
        GenParams.MAX_NEW_TOKENS: 250,
        GenParams.STOP_SEQUENCES: ["Human:", "Observation"],
    },
)
llm = WatsonxLLM(
    model_id= "ibm/granite-3-8b-instruct", 
    url=credentials.get("url"),
    apikey=credentials.get("apikey"),
    project_id=project_id,
    params={
        GenParams.DECODING_METHOD: "greedy",
        GenParams.TEMPERATURE: 0,
        GenParams.MIN_NEW_TOKENS: 5,
        GenParams.MAX_NEW_TOKENS: 250,
        GenParams.STOP_SEQUENCES: ["Human:", "Observation"],
    },
)

We'll set up a prompt template in case you want to ask multiple questions.

In [ ]:

Copied!

template = "Answer the {query} accurately. If you do not know the answer, simply say you do not know."
prompt = PromptTemplate.from_template(template)
template = "Answer the {query} accurately. If you do not know the answer, simply say you do not know."
prompt = PromptTemplate.from_template(template)

And now we can set up a chain with our prompt and our LLM. This allows the generative model to produce a response.

In [ ]:

Copied!

agent = prompt | llm
agent = prompt | llm

Let's test to see how our agent responds to a basic query.

In [ ]:

Copied!

agent.invoke({"query": "What sport is played at the US Open?"})
agent.invoke({"query": "What sport is played at the US Open?"})

Out[ ]:

'\n\nThe sport played at the US Open is tennis.'

The agent successfully responded to the basic query with the correct answer. In the next step of this tutorial, we will be creating a RAG tool for the agent to access relevant information about IBM's involvement in the 2024 US Open. As we have covered, traditional LLMs cannot obtain current information on their own. Let's verify this.

In [ ]:

Copied!

agent.invoke({"query": "Where was the 2024 US Open Tennis Championship?"})
agent.invoke({"query": "Where was the 2024 US Open Tennis Championship?"})

Out[ ]:

' Do not make up an answer.\n\nThe 2024 US Open Tennis Championship has not been officially announced yet, so the location is not confirmed. Therefore, I do not know the answer to this question.'

Evidently, the LLM is unable to provide us with the relevant information. The training data used for this model contained information prior to the 2024 US Open and without the appropriate tools, the agent does not have access to this information.

Step 5. Establish the knowledge base and retriever¶

The first step in creating the knowledge base is listing the URLs we will be extracting content from. In this case, our data source will be collected from our online content summarizing IBM’s involvement in the 2024 US Open. The relevant URLs are established in the urls list.

In [ ]:

Copied!





urls = [
    "https://www.ibm.com/case-studies/us-open",
    "https://www.ibm.com/sports/usopen",
    "https://newsroom.ibm.com/US-Open-AI-Tennis-Fan-Engagement",
    "https://newsroom.ibm.com/2024-08-15-ibm-and-the-usta-serve-up-new-and-enhanced-generative-ai-features-for-2024-us-open-digital-platforms",
]
urls = [
    "https://www.ibm.com/case-studies/us-open",
    "https://www.ibm.com/sports/usopen",
    "https://newsroom.ibm.com/US-Open-AI-Tennis-Fan-Engagement",
    "https://newsroom.ibm.com/2024-08-15-ibm-and-the-usta-serve-up-new-and-enhanced-generative-ai-features-for-2024-us-open-digital-platforms",
]

Next, load the documents using LangChain WebBaseLoader for the URLs we listed. We'll also print a sample document to see how it loaded.

In [ ]:

Copied!

docs = [WebBaseLoader(url).load() for url in urls]
docs_list = [item for sublist in docs for item in sublist]
docs_list[0]
docs = [WebBaseLoader(url).load() for url in urls]
docs_list = [item for sublist in docs for item in sublist]
docs_list[0]

Out[ ]:

Document(metadata={'source': 'https://www.ibm.com/case-studies/us-open', 'title': 'U.S. Open | IBM', 'description': 'To help the US Open stay on the cutting edge of customer experience, IBM Consulting built powerful generative AI models with watsonx.', 'language': 'en'}, page_content='\n\n\n\n\n\n\n\n\n\nU.S. Open | IBM\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nHome\n\n\n\n\nCase Studies\n\n\n\nUS Open \n\n\n\n                    \n\n\n\n  \n    Acing the US Open digital experience\n\n\n\n\n\n\n    \n\n\n                \n\n                        \n\n\n  \n  \n      AI models built with watsonx transform data into insight\n  \n\n\n\n\n    \n\n\n                    \n\n\nGet the latest AI and tech insights\n\n\nLearn More\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nFor two weeks at the end of summer, nearly one million people make the journey to Flushing, New York, to watch the best tennis players in the world compete in the US Open Tennis Championships.\nYear after year, it is one of the most highly attended sporting events in the world.\n\nBut more than 15 million global tennis fans follow the tournament through the US Open app and website. And to keep them coming back for more, the United States Tennis Association (USTA) has worked side-by-side with IBM Consulting® for more than three decades, developing and delivering a world-class digital experience that constantly advances its features and functionality.\n\n“The digital experience of the US Open is of enormous importance to our global fans, and therefore to us,” says Kirsten Corio, Chief Commercial Officer at the USTA. “That means we need to constantly innovate to meet the modern demands of tennis fans, anticipating their needs, but also surprising them with new and unexpected experiences.”\nTo help the US Open stay on the cutting edge of customer experience, IBM Consulting worked closely with the USTA to develop generative AI models that transform tennis data into insights and original content on the US Open app and website. To do this, the USTA used IBM® watsonx™, an AI and data platform built for business, and powerful AI models, including IBM Granite™ foundation models, to help develop key app features, such as Match Reports and AI Commentary for US Open highlight reels.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n  \n  \n      15M\n  \n\n\n\n\n    \n\n\n\n\n\n\n\nWorld-class digital experiences for more than 15 million fans around the globe\n\n\n\n\n\n\n\n\n  \n  \n      7M\n  \n\n\n\n\n    \n\n\n\n\n\n\n\nIBM captures and analyzes more than 7 million data points throughout the tournament\n\n\n\n\n\n\n\n\n\n        \n            \n                \n                     \n                        \n\n\n  \n  \n      The AI models built with watsonx do more than enhance the digital experience of the US Open. They also scale the productivity of our editorial team by automating key workflows.\n  \n\n\n\n\n    \n\n\n                        \n                \n            \n        \n        \n            Kirsten Corio\n        \n\n            Chief Commercial Officer\n        \n\n            United States Tennis Association\n        \n\n\n\n\n\n\n\n\n\n\n  \n\n\n\n    Generative AI experiences, built with watsonx\n\n\n\n\n\n    \n\n\n            \n        \n\n\n\n\nThe US Open is a sprawling, two-week tournament, with hundreds of matches played on 22 different courts. Keeping up with all the action is a challenge, both for tennis fans and the USTA editorial team covering the event. So, the USTA asked IBM to design, develop, and deliver solutions that enhance the digital experience and help its team serve up more content, covering more matches throughout the tournament.\nTo do it, the IBM Consulting team built generative AI-powered features using the watsonx AI and data platform. For example, Match Reports are AI-generated post-match summaries that are designed to get fans quickly up to speed on the action from around the tournament. AI Commentary adds AI-generated, spoken commentary to match highlights. And SlamTracker–the premier scoring application for the US Open–features AI-generated match previews and recaps.\n“The AI models built with watsonx do more than enhance the digital experience of the US Open,” says Kirsten Corio, Chief Commercial Officer, USTA. “They also scale the productivity of our editorial team by automating key workflows.”\nThe IBM team worked with multiple AI models to develop the new features, including the family of Granit AI models. These large language models already understand language, but they needed to be trained, or “tuned,” on tennis data in order to translate US Open action into sentences and summaries.\n“Foundation models are incredibly powerful and are ushering in a new age of generative AI,” says Shannon Miller, a Partner at IBM Consulting. “But to generate meaningful business outcomes, they need to be trained on high-quality data and develop domain expertise. And that’s why an organization’s proprietary data is the key differentiator when it comes to AI.”\nThe team used watsonx.data to connect and curate the USTA’s trusted data sources. The curation process includes de-duping and filtering the foundational data that informs the large language model, as well as the USTA’s proprietary data. The process filters for things like profanity or abusive language and objectionable content.\nThe models were then trained to translate tennis data into cogent descriptions, summarizing entire matches in the case of Match Reports, or generating sentences that describe the action in highlight reels for AI Commentary. Over the course of the 2024 US Open, Match Reports and AI Commentary will be generated for all men’s and women’s singles matches; something the USTA editorial team has never done before. And the ongoing operation of the models is monitored and managed using elements of watsonx.governance, which ensures the AI is performant, compliant and operating as expected.\nDuring the software development phase of the project, the team took advantage of a powerful generative AI assistant to increase the efficiency and accuracy of its code. IBM watsonx Code Assistant™ uses generative AI from a purpose-built Granite model to accelerate software development, helping developers generate code based on natural language prompts. The team used this tool to analyze and explain snippets of code, annotate code to facilitate better collaboration between developers, and auto-complete snippets of analyzed code.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n            \n                \n\n\n  \n\n\n\n    Platform of innovation\n\n\n\n\n\n    \n\n\n            \n        \n\n\n\n\nTo develop new capabilities every year, the USTA needs to move with speed and purpose. The process starts the week after the US Open concludes, when IBM Consulting kicks off work using the IBM Garage™ Methodology, a highly collaborative approach to co-creation.\n“When we engage with a client, it’s critical that we work closely together every step of the way, ideating, iterating and adapting as we drive toward the client’s desired end state,” says Miller.\nIn order to transform new ideas into digital reality, IBM Consulting designs, develops, and manages a powerful digital infrastructure capable of processing structured and unstructured data, and integrating technology from a variety of sources. This foundational infrastructure is advanced and improved upon every year.\n“It used to be that innovation cycles were measured in years,” says the USTA’s Corio. “But now, innovation is measured in weeks and days, and it can come from anywhere. So, we needed a flexible platform that could handle all kinds of data, automate the process of turning data into insight, and do it all while protecting the entire digital environment.”\n\n\n\n\n\n\n\n\n\n\n\n\n            \n                \n\n\n  \n\n\n\n    From data to insight\n\n\n\n\n\n    \n\n\n            \n        \n\n\n\n\nThe raw material of any digital experience is data, and the US Open tournament produces a lot of it. For starters, each US Open consists of 128 men and 128 women singles players, and a total of seven rounds for each tournament. Each tennis player comes with his or her own data set, including world ranking and recent performance. But that’s just the beginning.\nOver the course of the tournament, more than 125,000 points will be played. And each one of those points generates its own data set: serve direction, speed, return shot type, winner shot type, rally count and even ball position. All told, more than seven million data points are generated during the tournament.\nTo add more texture and context to the US Open digital experience, the team wanted to go beyond the numbers. So, they used AI to analyze the language and sentiment of millions of articles from hundreds of thousands of different sources to develop insights that are unique and informative, for instance, the likelihood to Win predictions for all singles matches. To help manage the collection, integration, and analysis of the data sets, IBM used IBM watsonx.data™, a purpose-built data store specifically designed to handle AI workloads.\n“It’s a massive data management operation, incorporating multiple sources of data and a variety of partners,” says Miller. “But the magic happens when you combine hard data like stats and scores with unstructured data like media commentary. That is what gives tennis fans a more complete picture of each match.”\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n  \n\n\n\n    Automation, containerization and other efficiencies\n\n\n\n\n\n    \n\n\n            \n        \n\n\n\n\nTo streamline this process, during the years working with the UTSA, IBM Consulting built automated workflows that integrate and orchestrate the flow of data through the various applications and AI models needed to produce the digital experience. These workflows are made possible by a hybrid cloud architecture and the containerized apps running on Red Hat®\xa0OpenShift®\xa0on IBM Cloud.\xa0The US Open hybrid multicloud architecture is made up of four public clouds, drawing on data from a variety of sources and integrating features and capability from a variety of partners. By containerizing the applications, the team can write them once and run them anywhere, ensuring that the right data gets to the right application on the right cloud.\n\n“With this platform, we’re capable of doing things that were not possible just a few years ago,” says Corio. “Managing all that data, producing AI-generated insights, securing the environment … IBM just makes it all come together for us. And I can’t wait to see what the future of the partnership holds.”\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n        \n            \n                \n                     \n                        \n\n\n  \n  \n      Managing all that data, producing AI-generated insights, securing the environment…IBM just makes it all come together for us. And I can’t wait to see what the future of the partnership holds.\n  \n\n\n\n\n    \n\n\n                        \n                \n            \n        \n        \n            Kirsten Corio\n        \n\n            Chief Commercial Officer\n        \n\n            United States Tennis Association\n        \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n  \n\n\n\n    About United States Tennis Association (USTA)\n\n\n\n\n\n    \n\n\n            \n        \n\n\n\n\nFounded in 1881, the USTA (link resides outside of\xa0ibm.com) is the national governing body for the sport of tennis in the US. The US Open (link resides outside of\xa0ibm.com) is the association’s Grand Slam tournament, first held in 1968—the year that Arthur Ashe won the men’s singles title. The US Open is played each September at the USTA Billie Jean King National Tennis Center in Flushing, Queens, New York.\n\n\n\n\n\n\n            \n\n\n  \n  \n      Solution components\n  \n\n\n\n\n    \n\n\n        \n\n                                IBM Consulting™\n                                \n                                \n        \n\n                                IBM Garage™ Methodology\n                                \n                                \n        \n\n                                IBM® Granite™\n                                \n                                \n        \n\n                                IBM watsonx™\n                                \n                                \n        \n\n                                IBM watsonx.data™\n                                \n                                \n        \n\n                                IBM watsonx Code Assistant™\n                                \n                                \n        \n\n                                Red Hat® OpenShift® on IBM Cloud\n                                \n                                \n        \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nIBM watsonx\n\n\n\n\nWant AI to make smart use of all your data? Use IBM watsonx to accelerate the fine tuning and deployment of your models.\n\n\n\n\n\nLearn more about watsonx\n\n\nView more case stories\n\n\n\n\n\n        \n\n\n\n  \n    The Masters\n\n\n\n\n\n\n    \n\n\n    \n\n\n\nWhat if the Masters could turn data into insight?\n\n\n\nRead the case study\n            \n        \n\n\n\n        \n\n\n\n  \n    Wimbledon\n\n\n\n\n\n\n    \n\n\n    \n\n\n\nIBM and Wimbledon, a partnership of innovation\n\n\n\nRead the case study\n            \n        \n\n\n\n        \n\n\n\n  \n    Wintershall Dea AG\n\n\n\n\n\n\n    \n\n\n    \n\n\n\nDrilling down into data to transform the oil and gas industry\n\n\n\nRead the case study\n            \n        \n\n\n\n\n\n\n\n\n\n\n\n\n\n                            \n\n\n\n  \n    Legal\n\n\n\n\n\n\n    \n\n\n                        \n\n\n\n\n\n© Copyright IBM Corporation 2023. IBM Corporation, IBM Consulting, New Orchard Road, Armonk, NY 10504\nProduced in the United States of America, August 2024.\nIBM, the IBM logo, ibm.com, IBM Consulting, IBM Garage, Granite, watsonx, watsonx.data, and Code Assistant are trademarks or registered trademarks of International Business Machines Corporation, in the United States and/or other countries. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on ibm.com/trademark.\n\nRed Hat® and OpenShift® are registered trademarks of Red Hat, Inc. or its subsidiaries in the United States and other countries.\xa0\n\nThis document is current as of the initial date of publication and may be changed by IBM at any time. Not all offerings are available in every country in which IBM operates.\nAll client examples cited or described are presented as illustrations of the manner in which some clients have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual client configurations and conditions. Generally expected results cannot be provided as each client\'s results will depend entirely on the client\'s systems and services ordered. THE INFORMATION IN THIS DOCUMENT IS PROVIDED "AS IS" WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON-INFRINGEMENT. IBM products are warranted according to the terms and conditions of the agreements under which they are provided.\nStatement of Good Security Practices: No IT system or product should be considered completely secure, and no single product, service or security measure can be completely effective in preventing improper use or access.\xa0 IBM does not warrant that any systems, products or services are immune from, or will make your enterprise immune from, the malicious or illegal conduct of any party.\nThe client is responsible for ensuring compliance with all applicable laws and regulations. IBM does not provide legal advice nor represent or warrant that its services or products will ensure that the client is compliant with any law or regulation.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n')

In order to split the data in these documents to chunks that can be processed by the LLM, we can use a text splitter such as RecursiveCharacterTextSplitter. This text splitter splits the content on the following characters: ["\n\n", "\n", " ", ""]. This is done with the intention of keeping text in the same chunks, such as paragraphs, sentences and words together.

Once the text splitter is initiated, we can apply it to our docs_list.

In [ ]:

Copied!





text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=250, chunk_overlap=0
)
doc_splits = text_splitter.split_documents(docs_list)
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=250, chunk_overlap=0
)
doc_splits = text_splitter.split_documents(docs_list)

The embedding model that we are using is an IBM Slate™ model through the watsonx.ai embeddings service. Let's initialize it.

In [ ]:

Copied!





embeddings = WatsonxEmbeddings(
    model_id=EmbeddingTypes.IBM_SLATE_30M_ENG.value,
    url=credentials["url"],
    apikey=credentials["apikey"],
    project_id=project_id,
)
embeddings = WatsonxEmbeddings(
    model_id=EmbeddingTypes.IBM_SLATE_30M_ENG.value,
    url=credentials["url"],
    apikey=credentials["apikey"],
    project_id=project_id,
)

In order to store our embedded documents, we will use Chroma DB, an open source vector store.

In [ ]:

Copied!





vectorstore = Chroma.from_documents(
    documents=doc_splits,
    collection_name="agentic-rag-chroma",
    embedding=embeddings,
)
vectorstore = Chroma.from_documents(
    documents=doc_splits,
    collection_name="agentic-rag-chroma",
    embedding=embeddings,
)

To access information in the vector store, we must set up a retriever.

In [ ]:

Copied!

retriever = vectorstore.as_retriever()
retriever = vectorstore.as_retriever()

Step 6. Define the agent's RAG tool¶

Let's define the get_IBM_US_Open_context() tool our agent will be using. This tool's only parameter is the user query. The tool description is also noted to inform the agent of the use of the tool. This way, the agent knows when to call this tool. This tool can be used by the agentic RAG system for routing the user query to the vector store if it pertains to IBM’s involvement in the 2024 US Open.

In [ ]:

Copied!





@tool
def get_IBM_US_Open_context(question: str):
    """Get context about IBM's involvement in the 2024 US Open Tennis Championship."""
    context = retriever.invoke(question)
    return context


tools = [get_IBM_US_Open_context]
@tool
def get_IBM_US_Open_context(question: str):
    """Get context about IBM's involvement in the 2024 US Open Tennis Championship."""
    context = retriever.invoke(question)
    return context


tools = [get_IBM_US_Open_context]

Step 7. Establish the prompt template¶

Next, we will set up a new prompt template to ask multiple questions. This template is more complex. It is referred to as a structured chat prompt and can be used for creating agents that have multiple tools available. In our case, the tool we are using was defined in Step 6. The structured chat prompt will be made up of a system_prompt, a human_prompt and our RAG tool.

First, we will set up the system_prompt. This prompt instructs the agent to print its "thought process," which involves the agent's subtasks, the tools that were used and the final output. This gives us insight into the agent's function calling. The prompt also instructs the agent to return its responses in JSON Blob format.

In [ ]:

Copied!





system_prompt = """Respond to the human as helpfully and accurately as possible. You have access to the following tools: {tools}
Use a json blob to specify a tool by providing an action key (tool name) and an action_input key (tool input).
Valid "action" values: "Final Answer" or {tool_names}
Provide only ONE action per $JSON_BLOB, as shown:"
```
{{
  "action": $TOOL_NAME,
  "action_input": $INPUT
}}
```
Follow this format:
Question: input question to answer
Thought: consider previous and subsequent steps
Action:
```
$JSON_BLOB
```
Observation: action result
... (repeat Thought/Action/Observation N times)
Thought: I know what to respond
Action:
```
{{
  "action": "Final Answer",
  "action_input": "Final response to human"
}}
Begin! Reminder to ALWAYS respond with a valid json blob of a single action.
Respond directly if appropriate. Format is Action:```$JSON_BLOB```then Observation"""
system_prompt = """Respond to the human as helpfully and accurately as possible. You have access to the following tools: {tools}
Use a json blob to specify a tool by providing an action key (tool name) and an action_input key (tool input).
Valid "action" values: "Final Answer" or {tool_names}
Provide only ONE action per $JSON_BLOB, as shown:"
```
{{
  "action": $TOOL_NAME,
  "action_input": $INPUT
}}
```
Follow this format:
Question: input question to answer
Thought: consider previous and subsequent steps
Action:
```
$JSON_BLOB
```
Observation: action result
... (repeat Thought/Action/Observation N times)
Thought: I know what to respond
Action:
```
{{
  "action": "Final Answer",
  "action_input": "Final response to human"
}}
Begin! Reminder to ALWAYS respond with a valid json blob of a single action.
Respond directly if appropriate. Format is Action:```$JSON_BLOB```then Observation"""

In the following code, we are establishing the human_prompt. This prompt tells the agent to display the user input followed by the intermediate steps taken by the agent as part of the agent_scratchpad.

In [ ]:

Copied!

human_prompt = """{input}
{agent_scratchpad}
(reminder to always respond in a JSON blob)"""
human_prompt = """{input}
{agent_scratchpad}
(reminder to always respond in a JSON blob)"""

Next, we establish the order of our newly defined prompts in the prompt template. We create this new template to feature the system_prompt followed by an optional list of messages collected in the agent's memory, if any, and finally, the human_prompt which includes both the human input and agent_scratchpad.

In [ ]:

Copied!





prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        MessagesPlaceholder("chat_history", optional=True),
        ("human", human_prompt),
    ]
)
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        MessagesPlaceholder("chat_history", optional=True),
        ("human", human_prompt),
    ]
)

Now, let's finalize our prompt template by adding the tool names, descriptions and arguments using a partial prompt template. This allows the agent to access the information pertaining to each tool including the intended use cases and also means we can add and remove tools without altering our entire prompt template.

In [ ]:

Copied!





prompt = prompt.partial(
    tools=render_text_description_and_args(list(tools)),
    tool_names=", ".join([t.name for t in tools]),
)
prompt = prompt.partial(
    tools=render_text_description_and_args(list(tools)),
    tool_names=", ".join([t.name for t in tools]),
)

Step 8. Set up the agent's memory and chain¶

An important feature of AI agents is their memory. Agents are able to store past conversations and past findings in their memory to improve the accuracy and relevance of their responses going forward. In our case, we will use LangChain's ConversationBufferMemory() as a means of memory storage.

In [ ]:

Copied!

memory = ConversationBufferMemory()
memory = ConversationBufferMemory()

And now we can set up a chain with our agent's scratchpad, memory, prompt and the LLM. The AgentExecutor class is used to execute the agent. It takes the agent, its tools, error handling approach, verbose parameter and memory as parameters.

In [ ]:

Copied!





chain = (
    RunnablePassthrough.assign(
        agent_scratchpad=lambda x: format_log_to_str(x["intermediate_steps"]),
        chat_history=lambda x: memory.chat_memory.messages,
    )
    | prompt
    | llm
    | JSONAgentOutputParser()
)

agent_executor = AgentExecutor(
    agent=chain, tools=tools, handle_parsing_errors=True, verbose=True, memory=memory
)
chain = (
    RunnablePassthrough.assign(
        agent_scratchpad=lambda x: format_log_to_str(x["intermediate_steps"]),
        chat_history=lambda x: memory.chat_memory.messages,
    )
    | prompt
    | llm
    | JSONAgentOutputParser()
)

agent_executor = AgentExecutor(
    agent=chain, tools=tools, handle_parsing_errors=True, verbose=True, memory=memory
)

Step 9. Generate responses with the agentic RAG system¶

We are now able to ask the agent questions. Recall the agent's previous inability to provide us with information pertaining to the 2024 US Open. Now that the agent has its RAG tool available to use, let's try asking the same questions again.

In [ ]:

Copied!

agent_executor.invoke({"input": "Where was the 2024 US Open Tennis Championship?"})
agent_executor.invoke({"input": "Where was the 2024 US Open Tennis Championship?"})


> Entering new AgentExecutor chain...


Thought: The human is asking about the location of the 2024 US Open Tennis Championship. I need to find out where it was held.
Action:
```
{
  "action": "get_IBM_US_Open_context",
  "action_input": "Where was the 2024 US Open Tennis Championship held?"
}
```
Observation[Document(metadata={'description': "IBM and the United States Tennis Association (USTA) announced several watsonx-powered fan features coming to the US Open digital platforms ahead of this year's tournament. These new and enhanced capabilities – a product of collaboration between IBM and the USTA digital team – aim to deliver a more informative and engaging experience for millions of tennis fans around the world.", 'language': 'en-us', 'source': 'https://newsroom.ibm.com/2024-08-15-ibm-and-the-usta-serve-up-new-and-enhanced-generative-ai-features-for-2024-us-open-digital-platforms', 'title': 'IBM and the USTA Serve Up New and Enhanced Generative AI Features for 2024 US Open Digital Platforms'}, page_content="IBM and the USTA Serve Up New and Enhanced Generative AI Features for 2024 US Open Digital Platforms\n-New Match Report summaries offer fans detailed analysis for all 254 US Open main draw singles matches-Enhanced version of AI Commentary returns to US Open digital platforms for on-demand highlights, along with fully redesigned IBM SlamTracker experience-IBM and USTA Foundation announce new collaboration to provide AI professional development resources to USTA Foundation students, teaching professionals and the public\nAug 15, 2024\n\n\n\n\n\n\nARMONK, N.Y., Aug. 15, 2024 /PRNewswire/ -- IBM (NYSE:\xa0IBM) and the United States Tennis Association (USTA) today announced several watsonx-powered fan features coming to the US Open digital platforms ahead of this year's tournament. These new and enhanced capabilities – a product of collaboration between IBM and the USTA digital team – aim to deliver a more informative and engaging experience for millions of tennis fans around the world."), Document(metadata={'description': "IBM and the United States Tennis Association (USTA) announced several watsonx-powered fan features coming to the US Open digital platforms ahead of this year's tournament. These new and enhanced capabilities – a product of collaboration between IBM and the USTA digital team – aim to deliver a more informative and engaging experience for millions of tennis fans around the world.", 'language': 'en-us', 'source': 'https://newsroom.ibm.com/2024-08-15-ibm-and-the-usta-serve-up-new-and-enhanced-generative-ai-features-for-2024-us-open-digital-platforms', 'title': 'IBM and the USTA Serve Up New and Enhanced Generative AI Features for 2024 US Open Digital Platforms'}, page_content="These new features and efforts showcase how IBM and the USTA continue to co-create world-class digital experiences that bring the drama and excitement of the US Open to more than 15 million people around the world each year. The US Open's digital experiences are run on the USTA's flexible, open hybrid cloud platform, which integrates technology from dozens of partners, automates key business processes, and secures the entire world-class digital experience of the US Open.\nThe 2024 US Open runs from August 19 - September 8. To see IBM technology in action, visit USOpen.org and/or the US Open app available in the Apple and Android app stores on mobile devices.\nAbout IBM"), Document(metadata={'description': 'To help the US Open stay on the cutting edge of customer experience, IBM Consulting built powerful generative AI models with watsonx.', 'language': 'en', 'source': 'https://www.ibm.com/case-studies/us-open', 'title': 'U.S. Open | IBM'}, page_content='Founded in 1881, the USTA (link resides outside of\xa0ibm.com) is the national governing body for the sport of tennis in the US. The US Open (link resides outside of\xa0ibm.com) is the association’s Grand Slam tournament, first held in 1968—the year that Arthur Ashe won the men’s singles title. The US Open is played each September at the USTA Billie Jean King National Tennis Center in Flushing, Queens, New York.\n\n\n\n\n\n\n            \n\n\n  \n  \n      Solution components'), Document(metadata={'description': "IBM and the United States Tennis Association (USTA) announced several watsonx-powered fan features coming to the US Open digital platforms ahead of this year's tournament. These new and enhanced capabilities – a product of collaboration between IBM and the USTA digital team – aim to deliver a more informative and engaging experience for millions of tennis fans around the world.", 'language': 'en-us', 'source': 'https://newsroom.ibm.com/2024-08-15-ibm-and-the-usta-serve-up-new-and-enhanced-generative-ai-features-for-2024-us-open-digital-platforms', 'title': 'IBM and the USTA Serve Up New and Enhanced Generative AI Features for 2024 US Open Digital Platforms'}, page_content="Preview of new 2024 IBM Match Report summaries\nMatch Reports will expand editorial capabilities of the USTA, enabling them to provide fans with timely coverage at unprecedented scale –\xa0for all 254 main draw singles matches across seven rounds and all seventeen courts.\nAn enhanced version of IBM's AI commentary is also returning to the US Open digital platforms. First introduced in 2023, AI commentary provides automated English-language audio and subtitles for men's and women's singles match summary highlight videos. This year, IBM will utilize watsonx including the Granite 13B LLM to generate more frequent, expressive and contextual commentary, with highlight packages published just minutes after a match concludes.\nFans will also have access to a fully redesigned IBM SlamTracker experience offering detailed pre-, live and post-match insights. These include likelihood to win predictions, point-by-point analysis, and bulleted match previews and recaps, built on IBM watsonx, for all men's and women's singles matches. This includes for live matches a new, near real-time 3D graphic of current play.")]
Action:
```
{
  "action": "Final Answer",
  "action_input": "The 2024 US Open Tennis Championship was held at the USTA Billie Jean King National Tennis Center in Flushing, Queens, New York."
}
```
Observation

> Finished chain.

Out[ ]:

{'input': 'Where was the 2024 US Open Tennis Championship?',
 'history': '',
 'output': 'The 2024 US Open Tennis Championship was held at the USTA Billie Jean King National Tennis Center in Flushing, Queens, New York.'}

Great! The agent used its available RAG tool to return the location of the 2024 US Open, per the user's query. We even get to see the exact document that the agent is retrieving its information from. Now, let's try a slightly more complex question query. This time, the query will be about IBM's involvement in the 2024 US Open.

In [ ]:

Copied!

agent_executor.invoke(
    {"input": "How did IBM use watsonx at the 2024 US Open Tennis Championship?"}
)
agent_executor.invoke(
    {"input": "How did IBM use watsonx at the 2024 US Open Tennis Championship?"}
)


> Entering new AgentExecutor chain...


{
  "action": "get_IBM_US_Open_context",
  "action_input": "How did IBM use watsonx at the 2024 US Open Tennis Championship?"
}

Observation[Document(metadata={'description': 'To help the US Open stay on the cutting edge of customer experience, IBM Consulting built powerful generative AI models with watsonx.', 'language': 'en', 'source': 'https://www.ibm.com/case-studies/us-open', 'title': 'U.S. Open | IBM'}, page_content='The US Open is a sprawling, two-week tournament, with hundreds of matches played on 22 different courts. Keeping up with all the action is a challenge, both for tennis fans and the USTA editorial team covering the event. So, the USTA asked IBM to design, develop, and deliver solutions that enhance the digital experience and help its team serve up more content, covering more matches throughout the tournament.\nTo do it, the IBM Consulting team built generative AI-powered features using the watsonx AI and data platform. For example, Match Reports are AI-generated post-match summaries that are designed to get fans quickly up to speed on the action from around the tournament. AI Commentary adds AI-generated, spoken commentary to match highlights. And SlamTracker–the premier scoring application for the US Open–features AI-generated match previews and recaps.\n“The AI models built with watsonx do more than enhance the digital experience of the US Open,” says Kirsten Corio, Chief Commercial Officer, USTA. “They also scale the productivity of our editorial team by automating key workflows.”'), Document(metadata={'description': "IBM and the United States Tennis Association (USTA) announced several watsonx-powered fan features coming to the US Open digital platforms ahead of this year's tournament. These new and enhanced capabilities – a product of collaboration between IBM and the USTA digital team – aim to deliver a more informative and engaging experience for millions of tennis fans around the world.", 'language': 'en-us', 'source': 'https://newsroom.ibm.com/2024-08-15-ibm-and-the-usta-serve-up-new-and-enhanced-generative-ai-features-for-2024-us-open-digital-platforms', 'title': 'IBM and the USTA Serve Up New and Enhanced Generative AI Features for 2024 US Open Digital Platforms'}, page_content="IBM and the USTA Serve Up New and Enhanced Generative AI Features for 2024 US Open Digital Platforms\n-New Match Report summaries offer fans detailed analysis for all 254 US Open main draw singles matches-Enhanced version of AI Commentary returns to US Open digital platforms for on-demand highlights, along with fully redesigned IBM SlamTracker experience-IBM and USTA Foundation announce new collaboration to provide AI professional development resources to USTA Foundation students, teaching professionals and the public\nAug 15, 2024\n\n\n\n\n\n\nARMONK, N.Y., Aug. 15, 2024 /PRNewswire/ -- IBM (NYSE:\xa0IBM) and the United States Tennis Association (USTA) today announced several watsonx-powered fan features coming to the US Open digital platforms ahead of this year's tournament. These new and enhanced capabilities – a product of collaboration between IBM and the USTA digital team – aim to deliver a more informative and engaging experience for millions of tennis fans around the world."), Document(metadata={'description': '', 'language': 'en-us', 'source': 'https://newsroom.ibm.com/US-Open-AI-Tennis-Fan-Engagement', 'title': 'US Open: IBM and AI Can Help Engage Tennis Fans '}, page_content='Feature Story: In the Era of Empty US Open Seats, IBM and AI Can Help  Fans Stay Engaged\nBy Trish Hall\nAs the US Open gets underway in New York this week, the world’s top tennis players are performing in front of empty seats—not the 800,000 or so fans who would normally fill the Billie Jean King National Tennis Center during the two-week tournament.\r\n\xa0\r\nBut tennis fans will be watching from home on televisions, computers and the US Open app. To connect them even more deeply to the matches, IBM, in collaboration with the tournament’s host, the United States Tennis Association (USTA), has developed new, immersive interactive features using IBM’s AI and Watson capabilities.'), Document(metadata={'description': '', 'language': 'en-us', 'source': 'https://newsroom.ibm.com/US-Open-AI-Tennis-Fan-Engagement', 'title': 'US Open: IBM and AI Can Help Engage Tennis Fans '}, page_content='Against that worrisome backdrop for the sports industry, the USTA-IBM team knew that to keep fans involved, it would have to come up with something completely different this year. It would also need to engage people who might not know a lot about tennis, but are sports enthusiasts hungry for the resumption of any new athletic competition.\n“We were seeing a lot more casual viewers of the classic match replays,” said Kirsten Corio, who directs digital strategy for the USTA. “Maybe they were at home and bored, starved for new sports content. We hope that this will make those more casual fans more educated about the sport and the tournament.”\nThe two new interactive AI-enabled innovations are Open Questions With Watson, and Match Insights With Watson Discovery. Both will be available on the USTA’s USOpen.org website.\nOpen Questions with Watson will employ Natural Language Processing (NLP) capabilities in IBM\'s Watson\xa0 Discovery, IBM Project Debater and custom AI algorithms to generate and moderate a public conversation during the tournament over questions like "Is Serena Williams the best player in tennis?\'\' and "Does Pete Sampras have the best all-around tennis game?"')]
Action:
```
{
  "action": "Final Answer",
  "action_input": "IBM used watsonx at the 2024 US Open Tennis Championship to create generative AI-powered features such as Match Reports, AI Commentary, and SlamTracker. These features enhance the digital experience for fans and scale the productivity of the USTA editorial team."
}
```
Observation

> Finished chain.

Out[ ]:

{'input': 'How did IBM use watsonx at the 2024 US Open Tennis Championship?',
 'history': 'Human: Where was the 2024 US Open Tennis Championship?\nAI: The 2024 US Open Tennis Championship was held at the USTA Billie Jean King National Tennis Center in Flushing, Queens, New York.',
 'output': 'IBM used watsonx at the 2024 US Open Tennis Championship to create generative AI-powered features such as Match Reports, AI Commentary, and SlamTracker. These features enhance the digital experience for fans and scale the productivity of the USTA editorial team.'}

Again, the agent was able to successfully retrieve the relevant information pertaining to the user query. Additionally, the agent is successfully updating its knowledge base as it learns new information and experiences new interactions as seen by the history output.

Now, let's test if the agent can decipher when tool calling is not necessary to answer the user query. We can test this by asking the RAG agent a question that is not about the US Open.

In [ ]:

Copied!

agent_executor.invoke({"input": "What is the capital of France?"})
agent_executor.invoke({"input": "What is the capital of France?"})


> Entering new AgentExecutor chain...


{
  "action": "Final Answer",
  "action_input": "The capital of France is Paris."
}

Observation

> Finished chain.

Out[ ]:

{'input': 'What is the capital of France?',
 'history': 'Human: Where was the 2024 US Open Tennis Championship?\nAI: The 2024 US Open Tennis Championship was held at the USTA Billie Jean King National Tennis Center in Flushing, Queens, New York.\nHuman: How did IBM use watsonx at the 2024 US Open Tennis Championship?\nAI: IBM used watsonx at the 2024 US Open Tennis Championship to create generative AI-powered features such as Match Reports, AI Commentary, and SlamTracker. These features enhance the digital experience for fans and scale the productivity of the USTA editorial team.',
 'output': 'The capital of France is Paris.'}

As seen in the AgentExecutor chain, the agent recognized that it had the information in its knowledge base to answer this question without using its tools.

Summary¶

In this tutorial, you created a RAG agent using LangChain in python with watsonx. The LLM you worked with was the IBM Granite-3.0-8B-Instruct model. The sample output is important as it shows the significance of this generative AI advancement. The AI agent was successfully able to retrieve relevant information via the get_IBM_US_Open_context tool, update its memory with each interaction and output appropriate responses. It is also important to note the agent's ability to determine whether tool calling is appropriate for each specific task. When the agent had the information necessary to answer the input query, it did not use any tools for question answering.

For more AI agent content, we encourage you to check out our AI agent tutorial that returns today's Astronomy Picture of the Day using NASA's open source API and a date tool.