Create a LangChain RAG system in Python with watsonx¶
Contributors: Nicholas Moss, Erika Russi
In this tutorial, we’ll use LangChain to walk through a step-by-step simple Retrieval Augmented Generation (RAG) example in Python. RAG is a technique in natural language processing (NLP) that combines information retrieval and generative models to produce more accurate, relevant and contextually aware responses.
For our use case, we’ll set up a RAG system for various IBM webpages related to the company's technology, products and offerings. The fetched content from these ibm.com websites will make up our knowledge base. From this knowledge base, we will then provide context to an LLM so it can answer some questions about IBM products.
More about RAG and LangChain¶
In traditional language generation tasks, large language models (LLMs) like OpenAI’s GPT-3.5 (Generative Pre-trained Transformer) or IBM’s Granite Models are used to construct responses based on an input prompt. However, these models may struggle to produce responses that are contextually relevant, factually accurate or up to date. The models may not know the latest information on IBM products. To tackle this knowledge gap, we can use methods such as fine-tuning or continued pre-training, but both can be expensive. Instead, we can use RAG to leverage a knowledge base of existing content.
RAG applications address the knowledge gap limitation by incorporating a retrieval step before response generation. During retrieval, vector search can be used to identify contextually pertinent information, such as relevant information or documents from a large corpus of text, typically stored in a vector database. Finally, an LLM is used to generate a response based on the retrieved context.
LangChain is a powerful, open-source framework that facilitates the development of applications using LLMs for various NLP tasks. In the context of RAG, LangChain plays a critical role by combining the strengths of retrieval-based methods and generative models to enhance the capabilities of NLP systems.
Prerequisites¶
You need an IBM Cloud account to create a watsonx.ai project.
Steps¶
Step 1. Set up your environment¶
While you can choose from several tools, this tutorial walks you through how to set up an IBM account to use a Jupyter Notebook.
Log in to watsonx.ai using your IBM Cloud account.
Create a watsonx.ai project.
You can get your project ID from within your project. Click the Manage tab. Then, copy the project ID from the Details section of the General page. You need this ID for this tutorial.
Create a Jupyter Notebook.
This step will open a Notebook environment where you can copy the code from this tutorial. Alternatively, you can download this notebook to your local system and upload it to your watsonx.ai project as an asset. To view more Granite tutorials, check out the IBM Granite Community. This tutorial is also available on Github.
Step 2. Set up a watsonx.ai Runtime instance and API key.¶
Create a watsonx.ai Runtime service instance (select your appropriate region and choose the Lite plan, which is a free instance).
Generate an API Key.
Associate the watsonx.ai Runtime service to the project that you created in watsonx.ai.
Step 3. Install and import relevant libraries and set up credentials¶
We have a few dependencies for this tutorial. Make sure to import the libraries below, and if they're not installed, you can resolve this with a quick pip install.
#installations
%pip install -q python-dotenv
%pip install -q langchain
%pip install -q langchain_chroma
%pip install -q langchain-community
%pip install -qU langchain_ibm
%pip install -qU langchain_community beautifulsoup4
%pip install -q ibm-watson-machine-learning
Import the relevant libraries:
#imports
import os
from dotenv import load_dotenv
from ibm_watson_machine_learning.metanames import GenTextParamsMetaNames as GenParams
from ibm_watsonx_ai.foundation_models.utils.enums import EmbeddingTypes
from langchain_ibm import WatsonxEmbeddings, WatsonxLLM
from langchain.vectorstores import Chroma
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
Set up your credentials. Please store your WATSONX_PROJECT_ID
and WATSONX_APIKEY
in a separate .env
file in the same level of your directory as this notebook.
load_dotenv(os.getcwd()+"/.env", override=True)
credentials = {
"url": "https://us-south.ml.cloud.ibm.com",
"apikey": os.getenv("WATSONX_APIKEY", ""),
}
project_id = os.getenv("WATSONX_PROJECT_ID", "")
Step 4. Index the URLs to create the knowledge base¶
We’ll index our ibm.com articles from URLs to create a knowledge base as a vectorstore. The content from these URLs will be our data sources and context for this exercise. The context will then be provided to an LLM to answer any questions we have about IBM products or technologies.
The first step to building vector embeddings is to clean and process the raw dataset. This may involve the removal of noise and standardization of the text. For our example, we won’t do any cleaning since the text is already cleaned and standardized.
First, let's establish URLS_DICTIONARY
. URLS_DICTIONARY
is a dict that helps us map the URLs from which we will be extracting the content. Let's also set up a name for our collection: askibm_2024
.
URLS_DICTIONARY = {
"ufc_ibm_partnership": "https://newsroom.ibm.com/2024-11-14-ufc-names-ibm-as-first-ever-official-ai-partner",
"granite.html": "https://www.ibm.com/granite",
"products_watsonx_ai.html": "https://www.ibm.com/products/watsonx-ai",
"products_watsonx_ai_foundation_models.html": "https://www.ibm.com/products/watsonx-ai/foundation-models",
"watsonx_pricing.html": "https://www.ibm.com/watsonx/pricing",
"watsonx.html": "https://www.ibm.com/watsonx",
"products_watsonx_data.html": "https://www.ibm.com/products/watsonx-data",
"products_watsonx_assistant.html": "https://www.ibm.com/products/watsonx-assistant",
"products_watsonx_code_assistant.html": "https://www.ibm.com/products/watsonx-code-assistant",
"products_watsonx_orchestrate.html": "https://www.ibm.com/products/watsonx-orchestrate",
"products_watsonx_governance.html": "https://www.ibm.com/products/watsonx-governance",
"granite_code_models_open_source.html": "https://research.ibm.com/blog/granite-code-models-open-source",
"red_hat_enterprise_linux_ai.html": "https://www.redhat.com/en/about/press-releases/red-hat-delivers-accessible-open-source-generative-ai-innovation-red-hat-enterprise-linux-ai",
"model_choice.html": "https://www.ibm.com/blog/announcement/enterprise-grade-model-choices/",
"democratizing.html": "https://www.ibm.com/blog/announcement/democratizing-large-language-model-development-with-instructlab-support-in-watsonx-ai/",
"ibm_consulting_expands_ai.html": "https://newsroom.ibm.com/Blog-IBM-Consulting-Expands-Capabilities-to-Help-Enterprises-Scale-AI",
"ibm_data_product_hub.html": "https://www.ibm.com/products/data-product-hub",
"ibm_price_performance_data.html": "https://www.ibm.com/blog/announcement/delivering-superior-price-performance-and-enhanced-data-management-for-ai-with-ibm-watsonx-data/",
"ibm_bi_adoption.html": "https://www.ibm.com/blog/a-new-era-in-bi-overcoming-low-adoption-to-make-smart-decisions-accessible-for-all/",
"code_assistant_for_java.html": "https://www.ibm.com/blog/announcement/watsonx-code-assistant-java/",
"accelerating_gen_ai.html": "https://newsroom.ibm.com/Blog-How-IBM-Cloud-is-Accelerating-Business-Outcomes-with-Gen-AI",
"watsonx_open_source.html": "https://newsroom.ibm.com/2024-05-21-IBM-Unveils-Next-Chapter-of-watsonx-with-Open-Source,-Product-Ecosystem-Innovations-to-Drive-Enterprise-AI-at-Scale",
"ibm_concert.html": "https://www.ibm.com/products/concert",
"ibm_consulting_advantage_news.html": "https://newsroom.ibm.com/2024-01-17-IBM-Introduces-IBM-Consulting-Advantage,-an-AI-Services-Platform-and-Library-of-Assistants-to-Empower-Consultants",
"ibm_consulting_advantage_info.html": "https://www.ibm.com/consulting/info/ibm-consulting-advantage"
}
COLLECTION_NAME = "askibm_2024"
Next, let's load our documents using the LangChain WebBaseLoader
for the list of URLs we have. Loaders load in data from a source and return a list of Documents. A Document is an object with some page_content (str) and metadata (dict). We'll print the page_content
of a sample document at the end to see how it's been loaded.
documents = []
for url in list(URLS_DICTIONARY.values()):
loader = WebBaseLoader(url)
data = loader.load()
documents += data
# #show sample document
documents[0].page_content
#Output:
Based on the sample document, it looks like there's a lot of white space and new line characters that we can get rid of. Let's clean that up and add some metadata to our documents, including an id number and the source of the content.
for doc in documents:
doc.page_content = " ".join(doc.page_content.split()) # remove white space
Let's see how our sample document looks now after we cleaned it up:
documents[0].page_content
We need to split up our text into smaller, more manageable pieces known as "chunks". LangChain's RecursiveCharacterTextSplitter
takes a large text and splits it based on a specified chunk size using a predefined set of characters. In order, the default characters are:
- "\n\n" - two new line characters
- "\n" - one new line character
- " " - a space
- "" - an empty character
The process starts by attempting to split the text using the first character, "\n\n." If the resulting chunks are still too large, it moves to the next character, "\n," and tries splitting again. This continues with each character in the set until the chunks are smaller than the specified chunk size. Since we already removed all the "\n\n" and "\n" characters when we cleaned up the text, the RecursiveCharacterTextSplitter
will begin at the " "(space) character.
We settled on a chunk size of 512 after experimenting with a chunk size of 1000. When the chunks were that large, our model was getting too much context for question-answering; this led to confused responses by the LLM because it was receiving too much information, so we changed it to smaller chunks. Feel free to experiment with chunk size further!
text_splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
Next, we choose an embedding model to be trained on our ibm.com dataset. The trained embedding model is used to generate embeddings for each data point in the dataset. For text data, popular open-source embedding models include Word2Vec, GloVe, FastText or pre-trained transformer-based models like BERT or RoBERTa. OpenAIembeddings may also be used by leveraging the OpenAI embeddings API endpoint, the langchain_openai
package and getting an openai_api_key
, however, there is a cost associated with this usage.
Unfortunately, because the embedding models are so large, vector embedding often demands significant computational resources. We can greatly lower the costs linked to embedding vectors, while preserving performance and accuracy by using WatsonxEmbeddings. We'll use the IBM embeddings model, Slate, an encoder-only (RoBERTa-based) model, which while not generative, is fast and effective for many NLP tasks.
embeddings = WatsonxEmbeddings(
model_id=EmbeddingTypes.IBM_SLATE_30M_ENG.value,
url=credentials["url"],# type: ignore
apikey=credentials["apikey"],# type: ignore
project_id=project_id,
)
Let's load our content into a local instance of a vector database, using Chroma.
vectorstore = Chroma.from_documents(documents=docs, embedding=embeddings)
Step 5. Set up a retriever¶
We'll set up our vector store as a retriever. The retrieved information from the vector store serves as additional context or knowledge that can be used by a generative model.
retriever = vectorstore.as_retriever()
Step 6. Generate a response with a Generative Model¶
Finally, we’ll generate a response. The generative model (like GPT-4 or IBM Granite) uses the retrieved information to produce a more accurate and contextually relevant response to our questions.
First, we'll establish which LLM we're going to use to generate the response. For this tutorial, we'll use IBM's Granite-3.0-8B-Instruct model.
model_id = "ibm/granite-3-8b-instruct"
parameters = {
GenParams.DECODING_METHOD: 'greedy',
GenParams.TEMPERATURE: 2,
GenParams.TOP_P: 0,
GenParams.TOP_K: 100,
GenParams.MIN_NEW_TOKENS: 10,
GenParams.MAX_NEW_TOKENS: 512,
GenParams.REPETITION_PENALTY:1.2,
GenParams.RETURN_OPTIONS: {'input_tokens': True,'generated_tokens': True, 'token_logprobs': True, 'token_ranks': True, }
}
Next, we instantiate the LLM.
llm = WatsonxLLM(
model_id=model_id,
url=credentials.get("url"), # type: ignore
apikey=credentials.get("apikey"), # type: ignore
project_id=project_id,
params=parameters
)
We'll set up a prompttemplate
to ask multiple questions. The "context" will be derived from our retriever (our vector database) with the relevant documents and the "question" will be derived from the user query.
template = """Generate a summary of the context that answers the question. Explain the answer in multiple steps if possible.
Answer style should match the context. Ideal Answer Length 2-3 sentences.\n\n{context}\nQuestion: {question}\nAnswer:
"""
prompt = ChatPromptTemplate.from_template(template)
Let's set up a helper function to format the docs accordingly:
def format_docs(docs):
return "\n\n".join([d.page_content for d in docs])
And now we can set up a chain with our context, our prompt and our LLM. The generative model processes the augmented context along with the user's question to produce an LLM-powered response.
rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
And now we can ask multiple questions:
rag_chain.invoke("Tell me about the UFC announcement from November 14, 2024")
Let's ask about watsonx.data next.
rag_chain.invoke("What is watsonx.data?")
And finally, let's ask about watsonx.ai.
rag_chain.invoke("What does watsonx.ai do?")
And that's it! Feel free to ask even more questions!
Summary and next steps¶
In this tutorial, you created a simple LangChain RAG workflow in Python with watsonx. You fetched 25 articles from ibm.com to create a vector store as context for an LLM to answer questions about IBM offerings and technologies.
You can imagine a situation where we can create chatbots to field these questions.
We encourage you to check out the LangChain documentation page for more information and tutorials on RAG.
Try watsonx for free¶
Build an AI strategy for your business on one collaborative AI and data platform called IBM watsonx, which brings together new generative AI capabilities, powered by foundation models, and traditional machine learning into a powerful platform spanning the AI lifecycle. With watsonx.ai, you can train, validate, tune, and deploy models with ease and build AI applications in a fraction of the time with a fraction of the data.
Try watsonx.ai, the next-generation studio for AI builders.
Next steps¶
Explore more articles and tutorials about watsonx on IBM Developer.