Build a self-RAG agent with IBM granite LLMs: A practical guide¶

Large language models (LLMs) have remarkable text generation and reasoning abilities but often produce factual inaccuracies or hallucinations due to their reliance on internal knowledge. Retrieval augmented generation (RAG) based solutions aim to resolve this by injecting external documents into the model’s context. However, traditional RAG approaches retrieve a fixed number of passages regardless of their necessity or quality, leading to redundancy, inefficiency, and inconsistent factual grounding.

The self-RAG framework provides a practical solution to this problem. It retrieves information on-demand by using special control tokens that dynamically decide when and how to perform retrieval during generation. Unlike agentic or multi-agent approaches that coordinate multiple models or components, self-RAG is a model-centric framework where a single model manages retrieval, generation, and critique internally. Its self-critique process is a structured step where the model evaluates both its own output and the quality of the retrieved information, allowing it to adapt its retrieval behavior through self-reflection tokens. It combines retrieval, generation and self-critique of its own generations with a single model trained end-to-end that allows more efficient, factual and controllable text generation. This method was originally introduced in the paper on Self-RAG: Learning to Retrieve, Generate, and Critique Through Self-Reflection (2024), which explores how fine-tuning LLMs for self-evaluation can improve factual consistency in natural language processing (NLP) tasks.

How self-RAG works¶

The workflow of self-RAG is orchestrated by special reflection tokens that the model generates alongside its text output, making the entire inference process dynamic and controllable. When additional information is needed, a single LLM takes on both the retriever and critic roles. A retriever component fetches relevant external passages, and the same LLM then uses reflection tokens to evaluate and refine its own generation during inference. This architecture represents a broader trend in artificial intelligence (AI) toward models capable of introspection and dynamic reasoning, bridging advances in prompt engineering and long-form generations.

1. On-demand retrieval¶

The LLM first generates a retrieval token to determine whether external factual information is necessary for the query. The model skips the remaining retrieval-based steps and continues with standard generation if it concludes that retrieval is not necessary. If the retrieval token is decoded as “yes”, a retriever is called to fetch a set of relevant passages from an external knowledge base. This step makes sure that retrieval occurs when its expected utility is high.

2. Passage retrieval and generation¶

If retrieval is required, the retriever fetches relevant passages from an external knowledge base. The LLM simultaneously processes the input and retrieved passages and generates text continuation for each passage.

3. Generate and reflect on retrieved passages¶

For each segment generated, the model concurrently generates special critique tokens that are embedded directly within the output sequence. These tokens are not separate evaluations, rather they appear as part of the generated sequence and help the model check its own work as it goes:

ISREL (Relevance): Assesses the usefulness of the retrieved passage.

ISSUP (Support/Factuality): Evaluates if the generated text segment has whole, partial, or no factual support from the source material.

ISUSE (Utility): Evaluates the created segment's overall quality, usefulness, and structure.

4. Inference¶

During inference, reflection tokens ae used to decide when to retrieve information or not. It enables the model to adjust to different tasks, such as retrieving less for creative activities and more for factual ones. When generating text, reflection tokens help the model in adhering to particular guidelines. They either provide clear boundaries or guide word choice, which makes the model's responses more flexible and appropriate for various contexts.

5. Training the self-RAG¶

During training, reflection tokens are inserted into the training data based on evaluations made by the critic model. This approach keeps self-rag training efficient by allowing the model to learn how to judge its own outputs and decide when it actually needs to look up information. Hence, the model becomes better at producing accurate, controlled, and high-quality responses.

In the experiment conducted in the research mentioned previously, self-RAG outperforms many standard retrieval-augmented and instruction-tuned baselines across various tasks, including open-domain question answering, reasoning, and fact verification. It improves factuality and citation accuracy by using self-reflection tokens and on-demand retrieval, matching or outperforming OpenAI's models.

In this tutorial, you'll learn how to build a robust self-reflective RAG agent by using IBM Granite® model on Watsonx and LangGraph. Similar frameworks and tools, such as ChatGPT, llama2, LlamaIndex or LangChain, also enable complex RAG flows. However, this tutorial focuses on using the powerful multi-modal models available through IBM. These models understand both text and images as well as its enterprise-grade design supports secure deployment, governance, and scalability. These features make Granite particularly well-suited for building reliable, production-ready RAG systems that can handle complex data and maintain high standards of trust and performance.

This tutorial demonstrates how to build a self-RAG agent designed to answer complex, multi-faceted queries over internal knowledge bases that include both text and visual data. This agent will analyze PDF documents including technical guidelines and survey data. It will guide you to implement the self-RAG algorithm, which:

Creates a multi-modal knowledge base: Uses a language model (granite-3-3-8b-instruct) and vision LLM (granite-vision-3.3-2B) to extract text and images from PDFs, generate descriptive captions, and create embeddings for both text and image data to enable semantic retrieval.

Generates and reflects: It creates an answer segment, adds reflection tokens (such ISREL, ISSUP, and ISUSE) and evaluates its own output quality and factual accuracy.

Executes self-correction: The LangGraph workflow extends the standard self-RAG approach by using a critique score derived from reflection tokens to guide its next steps. When the score is low, the agent requests stronger context and retrieves more relevant information before generating the next segment, helping produce a higher-quality final output.

Provides segmented answers: Provides thorough and traceable responses by generating complex answers in a sequence of factually validated chunks.

Prerequisites¶

You need an IBM Cloud® account to create a watsonx.ai® project. Ensure that you have access to both your watsonx API Key and Project ID.

Steps¶

Step 1. Set up your environment¶

While you can choose from several tools, this tutorial walks you through how to set up an IBM account by using a Jupyter Notebook.

Log in to watsonx.ai by using your IBM Cloud account.
Create a watsonx.ai project. You can get your project ID from within your project. Click the Manage tab.Then, copy the project ID from the Details section of the General page. You need this ID for this tutorial.
Create a Jupyter Notebook.

This step opens a notebook environment where you can copy the code from this tutorial. Alternatively, you can download this notebook to your local system and upload it to your watsonx.ai project as an asset. To view more Granite tutorials, check out the IBM Granite Community. This tutorial is also available on GitHub.

Note: You can run the multi-modal self-RAG tutorial entirely on a local CPU system. This is achievable by adapting the setup to use local resources instead of remote cloud services. You can initialize the Granite instruct model (the 3.2 2B version) directly from Hugging Face using the appropriate transformers steps. For data handling, save your PDF files directly on your local system and easily read them into your Jupyter Notebook environment using their local file path, completely bypassing the need for IBM Cloud Object Storage. To handle the complex reasoning and self-critique, the larger remote Granite 3.3-8B model can be replaced by a powerful open-source LLM hosted locally using a dedicated server setup. This setup requires installing specific local Python dependencies, such as langgraph, faiss-cpu, sentence-transformers, and pymupdf for the vector store, RAG logic, embeddings, and PDF parsing, respectively. Models can be configured for efficient CPU operation by explicitly setting the device to "cpu" and adjusting the floating-point data type to manage memory usage and prevent crashes common with large models on typical desktop hardware.

Step 2. Set up watsonx.ai runtime service and API key¶

Create a watsonx.ai Runtime service instance (choose the Lite plan, which is a free instance).
Generate an application programming interface (API) Key.
Associate the watsonx.ai Runtime service to the project that you created in watsonx.ai.

Step 3. Installation of the packages¶

To build and orchestrate this multi-modal self-reflective RAG agent, we require a comprehensive set of libraries. Install langgraph to define the core state machine that orchestrates the self-correction loop based on critique ratings. For integrating IBM Granite LLMs and embeddings from the Watsonx platform, install langchain-ibm and ibm-watsonx-ai. For quick retrieval, install faiss-cpu that offers indexing for the vector store. We use deep learning libraries like torch and the hugging face transformers library to load and run the granite-vision-3.3-2B model. To extract and process the text and images from our PDF documents, pillow and pymupdf are essential. Lastly, to access raw data from cloud object storage, ibm-cos-sdk is included.

In [ ]:

Copied!





# Install packages

!pip install -U "transformers>=4.50.0" "huggingface_hub>=0.26.2" \
  torch torchvision torchaudio \
  langgraph faiss-cpu Pillow requests tqdm pymupdf pydantic \
  langchain-ibm ibm-watsonx-ai ibm-cos-sdk sentence-transformers

print("Required packages installed.")
# Install packages

!pip install -U "transformers>=4.50.0" "huggingface_hub>=0.26.2" \
  torch torchvision torchaudio \
  langgraph faiss-cpu Pillow requests tqdm pymupdf pydantic \
  langchain-ibm ibm-watsonx-ai ibm-cos-sdk sentence-transformers

print("Required packages installed.")

Collecting transformers>=4.50.0
  Downloading transformers-4.57.1-py3-none-any.whl.metadata (43 kB)
Requirement already satisfied: huggingface_hub>=0.26.2 in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (0.29.2)
Collecting huggingface_hub>=0.26.2
  Downloading huggingface_hub-1.1.2-py3-none-any.whl.metadata (13 kB)
Requirement already satisfied: torch in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (2.1.2)
Collecting torch
  Downloading torch-2.9.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (30 kB)
Requirement already satisfied: torchvision in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (0.16.2)
Collecting torchvision
  Downloading torchvision-0.24.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (5.9 kB)
Collecting torchaudio
  Downloading torchaudio-2.9.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (6.9 kB)
Requirement already satisfied: langgraph in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (0.2.40)
Collecting langgraph
  Downloading langgraph-1.0.3-py3-none-any.whl.metadata (7.8 kB)
Collecting faiss-cpu
  Downloading faiss_cpu-1.12.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (5.1 kB)
Requirement already satisfied: Pillow in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (10.3.0)
Collecting Pillow
  Downloading pillow-12.0.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (8.8 kB)
Requirement already satisfied: requests in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (2.32.5)
Requirement already satisfied: tqdm in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (4.66.4)
Collecting tqdm
  Downloading tqdm-4.67.1-py3-none-any.whl.metadata (57 kB)
Collecting pymupdf
  Downloading pymupdf-1.26.6-cp310-abi3-manylinux_2_28_x86_64.whl.metadata (3.4 kB)
Requirement already satisfied: pydantic in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (2.11.9)
Collecting pydantic
  Downloading pydantic-2.12.4-py3-none-any.whl.metadata (89 kB)
Requirement already satisfied: langchain-ibm in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (0.3.18)
Collecting langchain-ibm
  Downloading langchain_ibm-1.0.0-py3-none-any.whl.metadata (5.0 kB)
Requirement already satisfied: ibm-watsonx-ai in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (1.4.2)
Collecting ibm-watsonx-ai
  Downloading ibm_watsonx_ai-1.4.5-py3-none-any.whl.metadata (3.3 kB)
Requirement already satisfied: ibm-cos-sdk in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (2.14.2)
Collecting ibm-cos-sdk
  Downloading ibm_cos_sdk-2.14.3.tar.gz (58 kB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: sentence-transformers in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (2.3.1)
Collecting sentence-transformers
  Downloading sentence_transformers-5.1.2-py3-none-any.whl.metadata (16 kB)
Requirement already satisfied: filelock in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from transformers>=4.50.0) (3.13.1)
Collecting huggingface_hub>=0.26.2
  Downloading huggingface_hub-0.36.0-py3-none-any.whl.metadata (14 kB)
Requirement already satisfied: numpy>=1.17 in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from transformers>=4.50.0) (1.26.4)
Requirement already satisfied: packaging>=20.0 in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from transformers>=4.50.0) (23.2)
Requirement already satisfied: pyyaml>=5.1 in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from transformers>=4.50.0) (6.0.1)
Requirement already satisfied: regex!=2019.12.17 in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from transformers>=4.50.0) (2023.10.3)
Collecting tokenizers<=0.23.0,>=0.22.0 (from transformers>=4.50.0)
  Downloading tokenizers-0.22.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.8 kB)
Collecting safetensors>=0.4.3 (from transformers>=4.50.0)
  Downloading safetensors-0.6.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.1 kB)
Requirement already satisfied: fsspec>=2023.5.0 in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from huggingface_hub>=0.26.2) (2023.10.0)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from huggingface_hub>=0.26.2) (4.15.0)
Collecting hf-xet<2.0.0,>=1.1.3 (from huggingface_hub>=0.26.2)
  Downloading hf_xet-1.2.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.9 kB)
Collecting sympy>=1.13.3 (from torch)
  Downloading sympy-1.14.0-py3-none-any.whl.metadata (12 kB)
Requirement already satisfied: networkx>=2.5.1 in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from torch) (3.2.1)
Requirement already satisfied: jinja2 in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from torch) (3.1.6)
Collecting nvidia-cuda-nvrtc-cu12==12.8.93 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.7 kB)
Collecting nvidia-cuda-runtime-cu12==12.8.90 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB)
Collecting nvidia-cuda-cupti-cu12==12.8.90 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB)
Collecting nvidia-cudnn-cu12==9.10.2.21 (from torch)
  Downloading nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl.metadata (1.8 kB)
Collecting nvidia-cublas-cu12==12.8.4.1 (from torch)
  Downloading nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl.metadata (1.7 kB)
Collecting nvidia-cufft-cu12==11.3.3.83 (from torch)
  Downloading nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB)
Collecting nvidia-curand-cu12==10.3.9.90 (from torch)
  Downloading nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl.metadata (1.7 kB)
Collecting nvidia-cusolver-cu12==11.7.3.90 (from torch)
  Downloading nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl.metadata (1.8 kB)
Collecting nvidia-cusparse-cu12==12.5.8.93 (from torch)
  Downloading nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.8 kB)
Collecting nvidia-cusparselt-cu12==0.7.1 (from torch)
  Downloading nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl.metadata (7.0 kB)
Collecting nvidia-nccl-cu12==2.27.5 (from torch)
  Downloading nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.0 kB)
Collecting nvidia-nvshmem-cu12==3.3.20 (from torch)
  Downloading nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.1 kB)
Collecting nvidia-nvtx-cu12==12.8.90 (from torch)
  Downloading nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.8 kB)
Collecting nvidia-nvjitlink-cu12==12.8.93 (from torch)
  Downloading nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.7 kB)
Collecting nvidia-cufile-cu12==1.13.1.3 (from torch)
  Downloading nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB)
Collecting triton==3.5.0 (from torch)
  Downloading triton-3.5.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (1.7 kB)
Requirement already satisfied: langchain-core>=0.1 in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from langgraph) (0.3.76)
Collecting langgraph-checkpoint<4.0.0,>=2.1.0 (from langgraph)
  Downloading langgraph_checkpoint-3.0.1-py3-none-any.whl.metadata (4.7 kB)
Collecting langgraph-prebuilt<1.1.0,>=1.0.2 (from langgraph)
  Downloading langgraph_prebuilt-1.0.2-py3-none-any.whl.metadata (5.0 kB)
Collecting langgraph-sdk<0.3.0,>=0.2.2 (from langgraph)
  Downloading langgraph_sdk-0.2.9-py3-none-any.whl.metadata (1.5 kB)
Requirement already satisfied: xxhash>=3.5.0 in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from langgraph) (3.5.0)
Collecting ormsgpack>=1.12.0 (from langgraph-checkpoint<4.0.0,>=2.1.0->langgraph)
  Downloading ormsgpack-1.12.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.2 kB)
Collecting langchain-core>=0.1 (from langgraph)
  Downloading langchain_core-1.0.4-py3-none-any.whl.metadata (3.5 kB)
Requirement already satisfied: httpx>=0.25.2 in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from langgraph-sdk<0.3.0,>=0.2.2->langgraph) (0.28.1)
Requirement already satisfied: orjson>=3.10.1 in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from langgraph-sdk<0.3.0,>=0.2.2->langgraph) (3.10.2)
Requirement already satisfied: charset_normalizer<4,>=2 in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from requests) (2.0.4)
Requirement already satisfied: idna<4,>=2.5 in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from requests) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from requests) (2.5.0)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from requests) (2025.10.5)
Requirement already satisfied: annotated-types>=0.6.0 in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from pydantic) (0.6.0)
Collecting pydantic-core==2.41.5 (from pydantic)
  Downloading pydantic_core-2.41.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.3 kB)
Collecting typing-inspection>=0.4.2 (from pydantic)
  Downloading typing_inspection-0.4.2-py3-none-any.whl.metadata (2.6 kB)
Requirement already satisfied: pandas<2.3.0,>=0.24.2 in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from ibm-watsonx-ai) (2.1.4)
Requirement already satisfied: lomond in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from ibm-watsonx-ai) (0.3.3)
Requirement already satisfied: tabulate in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from ibm-watsonx-ai) (0.8.10)
Requirement already satisfied: cachetools in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from ibm-watsonx-ai) (5.3.3)
Collecting ibm-cos-sdk-core==2.14.3 (from ibm-cos-sdk)
  Downloading ibm_cos_sdk_core-2.14.3.tar.gz (1.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 78.0 MB/s  0:00:00
  Preparing metadata (setup.py) ... done
Collecting ibm-cos-sdk-s3transfer==2.14.3 (from ibm-cos-sdk)
  Downloading ibm_cos_sdk_s3transfer-2.14.3.tar.gz (139 kB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: jmespath<=1.0.1,>=0.10.0 in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from ibm-cos-sdk) (1.0.1)
Requirement already satisfied: python-dateutil<3.0.0,>=2.9.0 in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from ibm-cos-sdk-core==2.14.3->ibm-cos-sdk) (2.9.0.post0)
Requirement already satisfied: anyio in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from httpx>=0.25.2->langgraph-sdk<0.3.0,>=0.2.2->langgraph) (4.7.0)
Requirement already satisfied: httpcore==1.* in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from httpx>=0.25.2->langgraph-sdk<0.3.0,>=0.2.2->langgraph) (1.0.9)
Requirement already satisfied: h11>=0.16 in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from httpcore==1.*->httpx>=0.25.2->langgraph-sdk<0.3.0,>=0.2.2->langgraph) (0.16.0)
Requirement already satisfied: jsonpatch<2.0.0,>=1.33.0 in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from langchain-core>=0.1->langgraph) (1.33)
Requirement already satisfied: langsmith<1.0.0,>=0.3.45 in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from langchain-core>=0.1->langgraph) (0.4.31)
Requirement already satisfied: tenacity!=8.4.0,<10.0.0,>=8.1.0 in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from langchain-core>=0.1->langgraph) (8.2.2)
Requirement already satisfied: jsonpointer>=1.9 in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from jsonpatch<2.0.0,>=1.33.0->langchain-core>=0.1->langgraph) (3.0.0)
Requirement already satisfied: requests-toolbelt>=1.0.0 in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from langsmith<1.0.0,>=0.3.45->langchain-core>=0.1->langgraph) (1.0.0)
Requirement already satisfied: zstandard>=0.23.0 in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from langsmith<1.0.0,>=0.3.45->langchain-core>=0.1->langgraph) (0.25.0)
Requirement already satisfied: pytz>=2020.1 in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from pandas<2.3.0,>=0.24.2->ibm-watsonx-ai) (2024.1)
Requirement already satisfied: tzdata>=2022.1 in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from pandas<2.3.0,>=0.24.2->ibm-watsonx-ai) (2023.3)
Requirement already satisfied: six>=1.5 in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from python-dateutil<3.0.0,>=2.9.0->ibm-cos-sdk-core==2.14.3->ibm-cos-sdk) (1.16.0)
Requirement already satisfied: scikit-learn in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from sentence-transformers) (1.3.0)
Requirement already satisfied: scipy in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from sentence-transformers) (1.11.4)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from sympy>=1.13.3->torch) (1.3.0)
Requirement already satisfied: sniffio>=1.1 in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from anyio->httpx>=0.25.2->langgraph-sdk<0.3.0,>=0.2.2->langgraph) (1.3.0)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from jinja2->torch) (2.1.3)
Requirement already satisfied: joblib>=1.1.1 in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from scikit-learn->sentence-transformers) (1.3.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/envs/Python-RT24.1-CUDA/lib/python3.11/site-packages (from scikit-learn->sentence-transformers) (2.2.0)
Downloading transformers-4.57.1-py3-none-any.whl (12.0 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.0/12.0 MB 123.1 MB/s  0:00:00
Downloading huggingface_hub-0.36.0-py3-none-any.whl (566 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 566.1/566.1 kB 78.7 MB/s  0:00:00
Downloading hf_xet-1.2.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.3/3.3 MB 133.2 MB/s  0:00:00
Downloading tokenizers-0.22.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.3/3.3 MB 189.5 MB/s  0:00:00
Downloading torch-2.9.0-cp311-cp311-manylinux_2_28_x86_64.whl (899.8 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 899.8/899.8 MB 117.3 MB/s  0:00:040:00:0100:01
Downloading nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl (594.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 594.3/594.3 MB 120.2 MB/s  0:00:040:00:0100:01
Downloading nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (10.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.2/10.2 MB 152.0 MB/s  0:00:00
Downloading nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (88.0 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 88.0/88.0 MB 166.3 MB/s  0:00:00 eta 0:00:01
Downloading nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (954 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 954.8/954.8 kB 86.8 MB/s  0:00:00
Downloading nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl (706.8 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 706.8/706.8 MB 124.3 MB/s  0:00:040:00:0100:01
Downloading nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (193.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 193.1/193.1 MB 170.3 MB/s  0:00:010:00:0100:01
Downloading nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (1.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 107.6 MB/s  0:00:00
Downloading nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl (63.6 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 63.6/63.6 MB 169.4 MB/s  0:00:00 eta 0:00:01
Downloading nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl (267.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 267.5/267.5 MB 147.8 MB/s  0:00:010:00:0100:01
Downloading nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (288.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 288.2/288.2 MB 139.1 MB/s  0:00:020:00:0100:01
Downloading nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl (287.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 287.2/287.2 MB 156.0 MB/s  0:00:010:00:0100:01
Downloading nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (322.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 322.3/322.3 MB 128.7 MB/s  0:00:020:00:0100:01
Downloading nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (39.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 39.3/39.3 MB 168.5 MB/s  0:00:00m0:00:01
Downloading nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (124.7 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.7/124.7 MB 148.7 MB/s  0:00:000:00:0100:01
Downloading nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (89 kB)
Downloading triton-3.5.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (170.4 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 170.4/170.4 MB 144.8 MB/s  0:00:010:00:0100:01
Downloading torchvision-0.24.0-cp311-cp311-manylinux_2_28_x86_64.whl (8.0 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.0/8.0 MB 178.2 MB/s  0:00:00
Downloading torchaudio-2.9.0-cp311-cp311-manylinux_2_28_x86_64.whl (2.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 153.1 MB/s  0:00:00
Downloading langgraph-1.0.3-py3-none-any.whl (156 kB)
Downloading langgraph_checkpoint-3.0.1-py3-none-any.whl (46 kB)
Downloading langgraph_prebuilt-1.0.2-py3-none-any.whl (34 kB)
Downloading langgraph_sdk-0.2.9-py3-none-any.whl (56 kB)
Downloading faiss_cpu-1.12.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (31.4 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 31.4/31.4 MB 211.1 MB/s  0:00:00
Downloading pillow-12.0.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (7.0 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.0/7.0 MB 145.8 MB/s  0:00:00
Downloading tqdm-4.67.1-py3-none-any.whl (78 kB)
Downloading pymupdf-1.26.6-cp310-abi3-manylinux_2_28_x86_64.whl (24.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 24.1/24.1 MB 137.5 MB/s  0:00:00
Downloading pydantic-2.12.4-py3-none-any.whl (463 kB)
Downloading pydantic_core-2.41.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 104.4 MB/s  0:00:00
Downloading langchain_ibm-1.0.0-py3-none-any.whl (49 kB)
Downloading ibm_watsonx_ai-1.4.5-py3-none-any.whl (1.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 118.6 MB/s  0:00:00
Downloading langchain_core-1.0.4-py3-none-any.whl (471 kB)
Downloading sentence_transformers-5.1.2-py3-none-any.whl (488 kB)
Downloading ormsgpack-1.12.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (207 kB)
Downloading safetensors-0.6.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (485 kB)
Downloading sympy-1.14.0-py3-none-any.whl (6.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.3/6.3 MB 156.7 MB/s  0:00:00
Downloading typing_inspection-0.4.2-py3-none-any.whl (14 kB)
Building wheels for collected packages: ibm-cos-sdk, ibm-cos-sdk-core, ibm-cos-sdk-s3transfer
  DEPRECATION: Building 'ibm-cos-sdk' using the legacy setup.py bdist_wheel mechanism, which will be removed in a future version. pip 25.3 will enforce this behaviour change. A possible replacement is to use the standardized build interface by setting the `--use-pep517` option, (possibly combined with `--no-build-isolation`), or adding a `pyproject.toml` file to the source tree of 'ibm-cos-sdk'. Discussion can be found at https://github.com/pypa/pip/issues/6334
  Building wheel for ibm-cos-sdk (setup.py) ... done
  Created wheel for ibm-cos-sdk: filename=ibm_cos_sdk-2.14.3-py3-none-any.whl size=77324 sha256=c516e4776629432b68631e150d6c206052fccfe15c944526609c05298400c6e1
  Stored in directory: /tmp/wsuser/.cache/pip/wheels/56/fa/85/9a1004ed234750540a7a90f34000bc1208e723f3613eaafc2b
  DEPRECATION: Building 'ibm-cos-sdk-core' using the legacy setup.py bdist_wheel mechanism, which will be removed in a future version. pip 25.3 will enforce this behaviour change. A possible replacement is to use the standardized build interface by setting the `--use-pep517` option, (possibly combined with `--no-build-isolation`), or adding a `pyproject.toml` file to the source tree of 'ibm-cos-sdk-core'. Discussion can be found at https://github.com/pypa/pip/issues/6334
  Building wheel for ibm-cos-sdk-core (setup.py) ... done
  Created wheel for ibm-cos-sdk-core: filename=ibm_cos_sdk_core-2.14.3-py3-none-any.whl size=662207 sha256=473d3a41154f32c13b49e4b7496f0da69f9f48cdd0b48674fb9bf6c12e453f32
  Stored in directory: /tmp/wsuser/.cache/pip/wheels/96/25/ac/fa87dd4aeda7eba0f3e7e891c52f9c92c8b2ea49963119d9df
  DEPRECATION: Building 'ibm-cos-sdk-s3transfer' using the legacy setup.py bdist_wheel mechanism, which will be removed in a future version. pip 25.3 will enforce this behaviour change. A possible replacement is to use the standardized build interface by setting the `--use-pep517` option, (possibly combined with `--no-build-isolation`), or adding a `pyproject.toml` file to the source tree of 'ibm-cos-sdk-s3transfer'. Discussion can be found at https://github.com/pypa/pip/issues/6334
  Building wheel for ibm-cos-sdk-s3transfer (setup.py) ... done
  Created wheel for ibm-cos-sdk-s3transfer: filename=ibm_cos_sdk_s3transfer-2.14.3-py3-none-any.whl size=90309 sha256=0f74440d13ca8609cb935d81a70073db01b25617caa67df208aa9c34574ade9f
  Stored in directory: /tmp/wsuser/.cache/pip/wheels/c4/03/6f/e85bbf1471a809e7892239f69458cef2cd69ee38fb543339c8
Successfully built ibm-cos-sdk ibm-cos-sdk-core ibm-cos-sdk-s3transfer
Installing collected packages: nvidia-cusparselt-cu12, typing-inspection, triton, tqdm, sympy, safetensors, pymupdf, pydantic-core, Pillow, ormsgpack, nvidia-nvtx-cu12, nvidia-nvshmem-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufile-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, hf-xet, faiss-cpu, pydantic, nvidia-cusparse-cu12, nvidia-cufft-cu12, nvidia-cudnn-cu12, ibm-cos-sdk-core, huggingface_hub, tokenizers, nvidia-cusolver-cu12, langgraph-sdk, ibm-cos-sdk-s3transfer, transformers, torch, langchain-core, ibm-cos-sdk, torchvision, torchaudio, sentence-transformers, langgraph-checkpoint, ibm-watsonx-ai, langgraph-prebuilt, langchain-ibm, langgraph
  Attempting uninstall: typing-inspection━━━━━━━  0/44 [nvidia-cusparselt-cu12]
    Found existing installation: typing-inspection 0.4.0/44 [nvidia-cusparselt-cu12]
    Uninstalling typing-inspection-0.4.0:━━━  0/44 [nvidia-cusparselt-cu12]
      Successfully uninstalled typing-inspection-0.4.0 0/44 [nvidia-cusparselt-cu12]
  Attempting uninstall: tqdm━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  2/44 [triton]
    Found existing installation: tqdm 4.66.4━━━━━━━━━━━━━━━━━━  2/44 [triton]
    Uninstalling tqdm-4.66.4:━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/44 [tqdm]
      Successfully uninstalled tqdm-4.66.4━━━━━━━━━━━━━━━━━━━━  3/44 [tqdm]
  Attempting uninstall: sympy━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/44 [tqdm]
    Found existing installation: sympy 1.12━━━━━━━━━━━━━━━━━━━  3/44 [tqdm]
    Uninstalling sympy-1.12:0m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/44 [sympy]
      Successfully uninstalled sympy-1.12━━━━━━━━━━━━━━━━━━━━━━━━━  4/44 [sympy]
  Attempting uninstall: safetensors━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/44 [sympy]
    Found existing installation: safetensors 0.4.2━━━━━━━━━━━━  4/44 [sympy]
    Uninstalling safetensors-0.4.2:━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/44 [sympy]
      Successfully uninstalled safetensors-0.4.2━━━━━━━━━━━━━━  4/44 [sympy]
  Attempting uninstall: pydantic-core━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  6/44 [pymupdf]
    Found existing installation: pydantic_core 2.33.2━━━━━━━━━  6/44 [pymupdf]
    Uninstalling pydantic_core-2.33.2:━━━━━━━━━━━━━━━━━━━━━━━━  6/44 [pymupdf]
      Successfully uninstalled pydantic_core-2.33.2━━━━━━━━━━━  6/44 [pymupdf]
  Attempting uninstall: Pillow━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  6/44 [pymupdf]
    Found existing installation: pillow 10.3.0━━━━━━━━━━━━━━━━  6/44 [pymupdf]
    Uninstalling pillow-10.3.0:━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  6/44 [pymupdf]
      Successfully uninstalled pillow-10.3.0━━━━━━━━━━━━━━━━━━  6/44 [pymupdf]
  Attempting uninstall: pydantic[90m╺━━━━━━━━━━━━━━━━━━━━ 21/44 [faiss-cpu]las-cu12]u12]2]
    Found existing installation: pydantic 2.11.9━━━━━━━━━━━━━━ 21/44 [faiss-cpu]
    Uninstalling pydantic-2.11.9:━━━━━━━━━━━━━━━━━━━━ 21/44 [faiss-cpu]
      Successfully uninstalled pydantic-2.11.9m━━━━━━━━━━━━━━━━━━━ 22/44 [pydantic]
  Attempting uninstall: ibm-cos-sdk-core━━━━━━━━━━━━━━━━━ 25/44 [nvidia-cudnn-cu12]12]
    Found existing installation: ibm-cos-sdk-core 2.14.2━━━━━━ 25/44 [nvidia-cudnn-cu12]
    Uninstalling ibm-cos-sdk-core-2.14.2:[90m━━━━━━━━━━━━━━━━━ 25/44 [nvidia-cudnn-cu12]
      Successfully uninstalled ibm-cos-sdk-core-2.14.2━━━━━━━━━━━━ 26/44 [ibm-cos-sdk-core]
  Attempting uninstall: huggingface_hubm╸━━━━━━━━━━━━━━━━ 26/44 [ibm-cos-sdk-core]
    Found existing installation: huggingface_hub 0.29.2━━━━━━━ 26/44 [ibm-cos-sdk-core]
    Uninstalling huggingface_hub-0.29.2:m━━━━━━━━━━━━━━━━ 26/44 [ibm-cos-sdk-core]
      Successfully uninstalled huggingface_hub-0.29.2━━━━━━━━━ 26/44 [ibm-cos-sdk-core]
  Attempting uninstall: tokenizers0m╸━━━━━━━━━━━━━━━ 27/44 [huggingface_hub]
    Found existing installation: tokenizers 0.15.1━━━━━━━━━━━━ 27/44 [huggingface_hub]
    Uninstalling tokenizers-0.15.1:1m╸━━━━━━━━━━━━━━━ 27/44 [huggingface_hub]
      Successfully uninstalled tokenizers-0.15.1[90m━━━━━━━━━━━━━━ 28/44 [tokenizers]]
  Attempting uninstall: langgraph-sdkm╺━━━━━━━━━━━━━ 29/44 [nvidia-cusolver-cu12]
    Found existing installation: langgraph-sdk 0.1.51━━━━━━━━━ 29/44 [nvidia-cusolver-cu12]
    Uninstalling langgraph-sdk-0.1.51:m╺━━━━━━━━━━━━━ 29/44 [nvidia-cusolver-cu12]
      Successfully uninstalled langgraph-sdk-0.1.51━━━━━━━━━━━ 29/44 [nvidia-cusolver-cu12]
  Attempting uninstall: ibm-cos-sdk-s3transfer90m━━━━━━━━━━━━━ 29/44 [nvidia-cusolver-cu12]
    Found existing installation: ibm-cos-sdk-s3transfer 2.14.2 29/44 [nvidia-cusolver-cu12]
    Uninstalling ibm-cos-sdk-s3transfer-2.14.2:0m━━━━━━━━━━━━━ 29/44 [nvidia-cusolver-cu12]
      Successfully uninstalled ibm-cos-sdk-s3transfer-2.14.2━━ 29/44 [nvidia-cusolver-cu12]
  Attempting uninstall: transformers╺━━━━━━━━━━━ 31/44 [ibm-cos-sdk-s3transfer]
    Found existing installation: transformers 4.37.2━━━━━━━━━━ 31/44 [ibm-cos-sdk-s3transfer]
    Uninstalling transformers-4.37.2:[90m╺━━━━━━━━━━━ 31/44 [ibm-cos-sdk-s3transfer]
      Successfully uninstalled transformers-4.37.2m━━━━━━━━━━ 32/44 [transformers]nsfer]
  Attempting uninstall: torch━━━━━━━━╺━━━━━━━━━━ 32/44 [transformers]
    Found existing installation: torch 2.1.2[0m━━━━━━━━━━ 32/44 [transformers]
    Uninstalling torch-2.1.2:━━━━╺━━━━━━━━━━ 32/44 [transformers]
      Successfully uninstalled torch-2.1.2╺━━━━━━━━━ 33/44 [torch]rs]
  Attempting uninstall: langchain-core╺━━━━━━━━━ 33/44 [torch]
    Found existing installation: langchain-core 0.3.76━━━━━━━━ 33/44 [torch]
    Uninstalling langchain-core-0.3.76:[90m╺━━━━━━━━━ 33/44 [torch]
      Successfully uninstalled langchain-core-0.3.76m━━━━━━━━━ 33/44 [torch]
  Attempting uninstall: ibm-cos-sdk━━━╸━━━━━━━━━ 34/44 [langchain-core]
    Found existing installation: ibm-cos-sdk 2.14.20m━━━━━━━━━ 34/44 [langchain-core]
    Uninstalling ibm-cos-sdk-2.14.2:0m╸━━━━━━━━━ 34/44 [langchain-core]
      Successfully uninstalled ibm-cos-sdk-2.14.2[90m━━━━━━━━━ 34/44 [langchain-core]
  Attempting uninstall: torchvision[0m╸━━━━━━━━━ 34/44 [langchain-core]
    Found existing installation: torchvision 0.16.20m━━━━━━━━━ 34/44 [langchain-core]
    Uninstalling torchvision-0.16.2:0m╸━━━━━━━━━ 34/44 [langchain-core]
      Successfully uninstalled torchvision-0.16.2╸━━━━━━━ 36/44 [torchvision]
  Attempting uninstall: sentence-transformers╸━━━━━━ 37/44 [torchaudio]]
    Found existing installation: sentence-transformers 2.3.1━━ 37/44 [torchaudio]
    Uninstalling sentence-transformers-2.3.1:m╸━━━━━━ 37/44 [torchaudio]
      Successfully uninstalled sentence-transformers-2.3.1━━━━ 37/44 [torchaudio]
  Attempting uninstall: langgraph-checkpoint0m╸━━━━━ 38/44 [sentence-transformers]
    Found existing installation: langgraph-checkpoint 2.0.990m━━━━ 39/44 [langgraph-checkpoint]
    Uninstalling langgraph-checkpoint-2.0.9:[90m╺━━━━ 39/44 [langgraph-checkpoint]
      Successfully uninstalled langgraph-checkpoint-2.0.9m━━━━ 39/44 [langgraph-checkpoint]
  Attempting uninstall: ibm-watsonx-ai━╺━━━━ 39/44 [langgraph-checkpoint]
    Found existing installation: ibm_watsonx_ai 1.4.2━━━━ 39/44 [langgraph-checkpoint]
    Uninstalling ibm_watsonx_ai-1.4.2:━╺━━━━ 39/44 [langgraph-checkpoint]
      Successfully uninstalled ibm_watsonx_ai-1.4.20m╺━━━ 40/44 [ibm-watsonx-ai]t]
  Attempting uninstall: langchain-ibm━━━━━━━╺━━━ 40/44 [ibm-watsonx-ai]
    Found existing installation: langchain-ibm 0.3.18m━━━ 40/44 [ibm-watsonx-ai]
    Uninstalling langchain-ibm-0.3.18:━━╺━━━ 40/44 [ibm-watsonx-ai]
      Successfully uninstalled langchain-ibm-0.3.18[90m╺━ 42/44 [langchain-ibm]
  Attempting uninstall: langgraph━━━━━━━━━╺━ 42/44 [langchain-ibm]
    Found existing installation: langgraph 0.2.400m╺━ 42/44 [langchain-ibm]
    Uninstalling langgraph-0.2.40:━━━━━━━━╺━ 42/44 [langchain-ibm]
      Successfully uninstalled langgraph-0.2.40[90m╺━ 42/44 [langchain-ibm]
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44/44 [langgraph]anggraph]hain-ibm]
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
caikit-nlp 0.4.9 requires sentence-transformers<2.4.0,>=2.3.1, but you have sentence-transformers 5.1.2 which is incompatible.
caikit-nlp 0.4.9 requires torch<2.3.0,>=2.0.1, but you have torch 2.9.0 which is incompatible.
langchain-chroma 0.1.4 requires langchain-core<0.4,>=0.1.40; python_version >= "3.9", but you have langchain-core 1.0.4 which is incompatible.
langchain-elasticsearch 0.3.0 requires langchain-core<0.4.0,>=0.3.0, but you have langchain-core 1.0.4 which is incompatible.
langchain-milvus 0.1.8 requires langchain-core<0.4,>=0.2.38, but you have langchain-core 1.0.4 which is incompatible.
langchain-mcp-adapters 0.1.7 requires langchain-core<0.4,>=0.3.36, but you have langchain-core 1.0.4 which is incompatible.
langchain 0.3.27 requires langchain-core<1.0.0,>=0.3.72, but you have langchain-core 1.0.4 which is incompatible.
Successfully installed Pillow-12.0.0 faiss-cpu-1.12.0 hf-xet-1.2.0 huggingface_hub-0.36.0 ibm-cos-sdk-2.14.3 ibm-cos-sdk-core-2.14.3 ibm-cos-sdk-s3transfer-2.14.3 ibm-watsonx-ai-1.4.5 langchain-core-1.0.4 langchain-ibm-1.0.0 langgraph-1.0.3 langgraph-checkpoint-3.0.1 langgraph-prebuilt-1.0.2 langgraph-sdk-0.2.9 nvidia-cublas-cu12-12.8.4.1 nvidia-cuda-cupti-cu12-12.8.90 nvidia-cuda-nvrtc-cu12-12.8.93 nvidia-cuda-runtime-cu12-12.8.90 nvidia-cudnn-cu12-9.10.2.21 nvidia-cufft-cu12-11.3.3.83 nvidia-cufile-cu12-1.13.1.3 nvidia-curand-cu12-10.3.9.90 nvidia-cusolver-cu12-11.7.3.90 nvidia-cusparse-cu12-12.5.8.93 nvidia-cusparselt-cu12-0.7.1 nvidia-nccl-cu12-2.27.5 nvidia-nvjitlink-cu12-12.8.93 nvidia-nvshmem-cu12-3.3.20 nvidia-nvtx-cu12-12.8.90 ormsgpack-1.12.0 pydantic-2.12.4 pydantic-core-2.41.5 pymupdf-1.26.6 safetensors-0.6.2 sentence-transformers-5.1.2 sympy-1.14.0 tokenizers-0.22.1 torch-2.9.0 torchaudio-2.9.0 torchvision-0.24.0 tqdm-4.67.1 transformers-4.57.1 triton-3.5.0 typing-inspection-0.4.2
Required packages installed.

Note: No GPU is required, but execution can be slower on CPU-based systems.

Step 4. Import required libraries¶

Next, import all the necessary modules to set up the fundamental tools for managing the multi-modal components, processing documents, coordinating the RAG workflow, and connecting to IBM watsonx.

In [ ]:

Copied!





# Core Libraries Import

import os
import getpass
import torch
import re
from pathlib import Path
from typing import List, Dict, Any, TypedDict
import gc

# LangGraph / LangChain Core
from langgraph.graph import StateGraph, END, START
from langchain_core.documents import Document
from langchain_ibm import WatsonxLLM, WatsonxEmbeddings
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames

# Vector store + text utilities
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter

# General utils
from tqdm import tqdm
from PIL import Image
import fitz # PyMuPDF for PDFs
import numpy as np
import io # Ensure io is imported

print("Core libraries imported successfully.")
# Set up device for Vision Model
HF_DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
print(f"PyTorch device: {HF_DEVICE}")
# Core Libraries Import

import os
import getpass
import torch
import re
from pathlib import Path
from typing import List, Dict, Any, TypedDict
import gc

# LangGraph / LangChain Core
from langgraph.graph import StateGraph, END, START
from langchain_core.documents import Document
from langchain_ibm import WatsonxLLM, WatsonxEmbeddings
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames

# Vector store + text utilities
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter

# General utils
from tqdm import tqdm
from PIL import Image
import fitz # PyMuPDF for PDFs
import numpy as np
import io # Ensure io is imported

print("Core libraries imported successfully.")
# Set up device for Vision Model
HF_DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
print(f"PyTorch device: {HF_DEVICE}")

Core libraries imported successfully.
PyTorch device: cuda

Multi-modal context: This tutorial uses a vision model and libraries like fitz to process both text and visual data into a unified context. This surpasses simple text-based RAG by enabling the agent to retrieve richer information and provide highly accurate answers derived from complex documents.

Self-correction loop: The system utilizes LangGraph (StateGraph) to build a self-reflective RAG agent. This allows the LLM to critique its own output for relevance and accuracy, and then automatically initiate a correction cycle by querying the vector store or refining the prompt, minimizing hallucinations.

Production-ready integration: The tutorial demonstrates a high-performance stack by integrating enterprise LLMs (such as Granite) accessed via an external API (or Hugging Face, depending on the setup) with efficient vector storage (FAISS) and streamlined RAG logic, proving its viability for real-world deployment.

Step 5. Load watsonx credentials¶

This step prepares your environment to securely connect to the IBM watsonx platform, allowing you to utilize the hosted granite LLMs and embeddings.

In [ ]:

Copied!





# Load Watsonx Credentials

WML_URL = "https://us-south.ml.cloud.ibm.com"

# Securely input Watsonx credentials
WML_API_KEY = getpass.getpass("Enter Watsonx API Key: ")
PROJECT_ID = input("Enter Watsonx Project ID: ")

# Set environment variables for langchain-ibm
os.environ["WATSONX_APIKEY"] = WML_API_KEY
os.environ["WATSONX_PROJECT_ID"] = PROJECT_ID

print(" Watsonx credentials loaded.")
# Load Watsonx Credentials

WML_URL = "https://us-south.ml.cloud.ibm.com"

# Securely input Watsonx credentials
WML_API_KEY = getpass.getpass("Enter Watsonx API Key: ")
PROJECT_ID = input("Enter Watsonx Project ID: ")

# Set environment variables for langchain-ibm
os.environ["WATSONX_APIKEY"] = WML_API_KEY
os.environ["WATSONX_PROJECT_ID"] = PROJECT_ID

print(" Watsonx credentials loaded.")

Enter Watsonx API Key:  ········
Enter Watsonx Project ID:  4d09cb34-ffa1-4097-be01-79e1ac1f5173

 Watsonx credentials loaded.

Step 6. Initialize models¶

This critical step configures the three distinct models required for our multi-modal self-RAG agent.

In [ ]:

Copied!





# Initialize Models

from transformers import AutoProcessor, AutoModelForVision2Seq
from huggingface_hub import hf_hub_download

# LLM: Granite-3-3-8B-Instruct (Generator & Critic)
qa_llm = WatsonxLLM(
    model_id="ibm/granite-3-3-8b-instruct",
    url=WML_URL,
    apikey=WML_API_KEY,
    project_id=PROJECT_ID,
    params={
        GenTextParamsMetaNames.MAX_NEW_TOKENS: 512, 
        GenTextParamsMetaNames.TEMPERATURE: 0.1,   
        GenTextParamsMetaNames.TOP_P: 0.9,
        GenTextParamsMetaNames.REPETITION_PENALTY: 1.05,
    },
)
print("Granite-3-3-8B-Instruct initialized for reasoning, QA, and self-critique.")

# Embedding Model: Granite-embedding-278m-multilingual
embeddings_model = WatsonxEmbeddings(
    model_id="ibm/granite-embedding-278m-multilingual",
    url=WML_URL,
    apikey=WML_API_KEY,
    project_id=PROJECT_ID,
)
print("Granite-embedding-278m-multilingual initialized for retrieval.")

# Vision Model: Granite-Vision-3.3-2B
try:
    print("Loading Granite Vision model in Bfloat16 for memory efficiency...")
    vision_model_id = "ibm-granite/granite-vision-3.3-2b"

    hf_processor = AutoProcessor.from_pretrained(vision_model_id)
    
    
    hf_vision_model = AutoModelForVision2Seq.from_pretrained(
        vision_model_id, 
        torch_dtype=torch.bfloat16 # <--- Saves ~50% VRAM on model weights
    ).to(HF_DEVICE)
    hf_vision_model.eval()

    print("Granite-Vision-3.3-2B initialized successfully with Bfloat16.")
except Exception as e:
    print(f"Vision model load failed: {e}")

print(f"Device available: {HF_DEVICE}")
print("All Watsonx + Vision models ready.")
# Initialize Models

from transformers import AutoProcessor, AutoModelForVision2Seq
from huggingface_hub import hf_hub_download

# LLM: Granite-3-3-8B-Instruct (Generator & Critic)
qa_llm = WatsonxLLM(
    model_id="ibm/granite-3-3-8b-instruct",
    url=WML_URL,
    apikey=WML_API_KEY,
    project_id=PROJECT_ID,
    params={
        GenTextParamsMetaNames.MAX_NEW_TOKENS: 512, 
        GenTextParamsMetaNames.TEMPERATURE: 0.1,   
        GenTextParamsMetaNames.TOP_P: 0.9,
        GenTextParamsMetaNames.REPETITION_PENALTY: 1.05,
    },
)
print("Granite-3-3-8B-Instruct initialized for reasoning, QA, and self-critique.")

# Embedding Model: Granite-embedding-278m-multilingual
embeddings_model = WatsonxEmbeddings(
    model_id="ibm/granite-embedding-278m-multilingual",
    url=WML_URL,
    apikey=WML_API_KEY,
    project_id=PROJECT_ID,
)
print("Granite-embedding-278m-multilingual initialized for retrieval.")

# Vision Model: Granite-Vision-3.3-2B
try:
    print("Loading Granite Vision model in Bfloat16 for memory efficiency...")
    vision_model_id = "ibm-granite/granite-vision-3.3-2b"

    hf_processor = AutoProcessor.from_pretrained(vision_model_id)
    
    
    hf_vision_model = AutoModelForVision2Seq.from_pretrained(
        vision_model_id, 
        torch_dtype=torch.bfloat16 # <--- Saves ~50% VRAM on model weights
    ).to(HF_DEVICE)
    hf_vision_model.eval()

    print("Granite-Vision-3.3-2B initialized successfully with Bfloat16.")
except Exception as e:
    print(f"Vision model load failed: {e}")

print(f"Device available: {HF_DEVICE}")
print("All Watsonx + Vision models ready.")

2025-11-11 06:03:48.844314: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-11-11 06:03:48.844348: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-11-11 06:03:48.844355: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

Granite-3-3-8B-Instruct initialized for reasoning, QA, and self-critique.
Granite-embedding-278m-multilingual initialized for retrieval.
Loading Granite Vision model in Bfloat16 for memory efficiency...

processor_config.json:   0%|          | 0.00/174 [00:00<?, ?B/s]

chat_template.json: 0.00B [00:00, ?B/s]

preprocessor_config.json: 0.00B [00:00, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/107 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/701 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!

model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/952M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/132 [00:00<?, ?B/s]

Granite-Vision-3.3-2B initialized successfully with Bfloat16.
Device available: cuda
All Watsonx + Vision models ready.

This configuration will:

Initialize the granite-3-3-8B-instruct model to function as both the primary generator and the self-critic by producing the reflection tokens (ISREL, ISSUP, and ISUSE). For the self-critique loop, the parameters are optimized for factual, deterministic, and stable answers.

Initialize the granite-embedding-278m-multilingual model. This model generates the textual embeddings essential for efficient semantic search and retrieval in the FAISS vector store.

Load the granite-vision-3.3-2B model locally using the transformers library. This model creates text captions for images extracted from PDF documents.

Step 7. PDF data retrieval from cloud object storage¶

This step focuses on securely retrieving the source dataset from IBM cloud object storage into the memory of your execution environment. This is necessary before any text splitting or multi-modal analysis can begin. We have uploaded two PDF files to the database for this tutorial.

In [ ]:

Copied!





# PDF Text Extraction
import io
import os, types
import pandas as pd
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share the notebook.

cos_client = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='your_api_key_id',
    ibm_auth_endpoint="https://iam.cloud.ibm.com/identity/token",
    config=Config(signature_version='oauth'),
    endpoint_url='https://s3.direct.us-south.cloud-object-storage.appdomain.cloud')

bucket = 'bucket_key'
pdf_keys = [
    'ICH_E6(R3)_Guideline.pdf',
    'inspection_survey.pdf'
]

def read_cos_pdf(bucket, key):
    """Read a PDF from IBM COS into bytes (streamed in chunks)."""
    print(f" Downloading {key} ...")
    response = cos_client.get_object(Bucket=bucket, Key=key)
    body = response['Body']
    data = io.BytesIO()
    while True:
        chunk = body.read(10 * 1024 * 1024)  # 10 MB chunks
        if not chunk:
            break
        data.write(chunk)
    data.seek(0)
    print(f" Finished downloading {key} ({data.getbuffer().nbytes / (1024*1024):.2f} MB)")
    return data.read()

# Loop through all PDFs and download
pdf_files = {}
for key in pdf_keys:
    pdf_files[key] = read_cos_pdf(bucket, key)

print(f" All {len(pdf_files)} PDFs downloaded successfully.")
# PDF Text Extraction
import io
import os, types
import pandas as pd
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share the notebook.

cos_client = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='your_api_key_id',
    ibm_auth_endpoint="https://iam.cloud.ibm.com/identity/token",
    config=Config(signature_version='oauth'),
    endpoint_url='https://s3.direct.us-south.cloud-object-storage.appdomain.cloud')

bucket = 'bucket_key'
pdf_keys = [
    'ICH_E6(R3)_Guideline.pdf',
    'inspection_survey.pdf'
]

def read_cos_pdf(bucket, key):
    """Read a PDF from IBM COS into bytes (streamed in chunks)."""
    print(f" Downloading {key} ...")
    response = cos_client.get_object(Bucket=bucket, Key=key)
    body = response['Body']
    data = io.BytesIO()
    while True:
        chunk = body.read(10 * 1024 * 1024)  # 10 MB chunks
        if not chunk:
            break
        data.write(chunk)
    data.seek(0)
    print(f" Finished downloading {key} ({data.getbuffer().nbytes / (1024*1024):.2f} MB)")
    return data.read()

# Loop through all PDFs and download
pdf_files = {}
for key in pdf_keys:
    pdf_files[key] = read_cos_pdf(bucket, key)

print(f" All {len(pdf_files)} PDFs downloaded successfully.")

 Downloading ICH_E6(R3)_Guideline.pdf ...
 Finished downloading ICH_E6(R3)_Guideline.pdf (0.79 MB)
 Downloading inspection_survey.pdf ...
 Finished downloading inspection_survey.pdf (2.22 MB)
 All 2 PDFs downloaded successfully.

This step is crucial for transforming our raw PDF documents into a multi-modal, searchable knowledge base for the self-RAG agent.

In [ ]:

Copied!





# Multi-Modal PDF Parsing and Captioning
import os
import pickle
import io
from typing import List
from PIL import Image
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.documents import Document
import fitz # PyMuPDF
import torch
import gc

def extract_and_caption_pdf(filename: str, pdf_content: bytes) -> List[Document]:
    """Extracts text and images from in-memory PDF content, captions images, and returns LangChain Documents."""
    print(f"\nProcessing {filename}...", flush=True)
    
    # Open the PDF from the in-memory byte stream
    doc = fitz.open(stream=pdf_content, filetype="pdf")
    all_content = []
    
    # 1. Extract Text Chunks (Unchanged)
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    for i, page in enumerate(doc):
        text = page.get_text()
        chunks = text_splitter.split_text(text)
        for j, chunk in enumerate(chunks):
            doc_metadata = {"source": filename, "page": i + 1, "chunk_id": f"P{i+1}-T{j}"}
            all_content.append(Document(page_content=chunk, metadata=doc_metadata))

    # 2. Extract and Caption Images
    if 'inspection_survey' in filename.lower():
        print(f"  -> {filename} identified as image-containing. Beginning image extraction...", flush=True)
        
        for i, page in enumerate(doc):
            image_list = page.get_images(full=True)
            for j, img_info in enumerate(image_list):
                try:
                    xref = img_info[0]
                    base_image = doc.extract_image(xref)
                    image_bytes = base_image["image"]
                    
                    # Defensive Image Loading and Normalization
                    img_stream = io.BytesIO(image_bytes)
                    image = Image.open(img_stream)
                    
                    # Convert to RGB to fix 'Unable to infer channel dimension' errors
                    if image.mode != 'RGB':
                        image = image.convert('RGB')
                    
                    # Memory Optimization (Resizing)
                    MAX_DIM = 1024
                    if max(image.size) > MAX_DIM:
                        image.thumbnail((MAX_DIM, MAX_DIM), Image.Resampling.LANCZOS)
                        
                    # --- Captioning ---
                    print(f"    -> Captioning image {j+1} on page {i+1}...", flush=True)
                    
                    conversation = [
                        {
                            "role": "user",
                            "content": [
                                {"type": "image", "image": image},
                                {"type": "text", "text": "Describe this image, chart, or diagram in detail. Summarize its key findings or data points."},
                            ],
                        },
                    ]
                    
                    # Apply chat template and generate
                    inputs = hf_processor.apply_chat_template(
                        conversation,
                        add_generation_prompt=True,
                        tokenize=True,
                        return_dict=True,
                        return_tensors="pt"
                    ).to(HF_DEVICE)
                    
                    # Use Bfloat16 for input tensors to match the model's dtype
                    if hf_vision_model.dtype == torch.bfloat16:
                        inputs = {k: v.to(torch.bfloat16) if v.is_floating_point() else v for k, v in inputs.items()}
                        
                    output = hf_vision_model.generate(**inputs, max_new_tokens=256)
                    caption = hf_processor.decode(output[0], skip_special_tokens=True).strip()

                    # Create document from caption
                    caption_doc = f"IMAGE CAPTION (Source: {filename}, Page {i+1}, Image {j+1}): {caption}"
                    img_metadata = {"source": filename, "page": i + 1, "chunk_id": f"P{i+1}-I{j}", "type": "image_caption"}
                    all_content.append(Document(page_content=caption_doc, metadata=img_metadata))
                    
                    # Aggressive Memory Clearing
                    del inputs
                    del output
                    torch.cuda.empty_cache() 
                    gc.collect() 

                except Exception as e:
                    print(f"    Error processing image on page {i+1}, image {j+1}: {e}", flush=True)
                    # Clear memory even on error
                    torch.cuda.empty_cache()
                    gc.collect()
                    continue
                    
    return all_content

# Execution of the Multi-modal Parsing (Caching Logic Added)

CACHE_FILE = 'multimodal_documents_cache.pkl'
all_documents = []

if os.path.exists(CACHE_FILE):
    # Load from Cache
    print(f"\nCache file found: {CACHE_FILE}. Loading documents from cache...", flush=True)
    try:
        with open(CACHE_FILE, 'rb') as f:
            all_documents = pickle.load(f)
        print("Documents successfully loaded from cache. Skipping multi-modal parsing.", flush=True)
    except Exception as e:
        # Fallback if the cache file is corrupted
        print(f"Error loading cache file: {e}. Attempting to run full parsing.", flush=True)
        os.remove(CACHE_FILE) # Delete bad cache
        
else:
    # Run Expensive Parsing and Save to Cache
    print(f"\nCache file not found. Running Multi-Modal PDF Parsing and Captioning...", flush=True)
    
    # Assuming 'pdf_files' dictionary is populated from your COS retrieval step
    for filename, content in pdf_files.items():
        all_documents.extend(extract_and_caption_pdf(filename, content))
        
    print(f"\nFinished parsing. Total documents created: {len(all_documents)}", flush=True)
    
    # Save the results
    try:
        with open(CACHE_FILE, 'wb') as f:
            pickle.dump(all_documents, f)
        print(f"Successfully saved all {len(all_documents)} documents to {CACHE_FILE}.", flush=True)
    except Exception as e:
        print(f"WARNING: Could not save cache file {CACHE_FILE}: {e}", flush=True)


print(f"\nTotal documents (text chunks + image captions) available: {len(all_documents)}", flush=True)
# Multi-Modal PDF Parsing and Captioning
import os
import pickle
import io
from typing import List
from PIL import Image
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.documents import Document
import fitz # PyMuPDF
import torch
import gc

def extract_and_caption_pdf(filename: str, pdf_content: bytes) -> List[Document]:
    """Extracts text and images from in-memory PDF content, captions images, and returns LangChain Documents."""
    print(f"\nProcessing {filename}...", flush=True)
    
    # Open the PDF from the in-memory byte stream
    doc = fitz.open(stream=pdf_content, filetype="pdf")
    all_content = []
    
    # 1. Extract Text Chunks (Unchanged)
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    for i, page in enumerate(doc):
        text = page.get_text()
        chunks = text_splitter.split_text(text)
        for j, chunk in enumerate(chunks):
            doc_metadata = {"source": filename, "page": i + 1, "chunk_id": f"P{i+1}-T{j}"}
            all_content.append(Document(page_content=chunk, metadata=doc_metadata))

    # 2. Extract and Caption Images
    if 'inspection_survey' in filename.lower():
        print(f"  -> {filename} identified as image-containing. Beginning image extraction...", flush=True)
        
        for i, page in enumerate(doc):
            image_list = page.get_images(full=True)
            for j, img_info in enumerate(image_list):
                try:
                    xref = img_info[0]
                    base_image = doc.extract_image(xref)
                    image_bytes = base_image["image"]
                    
                    # Defensive Image Loading and Normalization
                    img_stream = io.BytesIO(image_bytes)
                    image = Image.open(img_stream)
                    
                    # Convert to RGB to fix 'Unable to infer channel dimension' errors
                    if image.mode != 'RGB':
                        image = image.convert('RGB')
                    
                    # Memory Optimization (Resizing)
                    MAX_DIM = 1024
                    if max(image.size) > MAX_DIM:
                        image.thumbnail((MAX_DIM, MAX_DIM), Image.Resampling.LANCZOS)
                        
                    # --- Captioning ---
                    print(f"    -> Captioning image {j+1} on page {i+1}...", flush=True)
                    
                    conversation = [
                        {
                            "role": "user",
                            "content": [
                                {"type": "image", "image": image},
                                {"type": "text", "text": "Describe this image, chart, or diagram in detail. Summarize its key findings or data points."},
                            ],
                        },
                    ]
                    
                    # Apply chat template and generate
                    inputs = hf_processor.apply_chat_template(
                        conversation,
                        add_generation_prompt=True,
                        tokenize=True,
                        return_dict=True,
                        return_tensors="pt"
                    ).to(HF_DEVICE)
                    
                    # Use Bfloat16 for input tensors to match the model's dtype
                    if hf_vision_model.dtype == torch.bfloat16:
                        inputs = {k: v.to(torch.bfloat16) if v.is_floating_point() else v for k, v in inputs.items()}
                        
                    output = hf_vision_model.generate(**inputs, max_new_tokens=256)
                    caption = hf_processor.decode(output[0], skip_special_tokens=True).strip()

                    # Create document from caption
                    caption_doc = f"IMAGE CAPTION (Source: {filename}, Page {i+1}, Image {j+1}): {caption}"
                    img_metadata = {"source": filename, "page": i + 1, "chunk_id": f"P{i+1}-I{j}", "type": "image_caption"}
                    all_content.append(Document(page_content=caption_doc, metadata=img_metadata))
                    
                    # Aggressive Memory Clearing
                    del inputs
                    del output
                    torch.cuda.empty_cache() 
                    gc.collect() 

                except Exception as e:
                    print(f"    Error processing image on page {i+1}, image {j+1}: {e}", flush=True)
                    # Clear memory even on error
                    torch.cuda.empty_cache()
                    gc.collect()
                    continue
                    
    return all_content

# Execution of the Multi-modal Parsing (Caching Logic Added)

CACHE_FILE = 'multimodal_documents_cache.pkl'
all_documents = []

if os.path.exists(CACHE_FILE):
    # Load from Cache
    print(f"\nCache file found: {CACHE_FILE}. Loading documents from cache...", flush=True)
    try:
        with open(CACHE_FILE, 'rb') as f:
            all_documents = pickle.load(f)
        print("Documents successfully loaded from cache. Skipping multi-modal parsing.", flush=True)
    except Exception as e:
        # Fallback if the cache file is corrupted
        print(f"Error loading cache file: {e}. Attempting to run full parsing.", flush=True)
        os.remove(CACHE_FILE) # Delete bad cache
        
else:
    # Run Expensive Parsing and Save to Cache
    print(f"\nCache file not found. Running Multi-Modal PDF Parsing and Captioning...", flush=True)
    
    # Assuming 'pdf_files' dictionary is populated from your COS retrieval step
    for filename, content in pdf_files.items():
        all_documents.extend(extract_and_caption_pdf(filename, content))
        
    print(f"\nFinished parsing. Total documents created: {len(all_documents)}", flush=True)
    
    # Save the results
    try:
        with open(CACHE_FILE, 'wb') as f:
            pickle.dump(all_documents, f)
        print(f"Successfully saved all {len(all_documents)} documents to {CACHE_FILE}.", flush=True)
    except Exception as e:
        print(f"WARNING: Could not save cache file {CACHE_FILE}: {e}", flush=True)


print(f"\nTotal documents (text chunks + image captions) available: {len(all_documents)}", flush=True)

Cache file not found. Running Multi-Modal PDF Parsing and Captioning...

Processing ICH_E6(R3)_Guideline.pdf...

Processing inspection_survey.pdf...
  -> inspection_survey.pdf identified as image-containing. Beginning image extraction...
    -> Captioning image 1 on page 1...
    -> Captioning image 2 on page 1...
    -> Captioning image 3 on page 1...
    -> Captioning image 4 on page 1...
    -> Captioning image 5 on page 1...
    -> Captioning image 6 on page 1...
    -> Captioning image 7 on page 1...
    -> Captioning image 8 on page 1...
    -> Captioning image 9 on page 1...
    -> Captioning image 1 on page 2...
    -> Captioning image 2 on page 2...
    -> Captioning image 3 on page 2...
    -> Captioning image 4 on page 2...
    -> Captioning image 1 on page 3...
    -> Captioning image 2 on page 3...
    -> Captioning image 3 on page 3...
    -> Captioning image 1 on page 4...
    -> Captioning image 2 on page 4...
    -> Captioning image 3 on page 4...
    -> Captioning image 4 on page 4...
    -> Captioning image 1 on page 5...
    -> Captioning image 2 on page 5...
    -> Captioning image 3 on page 5...
    -> Captioning image 1 on page 6...
    -> Captioning image 2 on page 6...
    -> Captioning image 3 on page 6...
    -> Captioning image 4 on page 6...
    -> Captioning image 1 on page 7...
    -> Captioning image 2 on page 7...
    -> Captioning image 3 on page 7...
    -> Captioning image 4 on page 7...
    -> Captioning image 5 on page 7...
    -> Captioning image 1 on page 8...
    -> Captioning image 2 on page 8...
    -> Captioning image 3 on page 8...
    -> Captioning image 4 on page 8...
    -> Captioning image 5 on page 8...
    -> Captioning image 1 on page 9...
    -> Captioning image 2 on page 9...
    -> Captioning image 3 on page 9...
    -> Captioning image 4 on page 9...
    -> Captioning image 1 on page 10...
    -> Captioning image 2 on page 10...
    -> Captioning image 1 on page 11...
    -> Captioning image 2 on page 11...
    -> Captioning image 1 on page 12...
    -> Captioning image 2 on page 12...
    -> Captioning image 3 on page 12...
    -> Captioning image 4 on page 12...
    -> Captioning image 1 on page 13...
    -> Captioning image 2 on page 13...
    -> Captioning image 3 on page 13...
    -> Captioning image 1 on page 14...
    -> Captioning image 2 on page 14...
    -> Captioning image 3 on page 14...
    -> Captioning image 4 on page 14...
    -> Captioning image 1 on page 15...
    -> Captioning image 2 on page 15...
    -> Captioning image 3 on page 15...
    -> Captioning image 4 on page 15...
    -> Captioning image 5 on page 15...
    -> Captioning image 1 on page 16...
    -> Captioning image 2 on page 16...
    -> Captioning image 3 on page 16...
    -> Captioning image 1 on page 17...
    -> Captioning image 2 on page 17...
    -> Captioning image 1 on page 18...
    -> Captioning image 2 on page 18...
    -> Captioning image 3 on page 18...
    -> Captioning image 4 on page 18...
    -> Captioning image 1 on page 19...
    -> Captioning image 2 on page 19...
    -> Captioning image 1 on page 20...
    -> Captioning image 2 on page 20...
    -> Captioning image 3 on page 20...
    -> Captioning image 1 on page 21...
    -> Captioning image 2 on page 21...
    -> Captioning image 3 on page 21...
    -> Captioning image 4 on page 21...
    -> Captioning image 5 on page 21...
    -> Captioning image 6 on page 21...
    -> Captioning image 7 on page 21...
    -> Captioning image 8 on page 21...
    -> Captioning image 1 on page 22...
    -> Captioning image 2 on page 22...
    -> Captioning image 3 on page 22...
    -> Captioning image 1 on page 23...
    -> Captioning image 2 on page 23...
    -> Captioning image 3 on page 23...
    -> Captioning image 1 on page 24...
    -> Captioning image 2 on page 24...
    -> Captioning image 1 on page 25...
    -> Captioning image 2 on page 25...
    -> Captioning image 3 on page 25...
    -> Captioning image 1 on page 26...
    -> Captioning image 2 on page 26...
    -> Captioning image 3 on page 26...
    -> Captioning image 4 on page 26...
    -> Captioning image 5 on page 26...
    -> Captioning image 6 on page 26...
    -> Captioning image 1 on page 27...
    -> Captioning image 2 on page 27...
    -> Captioning image 3 on page 27...
    -> Captioning image 4 on page 27...
    -> Captioning image 5 on page 27...
    -> Captioning image 1 on page 28...
    -> Captioning image 2 on page 28...
    -> Captioning image 1 on page 29...
    -> Captioning image 2 on page 29...
    -> Captioning image 3 on page 29...
    -> Captioning image 4 on page 29...
    -> Captioning image 5 on page 29...
    -> Captioning image 6 on page 29...
    -> Captioning image 1 on page 30...
    -> Captioning image 2 on page 30...
    -> Captioning image 1 on page 31...
    -> Captioning image 2 on page 31...
    -> Captioning image 3 on page 31...
    -> Captioning image 4 on page 31...
    -> Captioning image 5 on page 31...
    -> Captioning image 6 on page 31...
    -> Captioning image 1 on page 32...
    -> Captioning image 2 on page 32...
    -> Captioning image 3 on page 32...
    -> Captioning image 4 on page 32...
    -> Captioning image 5 on page 32...
    -> Captioning image 6 on page 32...
    -> Captioning image 1 on page 33...
    -> Captioning image 2 on page 33...
    -> Captioning image 1 on page 34...
    -> Captioning image 2 on page 34...
    -> Captioning image 3 on page 34...
    -> Captioning image 4 on page 34...
    -> Captioning image 5 on page 34...
    -> Captioning image 1 on page 35...
    -> Captioning image 2 on page 35...
    -> Captioning image 3 on page 35...
    -> Captioning image 4 on page 35...
    -> Captioning image 1 on page 36...
    -> Captioning image 2 on page 36...
    -> Captioning image 3 on page 36...
    -> Captioning image 1 on page 37...
    -> Captioning image 2 on page 37...
    -> Captioning image 1 on page 38...
    -> Captioning image 2 on page 38...
    -> Captioning image 3 on page 38...
    -> Captioning image 1 on page 39...
    -> Captioning image 2 on page 39...
    -> Captioning image 1 on page 40...
    -> Captioning image 2 on page 40...
    -> Captioning image 3 on page 40...
    -> Captioning image 1 on page 41...
    -> Captioning image 2 on page 41...
    -> Captioning image 3 on page 41...
    -> Captioning image 4 on page 41...
    -> Captioning image 1 on page 42...
    -> Captioning image 2 on page 42...
    -> Captioning image 1 on page 43...
    -> Captioning image 2 on page 43...
    -> Captioning image 1 on page 44...
    -> Captioning image 2 on page 44...
    -> Captioning image 3 on page 44...
    -> Captioning image 1 on page 45...
    -> Captioning image 2 on page 45...
    -> Captioning image 3 on page 45...
    -> Captioning image 1 on page 46...
    -> Captioning image 2 on page 46...
    -> Captioning image 3 on page 46...
    -> Captioning image 4 on page 46...
    -> Captioning image 1 on page 47...
    -> Captioning image 2 on page 47...
    -> Captioning image 3 on page 47...
    -> Captioning image 4 on page 47...
    -> Captioning image 1 on page 48...
    -> Captioning image 2 on page 48...
    -> Captioning image 3 on page 48...
    -> Captioning image 4 on page 48...
    -> Captioning image 1 on page 49...
    -> Captioning image 2 on page 49...
    -> Captioning image 3 on page 49...
    -> Captioning image 4 on page 49...
    -> Captioning image 1 on page 50...
    -> Captioning image 2 on page 50...
    -> Captioning image 3 on page 50...
    -> Captioning image 4 on page 50...
    -> Captioning image 1 on page 51...
    -> Captioning image 2 on page 51...
    -> Captioning image 3 on page 51...
    -> Captioning image 4 on page 51...
    -> Captioning image 1 on page 52...
    -> Captioning image 2 on page 52...
    -> Captioning image 3 on page 52...
    -> Captioning image 4 on page 52...
    -> Captioning image 1 on page 53...
    -> Captioning image 2 on page 53...
    -> Captioning image 3 on page 53...
    -> Captioning image 4 on page 53...
    -> Captioning image 5 on page 53...
    -> Captioning image 1 on page 54...
    -> Captioning image 2 on page 54...
    -> Captioning image 3 on page 54...
    -> Captioning image 4 on page 54...
    -> Captioning image 1 on page 55...
    -> Captioning image 2 on page 55...
    -> Captioning image 3 on page 55...
    -> Captioning image 4 on page 55...
    -> Captioning image 5 on page 55...
    -> Captioning image 6 on page 55...
    -> Captioning image 1 on page 56...
    -> Captioning image 2 on page 56...
    -> Captioning image 3 on page 56...
    -> Captioning image 1 on page 57...
    -> Captioning image 2 on page 57...
    -> Captioning image 3 on page 57...
    -> Captioning image 4 on page 57...
    -> Captioning image 5 on page 57...
    -> Captioning image 1 on page 58...
    -> Captioning image 2 on page 58...
    -> Captioning image 3 on page 58...
    -> Captioning image 4 on page 58...
    -> Captioning image 5 on page 58...
    -> Captioning image 1 on page 59...
    -> Captioning image 2 on page 59...
    -> Captioning image 3 on page 59...
    -> Captioning image 4 on page 59...
    -> Captioning image 1 on page 60...
    -> Captioning image 2 on page 60...
    -> Captioning image 1 on page 61...
    -> Captioning image 2 on page 61...
    -> Captioning image 3 on page 61...
    -> Captioning image 1 on page 62...
    -> Captioning image 2 on page 62...
    -> Captioning image 3 on page 62...
    -> Captioning image 1 on page 63...
    -> Captioning image 2 on page 63...
    -> Captioning image 3 on page 63...
    -> Captioning image 1 on page 64...
    -> Captioning image 2 on page 64...
    -> Captioning image 3 on page 64...
    -> Captioning image 4 on page 64...
    -> Captioning image 5 on page 64...
    -> Captioning image 6 on page 64...
    -> Captioning image 7 on page 64...
    -> Captioning image 1 on page 65...
    -> Captioning image 2 on page 65...
    -> Captioning image 1 on page 66...
    -> Captioning image 2 on page 66...
    -> Captioning image 3 on page 66...
    -> Captioning image 4 on page 66...
    -> Captioning image 1 on page 67...
    -> Captioning image 2 on page 67...
    -> Captioning image 3 on page 67...
    -> Captioning image 4 on page 67...
    -> Captioning image 1 on page 68...
    -> Captioning image 2 on page 68...
    -> Captioning image 3 on page 68...
    -> Captioning image 1 on page 69...
    -> Captioning image 2 on page 69...
    -> Captioning image 3 on page 69...
    -> Captioning image 1 on page 70...
    -> Captioning image 2 on page 70...
    -> Captioning image 3 on page 70...
    -> Captioning image 1 on page 71...
    -> Captioning image 2 on page 71...
    -> Captioning image 3 on page 71...
    -> Captioning image 1 on page 72...
    -> Captioning image 2 on page 72...
    -> Captioning image 3 on page 72...
    -> Captioning image 1 on page 73...
    -> Captioning image 2 on page 73...
    -> Captioning image 3 on page 73...
    -> Captioning image 1 on page 74...
    -> Captioning image 2 on page 74...
    -> Captioning image 3 on page 74...
    -> Captioning image 4 on page 74...
    -> Captioning image 5 on page 74...
    -> Captioning image 1 on page 75...
    -> Captioning image 2 on page 75...
    -> Captioning image 3 on page 75...
    -> Captioning image 4 on page 75...
    -> Captioning image 1 on page 76...
    -> Captioning image 2 on page 76...
    -> Captioning image 3 on page 76...
    -> Captioning image 4 on page 76...
    -> Captioning image 1 on page 77...
    -> Captioning image 2 on page 77...
    -> Captioning image 3 on page 77...
    -> Captioning image 1 on page 78...
    -> Captioning image 2 on page 78...
    -> Captioning image 3 on page 78...
    -> Captioning image 1 on page 79...
    -> Captioning image 2 on page 79...
    -> Captioning image 3 on page 79...
    -> Captioning image 4 on page 79...
    -> Captioning image 5 on page 79...
    -> Captioning image 1 on page 80...
    -> Captioning image 2 on page 80...
    -> Captioning image 3 on page 80...
    -> Captioning image 4 on page 80...
    -> Captioning image 5 on page 80...
    -> Captioning image 6 on page 80...
    -> Captioning image 1 on page 81...
    -> Captioning image 2 on page 81...
    -> Captioning image 3 on page 81...
    -> Captioning image 4 on page 81...
    -> Captioning image 1 on page 82...
    -> Captioning image 2 on page 82...
    -> Captioning image 1 on page 83...
    -> Captioning image 2 on page 83...
    -> Captioning image 1 on page 84...
    -> Captioning image 2 on page 84...
    -> Captioning image 3 on page 84...
    -> Captioning image 4 on page 84...
    -> Captioning image 5 on page 84...
    -> Captioning image 6 on page 84...
    -> Captioning image 7 on page 84...
    -> Captioning image 1 on page 85...
    -> Captioning image 2 on page 85...
    -> Captioning image 3 on page 85...
    -> Captioning image 4 on page 85...
    -> Captioning image 5 on page 85...
    -> Captioning image 1 on page 86...
    -> Captioning image 2 on page 86...
    -> Captioning image 3 on page 86...
    -> Captioning image 4 on page 86...
    -> Captioning image 1 on page 87...
    -> Captioning image 2 on page 87...
    -> Captioning image 3 on page 87...
    -> Captioning image 1 on page 88...
    -> Captioning image 2 on page 88...
    -> Captioning image 1 on page 89...
    -> Captioning image 2 on page 89...
    -> Captioning image 3 on page 89...
    -> Captioning image 4 on page 89...
    -> Captioning image 5 on page 89...
    -> Captioning image 6 on page 89...
    -> Captioning image 1 on page 90...
    -> Captioning image 2 on page 90...
    -> Captioning image 3 on page 90...
    -> Captioning image 1 on page 91...
    -> Captioning image 2 on page 91...
    -> Captioning image 3 on page 91...
    -> Captioning image 4 on page 91...
    -> Captioning image 5 on page 91...
    -> Captioning image 1 on page 92...
    -> Captioning image 2 on page 92...
    -> Captioning image 3 on page 92...
    -> Captioning image 4 on page 92...
    -> Captioning image 5 on page 92...
    -> Captioning image 1 on page 93...
    -> Captioning image 2 on page 93...
    -> Captioning image 3 on page 93...

Finished parsing. Total documents created: 753
Successfully saved all 753 documents to multimodal_documents_cache.pkl.

Total documents (text chunks + image captions) available: 753

This parsing will:

• Define the function and uses fitz to accurately pull both text and embedded image bytes from structured documents, a task simple text readers often fail at.

• Pass the extracted images and a descriptive prompt to the locally loaded granite vision model as it is crucial for multi-modality. By converting images into descriptive text captions, we make visual information searchable via the standard text embedding model.This mechanism ensures the agent is not "blind" to non-textual context, thus improves the completeness of the knowledge base.

• Implement caching logic to store the results, preventing the time-consuming and computationally demanding multi-modal captioning process from having to be repeated. Storing the processed knowledge base speeds up development and repeated execution.

• Ensure the final knowledge base gives the self-reflective agent full context that includes both textual and visual data. This is the main objective of the entire process, giving the later self-reflective retrieval the foundation it needs to be precise and well-founded.

Step 9. Indexing and retriever setup¶

This step completes the preparation of the multi-modal knowledge base by indexing all processed document chunks into an efficient, searchable vector store, which forms the basis for the agent's initial retrieval capability.

In [ ]:

Copied!





# Indexing and Retriever Setup 
from langchain_ibm import WatsonxEmbeddings 
from langchain_community.vectorstores import FAISS

print("\n Starting Vector Store Creation ", flush=True)

try:
    # Create the FAISS Vector Store
    vectorstore = FAISS.from_documents(
        documents=all_documents,
        embedding=embeddings_model
    )
    print(f"Vector Store created successfully with {len(all_documents)} documents.", flush=True)

    # Create the Retriever
    # We set 'k=5' to retrieve the top 5 most similar documents for any given query.
    retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
    print("Retriever configured (k=5). Ready for RAG.", flush=True)

except Exception as e:
    # This captures errors like embedding failures.
    print(f"Vector Store creation failed: {e}", flush=True)
# Indexing and Retriever Setup 
from langchain_ibm import WatsonxEmbeddings 
from langchain_community.vectorstores import FAISS

print("\n Starting Vector Store Creation ", flush=True)

try:
    # Create the FAISS Vector Store
    vectorstore = FAISS.from_documents(
        documents=all_documents,
        embedding=embeddings_model
    )
    print(f"Vector Store created successfully with {len(all_documents)} documents.", flush=True)

    # Create the Retriever
    # We set 'k=5' to retrieve the top 5 most similar documents for any given query.
    retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
    print("Retriever configured (k=5). Ready for RAG.", flush=True)

except Exception as e:
    # This captures errors like embedding failures.
    print(f"Vector Store creation failed: {e}", flush=True)

--- Starting Vector Store Creation ---
Vector Store created successfully with 753 documents.
Retriever configured (k=5). Ready for RAG.

This configuration plays a key role in preparing the retrieval layer for the self-RAG workflow:

• It builds a high efficiency vector store using FAISS, which is well known for its speed and scalability when handling dense vector indexes. This ensures that similarity searches run quickly which is critical for maintaining a responsive RAG pipeline.

• It transforms the multi modal knowledge base into vector representations, allowing the retriever to match user queries by meaning rather than relying on exact keyword overlap.

• It fine tunes context delivery by typically retrieving the top five most relevant documents (k=5), balancing precision and relevance within the model’s context window.

• It establishes a single, consistent knowledge source that the self RAG agent can depend on for factual grounding that is an essential element of any trustworthy retrieval augmented system.

Step 10. LangGraph state and core self-RAG logic¶

This step sets up the main sections of the self-RAG workflow. The agent state tracks the entire process. The LangGraph node functions manage the flexible, self-correcting logic.

In [ ]:

Copied!





# LangGraph state and core self-RAG logic

from typing import TypedDict, List
from langchain_core.documents import Document
from langgraph.graph import StateGraph, END
import re
# Assumed objects: qa_llm, retriever, calculate_score (defined below)


# Define the Agent State (Schema)
class AgentState(TypedDict):
    """Represents the state of the Self-RAG agent."""
    query: str                       # The original user query
    retrieved_docs: List[Document]   # Documents retrieved from the vector store
    generation_history: List[str]    # History of generated segments
    critique_score: float            # The critique score of the last generated segment
    segment_count: int               # Counter for generated segments
    finish_generation: bool          # Flag to stop the generation loop


# LangGraph Node Functions (SELF-RAG Logic)
MAX_SEGMENTS = 10 
SCORE_THRESHOLD = 2.5 # Defined here for convenience, but also used in evaluate_critique

def calculate_score(isrel_val: str, issup_val: str, isuse_val: int) -> float:
    """Calculates the combined weighted score for a segment (Soft Constraint)."""
    W_ISSUP = 3.0  
    W_ISREL = 1.5  
    W_ISUSE = 0.5  

    score_rel = 1.0 if isrel_val == "Relevant" else 0.0
    
    if "Fully Supported" in issup_val:
        score_sup = 1.0
    elif "Partially" in issup_val:
        score_sup = 0.5
    else:
        score_sup = 0.0

    score_use = (isuse_val - 1) / 4.0  
    
    total_score = (W_ISREL * score_rel) + (W_ISSUP * score_sup) + (W_ISUSE * score_use)
    return total_score


def initial_decision(state: AgentState) -> AgentState:
    """Initial decision on whether to retrieve based on the query type."""
    query = state["query"]
    
    prompt = f"""
    You are an expert self-reflecting LLM. Your task is to determine if external knowledge is required to answer the following query accurately.
    - If knowledge is required, output the token: <|Retrieve=Yes|>
    - If the query is open-ended or based on common knowledge, output the token: <|Retrieve=No|>

    Query: "{query}"

    Decision Token:
    """
    
    response = qa_llm.invoke(prompt)
    
    if "<|Retrieve=Yes|>" in response:
        print("Decision: Retrieval required.", flush=True) 
        return {"query": query, "retrieved_docs": [], "critique_score": 0.0, "segment_count": 0, "finish_generation": False}
    else:
        print("Decision: No retrieval required for initial generation.", flush=True) 
        return {"query": query, "retrieved_docs": [Document(page_content="No documents retrieved.")], "critique_score": 0.0, "segment_count": 0, "finish_generation": False}


def retrieve_docs(state: AgentState) -> AgentState:
    """Retrieves documents based on the current query or the last generated segment."""
    query = state["query"]
    
    if state.get("generation_history"):
        search_query = state["generation_history"][-1]
    else:
        search_query = query
        
    print(f"Retrieving documents for: '{search_query[:50]}...'", flush=True) 
    docs = retriever.invoke(search_query)
    
    return {"retrieved_docs": docs}


def generate_segment(state: AgentState) -> AgentState:
    """Generates the next answer segment and self-reflects using critique tokens."""
    query = state["query"]
    history = state.get("generation_history", [])
    
    docs_context = "\n---\n".join([f"Source ({d.metadata.get('chunk_id')}): {d.page_content}" for d in state["retrieved_docs"]])
    history_context = "\n".join(history)
    
    prompt = f"""
    You are a SELF-RAG agent using the IBM Granite model. Your goal is to generate one accurate, concise segment of an answer.
    
    INSTRUCTION: Generate a comprehensive, multi-segment answer to the user's query.
    1. CONTEXT: Use the provided document segments (which include text and image captions) to answer the question accurately.
    2. SEGMENTATION: Only use the <|END|> token when the answer is fully comprehensive and detailed, and you have no more relevant information to add.
    3. REFLECTION: After generating the segment, immediately append these key-value reflection tokens:
        - ISREL: <|ISREL=Relevant|> or <|ISREL=Irrelevant|>
        - ISSUP: <|ISSUP=Fully Supported|> or <|ISSUP=Partially Supported|> or <|ISSUP=No Support|>
        - ISUSE: <|ISUSE=N|> (where N is the overall quality/utility score from 1 to 5, 5 is best).
        
    CURRENT QUERY: "{query}"
    
    HISTORY SO FAR: "{history_context}"
    
    RETRIEVED CONTEXT (Multi-Modal: text chunks and image captions):
    {docs_context}
    
    ---
    
    Generate the NEXT SEGMENT and REFLECTION TOKENS. End the entire generation with <|END|> if the answer is complete.
    """
    
    print(f"Generating Segment {state['segment_count'] + 1}...", flush=True) 
    full_response = qa_llm.invoke(prompt)

    CRITIQUE_TOKENS = ["<|ISREL=", "<|ISSUP=", "<|ISUSE=", "|>"] 

    isrel = re.search(r"<\|ISREL=(.+?)\|>", full_response)
    issup = re.search(r"<\|ISSUP=(.+?)\|>", full_response)
    isuse = re.search(r"<\|ISUSE=(\d+)\|>", full_response)
    
    isrel_val = isrel.group(1).strip() if isrel else "Irrelevant"
    issup_val = issup.group(1).strip() if issup else "No Support"
    isuse_val = int(isuse.group(1).strip()) if isuse and isuse.group(1).isdigit() else 1

    segment = full_response
    for token in CRITIQUE_TOKENS + ["<|Retrieve=Yes|>", "<|Retrieve=No|>", "<|END|>"]:
        segment = segment.replace(token, "").strip()
    
    new_history = history + [segment]
    
    print(f"  -> ISREL: {isrel_val}, ISSUP: {issup_val}, ISUSE: {isuse_val}", flush=True) 
    
    return {
        "generation_history": new_history,
        "segment_count": state["segment_count"] + 1,
        "finish_generation": "<|END|>" in full_response,
        "critique_score": calculate_score(isrel_val, issup_val, isuse_val), 
        "retrieved_docs": state["retrieved_docs"] 
    }


def evaluate_critique(state: AgentState) -> str:
    """Conditional edge function to determine the next step based on critique score."""
    score = state["critique_score"]
    segment_count = state["segment_count"]
    is_finished = state["finish_generation"]
    
    if is_finished or segment_count >= MAX_SEGMENTS:
        return "end"
    
    if score < SCORE_THRESHOLD:
        print(f"Critique: Low score ({score:.2f}) observed. FORCING RE-RETRIEVAL for next segment.", flush=True) 
        return "retrieve"  
    
    print(f"Critique: High score ({score:.2f}) observed. Continuing generation.", flush=True)
    return "continue"


def finalize_answer(state: AgentState) -> AgentState:
    """Compiles the final answer."""
    final_answer = "\n".join(state["generation_history"])
    print("\n--- FINAL ANSWER ---", flush=True) 
    print(final_answer, flush=True)
    return state


# Build and Compile the LangGraph Workflow

print("\n Building and Compiling LangGraph Workflow ", flush=True)

workflow = StateGraph(AgentState)

# Add Nodes (Function Calls)
workflow.add_node("initial_decision", initial_decision)
workflow.add_node("retrieve_docs", retrieve_docs)
workflow.add_node("generate_segment", generate_segment)
workflow.add_node("finalize_answer", finalize_answer)


# Define Edges (Flow Control)
workflow.set_entry_point("initial_decision")

# Edge 1: Decide between retrieval or initial generation
workflow.add_conditional_edges(
    "initial_decision", 
    lambda state: "retrieve" if not state["retrieved_docs"] else "generate",
    {
        "retrieve": "retrieve_docs",
        "generate": "generate_segment",
    }
)

# Edge 2: After retrieval, always generate a segment
workflow.add_edge("retrieve_docs", "generate_segment")


# Edge 3: The core loop - Evaluate the critique score to determine the next action
workflow.add_conditional_edges(
    "generate_segment",
    evaluate_critique, 
    {
        "retrieve": "retrieve_docs",
        "continue": "generate_segment",
        "end": "finalize_answer",
    }
)

# Edge 4: End the workflow
workflow.add_edge("finalize_answer", END)


# Compile the Graph
app = workflow.compile()
print("LangGraph workflow compiled successfully (object named 'app').", flush=True)
# LangGraph state and core self-RAG logic

from typing import TypedDict, List
from langchain_core.documents import Document
from langgraph.graph import StateGraph, END
import re
# Assumed objects: qa_llm, retriever, calculate_score (defined below)


# Define the Agent State (Schema)
class AgentState(TypedDict):
    """Represents the state of the Self-RAG agent."""
    query: str                       # The original user query
    retrieved_docs: List[Document]   # Documents retrieved from the vector store
    generation_history: List[str]    # History of generated segments
    critique_score: float            # The critique score of the last generated segment
    segment_count: int               # Counter for generated segments
    finish_generation: bool          # Flag to stop the generation loop


# LangGraph Node Functions (SELF-RAG Logic)
MAX_SEGMENTS = 10 
SCORE_THRESHOLD = 2.5 # Defined here for convenience, but also used in evaluate_critique

def calculate_score(isrel_val: str, issup_val: str, isuse_val: int) -> float:
    """Calculates the combined weighted score for a segment (Soft Constraint)."""
    W_ISSUP = 3.0  
    W_ISREL = 1.5  
    W_ISUSE = 0.5  

    score_rel = 1.0 if isrel_val == "Relevant" else 0.0
    
    if "Fully Supported" in issup_val:
        score_sup = 1.0
    elif "Partially" in issup_val:
        score_sup = 0.5
    else:
        score_sup = 0.0

    score_use = (isuse_val - 1) / 4.0  
    
    total_score = (W_ISREL * score_rel) + (W_ISSUP * score_sup) + (W_ISUSE * score_use)
    return total_score


def initial_decision(state: AgentState) -> AgentState:
    """Initial decision on whether to retrieve based on the query type."""
    query = state["query"]
    
    prompt = f"""
    You are an expert self-reflecting LLM. Your task is to determine if external knowledge is required to answer the following query accurately.
    - If knowledge is required, output the token: <|Retrieve=Yes|>
    - If the query is open-ended or based on common knowledge, output the token: <|Retrieve=No|>

    Query: "{query}"

    Decision Token:
    """
    
    response = qa_llm.invoke(prompt)
    
    if "<|Retrieve=Yes|>" in response:
        print("Decision: Retrieval required.", flush=True) 
        return {"query": query, "retrieved_docs": [], "critique_score": 0.0, "segment_count": 0, "finish_generation": False}
    else:
        print("Decision: No retrieval required for initial generation.", flush=True) 
        return {"query": query, "retrieved_docs": [Document(page_content="No documents retrieved.")], "critique_score": 0.0, "segment_count": 0, "finish_generation": False}


def retrieve_docs(state: AgentState) -> AgentState:
    """Retrieves documents based on the current query or the last generated segment."""
    query = state["query"]
    
    if state.get("generation_history"):
        search_query = state["generation_history"][-1]
    else:
        search_query = query
        
    print(f"Retrieving documents for: '{search_query[:50]}...'", flush=True) 
    docs = retriever.invoke(search_query)
    
    return {"retrieved_docs": docs}


def generate_segment(state: AgentState) -> AgentState:
    """Generates the next answer segment and self-reflects using critique tokens."""
    query = state["query"]
    history = state.get("generation_history", [])
    
    docs_context = "\n---\n".join([f"Source ({d.metadata.get('chunk_id')}): {d.page_content}" for d in state["retrieved_docs"]])
    history_context = "\n".join(history)
    
    prompt = f"""
    You are a SELF-RAG agent using the IBM Granite model. Your goal is to generate one accurate, concise segment of an answer.
    
    INSTRUCTION: Generate a comprehensive, multi-segment answer to the user's query.
    1. CONTEXT: Use the provided document segments (which include text and image captions) to answer the question accurately.
    2. SEGMENTATION: Only use the <|END|> token when the answer is fully comprehensive and detailed, and you have no more relevant information to add.
    3. REFLECTION: After generating the segment, immediately append these key-value reflection tokens:
        - ISREL: <|ISREL=Relevant|> or <|ISREL=Irrelevant|>
        - ISSUP: <|ISSUP=Fully Supported|> or <|ISSUP=Partially Supported|> or <|ISSUP=No Support|>
        - ISUSE: <|ISUSE=N|> (where N is the overall quality/utility score from 1 to 5, 5 is best).
        
    CURRENT QUERY: "{query}"
    
    HISTORY SO FAR: "{history_context}"
    
    RETRIEVED CONTEXT (Multi-Modal: text chunks and image captions):
    {docs_context}
    
    ---
    
    Generate the NEXT SEGMENT and REFLECTION TOKENS. End the entire generation with <|END|> if the answer is complete.
    """
    
    print(f"Generating Segment {state['segment_count'] + 1}...", flush=True) 
    full_response = qa_llm.invoke(prompt)

    CRITIQUE_TOKENS = ["<|ISREL=", "<|ISSUP=", "<|ISUSE=", "|>"] 

    isrel = re.search(r"<\|ISREL=(.+?)\|>", full_response)
    issup = re.search(r"<\|ISSUP=(.+?)\|>", full_response)
    isuse = re.search(r"<\|ISUSE=(\d+)\|>", full_response)
    
    isrel_val = isrel.group(1).strip() if isrel else "Irrelevant"
    issup_val = issup.group(1).strip() if issup else "No Support"
    isuse_val = int(isuse.group(1).strip()) if isuse and isuse.group(1).isdigit() else 1

    segment = full_response
    for token in CRITIQUE_TOKENS + ["<|Retrieve=Yes|>", "<|Retrieve=No|>", "<|END|>"]:
        segment = segment.replace(token, "").strip()
    
    new_history = history + [segment]
    
    print(f"  -> ISREL: {isrel_val}, ISSUP: {issup_val}, ISUSE: {isuse_val}", flush=True) 
    
    return {
        "generation_history": new_history,
        "segment_count": state["segment_count"] + 1,
        "finish_generation": "<|END|>" in full_response,
        "critique_score": calculate_score(isrel_val, issup_val, isuse_val), 
        "retrieved_docs": state["retrieved_docs"] 
    }


def evaluate_critique(state: AgentState) -> str:
    """Conditional edge function to determine the next step based on critique score."""
    score = state["critique_score"]
    segment_count = state["segment_count"]
    is_finished = state["finish_generation"]
    
    if is_finished or segment_count >= MAX_SEGMENTS:
        return "end"
    
    if score < SCORE_THRESHOLD:
        print(f"Critique: Low score ({score:.2f}) observed. FORCING RE-RETRIEVAL for next segment.", flush=True) 
        return "retrieve"  
    
    print(f"Critique: High score ({score:.2f}) observed. Continuing generation.", flush=True)
    return "continue"


def finalize_answer(state: AgentState) -> AgentState:
    """Compiles the final answer."""
    final_answer = "\n".join(state["generation_history"])
    print("\n--- FINAL ANSWER ---", flush=True) 
    print(final_answer, flush=True)
    return state


# Build and Compile the LangGraph Workflow

print("\n Building and Compiling LangGraph Workflow ", flush=True)

workflow = StateGraph(AgentState)

# Add Nodes (Function Calls)
workflow.add_node("initial_decision", initial_decision)
workflow.add_node("retrieve_docs", retrieve_docs)
workflow.add_node("generate_segment", generate_segment)
workflow.add_node("finalize_answer", finalize_answer)


# Define Edges (Flow Control)
workflow.set_entry_point("initial_decision")

# Edge 1: Decide between retrieval or initial generation
workflow.add_conditional_edges(
    "initial_decision", 
    lambda state: "retrieve" if not state["retrieved_docs"] else "generate",
    {
        "retrieve": "retrieve_docs",
        "generate": "generate_segment",
    }
)

# Edge 2: After retrieval, always generate a segment
workflow.add_edge("retrieve_docs", "generate_segment")


# Edge 3: The core loop - Evaluate the critique score to determine the next action
workflow.add_conditional_edges(
    "generate_segment",
    evaluate_critique, 
    {
        "retrieve": "retrieve_docs",
        "continue": "generate_segment",
        "end": "finalize_answer",
    }
)

# Edge 4: End the workflow
workflow.add_edge("finalize_answer", END)


# Compile the Graph
app = workflow.compile()
print("LangGraph workflow compiled successfully (object named 'app').", flush=True)

--- Building and Compiling LangGraph Workflow ---
LangGraph workflow compiled successfully (object named 'app').

This code serves several purposes:

• The agent keeps a core memory that stores its evolving response, the evidence it has retrieved, and internal feedback. This memory helps the agent's logic to dynamically improve its reasoning by storing context across various steps.

• The agent first determines whether adequate factual grounding is present before producing any segments. To ensure that the generated response is accurate and pertinent, the agent intelligently seeks for stronger, more supportive information if the existing context is deemed incomplete.

• Alongside each generated segment, the model issues internal reflection tokens that immediately quantify the output's relevance, factual support, and overall quality. These critical signals are then combined into a single critique score, giving the agent an objective, measurable way to judge its own performance.

• Determined by the critique score, the agent then decides whether to rework, expand upon, or finalize its answer. This iterative process makes the system inherently resilient, forcing it to improve incorrect generations and maintain factual precision over multiple reasoning rounds.

Step 11. LangGraph state and core self-RAG logic¶

The entire self-RAG workflow begins with this last step.

In [ ]:

Copied!





# Execute the LangGraph Workflow

# 1. Define the Query
# This query is designed to require information from both documents.
user_query = "What is the primary purpose of the ICH E6(R3) Guideline and what are the key findings from the EFPIA 2024 inspection survey regarding remote inspections?"

# 2. Define the Initial Input State
# The generation_history must start empty.
inputs = {
    "query": user_query, 
    "generation_history": []
}

print(f"\n--- STARTING LANGGRAPH EXECUTION ---", flush=True)
print(f"Query: {user_query}\n", flush=True)

# 3. Stream the Execution
# This loop runs the graph and prints the state update after each node completes.
for step in app.stream(inputs):
    # Print the name of the node that just executed and its resulting state
    print(step, flush=True)
    print("\n--- NODE TRANSITION ---", flush=True) 

print(f"--- LANGGRAPH EXECUTION COMPLETE ---", flush=True)
# Execute the LangGraph Workflow

# 1. Define the Query
# This query is designed to require information from both documents.
user_query = "What is the primary purpose of the ICH E6(R3) Guideline and what are the key findings from the EFPIA 2024 inspection survey regarding remote inspections?"

# 2. Define the Initial Input State
# The generation_history must start empty.
inputs = {
    "query": user_query, 
    "generation_history": []
}

print(f"\n--- STARTING LANGGRAPH EXECUTION ---", flush=True)
print(f"Query: {user_query}\n", flush=True)

# 3. Stream the Execution
# This loop runs the graph and prints the state update after each node completes.
for step in app.stream(inputs):
    # Print the name of the node that just executed and its resulting state
    print(step, flush=True)
    print("\n--- NODE TRANSITION ---", flush=True) 

print(f"--- LANGGRAPH EXECUTION COMPLETE ---", flush=True)

--- STARTING LANGGRAPH EXECUTION ---
Query: What is the primary purpose of the ICH E6(R3) Guideline and what are the key findings from the EFPIA 2024 inspection survey regarding remote inspections?

Decision: Retrieval required.
{'initial_decision': {'query': 'What is the primary purpose of the ICH E6(R3) Guideline and what are the key findings from the EFPIA 2024 inspection survey regarding remote inspections?', 'retrieved_docs': [], 'critique_score': 0.0, 'segment_count': 0, 'finish_generation': False}}

--- NODE TRANSITION ---
Retrieving documents for: 'What is the primary purpose of the ICH E6(R3) Guid...'
{'retrieve_docs': {'retrieved_docs': [Document(id='399b5b94-41dd-460a-84ad-69d9da64aedb', metadata={'source': 'inspection_survey.pdf', 'page': 30, 'chunk_id': 'P30-T0'}, page_content='30\nEFPIA ANNUAL INSPECTION SURVEY - 2024 DATA - PUBLIC VERSION\nA strong basis of an inspection using physical \npresence by a strengthened domestic inspectorate\nCollaboration, Reliance, Recognition\nSCIENCE AND RISK-BASED INSPECTION APPROACH\nReal time \nremote presence \nDocument\nreview\nMutual Recognition \nAgreement (MRA)\nUnilateral \nreliance*\n*based on information from other stringent regulatory authorities'), Document(id='f487f42e-80e6-4514-b511-5b5aeeb2efce', metadata={'source': 'inspection_survey.pdf', 'page': 21, 'chunk_id': 'P21-T0'}, page_content='21\nEFPIA ANNUAL INSPECTION SURVEY - 2024 DATA - PUBLIC VERSION\n• Guidance on good practices for desk assessment… for medical products regulatory decisions, WHO, \nTRS 1010 (2018), Annex 9.\n• Good reliance practices in the regulation of medical products: high level principles and \nconsiderations, WHO, TRS 1033, Annex 10, 2022, 237-267.\n• International regulators recommend use of remote inspections as complementary tool beyond \npandemic, EMA-News, 13. Dec 2022.\n• Guidance related to GMP/GDP and PMF: distant assessments. EMA/335293/2020, 15. Oct. 2020\n• Remote Interactive Evaluations of Drug…, FDA , Guidance for Industry, FDA-2020-D-1136, April 22\n• Conducting Remote Regulatory Assessments, Q&A, FDA draft guidance for industry, July 22\n• Joint Audit Programme for EEA GMP inspectorates - JAP Procedure (Rev.3)\n• Report on the review of regulatory flexibilities/agilities as implemented by National Regulatory'), Document(id='f0861537-f5b9-4d90-b13e-bf4041c79bc6', metadata={'source': 'inspection_survey.pdf', 'page': 18, 'chunk_id': 'P18-T0'}, page_content='18\nEFPIA ANNUAL INSPECTION SURVEY - 2024 DATA - PUBLIC VERSION\nEFPIA member companies welcome the expansion of the pilot and \nflexibility adopting the inspection approach\nThere is opportunity to put all in common practice\nMinimise increased efforts by using existing procedures e.g.,\nLocal inspectorate as lead and coordinating inspector\nUse other time zones for document inspection; use core inspection time for \nclarifications and interactions\nOne inspection report with one set of agreed observations; co-\ninspectors/observers to reference as needed\nNote: Why companies did not apply in 2024 pilot - responses received\nUncertainty about the return on investment\nBusiness priorities / resource allocation / management decisions\nLimited scope in the first round\nA reflections on the positive outcome of the ICMRA \nCollaborative Hybrid Inspection Pilot (CHIP)\nRELIANCE APPROACH - ANSWERS TO QUESTIONS'), Document(id='321a9682-f1ac-4db4-97a0-2ad2b87891e4', metadata={'source': 'inspection_survey.pdf', 'page': 91, 'chunk_id': 'P91-T0'}, page_content='91\nEFPIA ANNUAL INSPECTION SURVEY - 2024 DATA - PUBLIC VERSION\nEU/EEA: Trend of less remote inspections year by year after the pandemic\nUS: No trend but slightly decreasing\nRemote (evaluations) and document inspections\nINSPECTION ACTIVITY – EU/EEA AND US'), Document(id='12290a0d-111f-4373-bec0-5eac7eef3c06', metadata={'source': 'inspection_survey.pdf', 'page': 58, 'chunk_id': 'P58-T0'}, page_content='58\nEFPIA ANNUAL INSPECTION SURVEY - 2024 DATA - PUBLIC VERSION\nFor most inspectorates domestic inspections have more follow up action \nthan foreign inspections – but not e.g., US-FDA\nThe order of countries is different on follow up actions reported to be \naddressed in domestic versus foreign sites – different behaviours?\nRate of inspections with follow-up actions\nINSPECTIONS AT MANUFACTURING SITES\n\uf0a8Green: EU Member State')]}}

--- NODE TRANSITION ---
Generating Segment 1...
  -> ISREL: Relevant, ISSUP: Fully Supported, ISUSE: 5
{'generate_segment': {'generation_history': ['SEGMENT: The ICH E6(R3) Guideline primarily focuses on good clinical practice for design and conduct of clinical trials on medicinal products. It aims to harmonize ethical and scientific quality standards for all aspects of clinical trial conduct. Regarding remote inspections, the EFPIA 2024 inspection survey reveals that while there is a trend of fewer remote inspections in the EU/EEA post-pandemic, the US shows no clear trend, with a slight decrease. The survey also highlights the potential for minimizing increased efforts through existing procedures such as utilizing local inspectorates as leads, leveraging different time zones for document inspection, and ensuring one inspection report with agreed observations. However, uncertainty about return on investment and business priorities were cited as reasons for not applying in the 2024 pilot.\n\nISREL: Relevant\nISSUP: Fully Supported\nISUSE: 5\n\n<|END'], 'segment_count': 1, 'finish_generation': True, 'critique_score': 5.0, 'retrieved_docs': [Document(id='399b5b94-41dd-460a-84ad-69d9da64aedb', metadata={'source': 'inspection_survey.pdf', 'page': 30, 'chunk_id': 'P30-T0'}, page_content='30\nEFPIA ANNUAL INSPECTION SURVEY - 2024 DATA - PUBLIC VERSION\nA strong basis of an inspection using physical \npresence by a strengthened domestic inspectorate\nCollaboration, Reliance, Recognition\nSCIENCE AND RISK-BASED INSPECTION APPROACH\nReal time \nremote presence \nDocument\nreview\nMutual Recognition \nAgreement (MRA)\nUnilateral \nreliance*\n*based on information from other stringent regulatory authorities'), Document(id='f487f42e-80e6-4514-b511-5b5aeeb2efce', metadata={'source': 'inspection_survey.pdf', 'page': 21, 'chunk_id': 'P21-T0'}, page_content='21\nEFPIA ANNUAL INSPECTION SURVEY - 2024 DATA - PUBLIC VERSION\n• Guidance on good practices for desk assessment… for medical products regulatory decisions, WHO, \nTRS 1010 (2018), Annex 9.\n• Good reliance practices in the regulation of medical products: high level principles and \nconsiderations, WHO, TRS 1033, Annex 10, 2022, 237-267.\n• International regulators recommend use of remote inspections as complementary tool beyond \npandemic, EMA-News, 13. Dec 2022.\n• Guidance related to GMP/GDP and PMF: distant assessments. EMA/335293/2020, 15. Oct. 2020\n• Remote Interactive Evaluations of Drug…, FDA , Guidance for Industry, FDA-2020-D-1136, April 22\n• Conducting Remote Regulatory Assessments, Q&A, FDA draft guidance for industry, July 22\n• Joint Audit Programme for EEA GMP inspectorates - JAP Procedure (Rev.3)\n• Report on the review of regulatory flexibilities/agilities as implemented by National Regulatory'), Document(id='f0861537-f5b9-4d90-b13e-bf4041c79bc6', metadata={'source': 'inspection_survey.pdf', 'page': 18, 'chunk_id': 'P18-T0'}, page_content='18\nEFPIA ANNUAL INSPECTION SURVEY - 2024 DATA - PUBLIC VERSION\nEFPIA member companies welcome the expansion of the pilot and \nflexibility adopting the inspection approach\nThere is opportunity to put all in common practice\nMinimise increased efforts by using existing procedures e.g.,\nLocal inspectorate as lead and coordinating inspector\nUse other time zones for document inspection; use core inspection time for \nclarifications and interactions\nOne inspection report with one set of agreed observations; co-\ninspectors/observers to reference as needed\nNote: Why companies did not apply in 2024 pilot - responses received\nUncertainty about the return on investment\nBusiness priorities / resource allocation / management decisions\nLimited scope in the first round\nA reflections on the positive outcome of the ICMRA \nCollaborative Hybrid Inspection Pilot (CHIP)\nRELIANCE APPROACH - ANSWERS TO QUESTIONS'), Document(id='321a9682-f1ac-4db4-97a0-2ad2b87891e4', metadata={'source': 'inspection_survey.pdf', 'page': 91, 'chunk_id': 'P91-T0'}, page_content='91\nEFPIA ANNUAL INSPECTION SURVEY - 2024 DATA - PUBLIC VERSION\nEU/EEA: Trend of less remote inspections year by year after the pandemic\nUS: No trend but slightly decreasing\nRemote (evaluations) and document inspections\nINSPECTION ACTIVITY – EU/EEA AND US'), Document(id='12290a0d-111f-4373-bec0-5eac7eef3c06', metadata={'source': 'inspection_survey.pdf', 'page': 58, 'chunk_id': 'P58-T0'}, page_content='58\nEFPIA ANNUAL INSPECTION SURVEY - 2024 DATA - PUBLIC VERSION\nFor most inspectorates domestic inspections have more follow up action \nthan foreign inspections – but not e.g., US-FDA\nThe order of countries is different on follow up actions reported to be \naddressed in domestic versus foreign sites – different behaviours?\nRate of inspections with follow-up actions\nINSPECTIONS AT MANUFACTURING SITES\n\uf0a8Green: EU Member State')]}}

--- NODE TRANSITION ---

--- FINAL ANSWER ---
SEGMENT: The ICH E6(R3) Guideline primarily focuses on good clinical practice for design and conduct of clinical trials on medicinal products. It aims to harmonize ethical and scientific quality standards for all aspects of clinical trial conduct. Regarding remote inspections, the EFPIA 2024 inspection survey reveals that while there is a trend of fewer remote inspections in the EU/EEA post-pandemic, the US shows no clear trend, with a slight decrease. The survey also highlights the potential for minimizing increased efforts through existing procedures such as utilizing local inspectorates as leads, leveraging different time zones for document inspection, and ensuring one inspection report with agreed observations. However, uncertainty about return on investment and business priorities were cited as reasons for not applying in the 2024 pilot.

ISREL: Relevant
ISSUP: Fully Supported
ISUSE: 5

<|END
{'finalize_answer': {'query': 'What is the primary purpose of the ICH E6(R3) Guideline and what are the key findings from the EFPIA 2024 inspection survey regarding remote inspections?', 'retrieved_docs': [Document(id='399b5b94-41dd-460a-84ad-69d9da64aedb', metadata={'source': 'inspection_survey.pdf', 'page': 30, 'chunk_id': 'P30-T0'}, page_content='30\nEFPIA ANNUAL INSPECTION SURVEY - 2024 DATA - PUBLIC VERSION\nA strong basis of an inspection using physical \npresence by a strengthened domestic inspectorate\nCollaboration, Reliance, Recognition\nSCIENCE AND RISK-BASED INSPECTION APPROACH\nReal time \nremote presence \nDocument\nreview\nMutual Recognition \nAgreement (MRA)\nUnilateral \nreliance*\n*based on information from other stringent regulatory authorities'), Document(id='f487f42e-80e6-4514-b511-5b5aeeb2efce', metadata={'source': 'inspection_survey.pdf', 'page': 21, 'chunk_id': 'P21-T0'}, page_content='21\nEFPIA ANNUAL INSPECTION SURVEY - 2024 DATA - PUBLIC VERSION\n• Guidance on good practices for desk assessment… for medical products regulatory decisions, WHO, \nTRS 1010 (2018), Annex 9.\n• Good reliance practices in the regulation of medical products: high level principles and \nconsiderations, WHO, TRS 1033, Annex 10, 2022, 237-267.\n• International regulators recommend use of remote inspections as complementary tool beyond \npandemic, EMA-News, 13. Dec 2022.\n• Guidance related to GMP/GDP and PMF: distant assessments. EMA/335293/2020, 15. Oct. 2020\n• Remote Interactive Evaluations of Drug…, FDA , Guidance for Industry, FDA-2020-D-1136, April 22\n• Conducting Remote Regulatory Assessments, Q&A, FDA draft guidance for industry, July 22\n• Joint Audit Programme for EEA GMP inspectorates - JAP Procedure (Rev.3)\n• Report on the review of regulatory flexibilities/agilities as implemented by National Regulatory'), Document(id='f0861537-f5b9-4d90-b13e-bf4041c79bc6', metadata={'source': 'inspection_survey.pdf', 'page': 18, 'chunk_id': 'P18-T0'}, page_content='18\nEFPIA ANNUAL INSPECTION SURVEY - 2024 DATA - PUBLIC VERSION\nEFPIA member companies welcome the expansion of the pilot and \nflexibility adopting the inspection approach\nThere is opportunity to put all in common practice\nMinimise increased efforts by using existing procedures e.g.,\nLocal inspectorate as lead and coordinating inspector\nUse other time zones for document inspection; use core inspection time for \nclarifications and interactions\nOne inspection report with one set of agreed observations; co-\ninspectors/observers to reference as needed\nNote: Why companies did not apply in 2024 pilot - responses received\nUncertainty about the return on investment\nBusiness priorities / resource allocation / management decisions\nLimited scope in the first round\nA reflections on the positive outcome of the ICMRA \nCollaborative Hybrid Inspection Pilot (CHIP)\nRELIANCE APPROACH - ANSWERS TO QUESTIONS'), Document(id='321a9682-f1ac-4db4-97a0-2ad2b87891e4', metadata={'source': 'inspection_survey.pdf', 'page': 91, 'chunk_id': 'P91-T0'}, page_content='91\nEFPIA ANNUAL INSPECTION SURVEY - 2024 DATA - PUBLIC VERSION\nEU/EEA: Trend of less remote inspections year by year after the pandemic\nUS: No trend but slightly decreasing\nRemote (evaluations) and document inspections\nINSPECTION ACTIVITY – EU/EEA AND US'), Document(id='12290a0d-111f-4373-bec0-5eac7eef3c06', metadata={'source': 'inspection_survey.pdf', 'page': 58, 'chunk_id': 'P58-T0'}, page_content='58\nEFPIA ANNUAL INSPECTION SURVEY - 2024 DATA - PUBLIC VERSION\nFor most inspectorates domestic inspections have more follow up action \nthan foreign inspections – but not e.g., US-FDA\nThe order of countries is different on follow up actions reported to be \naddressed in domestic versus foreign sites – different behaviours?\nRate of inspections with follow-up actions\nINSPECTIONS AT MANUFACTURING SITES\n\uf0a8Green: EU Member State')], 'generation_history': ['SEGMENT: The ICH E6(R3) Guideline primarily focuses on good clinical practice for design and conduct of clinical trials on medicinal products. It aims to harmonize ethical and scientific quality standards for all aspects of clinical trial conduct. Regarding remote inspections, the EFPIA 2024 inspection survey reveals that while there is a trend of fewer remote inspections in the EU/EEA post-pandemic, the US shows no clear trend, with a slight decrease. The survey also highlights the potential for minimizing increased efforts through existing procedures such as utilizing local inspectorates as leads, leveraging different time zones for document inspection, and ensuring one inspection report with agreed observations. However, uncertainty about return on investment and business priorities were cited as reasons for not applying in the 2024 pilot.\n\nISREL: Relevant\nISSUP: Fully Supported\nISUSE: 5\n\n<|END'], 'critique_score': 5.0, 'segment_count': 1, 'finish_generation': True}}

--- NODE TRANSITION ---
--- LANGGRAPH EXECUTION COMPLETE ---

In [ ]:

Copied!





# Final Answer Extraction and Review


final_state = None
# The query and inputs are reused from Step 10
user_query = "What is the primary purpose of the ICH E6(R3) Guideline and what are the key findings from the EFPIA 2024 inspection survey regarding remote inspections?"
inputs = {"query": user_query, "generation_history": []}


print("\n--- RE-RUNNING EXECUTION FOR FINAL EXTRACTION ---", flush=True)

for step in app.stream(inputs):
    for key, value in step.items():
        # The key tells us which node just ran (e.g., 'finalize_answer')
        # The value is the state output of that node
        if key == "finalize_answer":
            final_state = value 
        elif key == END:
            # If the END node is hit, the graph is finished
            final_state = value 

# 2. Extract and Format the Final Answer
if final_state and "generation_history" in final_state:
    # Join all generated segments into one cohesive answer
    final_answer = "\n".join(final_state["generation_history"]).strip()

    print("\n==============================================", flush=True)
    print(" RAG PIPELINE COMPLETE", flush=True)
    print("==============================================", flush=True)
    print(f"USER QUERY:\n{user_query}\n", flush=True)
    print(f"FINAL GENERATED ANSWER ({final_state['segment_count']} segments):", flush=True)
    print("----------------------------------------------", flush=True)
    print(final_answer, flush=True)
    print("----------------------------------------------", flush=True)
else:
    print("\n EXECUTION FAILED or final state was not captured.", flush=True)
    print(f"Last recorded state: {final_state}", flush=True)
# Final Answer Extraction and Review


final_state = None
# The query and inputs are reused from Step 10
user_query = "What is the primary purpose of the ICH E6(R3) Guideline and what are the key findings from the EFPIA 2024 inspection survey regarding remote inspections?"
inputs = {"query": user_query, "generation_history": []}


print("\n--- RE-RUNNING EXECUTION FOR FINAL EXTRACTION ---", flush=True)

for step in app.stream(inputs):
    for key, value in step.items():
        # The key tells us which node just ran (e.g., 'finalize_answer')
        # The value is the state output of that node
        if key == "finalize_answer":
            final_state = value 
        elif key == END:
            # If the END node is hit, the graph is finished
            final_state = value 

# 2. Extract and Format the Final Answer
if final_state and "generation_history" in final_state:
    # Join all generated segments into one cohesive answer
    final_answer = "\n".join(final_state["generation_history"]).strip()

    print("\n==============================================", flush=True)
    print(" RAG PIPELINE COMPLETE", flush=True)
    print("==============================================", flush=True)
    print(f"USER QUERY:\n{user_query}\n", flush=True)
    print(f"FINAL GENERATED ANSWER ({final_state['segment_count']} segments):", flush=True)
    print("----------------------------------------------", flush=True)
    print(final_answer, flush=True)
    print("----------------------------------------------", flush=True)
else:
    print("\n EXECUTION FAILED or final state was not captured.", flush=True)
    print(f"Last recorded state: {final_state}", flush=True)

--- RE-RUNNING EXECUTION FOR FINAL EXTRACTION ---
Decision: Retrieval required.
Retrieving documents for: 'What is the primary purpose of the ICH E6(R3) Guid...'
Generating Segment 1...
  -> ISREL: Relevant, ISSUP: Fully Supported, ISUSE: 5

--- FINAL ANSWER ---
SEGMENT: The ICH E6(R3) Guideline primarily focuses on good clinical practice for design and conduct of clinical trials on medicinal products. It aims to harmonize these practices across different regions to ensure the protection of human subjects involved in clinical trials and the quality and integrity of the data generated. Regarding remote inspections, the EFPIA 2024 inspection survey reveals that while there is a trend of fewer remote inspections in the EU/EEA post-pandemic, the US shows no clear trend, with a slight decrease. The survey also highlights the potential for minimizing increased efforts through strategies like utilizing local inspectorates as leads, leveraging different time zones for document reviews, and producing one inspection report with agreed observations. However, uncertainty about return on investment and business priorities were cited as reasons for not applying in the 2024 pilot.

ISREL: Relevant
ISSUP: Fully Supported
ISUSE: 5

<|END

==============================================
✅ RAG PIPELINE COMPLETE
==============================================
USER QUERY:
What is the primary purpose of the ICH E6(R3) Guideline and what are the key findings from the EFPIA 2024 inspection survey regarding remote inspections?

FINAL GENERATED ANSWER (1 segments):
----------------------------------------------
SEGMENT: The ICH E6(R3) Guideline primarily focuses on good clinical practice for design and conduct of clinical trials on medicinal products. It aims to harmonize these practices across different regions to ensure the protection of human subjects involved in clinical trials and the quality and integrity of the data generated. Regarding remote inspections, the EFPIA 2024 inspection survey reveals that while there is a trend of fewer remote inspections in the EU/EEA post-pandemic, the US shows no clear trend, with a slight decrease. The survey also highlights the potential for minimizing increased efforts through strategies like utilizing local inspectorates as leads, leveraging different time zones for document reviews, and producing one inspection report with agreed observations. However, uncertainty about return on investment and business priorities were cited as reasons for not applying in the 2024 pilot.

ISREL: Relevant
ISSUP: Fully Supported
ISUSE: 5

<|END
----------------------------------------------

Once the agent either hits the maximum number of segments or completes its multi-segment answer, it produces the final output to the user question. The .stream() method is then used to run the compiled graph, represented by the app object.

The initial state, which contains the detailed user_query, is passed in through the inputs dictionary.

As the graph streams, each loop processes one node at a time based on the system’s internal logic. Every node’s output is printed as it runs, letting us watch the agent refine its reasoning in real time and build its multi-part response ultimately ending with a well-supported final answer. The final step reruns the full self-RAG workflow to create a refined answer. It executes the LangGraph and watches the streaming state updates until the finalize_answer or END node shows up. It pulls the generated segments and joins them into a grounded final answer whenever the final state is reached.

In [ ]:

Copied!





# 1. Define the query and inputs for the graph execution
user_query = "What does the ICH ER6 guideline say about Quality Assurance and Quality Control?"
inputs = {"query": user_query, "generation_history": []}

# 2. Rerun stream and capture the final state
print("--- Rerunning stream to answer the new query ---", flush=True)

final_state = None
# This loop runs the graph and prints the state update after each node completes.
for step in app.stream(inputs):
    print(step, flush=True)
    print("\n--- NODE TRANSITION ---\n", flush=True)
    # Capture the last yielded state (which contains the compiled final history)
    for key, value in step.items():
        if key != END:
            final_state = value

# 3. Extract and Format the Final Answer
if final_state and "generation_history" in final_state:
    # Join all generated segments (which should now be clean)
    final_answer_text = "\n".join(final_state["generation_history"]).strip()
    
    # Run a final cleanup pass
    final_answer_text = re.sub(r'\s*(Relevant|Irrelevant)\s*(Fully Supported|Partially Supported|No Support)\s*\d', '', final_answer_text).strip()
    final_answer_text = final_answer_text.replace("<|END", "").strip()

    # 4. Present the Results
    print("\n\n#####################################################", flush=True)
    print("            FINAL SELF-RAG ANSWER                 ", flush=True)
    print("#####################################################\n", flush=True)

    print("--- ANSWER ---", flush=True)
    print(final_answer_text, flush=True)
    print("\n#####################################################", flush=True)

else:
    print("\n EXECUTION FAILED or final state was not captured.", flush=True)

# 1. Define the query and inputs for the graph execution
user_query = "What does the ICH ER6 guideline say about Quality Assurance and Quality Control?"
inputs = {"query": user_query, "generation_history": []}

# 2. Rerun stream and capture the final state
print("--- Rerunning stream to answer the new query ---", flush=True)

final_state = None
# This loop runs the graph and prints the state update after each node completes.
for step in app.stream(inputs):
    print(step, flush=True)
    print("\n--- NODE TRANSITION ---\n", flush=True)
    # Capture the last yielded state (which contains the compiled final history)
    for key, value in step.items():
        if key != END:
            final_state = value

# 3. Extract and Format the Final Answer
if final_state and "generation_history" in final_state:
    # Join all generated segments (which should now be clean)
    final_answer_text = "\n".join(final_state["generation_history"]).strip()
    
    # Run a final cleanup pass
    final_answer_text = re.sub(r'\s*(Relevant|Irrelevant)\s*(Fully Supported|Partially Supported|No Support)\s*\d', '', final_answer_text).strip()
    final_answer_text = final_answer_text.replace("<|END", "").strip()

    # 4. Present the Results
    print("\n\n#####################################################", flush=True)
    print("            FINAL SELF-RAG ANSWER                 ", flush=True)
    print("#####################################################\n", flush=True)

    print("--- ANSWER ---", flush=True)
    print(final_answer_text, flush=True)
    print("\n#####################################################", flush=True)

else:
    print("\n EXECUTION FAILED or final state was not captured.", flush=True)

--- Rerunning stream to answer the new query ---
Decision: Retrieval required.
{'initial_decision': {'query': 'What does the ICH ER6 guideline say about Quality Assurance and Quality Control?', 'retrieved_docs': [], 'critique_score': 0.0, 'segment_count': 0, 'finish_generation': False}}

--- NODE TRANSITION ---

Retrieving documents for: 'What does the ICH ER6 guideline say about Quality ...'
{'retrieve_docs': {'retrieved_docs': [Document(id='7908cddc-7be4-46e8-b8dd-1109af8da414', metadata={'source': 'ICH_E6(R3)_Guideline.pdf', 'page': 84, 'chunk_id': 'P84-T0'}, page_content='ICH E6(R3) Guideline \n \n77 \n \nQuality Assurance (QA) \n \nAll those planned and systematic actions that are established to ensure that the trial is performed \nand the data are generated, documented (recorded) and reported in compliance with GCP and \nthe applicable regulatory requirement(s). \n \nQuality Control (QC) \n \nThe operational techniques and activities undertaken to verify that the requirements for quality \nof the trial-related activities have been fulfilled. \n \nRandomisation \n \nThe process of deliberately including an element of chance when assigning participants to \ngroups that receive different treatments in order to reduce bias. \n \nReference Safety Information (RSI) \n \nContains a cumulative list of ADRs that are expected for the investigational product being \nadministered to participants in a clinical trial. The RSI is included in the Investigator’s \nBrochure or alternative documents according to applicable regulatory requirements. Refer to'), Document(id='33d0ab06-cc53-4d46-80bb-42bcf7069aed', metadata={'source': 'ICH_E6(R3)_Guideline.pdf', 'page': 36, 'chunk_id': 'P36-T0'}, page_content='ICH E6(R3) Guideline \n \n29 \n \n3.10.1.5 Risk Review  \n \nThe sponsor should periodically review risk control measures to ascertain whether the \nimplemented quality management activities remain effective and relevant, taking into \naccount emerging knowledge and experience. Additional risk control measures may \nbe implemented as needed. \n \n3.10.1.6 Risk Reporting \n \nThe sponsor should summarise and report important quality issues (including \ninstances in which acceptable ranges are exceeded, as detailed in section 3.10.1.3) and \nthe remedial actions taken and document them in the clinical trial report (see ICH E3).  \n \n3.11 \nQuality Assurance and Quality Control \n \nThe sponsor is responsible for establishing, implementing and maintaining \nappropriate quality assurance and quality control processes and documented \nprocedures to ensure that trials are conducted and data are generated, recorded and \nreported in compliance with the protocol, GCP and the applicable regulatory \nrequirement(s).'), Document(id='8311fdb3-701b-4698-a31d-b1e26efdee8d', metadata={'source': 'ICH_E6(R3)_Guideline.pdf', 'page': 69, 'chunk_id': 'P69-T0'}, page_content='ICH E6(R3) Guideline \n \n62 \n \nB.12 \nQuality Control and Quality Assurance  \n \nB.12.1 Description of identified critical to quality factors, associated risks and risk mitigation \nstrategies in the trial unless documented elsewhere. \n \nB.12.2 Summary of the monitoring approaches that are part of the quality control process for \nthe clinical trial. \n \nB.12.3 Description of the process for the handling of noncompliance with the protocol or \nGCP. \n \nB.13 \nEthics  \n \nDescription of ethical considerations relating to the trial.  \n \nB.14 \nData Handling and Record Keeping  \n \nB.14.1 Specification of data to be collected and the method of its collection. Where necessary, \nadditional details should be contained in a clinical trial-related document.  \n \nB.14.2 The identification of data to be recorded directly into the data acquisition tools (i.e., \nno prior written or electronic record of data) and considered to be the source record.'), Document(id='8d26d7ee-6b6a-4d98-9fd4-f0cd145d7f51', metadata={'source': 'ICH_E6(R3)_Guideline.pdf', 'page': 3, 'chunk_id': 'P3-T0'}, page_content='ICH E6(R3) Guideline \n \ni \n \nICH HARMONISED GUIDELINE \nGUIDELINE FOR GOOD CLINICAL PRACTICE \nE6(R3) \nICH Consensus Guideline \nTABLE OF CONTENTS \nI. \nINTRODUCTION........................................................................................................ 1 \nGuideline Scope ......................................................................................................................... 1 \nGuideline Structure .................................................................................................................... 1 \nII. \nPRINCIPLES OF ICH GCP ....................................................................................... 2 \nIII. \nANNEX 1 ...................................................................................................................... 7 \n1. \nINSTITUTIONAL REVIEW BOARD/INDEPENDENT ETHICS \nCOMMITTEE (IRB/IEC) ........................................................................................... 7 \n1.1'), Document(id='7a470ca1-74f8-4dec-bca6-2d770a96b517', metadata={'source': 'ICH_E6(R3)_Guideline.pdf', 'page': 1, 'chunk_id': 'P1-T0'}, page_content='INTERNATIONAL COUNCIL FOR HARMONISATION OF TECHNICAL \nREQUIREMENTS FOR PHARMACEUTICALS FOR HUMAN USE \n \n \n \nICH HARMONISED GUIDELINE \nGUIDELINE FOR GOOD CLINICAL PRACTICE \nE6(R3) \n \n \nFinal version  \nAdopted on 06 January 2025 \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \nThis Guideline has been developed by the appropriate ICH Expert Working Group and has been \nsubject to consultation by the regulatory parties, in accordance with the ICH Process. At Step 4 \nof the Process the final draft is recommended for adoption to the regulatory bodies of ICH regions.')]}}

--- NODE TRANSITION ---

Generating Segment 1...
  -> ISREL: Relevant, ISSUP: Fully Supported, ISUSE: 5
{'generate_segment': {'generation_history': ['"""\n\n\n## Segment 1:\n\nThe ICH E6(R3) guideline outlines the principles for ensuring that clinical trials are conducted in compliance with Good Clinical Practice (GCP) and applicable regulatory requirements. It defines Quality Assurance (QA) as all planned and systematic actions to ensure compliance with GCP and regulations during the trial and data generation. Quality Control (QC) involves operational techniques and activities to verify that quality requirements for trial-related activities are met.\n\nThe guideline emphasizes the sponsor\'s responsibility to establish, implement, and maintain appropriate QA and QC processes with documented procedures. This ensures that trials are conducted and data are generated, recorded, and reported correctly. The sponsor must periodically review risk control measures to ensure their effectiveness and relevance, considering emerging knowledge and experience. Important quality issues should be summarized and reported in the clinical trial report.\n\nThe guideline also includes sections on risk review, risk reporting, and ethical considerations. It requires specifying data collection methods and identifying data to be recorded directly into data acquisition tools as source records. Additionally, it addresses critical-to-quality factors, associated risks, and risk mitigation strategies, as well as monitoring approaches for quality control.\n\nISREL: Relevant\n\nISSUP: Fully Supported\n\nISUSE: 5\n\n<|END'], 'segment_count': 1, 'finish_generation': True, 'critique_score': 5.0, 'retrieved_docs': [Document(id='7908cddc-7be4-46e8-b8dd-1109af8da414', metadata={'source': 'ICH_E6(R3)_Guideline.pdf', 'page': 84, 'chunk_id': 'P84-T0'}, page_content='ICH E6(R3) Guideline \n \n77 \n \nQuality Assurance (QA) \n \nAll those planned and systematic actions that are established to ensure that the trial is performed \nand the data are generated, documented (recorded) and reported in compliance with GCP and \nthe applicable regulatory requirement(s). \n \nQuality Control (QC) \n \nThe operational techniques and activities undertaken to verify that the requirements for quality \nof the trial-related activities have been fulfilled. \n \nRandomisation \n \nThe process of deliberately including an element of chance when assigning participants to \ngroups that receive different treatments in order to reduce bias. \n \nReference Safety Information (RSI) \n \nContains a cumulative list of ADRs that are expected for the investigational product being \nadministered to participants in a clinical trial. The RSI is included in the Investigator’s \nBrochure or alternative documents according to applicable regulatory requirements. Refer to'), Document(id='33d0ab06-cc53-4d46-80bb-42bcf7069aed', metadata={'source': 'ICH_E6(R3)_Guideline.pdf', 'page': 36, 'chunk_id': 'P36-T0'}, page_content='ICH E6(R3) Guideline \n \n29 \n \n3.10.1.5 Risk Review  \n \nThe sponsor should periodically review risk control measures to ascertain whether the \nimplemented quality management activities remain effective and relevant, taking into \naccount emerging knowledge and experience. Additional risk control measures may \nbe implemented as needed. \n \n3.10.1.6 Risk Reporting \n \nThe sponsor should summarise and report important quality issues (including \ninstances in which acceptable ranges are exceeded, as detailed in section 3.10.1.3) and \nthe remedial actions taken and document them in the clinical trial report (see ICH E3).  \n \n3.11 \nQuality Assurance and Quality Control \n \nThe sponsor is responsible for establishing, implementing and maintaining \nappropriate quality assurance and quality control processes and documented \nprocedures to ensure that trials are conducted and data are generated, recorded and \nreported in compliance with the protocol, GCP and the applicable regulatory \nrequirement(s).'), Document(id='8311fdb3-701b-4698-a31d-b1e26efdee8d', metadata={'source': 'ICH_E6(R3)_Guideline.pdf', 'page': 69, 'chunk_id': 'P69-T0'}, page_content='ICH E6(R3) Guideline \n \n62 \n \nB.12 \nQuality Control and Quality Assurance  \n \nB.12.1 Description of identified critical to quality factors, associated risks and risk mitigation \nstrategies in the trial unless documented elsewhere. \n \nB.12.2 Summary of the monitoring approaches that are part of the quality control process for \nthe clinical trial. \n \nB.12.3 Description of the process for the handling of noncompliance with the protocol or \nGCP. \n \nB.13 \nEthics  \n \nDescription of ethical considerations relating to the trial.  \n \nB.14 \nData Handling and Record Keeping  \n \nB.14.1 Specification of data to be collected and the method of its collection. Where necessary, \nadditional details should be contained in a clinical trial-related document.  \n \nB.14.2 The identification of data to be recorded directly into the data acquisition tools (i.e., \nno prior written or electronic record of data) and considered to be the source record.'), Document(id='8d26d7ee-6b6a-4d98-9fd4-f0cd145d7f51', metadata={'source': 'ICH_E6(R3)_Guideline.pdf', 'page': 3, 'chunk_id': 'P3-T0'}, page_content='ICH E6(R3) Guideline \n \ni \n \nICH HARMONISED GUIDELINE \nGUIDELINE FOR GOOD CLINICAL PRACTICE \nE6(R3) \nICH Consensus Guideline \nTABLE OF CONTENTS \nI. \nINTRODUCTION........................................................................................................ 1 \nGuideline Scope ......................................................................................................................... 1 \nGuideline Structure .................................................................................................................... 1 \nII. \nPRINCIPLES OF ICH GCP ....................................................................................... 2 \nIII. \nANNEX 1 ...................................................................................................................... 7 \n1. \nINSTITUTIONAL REVIEW BOARD/INDEPENDENT ETHICS \nCOMMITTEE (IRB/IEC) ........................................................................................... 7 \n1.1'), Document(id='7a470ca1-74f8-4dec-bca6-2d770a96b517', metadata={'source': 'ICH_E6(R3)_Guideline.pdf', 'page': 1, 'chunk_id': 'P1-T0'}, page_content='INTERNATIONAL COUNCIL FOR HARMONISATION OF TECHNICAL \nREQUIREMENTS FOR PHARMACEUTICALS FOR HUMAN USE \n \n \n \nICH HARMONISED GUIDELINE \nGUIDELINE FOR GOOD CLINICAL PRACTICE \nE6(R3) \n \n \nFinal version  \nAdopted on 06 January 2025 \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \nThis Guideline has been developed by the appropriate ICH Expert Working Group and has been \nsubject to consultation by the regulatory parties, in accordance with the ICH Process. At Step 4 \nof the Process the final draft is recommended for adoption to the regulatory bodies of ICH regions.')]}}

--- NODE TRANSITION ---


--- FINAL ANSWER ---
"""


## Segment 1:

The ICH E6(R3) guideline outlines the principles for ensuring that clinical trials are conducted in compliance with Good Clinical Practice (GCP) and applicable regulatory requirements. It defines Quality Assurance (QA) as all planned and systematic actions to ensure compliance with GCP and regulations during the trial and data generation. Quality Control (QC) involves operational techniques and activities to verify that quality requirements for trial-related activities are met.

The guideline emphasizes the sponsor's responsibility to establish, implement, and maintain appropriate QA and QC processes with documented procedures. This ensures that trials are conducted and data are generated, recorded, and reported correctly. The sponsor must periodically review risk control measures to ensure their effectiveness and relevance, considering emerging knowledge and experience. Important quality issues should be summarized and reported in the clinical trial report.

The guideline also includes sections on risk review, risk reporting, and ethical considerations. It requires specifying data collection methods and identifying data to be recorded directly into data acquisition tools as source records. Additionally, it addresses critical-to-quality factors, associated risks, and risk mitigation strategies, as well as monitoring approaches for quality control.

ISREL: Relevant

ISSUP: Fully Supported

ISUSE: 5

<|END
{'finalize_answer': {'query': 'What does the ICH ER6 guideline say about Quality Assurance and Quality Control?', 'retrieved_docs': [Document(id='7908cddc-7be4-46e8-b8dd-1109af8da414', metadata={'source': 'ICH_E6(R3)_Guideline.pdf', 'page': 84, 'chunk_id': 'P84-T0'}, page_content='ICH E6(R3) Guideline \n \n77 \n \nQuality Assurance (QA) \n \nAll those planned and systematic actions that are established to ensure that the trial is performed \nand the data are generated, documented (recorded) and reported in compliance with GCP and \nthe applicable regulatory requirement(s). \n \nQuality Control (QC) \n \nThe operational techniques and activities undertaken to verify that the requirements for quality \nof the trial-related activities have been fulfilled. \n \nRandomisation \n \nThe process of deliberately including an element of chance when assigning participants to \ngroups that receive different treatments in order to reduce bias. \n \nReference Safety Information (RSI) \n \nContains a cumulative list of ADRs that are expected for the investigational product being \nadministered to participants in a clinical trial. The RSI is included in the Investigator’s \nBrochure or alternative documents according to applicable regulatory requirements. Refer to'), Document(id='33d0ab06-cc53-4d46-80bb-42bcf7069aed', metadata={'source': 'ICH_E6(R3)_Guideline.pdf', 'page': 36, 'chunk_id': 'P36-T0'}, page_content='ICH E6(R3) Guideline \n \n29 \n \n3.10.1.5 Risk Review  \n \nThe sponsor should periodically review risk control measures to ascertain whether the \nimplemented quality management activities remain effective and relevant, taking into \naccount emerging knowledge and experience. Additional risk control measures may \nbe implemented as needed. \n \n3.10.1.6 Risk Reporting \n \nThe sponsor should summarise and report important quality issues (including \ninstances in which acceptable ranges are exceeded, as detailed in section 3.10.1.3) and \nthe remedial actions taken and document them in the clinical trial report (see ICH E3).  \n \n3.11 \nQuality Assurance and Quality Control \n \nThe sponsor is responsible for establishing, implementing and maintaining \nappropriate quality assurance and quality control processes and documented \nprocedures to ensure that trials are conducted and data are generated, recorded and \nreported in compliance with the protocol, GCP and the applicable regulatory \nrequirement(s).'), Document(id='8311fdb3-701b-4698-a31d-b1e26efdee8d', metadata={'source': 'ICH_E6(R3)_Guideline.pdf', 'page': 69, 'chunk_id': 'P69-T0'}, page_content='ICH E6(R3) Guideline \n \n62 \n \nB.12 \nQuality Control and Quality Assurance  \n \nB.12.1 Description of identified critical to quality factors, associated risks and risk mitigation \nstrategies in the trial unless documented elsewhere. \n \nB.12.2 Summary of the monitoring approaches that are part of the quality control process for \nthe clinical trial. \n \nB.12.3 Description of the process for the handling of noncompliance with the protocol or \nGCP. \n \nB.13 \nEthics  \n \nDescription of ethical considerations relating to the trial.  \n \nB.14 \nData Handling and Record Keeping  \n \nB.14.1 Specification of data to be collected and the method of its collection. Where necessary, \nadditional details should be contained in a clinical trial-related document.  \n \nB.14.2 The identification of data to be recorded directly into the data acquisition tools (i.e., \nno prior written or electronic record of data) and considered to be the source record.'), Document(id='8d26d7ee-6b6a-4d98-9fd4-f0cd145d7f51', metadata={'source': 'ICH_E6(R3)_Guideline.pdf', 'page': 3, 'chunk_id': 'P3-T0'}, page_content='ICH E6(R3) Guideline \n \ni \n \nICH HARMONISED GUIDELINE \nGUIDELINE FOR GOOD CLINICAL PRACTICE \nE6(R3) \nICH Consensus Guideline \nTABLE OF CONTENTS \nI. \nINTRODUCTION........................................................................................................ 1 \nGuideline Scope ......................................................................................................................... 1 \nGuideline Structure .................................................................................................................... 1 \nII. \nPRINCIPLES OF ICH GCP ....................................................................................... 2 \nIII. \nANNEX 1 ...................................................................................................................... 7 \n1. \nINSTITUTIONAL REVIEW BOARD/INDEPENDENT ETHICS \nCOMMITTEE (IRB/IEC) ........................................................................................... 7 \n1.1'), Document(id='7a470ca1-74f8-4dec-bca6-2d770a96b517', metadata={'source': 'ICH_E6(R3)_Guideline.pdf', 'page': 1, 'chunk_id': 'P1-T0'}, page_content='INTERNATIONAL COUNCIL FOR HARMONISATION OF TECHNICAL \nREQUIREMENTS FOR PHARMACEUTICALS FOR HUMAN USE \n \n \n \nICH HARMONISED GUIDELINE \nGUIDELINE FOR GOOD CLINICAL PRACTICE \nE6(R3) \n \n \nFinal version  \nAdopted on 06 January 2025 \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \nThis Guideline has been developed by the appropriate ICH Expert Working Group and has been \nsubject to consultation by the regulatory parties, in accordance with the ICH Process. At Step 4 \nof the Process the final draft is recommended for adoption to the regulatory bodies of ICH regions.')], 'generation_history': ['"""\n\n\n## Segment 1:\n\nThe ICH E6(R3) guideline outlines the principles for ensuring that clinical trials are conducted in compliance with Good Clinical Practice (GCP) and applicable regulatory requirements. It defines Quality Assurance (QA) as all planned and systematic actions to ensure compliance with GCP and regulations during the trial and data generation. Quality Control (QC) involves operational techniques and activities to verify that quality requirements for trial-related activities are met.\n\nThe guideline emphasizes the sponsor\'s responsibility to establish, implement, and maintain appropriate QA and QC processes with documented procedures. This ensures that trials are conducted and data are generated, recorded, and reported correctly. The sponsor must periodically review risk control measures to ensure their effectiveness and relevance, considering emerging knowledge and experience. Important quality issues should be summarized and reported in the clinical trial report.\n\nThe guideline also includes sections on risk review, risk reporting, and ethical considerations. It requires specifying data collection methods and identifying data to be recorded directly into data acquisition tools as source records. Additionally, it addresses critical-to-quality factors, associated risks, and risk mitigation strategies, as well as monitoring approaches for quality control.\n\nISREL: Relevant\n\nISSUP: Fully Supported\n\nISUSE: 5\n\n<|END'], 'critique_score': 5.0, 'segment_count': 1, 'finish_generation': True}}

--- NODE TRANSITION ---



#####################################################
           ✅ FINAL SELF-RAG ANSWER ✅                 
#####################################################

--- ANSWER ---
"""


## Segment 1:

The ICH E6(R3) guideline outlines the principles for ensuring that clinical trials are conducted in compliance with Good Clinical Practice (GCP) and applicable regulatory requirements. It defines Quality Assurance (QA) as all planned and systematic actions to ensure compliance with GCP and regulations during the trial and data generation. Quality Control (QC) involves operational techniques and activities to verify that quality requirements for trial-related activities are met.

The guideline emphasizes the sponsor's responsibility to establish, implement, and maintain appropriate QA and QC processes with documented procedures. This ensures that trials are conducted and data are generated, recorded, and reported correctly. The sponsor must periodically review risk control measures to ensure their effectiveness and relevance, considering emerging knowledge and experience. Important quality issues should be summarized and reported in the clinical trial report.

The guideline also includes sections on risk review, risk reporting, and ethical considerations. It requires specifying data collection methods and identifying data to be recorded directly into data acquisition tools as source records. Additionally, it addresses critical-to-quality factors, associated risks, and risk mitigation strategies, as well as monitoring approaches for quality control.

ISREL: Relevant

ISSUP: Fully Supported

ISUSE: 5

#####################################################

In [ ]:

Copied!





# USER QUERY: According to EFPIA 2024 data on multiple inspections at manufacturing sites, which countries recorded the highest inspection counts per site, and what does this reveal about their regulatory significance?
# 1. Define the query and inputs for the graph execution
user_query = "According to EFPIA 2024 data on multiple inspections at manufacturing sites, which countries recorded the highest inspection counts per site, and what does this reveal about their regulatory significance?"
inputs = {"query": user_query, "generation_history": []}

# 2. Rerun stream and capture the final state
print("--- Rerunning stream to answer the new combined query ---", flush=True)

final_state = None
# This loop runs the graph and prints the state update after each node completes.
for step in app.stream(inputs):
    print(step, flush=True)
    print("\n--- NODE TRANSITION ---\n", flush=True)
    # Capture the last yielded state (which contains the compiled final history)
    for key, value in step.items():
        if key != END:
            final_state = value

# 3. Extract and Format the Final Answer
if final_state and "generation_history" in final_state:
    # Join all generated segments (which should now be clean)
    final_answer_text = "\n".join(final_state["generation_history"]).strip()
    
    # Run a final cleanup pass
    final_answer_text = re.sub(r'\s*(Relevant|Irrelevant)\s*(Fully Supported|Partially Supported|No Support)\s*\d', '', final_answer_text).strip()
    final_answer_text = final_answer_text.replace("<|END", "").strip()

    # 4. Present the Results
    print("\n\n#####################################################", flush=True)
    print("            FINAL SELF-RAG ANSWER                  ", flush=True)
    print("#####################################################\n", flush=True)

    print("--- ANSWER ---", flush=True)
    print(final_answer_text, flush=True)
    print("\n#####################################################", flush=True)

else:
    print("\n EXECUTION FAILED or final state was not captured.", flush=True)
# USER QUERY: According to EFPIA 2024 data on multiple inspections at manufacturing sites, which countries recorded the highest inspection counts per site, and what does this reveal about their regulatory significance?
# 1. Define the query and inputs for the graph execution
user_query = "According to EFPIA 2024 data on multiple inspections at manufacturing sites, which countries recorded the highest inspection counts per site, and what does this reveal about their regulatory significance?"
inputs = {"query": user_query, "generation_history": []}

# 2. Rerun stream and capture the final state
print("--- Rerunning stream to answer the new combined query ---", flush=True)

final_state = None
# This loop runs the graph and prints the state update after each node completes.
for step in app.stream(inputs):
    print(step, flush=True)
    print("\n--- NODE TRANSITION ---\n", flush=True)
    # Capture the last yielded state (which contains the compiled final history)
    for key, value in step.items():
        if key != END:
            final_state = value

# 3. Extract and Format the Final Answer
if final_state and "generation_history" in final_state:
    # Join all generated segments (which should now be clean)
    final_answer_text = "\n".join(final_state["generation_history"]).strip()
    
    # Run a final cleanup pass
    final_answer_text = re.sub(r'\s*(Relevant|Irrelevant)\s*(Fully Supported|Partially Supported|No Support)\s*\d', '', final_answer_text).strip()
    final_answer_text = final_answer_text.replace("<|END", "").strip()

    # 4. Present the Results
    print("\n\n#####################################################", flush=True)
    print("            FINAL SELF-RAG ANSWER                  ", flush=True)
    print("#####################################################\n", flush=True)

    print("--- ANSWER ---", flush=True)
    print(final_answer_text, flush=True)
    print("\n#####################################################", flush=True)

else:
    print("\n EXECUTION FAILED or final state was not captured.", flush=True)

--- Rerunning stream to answer the new combined query ---
Decision: Retrieval required.
{'initial_decision': {'query': 'According to EFPIA 2024 data on multiple inspections at manufacturing sites, which countries recorded the highest inspection counts per site, and what does this reveal about their regulatory significance?', 'retrieved_docs': [], 'critique_score': 0.0, 'segment_count': 0, 'finish_generation': False}}

--- NODE TRANSITION ---

Retrieving documents for: 'According to EFPIA 2024 data on multiple inspectio...'
{'retrieve_docs': {'retrieved_docs': [Document(id='6f221e35-3bae-4e4d-a47f-8e6df714b254', metadata={'source': 'inspection_survey.pdf', 'page': 44, 'chunk_id': 'P44-T0'}, page_content='44\nEFPIA ANNUAL INSPECTION SURVEY - 2024 DATA - PUBLIC VERSION\nMultiple inspections at one manufacturing site (6 and more)\nINSPECTIONS AT MANUFACTURING SITES\n.\n.\nUS\n• AMA\n• Tanzania\n• Brazil\n• Russia\n• Türkiye\nGermany\n• US-FDA (2)\n• Belarus\n• Türkiye\nFrance\n• US-FDA\n• Türkiye\n• Rep. of Korea\n• China\n• EAEU\nDenmark\n• Japan (4)\n• Brazil (2)\n• US-FDA\n• Türkiye\n• Kenya\n• Chinese Taipei\n• Rep. of Korea\nGermany\n• Japan\n• Türkiye\n• Belarus\nGermany\n• China\n• Russia\n• Türkiye\n• Libya\nUS\n• Japan\nGermany\n• Russia\nDenmark\n• Japan (3)\n• Rep. of Korea (3)\n• Brazil \n• Chinese Taipei\n• Türkiye\n• Kenya\nBrazil\n• Mexico\nMost sites exposed to multiple foreign \ninspections are in Germany (4) and Denmark (2)'), Document(id='ca2bdd03-24e9-4ff7-bb79-1e94ef40bd87', metadata={'source': 'inspection_survey.pdf', 'page': 5, 'chunk_id': 'P5-T0'}, page_content='5\nEFPIA ANNUAL INSPECTION SURVEY - 2024 DATA - PUBLIC VERSION\nNumber of foreign inspections at manufacturing sites \n(EU as one entity; all inspection types and modes)\nEFPIA’S ANNUAL INSPECTION SURVEY - DATA\n> 28\n*Inspectorate is a PIC/S participating authority   **PIC/S Applicant     ***PIC/S Pre-Applicant\nPerformed by\n\uf0b3 9\n+ 9 countries with one  foreign inspection\n[same level as last years]'), Document(id='695b6077-e2cc-46a7-929f-22e68f24c037', metadata={'source': 'inspection_survey.pdf', 'page': 9, 'chunk_id': 'P9-T0'}, page_content='9\nEFPIA ANNUAL INSPECTION SURVEY - 2024 DATA - PUBLIC VERSION\nOversight\n80% of foreign inspections are in EU/EEA, US and Switzerland; \nthis demonstrates where research-based industry is manufacturing \nEU/EEA (50%), US (27%), Switzerland (5%), China (4%), Singapore (3%), \nand Brazil (3%)\nFor the EU/EEA it’s mainly in Germany, France, Belgium, Denmark, \nIreland, Italy\nSites most exposed to multiple foreign inspections are in Germany (4) \nand Denmark (2)\nNumber of foreign inspections at manufacturing sites post pandemic is \nslightly decreasing to the baseline before the pandemic\nIndividual sites receive more foreign inspections - higher than before the \npandemic and highest ever\nForeign inspections at manufacturing sites\nEFPIA’S ANNUAL INSPECTION SURVEY - DATA\nManufacturing'), Document(id='36885416-d5ff-49cf-84e8-de7320c30d56', metadata={'source': 'inspection_survey.pdf', 'page': 50, 'chunk_id': 'P50-T0'}, page_content='50\nEFPIA ANNUAL INSPECTION SURVEY - 2024 DATA - PUBLIC VERSION\nChina, EU / EEA, Japan, US, Pakistan and Brazil are the countries with the \nmost inspections by sites for local / regional supply \nInspections in country at sites reported for \ndomestic or regional supply only – several years history\nINSPECTIONS AT MANUFACTURING SITES - SUPPLY'), Document(id='1d87fb73-27c4-4f08-aee5-66d5a74e8a04', metadata={'source': 'inspection_survey.pdf', 'page': 59, 'chunk_id': 'P59-T0'}, page_content='59\nEFPIA ANNUAL INSPECTION SURVEY - 2024 DATA - PUBLIC VERSION\nEven if FDA is doing announced foreign inspection companies report to have \nmore inspections with follow up in 3rd countries than for sites in the US\nRate of inspections with follow-up actions\nINSPECTIONS AT MANUFACTURING SITES')]}}

--- NODE TRANSITION ---

Generating Segment 1...
  -> ISREL: Relevant, ISSUP: Fully Supported, ISUSE: 5
{'generate_segment': {'generation_history': ['SEGMENT:\nThe countries with the highest inspection counts per manufacturing site, according to the EFPIA 2024 data, are Germany and Denmark, each with four multiple inspections at their sites. This indicates significant regulatory scrutiny and importance in the pharmaceutical manufacturing sector. Germany stands out with additional sites also facing inspections from Belarus, Türkiye, Russia, and the US-FDA, while Denmark has inspections from Japan, Brazil, US-FDA, Türkiye, Kenya, Chinese Taipei, and the Rep. of Korea. The high number of inspections suggests that these countries play crucial roles in global pharmaceutical manufacturing oversight, likely due to their central positions in the industry and stringent regulatory environments.\n\nISREL: Relevant\nISSUP: Fully Supported\nISUSE: 5\n\n<|END'], 'segment_count': 1, 'finish_generation': True, 'critique_score': 5.0, 'retrieved_docs': [Document(id='6f221e35-3bae-4e4d-a47f-8e6df714b254', metadata={'source': 'inspection_survey.pdf', 'page': 44, 'chunk_id': 'P44-T0'}, page_content='44\nEFPIA ANNUAL INSPECTION SURVEY - 2024 DATA - PUBLIC VERSION\nMultiple inspections at one manufacturing site (6 and more)\nINSPECTIONS AT MANUFACTURING SITES\n.\n.\nUS\n• AMA\n• Tanzania\n• Brazil\n• Russia\n• Türkiye\nGermany\n• US-FDA (2)\n• Belarus\n• Türkiye\nFrance\n• US-FDA\n• Türkiye\n• Rep. of Korea\n• China\n• EAEU\nDenmark\n• Japan (4)\n• Brazil (2)\n• US-FDA\n• Türkiye\n• Kenya\n• Chinese Taipei\n• Rep. of Korea\nGermany\n• Japan\n• Türkiye\n• Belarus\nGermany\n• China\n• Russia\n• Türkiye\n• Libya\nUS\n• Japan\nGermany\n• Russia\nDenmark\n• Japan (3)\n• Rep. of Korea (3)\n• Brazil \n• Chinese Taipei\n• Türkiye\n• Kenya\nBrazil\n• Mexico\nMost sites exposed to multiple foreign \ninspections are in Germany (4) and Denmark (2)'), Document(id='ca2bdd03-24e9-4ff7-bb79-1e94ef40bd87', metadata={'source': 'inspection_survey.pdf', 'page': 5, 'chunk_id': 'P5-T0'}, page_content='5\nEFPIA ANNUAL INSPECTION SURVEY - 2024 DATA - PUBLIC VERSION\nNumber of foreign inspections at manufacturing sites \n(EU as one entity; all inspection types and modes)\nEFPIA’S ANNUAL INSPECTION SURVEY - DATA\n> 28\n*Inspectorate is a PIC/S participating authority   **PIC/S Applicant     ***PIC/S Pre-Applicant\nPerformed by\n\uf0b3 9\n+ 9 countries with one  foreign inspection\n[same level as last years]'), Document(id='695b6077-e2cc-46a7-929f-22e68f24c037', metadata={'source': 'inspection_survey.pdf', 'page': 9, 'chunk_id': 'P9-T0'}, page_content='9\nEFPIA ANNUAL INSPECTION SURVEY - 2024 DATA - PUBLIC VERSION\nOversight\n80% of foreign inspections are in EU/EEA, US and Switzerland; \nthis demonstrates where research-based industry is manufacturing \nEU/EEA (50%), US (27%), Switzerland (5%), China (4%), Singapore (3%), \nand Brazil (3%)\nFor the EU/EEA it’s mainly in Germany, France, Belgium, Denmark, \nIreland, Italy\nSites most exposed to multiple foreign inspections are in Germany (4) \nand Denmark (2)\nNumber of foreign inspections at manufacturing sites post pandemic is \nslightly decreasing to the baseline before the pandemic\nIndividual sites receive more foreign inspections - higher than before the \npandemic and highest ever\nForeign inspections at manufacturing sites\nEFPIA’S ANNUAL INSPECTION SURVEY - DATA\nManufacturing'), Document(id='36885416-d5ff-49cf-84e8-de7320c30d56', metadata={'source': 'inspection_survey.pdf', 'page': 50, 'chunk_id': 'P50-T0'}, page_content='50\nEFPIA ANNUAL INSPECTION SURVEY - 2024 DATA - PUBLIC VERSION\nChina, EU / EEA, Japan, US, Pakistan and Brazil are the countries with the \nmost inspections by sites for local / regional supply \nInspections in country at sites reported for \ndomestic or regional supply only – several years history\nINSPECTIONS AT MANUFACTURING SITES - SUPPLY'), Document(id='1d87fb73-27c4-4f08-aee5-66d5a74e8a04', metadata={'source': 'inspection_survey.pdf', 'page': 59, 'chunk_id': 'P59-T0'}, page_content='59\nEFPIA ANNUAL INSPECTION SURVEY - 2024 DATA - PUBLIC VERSION\nEven if FDA is doing announced foreign inspection companies report to have \nmore inspections with follow up in 3rd countries than for sites in the US\nRate of inspections with follow-up actions\nINSPECTIONS AT MANUFACTURING SITES')]}}

--- NODE TRANSITION ---


--- FINAL ANSWER ---
SEGMENT:
The countries with the highest inspection counts per manufacturing site, according to the EFPIA 2024 data, are Germany and Denmark, each with four multiple inspections at their sites. This indicates significant regulatory scrutiny and importance in the pharmaceutical manufacturing sector. Germany stands out with additional sites also facing inspections from Belarus, Türkiye, Russia, and the US-FDA, while Denmark has inspections from Japan, Brazil, US-FDA, Türkiye, Kenya, Chinese Taipei, and the Rep. of Korea. The high number of inspections suggests that these countries play crucial roles in global pharmaceutical manufacturing oversight, likely due to their central positions in the industry and stringent regulatory environments.

ISREL: Relevant
ISSUP: Fully Supported
ISUSE: 5

<|END
{'finalize_answer': {'query': 'According to EFPIA 2024 data on multiple inspections at manufacturing sites, which countries recorded the highest inspection counts per site, and what does this reveal about their regulatory significance?', 'retrieved_docs': [Document(id='6f221e35-3bae-4e4d-a47f-8e6df714b254', metadata={'source': 'inspection_survey.pdf', 'page': 44, 'chunk_id': 'P44-T0'}, page_content='44\nEFPIA ANNUAL INSPECTION SURVEY - 2024 DATA - PUBLIC VERSION\nMultiple inspections at one manufacturing site (6 and more)\nINSPECTIONS AT MANUFACTURING SITES\n.\n.\nUS\n• AMA\n• Tanzania\n• Brazil\n• Russia\n• Türkiye\nGermany\n• US-FDA (2)\n• Belarus\n• Türkiye\nFrance\n• US-FDA\n• Türkiye\n• Rep. of Korea\n• China\n• EAEU\nDenmark\n• Japan (4)\n• Brazil (2)\n• US-FDA\n• Türkiye\n• Kenya\n• Chinese Taipei\n• Rep. of Korea\nGermany\n• Japan\n• Türkiye\n• Belarus\nGermany\n• China\n• Russia\n• Türkiye\n• Libya\nUS\n• Japan\nGermany\n• Russia\nDenmark\n• Japan (3)\n• Rep. of Korea (3)\n• Brazil \n• Chinese Taipei\n• Türkiye\n• Kenya\nBrazil\n• Mexico\nMost sites exposed to multiple foreign \ninspections are in Germany (4) and Denmark (2)'), Document(id='ca2bdd03-24e9-4ff7-bb79-1e94ef40bd87', metadata={'source': 'inspection_survey.pdf', 'page': 5, 'chunk_id': 'P5-T0'}, page_content='5\nEFPIA ANNUAL INSPECTION SURVEY - 2024 DATA - PUBLIC VERSION\nNumber of foreign inspections at manufacturing sites \n(EU as one entity; all inspection types and modes)\nEFPIA’S ANNUAL INSPECTION SURVEY - DATA\n> 28\n*Inspectorate is a PIC/S participating authority   **PIC/S Applicant     ***PIC/S Pre-Applicant\nPerformed by\n\uf0b3 9\n+ 9 countries with one  foreign inspection\n[same level as last years]'), Document(id='695b6077-e2cc-46a7-929f-22e68f24c037', metadata={'source': 'inspection_survey.pdf', 'page': 9, 'chunk_id': 'P9-T0'}, page_content='9\nEFPIA ANNUAL INSPECTION SURVEY - 2024 DATA - PUBLIC VERSION\nOversight\n80% of foreign inspections are in EU/EEA, US and Switzerland; \nthis demonstrates where research-based industry is manufacturing \nEU/EEA (50%), US (27%), Switzerland (5%), China (4%), Singapore (3%), \nand Brazil (3%)\nFor the EU/EEA it’s mainly in Germany, France, Belgium, Denmark, \nIreland, Italy\nSites most exposed to multiple foreign inspections are in Germany (4) \nand Denmark (2)\nNumber of foreign inspections at manufacturing sites post pandemic is \nslightly decreasing to the baseline before the pandemic\nIndividual sites receive more foreign inspections - higher than before the \npandemic and highest ever\nForeign inspections at manufacturing sites\nEFPIA’S ANNUAL INSPECTION SURVEY - DATA\nManufacturing'), Document(id='36885416-d5ff-49cf-84e8-de7320c30d56', metadata={'source': 'inspection_survey.pdf', 'page': 50, 'chunk_id': 'P50-T0'}, page_content='50\nEFPIA ANNUAL INSPECTION SURVEY - 2024 DATA - PUBLIC VERSION\nChina, EU / EEA, Japan, US, Pakistan and Brazil are the countries with the \nmost inspections by sites for local / regional supply \nInspections in country at sites reported for \ndomestic or regional supply only – several years history\nINSPECTIONS AT MANUFACTURING SITES - SUPPLY'), Document(id='1d87fb73-27c4-4f08-aee5-66d5a74e8a04', metadata={'source': 'inspection_survey.pdf', 'page': 59, 'chunk_id': 'P59-T0'}, page_content='59\nEFPIA ANNUAL INSPECTION SURVEY - 2024 DATA - PUBLIC VERSION\nEven if FDA is doing announced foreign inspection companies report to have \nmore inspections with follow up in 3rd countries than for sites in the US\nRate of inspections with follow-up actions\nINSPECTIONS AT MANUFACTURING SITES')], 'generation_history': ['SEGMENT:\nThe countries with the highest inspection counts per manufacturing site, according to the EFPIA 2024 data, are Germany and Denmark, each with four multiple inspections at their sites. This indicates significant regulatory scrutiny and importance in the pharmaceutical manufacturing sector. Germany stands out with additional sites also facing inspections from Belarus, Türkiye, Russia, and the US-FDA, while Denmark has inspections from Japan, Brazil, US-FDA, Türkiye, Kenya, Chinese Taipei, and the Rep. of Korea. The high number of inspections suggests that these countries play crucial roles in global pharmaceutical manufacturing oversight, likely due to their central positions in the industry and stringent regulatory environments.\n\nISREL: Relevant\nISSUP: Fully Supported\nISUSE: 5\n\n<|END'], 'critique_score': 5.0, 'segment_count': 1, 'finish_generation': True}}

--- NODE TRANSITION ---



#####################################################
           ✅ FINAL SELF-RAG ANSWER ✅                 
#####################################################

--- ANSWER ---
SEGMENT:
The countries with the highest inspection counts per manufacturing site, according to the EFPIA 2024 data, are Germany and Denmark, each with four multiple inspections at their sites. This indicates significant regulatory scrutiny and importance in the pharmaceutical manufacturing sector. Germany stands out with additional sites also facing inspections from Belarus, Türkiye, Russia, and the US-FDA, while Denmark has inspections from Japan, Brazil, US-FDA, Türkiye, Kenya, Chinese Taipei, and the Rep. of Korea. The high number of inspections suggests that these countries play crucial roles in global pharmaceutical manufacturing oversight, likely due to their central positions in the industry and stringent regulatory environments.

ISREL: Relevant
ISSUP: Fully Supported
ISUSE: 5

#####################################################

The self-reflective retrieval augmented generation setup in this tutorial offers major advantages over standard RAG, mainly in terms of reliability and smart efficiency. Its biggest strength is improved factual accuracy and traceability, made possible by the Granite LLM running its own self-critiques with reflection tokens. These critiques produce a score that guides the workflow, allowing adaptive retrieval and the model only pulls new context when a segment isn’t well supported. This approach also makes it easier to work with complex, multi-modal documents, since image captions can be added to the vector store. The result is a more trustworthy, flexible query agent that checks and segments its answers against the knowledge base before giving the final result.

Build a self-RAG agent with IBM granite LLMs: A practical guide¶

How self-RAG works¶

1. On-demand retrieval¶

2. Passage retrieval and generation¶

3. Generate and reflect on retrieved passages¶

4. Inference¶

5. Training the self-RAG¶

Use case: Building a self-RAG query agent over multi-modal documents¶

Prerequisites¶

Steps¶

Step 1. Set up your environment¶

Step 2. Set up watsonx.ai runtime service and API key¶

Step 3. Installation of the packages¶

Step 4. Import required libraries¶

Step 5. Load watsonx credentials¶

Step 6. Initialize models¶

Step 7. PDF data retrieval from cloud object storage¶

Step 8. Multi-modal PDF parsing and captioning¶

Step 9. Indexing and retriever setup¶

Step 10. LangGraph state and core self-RAG logic¶

Step 11. LangGraph state and core self-RAG logic¶