Multi-agent RAG with AutoGen: Build locally with Granite¶
Authors: Kelly Abuelsaad, Anna Gutowska
Can you build agentic workflows without needing extremely large, costly large language models (LLMs)? The answer is yes. In this tutorial, we will demonstrate how to build a multi-agent RAG system locally with AutoGen by using IBM® Granite™.
Agentic RAG overview¶
Retrieval-augmented generation (RAG) is an effective way of providing an LLM with additional datasets from various data sources without the need for expensive fine-tuning. Similarly, agentic RAG leverages an AI agent’s ability to plan and execute subtasks along with the retrieval of relevant information to supplement an LLM's knowledge base. This ability allows for the optimization and greater scalability of RAG applications compared to traditional chatbots. No longer do we need to write complex SQL queries to extract relevant data from a knowledge base.
The future of agentic RAG is multi-agent RAG, where several specialized agents collaborate to achieve optimal latency and efficiency. We will demonstrate this collaboration by using a small, efficient model such as Granite 3.2 and combining it with a modular agent architecture. We will use multiple specialized "mini agents" that collaborate to achieve tasks through adaptive planning and tool or function calling. Like humans, a team of agents, or a multi-agent system, often outperforms the heroic efforts of an individual, especially when they have clearly defined roles and effective communication.
For the orchestration of this collaboration, we can use AutoGen (AG2) as the core framework to manage workflows and decision-making, alongside other tools like Ollama for local LLM serving and Open WebUI for interaction. AutoGen is a framework for creating multi-agent AI applications developed by Microsoft.1 Notably, every one of the components leveraged in this tutorial is open source. Notably, every component leveraged in this tutorial is open source. Together, these tools enable you to build an AI system that is both powerful and privacy-conscious, without leaving your laptop.
Multi-agent architecture: When collaboration beats competition¶
Our Granite retrieval agent relies on a modular architecture in which each agent has a specialized role. Like humans, agents perform best when they have targeted instructions and just enough context to make an informed decision. Too much extraneous information, such as an unfiltered chat history, can create a “needle in the haystack” problem, where it becomes increasingly difficult to decipher signal from noise.
In this agentic AI architecture, the agents work together sequentially to achieve the goal. Here is how the generative AI system is organized:
Planner agent: Creates the initial high-level plan, once in the beginning of the workflow. For example, if a user asks, “What are comparable open source projects to the ones my team is using?” then, the agent will put together a step-by-step plan that might look something like this: “1. Search team documents for open source technologies. 2. Search the web for similar open source projects to the ones found in step 1.” If any of these steps fail or provide insufficient results, the steps can be later adapted by the reflection agent.
Research Aasistant: The research assistant is the workhorse of the system. It takes in and executes instructions such as “Search team documents for open source technologies.” For step 1 of the plan, it uses the initial instruction from the planner agent. For subsequent steps, it also receives curated context from the outcomes of previous steps.
For example, if asked to “Search the web for similar open source projects,” it will also receive the output from the previous document search step. Depending on the instruction, the research assistant can use tools like web search or document search, or both, to fulfill its task.
Step critic: The step critic is responsible for deciding whether the output of the previous step satisfactorily fulfilled the instruction it was given. It receives two pieces of information: the single-step instruction that was just executed and the output of that instruction. Having a step critic weigh in on the conversation brings clarity around whether the goal was achieved, which is needed for the planning of the next step.
Goal judge: The goal judge determines whether the ultimate objective has been met, based on all of the requirements of the provided goal, the plans drafted to achieve it, and the information gathered so far. The output of the judge is either "YES" or "NOT YET" followed by a brief explanation that is no longer than one or two sentences.
Reflection agent: The reflection agent is our executive decision-maker. It decides what step to take next, whether that is encroaching onto the next planned step, pivoting course to make up for mishaps or confirming that the goal has been completed. Like a real-life CEO, it performs its best decision-making when it has a clear goal in mind and is presented with concise findings on the progress that has or has not been made to reach that goal. The output of the reflection agent is either the next step to take or the instructions to terminate if the goal has been reached. We present the reflection agent with the following items:
- The goal
- The original plan
- The last step that was executed
- The result of the last step indicating success or failure
- A concise sequence of previously executed instructions (just the instructions, not their output)
Presenting these items in a structured format makes it clear to our decision maker what has been done so that it can decide what needs to happen next.
Report Generator: Once the goal is achieved, the Report Generator synthesizes all findings into a cohesive output that directly answers the original query. While each step in the process generates targeted outputs, the Report Generator ties everything together into a final report.
Leveraging open source tools¶
For beginners, it can be difficult to build an agentic AI application from scratch. Hence, we will use a set of open source tools. The Granite Retrieval Agent integrates multiple tools for agentic RAG.
Open WebUI: The user interacts with the system through an intuitive chat interface hosted in Open WebUI. This interface acts as the primary point for submitting queries (such as “Fetch me the latest news articles pertaining to my project notes”) and viewing the outputs.
Python-based agent (AG2 framework): At the core of the system is a Python-based agent built by using AutoGen (AG2). This agent coordinates the workflow by breaking down tasks and dynamically calling tools to execute steps.
The agent has access to two primary tools:
Document search tool: Fetches relevant information from a vector database containing uploaded project notes or documents stored as embeddings. This vector search leverages the built-in documental retrieval APIs inside Open WebUI, rather than setting up an entirely separate data store.
Web search tool: Performs web-based searches to gather external knowledge and real-time information. In this case, we are using SearXNG as our metasearch engine.
Ollama: The IBM Granite 3.2 LLM serves as the language model powering the system. It is hosted locally by using Ollama, ensuring fast inference, cost efficiency and data privacy. If you are interested in running this project with larger models, API access through IBM watsonx.ai® or OpenAI, for example, is preferred. This approach, however, requires a watsonx.ai or OpenAI API key. Instead, we use locally hosted Ollama in this tutorial.
Other common open source, agent frameworks not covered in this tutorial include LangChain, LangGraph and crewAI.
Steps¶
Detailed setup instructions as well as the entire project can be viewed on the IBM Granite Community GitHub. The Jupyter Notebook version of this tutorial can be found on GitHub as well.
The following steps provide a quick setup for the Granite Retrieval agent.
Step 1: Install Ollama¶
Installing Ollama is as simple as downloading the client from the official Ollama site. After installing Ollama, run the following command to pull the Granite 3.2 LLM.
ollama pull granite3.2:8b
You are now up and running with Ollama and Granite.
Step 2. Build a simple agent (optional)¶
Before we begin the setup of the complete multi-agent RAG project, let’s unpack a simpler example. To continue, set up a Jupyter Notebook in your preferred integrated development environment (IDE) and activate a virtual environment by running the following commands in your terminal.
python3.11 -m venv venv
source venv/bin/activate
We'll need a few libraries and modules for this simple agent. Make sure to install and import the following ones.
!pip install -qU langchain chromadb tf-keras pyautogen "ag2[ollama]" sentence_transformers
import getpass
from autogen.agentchat.contrib.retrieve_assistant_agent import AssistantAgent
from autogen.agentchat.contrib.retrieve_user_proxy_agent import RetrieveUserProxyAgent
There are several configuration parameters to set locally to invoke the correct LLM that we pulled by using Ollama.
ollama_llm_config = {
"config_list": [
{
"model": "granite3.2:8b",
"api_type": "ollama",
}
],
}
We can pass these configuration parameters in the llm_config
parameter of the AssistantAgent
class to instantiate our first AI agent.
assistant = AssistantAgent(
name="assistant",
system_message="You are a helpful assistant.",
llm_config=ollama_llm_config,
)
This agent uses Granite 3.2 to synthesize the information returned by the ragproxyagent
agent. The document we provide to the RAG agent as additional context is the raw README Markdown file found in the AutoGen repository on GitHub. Additionally, we can pass a new dictionary of configurations specific to the retrieval agent. Some additional keys that you might find useful are vector_db
, chunk_token_size
and embedding_model
. For a full list of configuration keys, refer to the official documentation.
ragproxyagent = RetrieveUserProxyAgent(
name="ragproxyagent",
max_consecutive_auto_reply=3,
is_termination_msg=lambda msg: msg.get("content") is not None or "TERMINATE" in msg["content"],
system_message = "Context retrieval assistant.",
retrieve_config={
"task": "qa",
"docs_path": "https://raw.githubusercontent.com/microsoft/autogen/main/README.md",
"get_or_create": True,
"collection_name": "autogen_docs",
"overwrite": True
},
code_execution_config=False,
human_input_mode="NEVER",
)
Now, we can initiate a chat with our RAG agent to ask a question that pertains to the document provided as context.
qs = "What languages does AutoGen support?"
result = ragproxyagent.initiate_chat(
assistant, message=ragproxyagent.message_generator, problem=qs
)
print(result)
Trying to create collection.
2025-07-21 12:20:36,125 - autogen.agentchat.contrib.retrieve_user_proxy_agent - INFO - Found 1 chunks. 2025-07-21 12:20:36,129 - autogen.agentchat.contrib.vectordb.chromadb - INFO - No content embedding is provided. Will use the VectorDB's embedding function to generate the content embedding.
VectorDB returns doc_ids: [['8e9131c7']] Adding content of doc 8e9131c7 to context. ragproxyagent (to assistant): You're a retrieve augmented chatbot. You answer user's questions based on your own knowledge and the context provided by the user. If you can't answer the question with or without the current context, you should reply exactly `UPDATE CONTEXT`. You must give as short an answer as possible. User's question is: What languages does AutoGen support? Context is: <a name="readme-top"></a> <div align="center"> <img src="https://microsoft.github.io/autogen/0.2/img/ag.svg" alt="AutoGen Logo" width="100"> [](https://twitter.com/pyautogen) [](https://www.linkedin.com/company/105812540) [](https://aka.ms/autogen-discord) [](https://microsoft.github.io/autogen/) [](https://devblogs.microsoft.com/autogen/) </div> <div align="center" style="background-color: rgba(255, 235, 59, 0.5); padding: 10px; border-radius: 5px; margin: 20px 0;"> <strong>Important:</strong> This is the official project. We are not affiliated with any fork or startup. See our <a href="https://x.com/pyautogen/status/1857264760951296210">statement</a>. </div> # AutoGen **AutoGen** is a framework for creating multi-agent AI applications that can act autonomously or work alongside humans. ## Installation AutoGen requires **Python 3.10 or later**. ```bash # Install AgentChat and OpenAI client from Extensions pip install -U "autogen-agentchat" "autogen-ext[openai]" ``` The current stable version is v0.4. If you are upgrading from AutoGen v0.2, please refer to the [Migration Guide](https://microsoft.github.io/autogen/stable/user-guide/agentchat-user-guide/migration-guide.html) for detailed instructions on how to update your code and configurations. ```bash # Install AutoGen Studio for no-code GUI pip install -U "autogenstudio" ``` ## Quickstart ### Hello World Create an assistant agent using OpenAI's GPT-4o model. See [other supported models](https://microsoft.github.io/autogen/stable/user-guide/agentchat-user-guide/tutorial/models.html). ```python import asyncio from autogen_agentchat.agents import AssistantAgent from autogen_ext.models.openai import OpenAIChatCompletionClient async def main() -> None: model_client = OpenAIChatCompletionClient(model="gpt-4o") agent = AssistantAgent("assistant", model_client=model_client) print(await agent.run(task="Say 'Hello World!'")) await model_client.close() asyncio.run(main()) ``` ### Web Browsing Agent Team Create a group chat team with a web surfer agent and a user proxy agent for web browsing tasks. You need to install [playwright](https://playwright.dev/python/docs/library). ```python # pip install -U autogen-agentchat autogen-ext[openai,web-surfer] # playwright install import asyncio from autogen_agentchat.agents import UserProxyAgent from autogen_agentchat.conditions import TextMentionTermination from autogen_agentchat.teams import RoundRobinGroupChat from autogen_agentchat.ui import Console from autogen_ext.models.openai import OpenAIChatCompletionClient from autogen_ext.agents.web_surfer import MultimodalWebSurfer async def main() -> None: model_client = OpenAIChatCompletionClient(model="gpt-4o") # The web surfer will open a Chromium browser window to perform web browsing tasks. web_surfer = MultimodalWebSurfer("web_surfer", model_client, headless=False, animate_actions=True) # The user proxy agent is used to get user input after each step of the web surfer. # NOTE: you can skip input by pressing Enter. user_proxy = UserProxyAgent("user_proxy") # The termination condition is set to end the conversation when the user types 'exit'. termination = TextMentionTermination("exit", sources=["user_proxy"]) # Web surfer and user proxy take turns in a round-robin fashion. team = RoundRobinGroupChat([web_surfer, user_proxy], termination_condition=termination) try: # Start the team and wait for it to terminate. await Console(team.run_stream(task="Find information about AutoGen and write a short summary.")) finally: await web_surfer.close() await model_client.close() asyncio.run(main()) ``` ### AutoGen Studio Use AutoGen Studio to prototype and run multi-agent workflows without writing code. ```bash # Run AutoGen Studio on http://localhost:8080 autogenstudio ui --port 8080 --appdir ./my-app ``` ## Why Use AutoGen? <div align="center"> <img src="autogen-landing.jpg" alt="AutoGen Landing" width="500"> </div> The AutoGen ecosystem provides everything you need to create AI agents, especially multi-agent workflows -- framework, developer tools, and applications. The _framework_ uses a layered and extensible design. Layers have clearly divided responsibilities and build on top of layers below. This design enables you to use the framework at different levels of abstraction, from high-level APIs to low-level components. - [Core API](./python/packages/autogen-core/) implements message passing, event-driven agents, and local and distributed runtime for flexibility and power. It also support cross-language support for .NET and Python. - [AgentChat API](./python/packages/autogen-agentchat/) implements a simpler but opinionated API for rapid prototyping. This API is built on top of the Core API and is closest to what users of v0.2 are familiar with and supports common multi-agent patterns such as two-agent chat or group chats. - [Extensions API](./python/packages/autogen-ext/) enables first- and third-party extensions continuously expanding framework capabilities. It support specific implementation of LLM clients (e.g., OpenAI, AzureOpenAI), and capabilities such as code execution. The ecosystem also supports two essential _developer tools_: <div align="center"> <img src="https://media.githubusercontent.com/media/microsoft/autogen/refs/heads/main/python/packages/autogen-studio/docs/ags_screen.png" alt="AutoGen Studio Screenshot" width="500"> </div> - [AutoGen Studio](./python/packages/autogen-studio/) provides a no-code GUI for building multi-agent applications. - [AutoGen Bench](./python/packages/agbench/) provides a benchmarking suite for evaluating agent performance. You can use the AutoGen framework and developer tools to create applications for your domain. For example, [Magentic-One](./python/packages/magentic-one-cli/) is a state-of-the-art multi-agent team built using AgentChat API and Extensions API that can handle a variety of tasks that require web browsing, code execution, and file handling. With AutoGen you get to join and contribute to a thriving ecosystem. We host weekly office hours and talks with maintainers and community. We also have a [Discord server](https://aka.ms/autogen-discord) for real-time chat, GitHub Discussions for Q&A, and a blog for tutorials and updates. ## Where to go next? <div align="center"> | | [](./python) | [](./dotnet) | [](./python/packages/autogen-studio) | | ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- | | Installation | [](https://microsoft.github.io/autogen/stable/user-guide/agentchat-user-guide/installation.html) | [](https://microsoft.github.io/autogen/dotnet/dev/core/installation.html) | [](https://microsoft.github.io/autogen/stable/user-guide/autogenstudio-user-guide/installation.html) | | Quickstart | [](https://microsoft.github.io/autogen/stable/user-guide/agentchat-user-guide/quickstart.html#) | [](https://microsoft.github.io/autogen/dotnet/dev/core/index.html) | [](https://microsoft.github.io/autogen/stable/user-guide/autogenstudio-user-guide/usage.html#) | | Tutorial | [](https://microsoft.github.io/autogen/stable/user-guide/agentchat-user-guide/tutorial/index.html) | [](https://microsoft.github.io/autogen/dotnet/dev/core/tutorial.html) | [](https://microsoft.github.io/autogen/stable/user-guide/autogenstudio-user-guide/usage.html#) | | API Reference | [](https://microsoft.github.io/autogen/stable/reference/index.html#) | [](https://microsoft.github.io/autogen/dotnet/dev/api/Microsoft.AutoGen.Contracts.html) | [](https://microsoft.github.io/autogen/stable/user-guide/autogenstudio-user-guide/usage.html) | | Packages | [](https://pypi.org/project/autogen-core/) <br> [](https://pypi.org/project/autogen-agentchat/) <br> [](https://pypi.org/project/autogen-ext/) | [](https://www.nuget.org/packages/Microsoft.AutoGen.Contracts/) <br> [](https://www.nuget.org/packages/Microsoft.AutoGen.Core/) <br> [](https://www.nuget.org/packages/Microsoft.AutoGen.Core.Grpc/) <br> [](https://www.nuget.org/packages/Microsoft.AutoGen.RuntimeGateway.Grpc/) | [](https://pypi.org/project/autogenstudio/) | </div> Interested in contributing? See [CONTRIBUTING.md](./CONTRIBUTING.md) for guidelines on how to get started. We welcome contributions of all kinds, including bug fixes, new features, and documentation improvements. Join our community and help us make AutoGen better! Have questions? Check out our [Frequently Asked Questions (FAQ)](./FAQ.md) for answers to common queries. If you don't find what you're looking for, feel free to ask in our [GitHub Discussions](https://github.com/microsoft/autogen/discussions) or join our [Discord server](https://aka.ms/autogen-discord) for real-time support. You can also read our [blog](https://devblogs.microsoft.com/autogen/) for updates. ## Legal Notices Microsoft and any contributors grant you a license to the Microsoft documentation and other content in this repository under the [Creative Commons Attribution 4.0 International Public License](https://creativecommons.org/licenses/by/4.0/legalcode), see the [LICENSE](LICENSE) file, and grant you a license to any code in the repository under the [MIT License](https://opensource.org/licenses/MIT), see the [LICENSE-CODE](LICENSE-CODE) file. Microsoft, Windows, Microsoft Azure, and/or other Microsoft products and services referenced in the documentation may be either trademarks or registered trademarks of Microsoft in the United States and/or other countries. The licenses for this project do not grant you rights to use any Microsoft names, logos, or trademarks. Microsoft's general trademark guidelines can be found at <http://go.microsoft.com/fwlink/?LinkID=254653>. Privacy information can be found at <https://go.microsoft.com/fwlink/?LinkId=521839> Microsoft and any contributors reserve all other rights, whether under their respective copyrights, patents, or trademarks, whether by implication, estoppel, or otherwise. <p align="right" style="font-size: 14px; color: #555; margin-top: 20px;"> <a href="#readme-top" style="text-decoration: none; color: blue; font-weight: bold;"> ↑ Back to Top ↑ </a> </p> -------------------------------------------------------------------------------- assistant (to ragproxyagent): The provided text appears to be a README file for the AutoGen project, an open-source initiative by Microsoft. Here's a summary of its content: 1. **Project Overview**: AutoGen is a system for creating and managing complex, distributed applications. It supports multiple languages (C#, Python) and provides a runtime environment for deploying and scaling these applications. 2. **Key Components**: - **AutoGen Core**: The core library containing the fundamental classes and interfaces for building AutoGen applications. - **AutoGen Agent**: A runtime component responsible for managing the lifecycle of AutoGen applications. - **AutoGen Runtime Gateway**: A service that facilitates communication between agents and enables load balancing, scaling, and fault tolerance. 3. **Languages Supported**: C# and Python are currently supported. 4. **Getting Started**: The README provides instructions on how to install the necessary packages, create a new project, and build/run an AutoGen application. 5. **Documentation**: Links to detailed documentation for reference, including API references, guides, and tutorials. 6. **Community & Contribution**: Guidelines for contributing to the project, including information on issue tracking, pull requests, and coding standards. 7. **Legal Notices**: Licensing information and trademark notices. 8. **Support & FAQ**: Information on how to ask questions, report issues, and find answers to common queries. The README also includes a table summarizing the available packages for each supported language (C# and Python) and their respective package managers (NuGet and PyPI). This makes it easy for developers to quickly identify the necessary components for getting started with AutoGen in their preferred language. -------------------------------------------------------------------------------- >>>>>>>> TERMINATING RUN (601a53dc-8a5d-4e19-8503-1517fe3c7634): Termination message condition on agent 'ragproxyagent' met ChatResult(chat_id=None, chat_history=[{'content': 'You\'re a retrieve augmented chatbot. You answer user\'s questions based on your own knowledge and the\ncontext provided by the user.\nIf you can\'t answer the question with or without the current context, you should reply exactly `UPDATE CONTEXT`.\nYou must give as short an answer as possible.\n\nUser\'s question is: What languages does AutoGen support?\n\nContext is: <a name="readme-top"></a>\n\n<div align="center">\n<img src="https://microsoft.github.io/autogen/0.2/img/ag.svg" alt="AutoGen Logo" width="100">\n\n[](https://twitter.com/pyautogen)\n[](https://www.linkedin.com/company/105812540)\n[](https://aka.ms/autogen-discord)\n[](https://microsoft.github.io/autogen/)\n[](https://devblogs.microsoft.com/autogen/)\n\n</div>\n\n<div align="center" style="background-color: rgba(255, 235, 59, 0.5); padding: 10px; border-radius: 5px; margin: 20px 0;">\n <strong>Important:</strong> This is the official project. We are not affiliated with any fork or startup. See our <a href="https://x.com/pyautogen/status/1857264760951296210">statement</a>.\n</div>\n\n# AutoGen\n\n**AutoGen** is a framework for creating multi-agent AI applications that can act autonomously or work alongside humans.\n\n## Installation\n\nAutoGen requires **Python 3.10 or later**.\n\n```bash\n# Install AgentChat and OpenAI client from Extensions\npip install -U "autogen-agentchat" "autogen-ext[openai]"\n```\n\nThe current stable version is v0.4. If you are upgrading from AutoGen v0.2, please refer to the [Migration Guide](https://microsoft.github.io/autogen/stable/user-guide/agentchat-user-guide/migration-guide.html) for detailed instructions on how to update your code and configurations.\n\n```bash\n# Install AutoGen Studio for no-code GUI\npip install -U "autogenstudio"\n```\n\n## Quickstart\n\n### Hello World\n\nCreate an assistant agent using OpenAI\'s GPT-4o model. See [other supported models](https://microsoft.github.io/autogen/stable/user-guide/agentchat-user-guide/tutorial/models.html).\n\n```python\nimport asyncio\nfrom autogen_agentchat.agents import AssistantAgent\nfrom autogen_ext.models.openai import OpenAIChatCompletionClient\n\nasync def main() -> None:\n model_client = OpenAIChatCompletionClient(model="gpt-4o")\n agent = AssistantAgent("assistant", model_client=model_client)\n print(await agent.run(task="Say \'Hello World!\'"))\n await model_client.close()\n\nasyncio.run(main())\n```\n\n### Web Browsing Agent Team\n\nCreate a group chat team with a web surfer agent and a user proxy agent\nfor web browsing tasks. You need to install [playwright](https://playwright.dev/python/docs/library).\n\n```python\n# pip install -U autogen-agentchat autogen-ext[openai,web-surfer]\n# playwright install\nimport asyncio\nfrom autogen_agentchat.agents import UserProxyAgent\nfrom autogen_agentchat.conditions import TextMentionTermination\nfrom autogen_agentchat.teams import RoundRobinGroupChat\nfrom autogen_agentchat.ui import Console\nfrom autogen_ext.models.openai import OpenAIChatCompletionClient\nfrom autogen_ext.agents.web_surfer import MultimodalWebSurfer\n\nasync def main() -> None:\n model_client = OpenAIChatCompletionClient(model="gpt-4o")\n # The web surfer will open a Chromium browser window to perform web browsing tasks.\n web_surfer = MultimodalWebSurfer("web_surfer", model_client, headless=False, animate_actions=True)\n # The user proxy agent is used to get user input after each step of the web surfer.\n # NOTE: you can skip input by pressing Enter.\n user_proxy = UserProxyAgent("user_proxy")\n # The termination condition is set to end the conversation when the user types \'exit\'.\n termination = TextMentionTermination("exit", sources=["user_proxy"])\n # Web surfer and user proxy take turns in a round-robin fashion.\n team = RoundRobinGroupChat([web_surfer, user_proxy], termination_condition=termination)\n try:\n # Start the team and wait for it to terminate.\n await Console(team.run_stream(task="Find information about AutoGen and write a short summary."))\n finally:\n await web_surfer.close()\n await model_client.close()\n\nasyncio.run(main())\n```\n\n### AutoGen Studio\n\nUse AutoGen Studio to prototype and run multi-agent workflows without writing code.\n\n```bash\n# Run AutoGen Studio on http://localhost:8080\nautogenstudio ui --port 8080 --appdir ./my-app\n```\n\n## Why Use AutoGen?\n\n<div align="center">\n <img src="autogen-landing.jpg" alt="AutoGen Landing" width="500">\n</div>\n\nThe AutoGen ecosystem provides everything you need to create AI agents, especially multi-agent workflows -- framework, developer tools, and applications.\n\nThe _framework_ uses a layered and extensible design. Layers have clearly divided responsibilities and build on top of layers below. This design enables you to use the framework at different levels of abstraction, from high-level APIs to low-level components.\n\n- [Core API](./python/packages/autogen-core/) implements message passing, event-driven agents, and local and distributed runtime for flexibility and power. It also support cross-language support for .NET and Python.\n- [AgentChat API](./python/packages/autogen-agentchat/) implements a simpler but opinionated\xa0API for rapid prototyping. This API is built on top of the Core API and is closest to what users of v0.2 are familiar with and supports common multi-agent patterns such as two-agent chat or group chats.\n- [Extensions API](./python/packages/autogen-ext/) enables first- and third-party extensions continuously expanding framework capabilities. It support specific implementation of LLM clients (e.g., OpenAI, AzureOpenAI), and capabilities such as code execution.\n\nThe ecosystem also supports two essential _developer tools_:\n\n<div align="center">\n <img src="https://media.githubusercontent.com/media/microsoft/autogen/refs/heads/main/python/packages/autogen-studio/docs/ags_screen.png" alt="AutoGen Studio Screenshot" width="500">\n</div>\n\n- [AutoGen Studio](./python/packages/autogen-studio/) provides a no-code GUI for building multi-agent applications.\n- [AutoGen Bench](./python/packages/agbench/) provides a benchmarking suite for evaluating agent performance.\n\nYou can use the AutoGen framework and developer tools to create applications for your domain. For example, [Magentic-One](./python/packages/magentic-one-cli/) is a state-of-the-art multi-agent team built using AgentChat API and Extensions API that can handle a variety of tasks that require web browsing, code execution, and file handling.\n\nWith AutoGen you get to join and contribute to a thriving ecosystem. We host weekly office hours and talks with maintainers and community. We also have a [Discord server](https://aka.ms/autogen-discord) for real-time chat, GitHub Discussions for Q&A, and a blog for tutorials and updates.\n\n## Where to go next?\n\n<div align="center">\n\n| | [](./python) | [](./dotnet) | [](./python/packages/autogen-studio) |\n| ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| Installation | [](https://microsoft.github.io/autogen/stable/user-guide/agentchat-user-guide/installation.html) | [](https://microsoft.github.io/autogen/dotnet/dev/core/installation.html) | [](https://microsoft.github.io/autogen/stable/user-guide/autogenstudio-user-guide/installation.html) |\n| Quickstart | [](https://microsoft.github.io/autogen/stable/user-guide/agentchat-user-guide/quickstart.html#) | [](https://microsoft.github.io/autogen/dotnet/dev/core/index.html) | [](https://microsoft.github.io/autogen/stable/user-guide/autogenstudio-user-guide/usage.html#) |\n| Tutorial | [](https://microsoft.github.io/autogen/stable/user-guide/agentchat-user-guide/tutorial/index.html) | [](https://microsoft.github.io/autogen/dotnet/dev/core/tutorial.html) | [](https://microsoft.github.io/autogen/stable/user-guide/autogenstudio-user-guide/usage.html#) |\n| API Reference | [](https://microsoft.github.io/autogen/stable/reference/index.html#) | [](https://microsoft.github.io/autogen/dotnet/dev/api/Microsoft.AutoGen.Contracts.html) | [](https://microsoft.github.io/autogen/stable/user-guide/autogenstudio-user-guide/usage.html) |\n| Packages | [](https://pypi.org/project/autogen-core/) <br> [](https://pypi.org/project/autogen-agentchat/) <br> [](https://pypi.org/project/autogen-ext/) | [](https://www.nuget.org/packages/Microsoft.AutoGen.Contracts/) <br> [](https://www.nuget.org/packages/Microsoft.AutoGen.Core/) <br> [](https://www.nuget.org/packages/Microsoft.AutoGen.Core.Grpc/) <br> [](https://www.nuget.org/packages/Microsoft.AutoGen.RuntimeGateway.Grpc/) | [](https://pypi.org/project/autogenstudio/) |\n\n</div>\n\n\nInterested in contributing? See [CONTRIBUTING.md](./CONTRIBUTING.md) for guidelines on how to get started. We welcome contributions of all kinds, including bug fixes, new features, and documentation improvements. Join our community and help us make AutoGen better!\n\nHave questions? Check out our [Frequently Asked Questions (FAQ)](./FAQ.md) for answers to common queries. If you don\'t find what you\'re looking for, feel free to ask in our [GitHub Discussions](https://github.com/microsoft/autogen/discussions) or join our [Discord server](https://aka.ms/autogen-discord) for real-time support. You can also read our [blog](https://devblogs.microsoft.com/autogen/) for updates.\n\n## Legal Notices\n\nMicrosoft and any contributors grant you a license to the Microsoft documentation and other content\nin this repository under the [Creative Commons Attribution 4.0 International Public License](https://creativecommons.org/licenses/by/4.0/legalcode),\nsee the [LICENSE](LICENSE) file, and grant you a license to any code in the repository under the [MIT License](https://opensource.org/licenses/MIT), see the\n[LICENSE-CODE](LICENSE-CODE) file.\n\nMicrosoft, Windows, Microsoft Azure, and/or other Microsoft products and services referenced in the documentation\nmay be either trademarks or registered trademarks of Microsoft in the United States and/or other countries.\nThe licenses for this project do not grant you rights to use any Microsoft names, logos, or trademarks.\nMicrosoft\'s general trademark guidelines can be found at <http://go.microsoft.com/fwlink/?LinkID=254653>.\n\nPrivacy information can be found at <https://go.microsoft.com/fwlink/?LinkId=521839>\n\nMicrosoft and any contributors reserve all other rights, whether under their respective copyrights, patents,\nor trademarks, whether by implication, estoppel, or otherwise.\n\n<p align="right" style="font-size: 14px; color: #555; margin-top: 20px;">\n <a href="#readme-top" style="text-decoration: none; color: blue; font-weight: bold;">\n ↑ Back to Top ↑\n </a>\n</p>\n\n', 'role': 'assistant', 'name': 'ragproxyagent'}, {'content': "The provided text appears to be a README file for the AutoGen project, an open-source initiative by Microsoft. Here's a summary of its content:\n\n1. **Project Overview**: AutoGen is a system for creating and managing complex, distributed applications. It supports multiple languages (C#, Python) and provides a runtime environment for deploying and scaling these applications.\n\n2. **Key Components**:\n - **AutoGen Core**: The core library containing the fundamental classes and interfaces for building AutoGen applications.\n - **AutoGen Agent**: A runtime component responsible for managing the lifecycle of AutoGen applications.\n - **AutoGen Runtime Gateway**: A service that facilitates communication between agents and enables load balancing, scaling, and fault tolerance.\n\n3. **Languages Supported**: C# and Python are currently supported.\n\n4. **Getting Started**: The README provides instructions on how to install the necessary packages, create a new project, and build/run an AutoGen application.\n\n5. **Documentation**: Links to detailed documentation for reference, including API references, guides, and tutorials.\n\n6. **Community & Contribution**: Guidelines for contributing to the project, including information on issue tracking, pull requests, and coding standards.\n\n7. **Legal Notices**: Licensing information and trademark notices.\n\n8. **Support & FAQ**: Information on how to ask questions, report issues, and find answers to common queries.\n\nThe README also includes a table summarizing the available packages for each supported language (C# and Python) and their respective package managers (NuGet and PyPI). This makes it easy for developers to quickly identify the necessary components for getting started with AutoGen in their preferred language.", 'role': 'user', 'name': 'assistant'}], summary="The provided text appears to be a README file for the AutoGen project, an open-source initiative by Microsoft. Here's a summary of its content:\n\n1. **Project Overview**: AutoGen is a system for creating and managing complex, distributed applications. It supports multiple languages (C#, Python) and provides a runtime environment for deploying and scaling these applications.\n\n2. **Key Components**:\n - **AutoGen Core**: The core library containing the fundamental classes and interfaces for building AutoGen applications.\n - **AutoGen Agent**: A runtime component responsible for managing the lifecycle of AutoGen applications.\n - **AutoGen Runtime Gateway**: A service that facilitates communication between agents and enables load balancing, scaling, and fault tolerance.\n\n3. **Languages Supported**: C# and Python are currently supported.\n\n4. **Getting Started**: The README provides instructions on how to install the necessary packages, create a new project, and build/run an AutoGen application.\n\n5. **Documentation**: Links to detailed documentation for reference, including API references, guides, and tutorials.\n\n6. **Community & Contribution**: Guidelines for contributing to the project, including information on issue tracking, pull requests, and coding standards.\n\n7. **Legal Notices**: Licensing information and trademark notices.\n\n8. **Support & FAQ**: Information on how to ask questions, report issues, and find answers to common queries.\n\nThe README also includes a table summarizing the available packages for each supported language (C# and Python) and their respective package managers (NuGet and PyPI). This makes it easy for developers to quickly identify the necessary components for getting started with AutoGen in their preferred language.", cost={'usage_including_cached_inference': {'total_cost': 0.0, 'granite3.2:8b': {'cost': 0.0, 'prompt_tokens': 2048, 'completion_tokens': 357, 'total_tokens': 2405}}, 'usage_excluding_cached_inference': {'total_cost': 0.0, 'granite3.2:8b': {'cost': 0.0, 'prompt_tokens': 2048, 'completion_tokens': 357, 'total_tokens': 2405}}}, human_input=[])
Great! Our assistant agent and RAG agent successfully synthesized the additional context to correctly respond to the user query with the programming languages currently supported by AutoGen. You can think of this as a group chat between agents exchanging information. This example is a simple demontration of implementing agentic RAG locally with AutoGen.
Step 3. Install Open WebUI¶
Now, let’s move on to building a more advanced agentic RAG system. In your terminal, install and run Open WebUI.
pip install open-webui
open-webui serve
Step 4. Set up web search¶
For web search, we will leverage the built-in web search capabilities in Open WebUI.
Open WebUI supports a number of search providers. Broadly, you can either use a 3rd-party application programming interface (API) service, for which you will need to obtain an API key, or you can locally set up a SearXNG Docker container. In either case, you will need to configure your search provider in the Open WebUI console.
This configuration, either a pointer to SearXNG or input of your API key, is under Admin Panel > Settings > Web Search in the Open WebUI console.
Please refer to the instructions in the Open WebUI documentation for more detailed instructions.
Step 5. Import the agent into Open WebUI¶
- In your browser, go to http://localhost:8080/ to access Open Web UI. If it is your first time opening the Open WebUI interface, register a username and password. This information is kept entirely local to your machine.
- After logging in, click the icon on the lower left side where your username is. From the menu, click Admin panel.
- In the Functions tab, click + to add a new function.
- Give the function a name, such as "Granite RAG Agent," and a description, both of
str
type. - Paste the
granite_autogen_rag.py
Python script into the text box provided, replacing any existing content. - Click Save at the bottom of the screen.
- Back on the Functions page, make sure that the agent is toggled to Enabled.
- Click the gear icon next to the enablement toggle to customize any settings such as the inference endpoint, the SearXNG endpoint or the model ID.
Now, your brand-new AutoGen agent shows up as a model in the Open WebUI interface. You can select it and provide it with user queries.
Step 6. Load documents into Open WebUI¶
- In Open WebUI, navigate to Workspace > Knowledge.
- Click + to create a new collection.
- Upload documents for the Granite retrieval agent to query.
Step 7. Configure Web Search in Open WebUI¶
To set up a search provider (for example, SearXNG), follow this guide.
The configuration parameters are as follows:
Parameter | Description | Default Value |
---|---|---|
task_model_id | Primary model for task execution | granite3.2:8b |
vision_model_id | Vision model for image analysis | granite-vision3.2:2b |
openai_api_url | API endpoint for OpenAI-style model calls | http://localhost:11434 |
openai_api_key | API key for authentication | ollama |
vision_api_url | Endpoint for vision-related tasks | http://localhost:11434 |
model_temperature | Controls response randomness | 0 |
max_plan_steps | Maximum steps in agent planning | 6 |
Note: These parameters can be configured through the gear icon in the "Functions" section of the Open WebUI Admin Panel after adding the function.
Step 8. Query the agentic system¶
The Granite retrieval agent performs AG2-based RAG by querying local documents and web sources, performing multi-agent task planning and enforcing adaptive execution. Start a chat and provide your agentic system with a query related to the documents provided to see the RAG chain in action.
Summary¶
A multi-agent setup enables the creation of practical, usable tools by getting the most out of moderately sized, open source models like Granite 3.2. This agentic RAG architecture, built with fully open source tools, can serve as a launching point to design and customize your question answering agents and AI algorithms. It can also be used outside of the box for a wide array of use cases. In this tutorial, you had the opportunity to delve into simple and complex agentic systems, leveraging the capabilities of AutoGen. The Granite LLM was invoked by using Ollama, allowing for a fully local exploration of these systems. As a next step, consider integrating more custom tools into your agentic system.
Footnotes:
1 Wu, Qingyun, et al. “AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework.” GitHub, 2023, github.com/microsoft/autogen.