Back to list

How To: Build a Multi Agent System From Scratch

Date

February 18, 2025

Author

Tim Shea

By far, the most compelling use case for Generative AI isn’t replacing human workers or automating strictly defined workflows, but rather vastly increasing the dimensions that people can explore when making a decision. In this way, LLM’s and diffusion models are incredible brainstorming partners. We can instruct LLM’s about the knowledge domain we’d like it to represent and then precisely define how we’d like to see the output. For example, we could ask:

“I want to create a database of the luxury retail holding companies like VF Corporation, New Guard Group, Only The Brave, and LVMH, as well as the brands they've recently acquired. Can you please generate an Excel spreadsheet with the following columns: Holding Company, Brand Name, Acquisition Amount, Acquisition Date, Rationale for Acquiring"

Queries like this are now considered table stakes, a staggering upgrade over traditional search engines that now seem quaint in the era of Generative AI.

However, some decision processes can be exceedingly complex. Sometimes we’d like to be able to model decision-making that might occur between multiple specialized personas (such as a CEO, CMO, or CTO) or specialized modules (such as an API connector, K1 parser, or S.W.O.T. analyzer).

In the real world, we’d take the results of that Excel spreadsheet and run it through a gauntlet of human experts. Strategists, investors, and domain experts would chime in. We’d construct “what ifs” about the acquisitions, we’d dig into the P&Ls, and we’d dream up go-to-market plans for potential future acquisitions. But thanks to modern LLMs, we can now create individualized AI agents that can model each of these behaviors and simulate having a room full of experts who inform our decisions.

Multi Agent Systems

Multi-Agent AI systems to model the proprietary decision-making processes that we call our “Super Vision OS.” With it, we can scour massive business databases in search of weak P&L models where AI can deliver outsized improvements. We can task the agents in our OS with hypotheticals about Retail, CPG, Print, Entertainment, or e-commerce. The agents can crunch numbers, pontificate, and debate with each other. The agents can even produce deliverables that other agents criticize and iterate on.

Multi-Agent Systems have a number of advantages over simple LLM chat interfaces:Each agent in the system can be trained with an enormous amount of specialized knowledge

The agents can be architected so that specific types of back-and-forth and checks-and-balance occur.
The agents can be strung together with deterministic modules - like a Python script - that perform specialized tasks.
The agent system can be given an almost infinite compute space to iterate and come up with options.
The agent system can present us with a dazzling array of options, each of which contains extremely well-crafted detail.

The Multi-Agent System Ecosystem

We also have a huge array of off-the-shelf options for standing up customized Multi-Agent Systems like AutoGen, CrewAI, and LangChain. We chose AutoGen because it’s free, there are tons of introductory videos, and they provide sample code for multiple different Multi-Agent architectures that can be configured and stacked together. AutoGen catalogs an extensive list of examples of what it calls “Conversation Patterns” (such as two-agent chat, sequential chat, group chat, and nested chat).

These patterns, or architectures, are an important part of the design process of the agent system. It’s similar to how you’d build a team of human experts and predefine the rules of engagement between them. Different designs can produce wildly different outcomes.

The simplest pattern that AutoGen provides is a Two-Agent Chat, where we define each agent’s persona, where one agent simply asks the other agent the original prompt question, and where the agents go back and forth on a problem a set number of times. It’s easy to set up, and we can read each interaction within their conversation verbatim. The sample code is below:

import os
from autogen import ConversableAgent


agent__venture_studio_founder = ConversableAgent (
   name="Agent__Venture_Studio_Founder",
   system_message="You are the founder owner of a venture studio with expertise in funding and building technology and retail brands.",
   llm_config={"config_list": [{"model": "gpt-4", "api_key": os.environ["OPENAI_API_KEY"]}]},
)


agent__business_analyst = ConversableAgent (
   name="Agent__Business_Analyst",
   system_message="You are an expert business analyst with deep expertise in finance, strategy, marketing, and operations.",
   llm_config={"config_list": [{"model": "gpt-4", "api_key": os.environ["OPENAI_API_KEY"]}]},
)


chat_result = agent__venture_studio_founder.initiate_chat (
   agent__business_analyst,
   message="""I want to identify companies that traditionally have weak P&L's such as
     thin profit margins, fickle customers, or are high susceptibility to market swings.


     Our thesis is we want to focus on industries such as Retail, CPG, Print, Entertainment, and e-Commerce
     so we only want to look at types of companies in those specific industries.


     We want to choose company types where Generative AI can produce outsized efficiencies to the P&L's.
     Don't want to consider company types where Generative AI cannot help.
   """,
   summary_method="reflection_with_llm",
   max_turns=2,
)

The output is pretty useful, and it gives us humans some ideas of what to do next. But it's not significantly more valuable than simply asking ChatGPT the same question in a single prompt.

A slightly more advanced pattern is a Sequential Chat. In this case, we set up multiple Two-Agent chats, where the output of each Two-Agent chat is passed onto multiple downstream Two-Agent chats. Think of this design like a CEO handing a task to a lower-level employee, who makes decisions, and then passes the decisions through multiple levels of approval. Sample code for this design is below.

# Define the Orchestrator
agent__orchestrator = ConversableAgent(
   name="Agent__Orchestrator",
   system_message="You coordinate the analysis of an industry by passing tasks to other agents and summarizing their feedback.",
   llm_config={"config_list": [{"model": "gpt-4", "api_key": os.environ["OPENAI_API_KEY"]}]},
   human_input_mode="NEVER",
)


# Define the Brainstorm Agent
agent__brainstorm_analyst = ConversableAgent(
   name="Agent__Brainstorm_Analyst",
   system_message="You analyze the given industry and generate a list of company types with weak P&L models where AI can deliver substantial improvements. Return this list.",
   llm_config={"config_list": [{"model": "gpt-4", "api_key": os.environ["OPENAI_API_KEY"]}]},
   human_input_mode="NEVER",
)


# Define the CFO
agent__cfo = ConversableAgent(
   name="Agent__CFO",
   system_message="You criticize the provided list of company types from a financial POV, identifying weaknesses in margins, capex, opex, inventory, and liquidity.  Remove companies that are too risky.",
   llm_config={"config_list": [{"model": "gpt-4", "api_key": os.environ["OPENAI_API_KEY"]}]},
   human_input_mode="NEVER",
)


# Define the CTO
agent__cto = ConversableAgent(
   name="Agent__CTO",
   system_message="You criticize the provided list of company types from a technology POV, identifying risks in tech implementation, cost, complexity, and feasibility.  Remove companies that are too risky.",
   llm_config={"config_list": [{"model": "gpt-4", "api_key": os.environ["OPENAI_API_KEY"]}]},
   human_input_mode="NEVER",
)


# Define the CMO
agent__cmo = ConversableAgent(
   name="Agent__CMO",
   system_message="You criticize the provided list of company types from a marketing POV, identifying risks in go-to-market, media costs, and customer acquisition and retention.   Remove companies that are too risky.",
   llm_config={"config_list": [{"model": "gpt-4", "api_key": os.environ["OPENAI_API_KEY"]}]},
   human_input_mode="NEVER",
)

This pattern produces significantly more pointed and detailed feedback now that it’s been vetted by a gauntlet of experts. And it emphasizes the importance of designing the right agent system for the right task.

AutoGen provides examples for many different design patterns, including Group Chats and Nested Chats, but the field is incredibly nascent and the potential designs and architectures are limitless. We’ve looked at proposals and whitepapers for:

Magentic-One: A Multi-Agent System from Microsoft where agents, like a "Web Surfer" or "Coder," work together under the guidance of a central "Orchestrator" to solve complex tasks.
Human Personas: A Stanford-associated project where agents were trained to behave like the real-life subjects of qualitative interviews and then asked to converse and make decisions across a variety of topics.
Project Sid: A large-scale simulation of agent “societies” where agents concurrently participated in jobs, trade, decision-making, and developed social systems, simulating human-like communities.

It’s a fascinating field and we’re actively experimenting with new frameworks and design patterns to provide rocket fuel to our decision making, and so can you. If this is an interesting field for you, please reach out to us, we’d love to engage with other smart practitioners in the field.

February 18, 2025