[Framework Review] Microsoft's AutoGen Framework

Reviewing Microsoft's Autogen Framework, what view of agency it has, what is interesting about it and where it might go next.

Jan 02, 2025

AutoGen in a nutshell

Autogen is an application framework for automating complex tasks (i.e. tasks that require multiple steps and potentially multiple attempts to get to a solution) using LLMs. Tasks are solved by creating a team of agents (at least 2) that collaborate and share information to solve the task.

AutoGen was presented at a 2024 International Conference on Learning Representations and won best paper in the LLM Agents Workshop.

Testing across a varied set of scenarios shows that AutoGen’s approach yielded better results when compared to standalone LLMs or other approaches1.

AutoGen is still nascent technology. A peak into a possible future for application development but with considerable work to be done. The most interesting aspect is the use of conversation protocols as a means to co-ordinate the work of AI Agents. It is disappointing that they do not reference the rich prior work in agent communication protocols. The thing to watch out for are teams adopting AutoGen because it is exciting but for problems that do not really require conversing agents to be solved.

Why was AutoGen created?

The principal thesis behind AutoGen, or the “bet” as they describe it, is that the path to reliably automating the completion of large, complex tasks using large foundational models is by applying a multi-agent view on the problem2.

The expectation is that multiple co-operating agents can:

help encourage divergent thinking
improve factuality and reasoning
provide guardrails

The core justification for this is that the best way to take advantage of an LLM’s broad range of capabilities towards the solution of a complex task is to break that task down into simpler sub-tasks3. Instead of providing an LLM the entire problem in a single prompt and expecting it to carry out all the steps required in one go you present the problem to a team of agents that will work together to solve the problem. Each agent specialises in one of the skills required to solve the problem and works with the other agents to get to an overall solution.

An agent in AutoGen becomes the means through which to represent a specific skill (e.g. code generation and execution, browsing and interacting with the web, soliciting user feedback) and the conversation between agents become the means through which different skills share information and collaborate to solve the problem.

Agent Design Perspective: AutoGen’s View on Agents and Autonomy

For every framework we analyse at AgentsDecoded we look for a deeper understanding of how AI Agents can be thought of and engineered. Our reference is our own conceptual framework for understanding agency and autonomy that provides a consistent set of terms to fall back on.

AutoGen has what I would term a very pragmatic view of agency. Essentially they don’t worry too much about what is an agent at the design level. Ultimately an agent is code that can send and receive messages and has access to an LLM with, optionally, some tools. The rest is up to the developer to further refine. You can see this in the definition of the Assistant Agent from the documentation. AutoGen calls these entities customisable and conversable Agents.

There is no explicit representation of an agent’s specific goals or motivations and no explicit way to determine when a goal was achieved. The goal will be defined at runtime based on what instructions will be passed on through a prompt (e.g a request from the user to solve a specific math problem) but it is not explicitly captured. The individual agents in AutoGen are essentially wrappers for prompts with access to tools.

The true innovation lies in the fact that they are co-ordinated through a conversation protocol instead of a more rigid workflow. This protocol is wrapped in the concept of a Team. It defines when the task is completed at which point the Agents will stop talking to each other. AutoGen calls all of this Conversation Programming.

These conversation protocols are perhaps the most interesting aspect of AutoGen. In fact, the AutoGen researchers indicate work on how different protocols can enable different types of problem solving as an interesting next area of research.

In terms of AgentsDecoded’s view of agency we are dealing with passive or self-directed agents. When self-directed (i.e. they have some choice about how to complete a task) it is, typically, a minimal degree of self-direction. Their core capability or role determines what they can do to solve a problem (e.g. execute code, search the web, etc). There is no real sense of autonomy (in terms of agents ability to generate their own goals). The agents can exhibit proactive behaviour since they can have multiple attempts to solve a goal - but it is the system that is proactive through the conversation protocol not necessarily individual agents. Similarly, they can be reactive since they will react to information provided by other agents, tools and humans.

Interestingly, if one were to take a step back and see an entire AutoGen application as a single agent that system exhibits more sophisticated behaviour. Looking at any of their example applications as a single agent from a design perspective we can see how the conversation protocol combined with the halting conditions essentially map to a planning protocol (e.g. keep exchanging ideas in a round-robin fashion) and the definition of a goal (i.e. stop when you get to a good enough answer based on a certain metric). In essence, what you have is a way of constructing sophisticated agents with goals out of simpler conversing agents that do not each necessarily need to have goals explicitly defined. These AutoGen internal agents don’t have a broader understanding of why they are doing something - they are simply executing until something tells them to stop.

AutoGen is most interesting as a means to study conversation protocols between these simpler agents, how information can be stored and shared between agents and how that helps improve on our capability to solve more complex tasks. It is not particularly opinionated about individual agent design or the concepts of goals, self-direction or autonomy (and nor does it need to be to achieve its own stated research goals).

When to use AutoGen (for Builders)

If you are a developer evaluating different frameworks when should you consider using AutoGen?

Taking a pragmatic view of what AutoGen can do I would say it is interesting to consider when you can naturally think of your problem as one where different components need to exchange information in a not clearly predetermined set of steps.

What I would watch out for is using AutoGen to just implement a simple workflow that uses LLMs. Looking at their code examples and the broad descriptions it is tempting to use it for a number of simple scenarios. However, those can be achieved with much simpler frameworks and tools. You would be doing yourself a disservice by adding more complexity to your own system.

If however, you are faced with a problem that might require multiple attempts to be solved between co-operating agents and you can also define clear halting conditions then AutoGen becomes interesting as a way to experiment with solutions.

What AutoGen adds to our understanding of Agent Systems (for Thinkers)

The key contribution from my perspective is the view on conversation protocols (or conversational programming as the AutoGen team calls it). There is already a rich body of research on multi-agent system control and coordination protocols and one can imagine AutoGen as the platform through which these protocols can be implemented and tested.

What AutoGen means for organisations

Organisations considering to use AutoGen should ensure that they have the teams and resources to allow them to explore and experiment in the space extensively. An experienced team that is able to wrap the system built on AutoGen with appropriate control mechanisms may find some of the conversation programming capabilities particularly interesting. However, expectations of having reliable, production grade solutions with an inexperienced team and limited resources would quickly lead to disappointment. As it currently stands it feels like it is primarily a tool for experimentation and learning. Of course, if you are a startup and think you have an interesting take on how to solve a complex multi-step problem using LLMs, AutoGen may help you get to that proof of concept quickly and you can worry about scaling later.

Ultimately, AutoGen is a peak into the future of application development with LLMs. It is not quite the thing yet, but it shows that we can start thinking of problems as a set of tasks that are delegated to entities to solve together with a co-ordination mechanism (conversational programming) to help them solve it. Early days but exciting to see.

AutoGen results against other approaches

https://www.microsoft.com/en-us/research/video/autogen-update-complex-tasks-and-agents/

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation, Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang (Eric) Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Ahmed Awadallah, Ryen W. White, Doug Burger, Chi Wang, COLM 2024 | August 2024