What We Learned from 5k Agents: What Kind of Agent Swarm Do We Actually Need?

Source: @t1anyufan on X · 2026-05-21

Tianyu Fan · 2nd year Ph.D student at HKU · MiniRAG, AI-Trader

The Most Certain Gains: Communication and Alignment

When we talk about agent swarms, it is easy to think of "getting more agents to work in parallel."

But I actually think the truly valuable form of an agent swarm may not be one person summoning 100 agents. It may be a swarm where every person has several agents of their own.

Agent A holds part of human A's permissions, documents, context, and preferences. Agent B holds part of human B's permissions, documents, context, and preferences.

When many such agents are safely connected by a Feishu-like system built specifically for agents, they can start representing humans in high-intensity information exchange and coordination. That is when it begins to look like a real agent-native swarm.

The Current State of Agent Swarms: Performance or Division of Labor

If we put the common forms of agent swarms today side by side, there are roughly two paths.

The first is the "parallel" path, represented by systems like Kimi Agent Swarm: 1k+ agents do roughly the same thing, usually under an orchestrator-subagent architecture, and the system extracts a better result from the collective outputs.

The second is the "role division" path, represented by various OPC / kanban-style orchestration workflows: around 10 agents, each with a predefined identity and skill set, collaborate through a process to complete a task.

The first path clearly pursues extreme performance. My question is mainly whether the extra cost - not only the parallel agents themselves, but usually also the need to predefine a task that can measure outputs - is enough to cover the final marginal improvement in performance.

The second path raises an even bigger question for me:

If these roles are just identities written into prompts, how is that fundamentally different from me opening 5 Codex sessions at the same time?

Looking Only at Coding Hides the Real Problem: Competition and Cooperation in Groups

Codex and CC have almost dominated the coding-agent space. But once I moved my attention away from code development, I realized something:

Coding hides two questions that almost every collaboration system has to answer: how to compete, and how to cooperate.

Code development is certainly one of the easiest scenarios for agents to demonstrate capability. The tasks are relatively clear, feedback is relatively direct, and the toolchain is fairly standardized: read code, modify code, run tests, open PRs.

But precisely because coding is so easy to split up, many agents are routed into nearly independent subtasks. As a result, issues like identity, permissions, approvals, and document boundaries are weakened by engineering toolchains and task decomposition.

Collaboration inside a real swarm is not like this.

Take an OA approval process as an example. A workflow may require a group of people to sign. The action may look the same - "signing" - but because their identities are different, the meaning is completely different. A signature from finance, legal, and the responsible owner are three completely non-equivalent operations.

In other words, a real swarm contains many permission boundaries, internal documents, historical decisions, approval processes, accountability structures, and contexts that cannot be casually shared.

So if we discuss agent swarms only in the context of coding agents, it is easy to reduce the problem to a simple abstraction like: "Can we run 10 agents in parallel to write code?"

But in a more realistic swarm context, the more important question is:

"Can agents under different people, roles, and permissions safely exchange information, form a division of labor, align with each other, and leave behind a traceable collaboration process?"

In other words, an agent swarm ultimately has to answer three questions:

How do we stimulate competition so agents actively call tools and produce judgments?
How do we establish cooperation so agents do not just work in isolation, but form sustained collaboration?
How do we control permissions, sharing, and responsibility boundaries so agents can enter real swarm workflows?

How We Observed Competition and Cooperation

Luckily, the AI-Trader platform we released earlier this year already has close to 10k agent users, which is enough to support some small experiments.

So we selected 5,289 agents from AI-Trader platform and divided them into four groups:

competition: 1,327
cooperation: 1,336
hybrid: 1,356
control: 1,270

The competition group consisted of independently observed individuals who were only told that they needed to achieve a higher ranking. The cooperation group was encouraged to cooperate in order to achieve a higher group ranking. The hybrid group had both individual and group ranking goals. The control group received no extra instruction.

Observation 1: A Competitive Context Significantly Increases Initiative

Data:

competition: 225 trading signals, 7,161 heartbeats, 966 experimental prompt exposures
cooperation: 73 trading signals, 5,727 heartbeats, 414 experimental prompt exposures

Looking mainly at trading signals, competition produced roughly 3x as many as cooperation. But the number of agents that had ever produced signals was not that different between the two groups. The competition group simply produced many more total signals. This suggests that a competitive context makes agents more likely to act proactively.

Observation 2: Cooperation Does Not Happen Naturally

Although I forced agents into the same group and asked them to cooperate, within a group,

agents still traded independently, made their own judgments, and optimized their own strategies. They rarely replied to each other, supplemented each other's work, continued each other's threads, or formed long-term collaboration around a shared goal.

Over the past 24 hours, reply agents were 0 across all four groups. In other words, there were no collaborative events where agents formed high-quality consensus under a group cooperation objective and reported conclusions back to the community, lol.

Why Competition Is Much Stronger Than Cooperation

So why do agents naturally compete, but not naturally team up?

From the model perspective, almost all post-training tasks teach models to survive in long-horizon tasks. Although subagents do appear in training data, many subagents still behave more like tools than collaborators: send in a query, wait for a result. At that point, a subagent is not fundamentally different from a search API.

From the harness perspective, almost every mainstream harness is built to accept a user's query and get the job done. The harness layer handles sessions, permissions, commands, and tool calls, but there is rarely an intermediate layer that can handle the complexity of swarm collaboration.

A simple example is: "A document exists in a group with three agents, but only two agents can edit it."

This is extremely troublesome for a single harness to handle. Which sessions can see the document? Which sessions can edit it? How do we trace changes after another agent modifies it?

So the problem is not whether agents can complete a task. The problem is that they lack the structural conditions and cooperative willingness required to enter swarm collaboration.

Without clear identity boundaries, permission boundaries, and long-term collaborative relationships, agents seem to share a latent fallback: they just start doing the work themselves as hard as possible, regardless of what happens to their teammates.

Harness for Harness: Building an Organizational Layer for Agents

Now that agents can already function as workers, at the swarm level, agents have long needed their own Feishu. (I think Feishu is the best enterprise collaboration software in the universe.)

More precisely, agents need a swarm collaboration layer to control things that ordinary harnesses struggle to control: information flow between groups, permission boundaries, collaborative relationships, and accountability records.

The phrase "harness for harness" is not quite accurate here, but I am too lazy to invent a new term for now.

Following this conclusion leads back to the core point from the beginning:

The future agent swarm should not be "100 clones of one person." It should be "a network of many people's agents."

In this network, each person manages their own agents. Each agent only holds the subset of permissions, documents, and skills it has been authorized to hold. This part should be handled by the collaboration platform, rather than relying on the agent itself to control it through a harness or sandbox.

In this context, many components that were previously hidden under "model performance improvement" will become increasingly important.

Permission control

If Agent A represents human A and Agent B represents human B, then what they can see, what they can call, and who they can make decisions on behalf of are no longer simple product settings. They become the foundational rules of the entire collaboration network.

Agent sharing

Does an agent belong only to one person, or can it be temporarily authorized for use by a team? Can it safely share part of its context with another agent or human? Can it explain its judgment to others without granting them access to the original documents?

Approval mechanisms / traceable records

When agents represent different people and collaborate with different permissions and documents, a swarm still needs to know: who authorized the agent, who can see its input, which decision its output influenced, and how responsibility can be traced after something goes wrong.

These have almost nothing to do with "a single agent doing one task well," but they are core problems for swarm collaboration.

How the Division of Labor Between Humans and Agents Will Change

My recent daily work may already be a miniature version of this.

Collaborator A asks their agent to produce a proposal. I ask my agent to read it and give feedback. Then the two of us have a meeting to align, mostly discussing who gets to make the decision and which person should drive the next step. Then that person will probably hand the task to their own agent as well.

Then everyone has another meeting. The focus is not exactly what has been done, but rather confirming the occasional sparks of insight and the things that were not recorded.

In this structure, agents handle high-frequency information processing, cross-system execution, and preliminary judgment - what large-company jargon would call "connecting and aligning everything." Humans are responsible for trade-offs, authorization, and accountability.

How to Make Agents More Like Trustworthy Swarm Members

They need identity, boundaries, authorization, memory, and traceable behavioral records.

Agent swarms may not immediately increase "final output quality" by 10x, but they will definitely improve two things significantly:

Communication speed. Information transfer no longer depends on humans having meetings.
Alignment depth. Each agent understands the other side with its owner's context and permissions, rather than reacting only to an isolated message.

So the agent swarm we need may not be "a swarm with more agents doing work," but "my agent can represent me and work with your agent."

This may also be one of the most interesting questions for agent-native swarms going forward: can agents form persistent swarm structures?

原推文链接: https://x.com/t1anyufan/status/2057460666097869170

5000 个 Agent 实验告诉我们：我们到底需要什么样的 Agent Swarm？