Tutorial · Multi-Agent Systems · Decentralized Orchestration

Coordination Without a Central Orchestrator: Emergent Order in Decentralized Multi-Agent Systems

Consensus protocols, stigmergic coordination, peer-to-peer negotiation, and learning-based approaches for multi-agent coalitions that self-organize without any single entity in charge.

Authors: Nathan Crock, GPT 5.1 (research), Gemini 3.1 (research), Claude Opus 4.5 (coding)

In Tutorials I through III, we developed the machinery for agents to discover each other, form coalitions, build trust, evaluate capabilities semantically, and align economic incentives. But throughout every worked example, one structural assumption remained unchallenged: a single orchestrator agent Ω sat at the center of every coalition, decomposing tasks, assigning roles, aggregating results, and making all coordination decisions. The orchestrator was the brain; every other agent was a limb.

This tutorial confronts the question of what happens when Ω is removed, or more precisely, when no agent has the right to assume the orchestrator role. In the open agentic web, agents are operated by different organizations with different trust levels, economic incentives, and operational constraints. Designating any single agent as orchestrator creates a concentration of authority that may be unacceptable, infeasible, or fragile. The question becomes: can a coalition coordinate itself?

The answer draws from five decades of distributed systems research, three decades of multi-agent systems theory, and a rapidly emerging body of work on LLM-based collective intelligence. Under the right conditions, decentralized coordination is not only possible but more resilient, more scalable, and more adaptable than centralized alternatives. But the conditions matter enormously. Without careful protocol design, decentralized coordination degrades into oscillation, deadlock, or strategic exploitation by self-interested agents.

1. The Orchestrator Bottleneck — Why Central Control Fails at Scale

The orchestrator pattern (Tutorial I, Section 5) is intuitive: a single agent Ω receives a task, decomposes it into sub-goals, discovers and recruits specialists, monitors their execution, handles failures, and aggregates results. This is the dominant architecture in every major multi-agent framework (AutoGen, CrewAI, MetaGPT, LangGraph), and for good reason: it simplifies coordination to a series of one-to-many interactions. The orchestrator maintains global state, and every other agent only needs to communicate with Ω.

But the pattern carries structural liabilities that become acute as systems scale beyond controlled environments into open, cross-organizational networks.

Three Structural Liabilities of Central Orchestration
Single Point of Failure If Ω fails, the entire coalition halts. No graceful degradation. Availability risk grows with coalition size Authority Concentration Ω sees all data, controls all assignments, and can manipulate. Cross-org trust prohibits single-party control Scalability Ceiling All messages route through Ω. O(n) per-round communication. Amdahl's Law: serial Ω is the bottleneck The combined effect is fragility, untrustworthiness, and slowness: precisely what the agentic web cannot afford. These are not edge cases; they are structural constraints of the star topology.
Fig. 1 — The three structural liabilities of centralized orchestration. Each grows worse as the number of agents and organizational boundaries increases.
Analogy

Consider the difference between an air traffic control tower and a flock of starlings. The tower is a centralized orchestrator: it tracks every aircraft, issues instructions, sequences landings. It works beautifully for a bounded number of aircraft in a defined airspace. But what the tower cannot do is scale to millions of agents, tolerate the tower going offline, or operate when the aircraft don't trust the controller. Starling murmurations achieve complex, coherent collective behavior, including rapid direction changes, obstacle avoidance, and predator evasion, through purely local interactions. Each bird attends to its nearest 6–7 neighbors, adjusting velocity and heading based on three simple rules: alignment, cohesion, and separation. No bird is "in charge." The flock's coordination is emergent.

The distributed systems literature quantifies these problems precisely. Yang et al. (NeurIPS 2025) demonstrated that AgentNet, a fully decentralized DAG-structured framework where LLM agents route tasks autonomously via local routing strategies, significantly outperformed centralized multi-agent systems in task efficiency, adaptability, and fault tolerance. Their key insight was that removing the central orchestrator allowed agents to specialize dynamically and adjust connectivity based on task demands, capabilities that star-topology coordination fundamentally precludes.

A 2025 industry survey reinforces this: contract net protocols (the decentralized ancestor of modern agent negotiation) remain the most widely deployed coordination mechanism in production multi-agent systems, used in 47% of deployments, while market-based approaches account for 29% and distributed constraint optimization covers 18%. Fully centralized orchestration, despite its dominance in research prototypes, is the minority pattern in systems that must actually survive in production.

The goal of this tutorial is to provide the theoretical foundations and practical protocols for removing Ω: not by ignoring the need for coordination, but by distributing it across the coalition itself. We will examine four complementary approaches: consensus protocols, stigmergic coordination, peer-to-peer negotiation, and learning-based emergent coordination. Each has different trade-offs; production systems will layer them, just as discovery protocols are layered (Tutorial I, Section 3).

2. Formal Foundations — Dec-POMDPs and the Complexity Wall

Before examining solution strategies, we need to understand why decentralized coordination is fundamentally hard: not merely inconvenient, but computationally intractable in the general case. The standard formal framework is the Decentralized Partially Observable Markov Decision Process (Dec-POMDP), which generalizes the single-agent POMDP to teams of cooperative agents with local-only observation.

Formal Definition — Dec-POMDP

A Dec-POMDP is a tuple ⟨n, S, {Ai}, T, R, {Ωi}, O, γ⟩ where n is the number of agents, S is the state space, Ai is the action space for agent i, T is the joint transition function, R is a global reward function, Ωi is agent i's observation space, O is the joint observation function, and γ is a discount factor. Critically, each agent i selects actions based only on its own observation history hi = (oi1, ai1, ..., oit). No agent observes the global state or other agents' observations directly.

The fundamental theorem of Dec-POMDP complexity, established by Bernstein et al. (2002), is sobering:

Complexity Hierarchy for Cooperative Decision-Making
MDP ∈ P  ⊂  POMDP ∈ PSPACE  ⊂  Dec-POMDP ∈ NEXP-complete

Moving from centralized (POMDP) to decentralized (Dec-POMDP) control raises worst-case complexity by an exponential step. The infinite-horizon case is undecidable. This is not an artifact of formalization; it reflects a genuine information-theoretic barrier: without shared observations, agents must reason about what other agents might be observing and doing, creating a recursion of beliefs about beliefs (the "I think you think I think" problem) that scales combinatorially.

The Practical Implication

NEXP-completeness means that no polynomial-time algorithm can optimally solve general Dec-POMDPs unless computational complexity theory is fundamentally wrong. For our prior authorization scenario with 5 agents, the joint policy space is already vast. This doesn't mean decentralized coordination is impossible; it means we must exploit structural assumptions (communication, independence, domain constraints) to escape the complexity wall. Every practical coordination protocol in this tutorial achieves tractability by restricting the general Dec-POMDP in specific ways.

Three structural relaxations are particularly relevant to agentic systems:

Communication (Dec-POMDP-Com)

If agents can share observations, the problem reduces to a centralized POMDP (PSPACE). Even sharing partial information such as action suggestions and compressed beliefs dramatically reduces complexity. Asmar et al. (2024) showed that Dec-POMDP-Com models achieve near-centralized team value with greatly reduced online complexity.

Transition Independence

When each agent's state transitions depend only on its own actions (not other agents'), optimal policies become Markov-local. Complexity drops from NEXP to NP-complete, which is still hard but tractable for small coalitions. This applies when sub-tasks are loosely coupled.

Event-Driven / Macro-Actions

Dec-POSMDPs extend the model with temporally extended macro-actions, allowing asynchronous decision-making. Agents coordinate at macro-action boundaries rather than every time step, reducing the coordination surface by orders of magnitude, which is crucial for multi-agent systems where sub-tasks have different durations.

Stigmergic Relaxation

When agents coordinate indirectly through a shared environment (a shared state store, bulletin board, or artifact registry), each agent's policy conditions on both its local observations and the shared state, thereby providing a Markovian signal that the standard Dec-POMDP lacks.

The remainder of this tutorial examines four families of coordination mechanisms, each of which exploits one or more of these relaxations. We will see that communication-based consensus, stigmergic shared state, negotiation protocols, and learned coordination strategies are not competing approaches but complementary layers in a coordination stack.

3. Consensus-Based Coordination — Reaching Agreement Without Leaders

The most direct approach to orchestrator-free coordination is consensus: a protocol by which distributed agents converge on a shared decision despite having only local information and local communication. Consensus is the bedrock of distributed computing, underlying Paxos, Raft, and every blockchain, and its adaptation to multi-agent AI systems provides the first building block for leaderless coalitions.

Formal Definition — Consensus Protocol

Given n agents, each holding an initial value vi(0), a consensus protocol is an iterative procedure where at each round t, agent i updates its value based on messages from its neighbors N(i): vi(t+1) = f(vi(t), {vj(t) : jN(i)}). The protocol converges if ∀i,j: limt→∞ |vi(t)vj(t)| = 0. The convergence rate depends on the spectral gap of the communication graph: specifically, on λ2(W), the second-largest eigenvalue of the weight matrix W.

3.1 Weighted Belief Fusion for Agent Coalitions

In our domain, the "values" that agents must agree on are not numbers but beliefs: about the task decomposition, about subtask assignments, about intermediate results, and about the overall plan. Each agent enters the coalition with a partial view: ClinEvidence knows about evidence synthesis workflows, GenomicsAgent understands variant analysis requirements, and PolicyReasoner knows coverage determination logic. These partial views must be fused into a coherent shared plan.

The SEEK-Multi framework (IJSRA, 2026) formalized this as a weighted belief fusion protocol for multi-agent coordination. At each round, agent i updates its belief bi(v, t) over a set of propositions v (e.g., "subtask T2 should precede T3", "the genomic analysis requires pharmacogenomic extension"):

Weighted Belief Fusion
bi(v, t+1) = wii · bi(v, t) + ΣjN(i) wij · bj(v, t)

where the weights wij satisfy Σj wij = 1, encode the trust that agent i places in agent j's beliefs, and can be derived from the reputation scores developed in Tutorial II (Trust Without a Central Authority, Section 6). The convergence theorem guarantees that if the communication graph is connected, all agents converge to the same belief vector b*, and the convergence rate is bounded by ||b(t) − b*|| ≤ λ2(W)t · ||b(0) − b*||.

Key Insight — Trust-Weighted Consensus

The weight matrix W is the bridge between the trust infrastructure (Tutorial II) and the coordination protocol. An agent with a high reputation score (say, ClinEvidence with a score of 95) will have its beliefs weighted more heavily in the consensus than a newly joined agent with a reputation of 22. This means the coalition's shared plan naturally reflects the judgments of its most trusted members, with no agent designated as orchestrator. Authority is distributed proportionally to demonstrated competence.

3.2 Consensus for Task Decomposition and Assignment

How does consensus apply to the concrete problem of task decomposition? In the orchestrated model, Ω decomposes the prior authorization request into sub-tasks T1 (evidence synthesis), T2 (genomic analysis), T3 (policy reasoning), and T4 (form generation). In the leaderless model, decomposition itself must be consensual.

Consensual Task Decomposition — Each agent proposes, then the coalition converges Pseudo-code
// Phase 1: Each agent proposes a task decomposition based on its expertise
on CoalitionFormed(task, members):
    my_decomposition = propose_decomposition(task, my_capabilities)
    // ClinEvidence proposes: [evidence_review, grade_assessment]
    // GenomicsAgent proposes: [variant_analysis, pharmacogenomic_check]
    // PolicyReasoner proposes: [policy_lookup, coverage_determination]

    broadcast(members, DecompositionProposal { 
        agent: self.id,
        subtasks: my_decomposition,
        confidence: 0.85,     // how confident am I in this decomposition?
        dependencies: my_deps   // which subtasks must precede which?
    })

// Phase 2: Belief fusion over received proposals
on ReceiveProposals(proposals[]):
    // Build a unified task graph by merging all proposals
    // Weight each proposal by the proposer's reputation and domain relevance
    merged_graph = empty_task_graph()
    for p in proposals:
        weight = reputation(p.agent) * domain_relevance(p.agent, task)
        merged_graph.merge(p.subtasks, p.dependencies, weight)

    // Phase 3: Iterative consensus on the merged graph
    for round in 1..MAX_ROUNDS:
        broadcast(members, BeliefUpdate { graph: merged_graph, round: round })
        neighbor_graphs = receive_all(members, BeliefUpdate)
        merged_graph = weighted_fusion(merged_graph, neighbor_graphs, trust_weights)
        
        if convergence_check(merged_graph, neighbor_graphs, ε=0.01):
            break  // consensus reached

    // Phase 4: Self-assignment — each agent claims subtasks matching its skills
    my_tasks = merged_graph.subtasks.filter(t => capability_match(t, self) > 0.8)
    broadcast(members, TaskClaim { agent: self.id, tasks: my_tasks })

    // Phase 5: Conflict resolution via second consensus round
    // If two agents claim the same subtask, resolve by capability score

The critical observation is that the decomposition protocol has three phases (propose, fuse, and claim), and each phase involves only peer-to-peer communication. No agent acts as a central authority. The fusion step naturally incorporates domain expertise: GenomicsAgent's proposal about genomic sub-tasks carries more weight than ClinEvidence's opinion about genomics, because GenomicsAgent has higher domain relevance for those tasks.

3.3 LLM Agents and Consensus: Empirical Results

Kaesberg et al. (ACL 2025) conducted one of the first systematic comparisons of consensus-based versus voting-based decision protocols for LLM agent groups. Their study tested seven distinct protocols across knowledge-based and reasoning tasks using Llama-3 (8B and 70B) agents. The results were nuanced: consensus-based protocols (where agents iteratively discuss until agreement) and voting-based protocols (where agents independently vote after fixed-length deliberation) achieved comparable performance, but with fundamentally different failure modes. Consensus protocols converged more slowly, requiring 2–3× more communication rounds, but produced higher-confidence decisions when they did converge. Voting protocols were faster but susceptible to cascading errors when the majority happened to be wrong.

For coalition coordination in the agentic web, this suggests a hybrid strategy: use fast voting for low-stakes coordination decisions (e.g., which output format to use) and deliberative consensus for high-stakes ones (e.g., task decomposition, subtask dependency ordering). We will formalize this hybrid approach in Section 9.

4. Stigmergic Coordination — Indirect Communication Through Shared State

Consensus requires direct message exchange. Stigmergy achieves coordination without direct communication at all. Coined by the French biologist Pierre-Paul Grassé in 1959 to describe termite nest construction, stigmergy is coordination through environmental modification: agents leave traces in a shared environment, and those traces influence the behavior of other agents. Ant pheromone trails are the canonical example: ants do not communicate the location of food to each other; they deposit pheromones on paths they've walked, and other ants probabilistically follow stronger trails.

Formal Definition — Stigmergic Coordination

A stigmergic system consists of agents {a1, ..., an}, an environment E with state sE, and interaction rules. Agent ai modifies E through action αi: sE′ = f(sE, αi). Agent aj observes the modified environment and adjusts its behavior: πj(sE′) ≠ πj(sE). The key properties are: (1) indirection: agents interact through the environment, not with each other; (2) persistence: modifications outlive the action that created them; (3) decay: modifications fade over time (pheromone evaporation), preventing stale coordination.

4.1 The Shared Task Board — Stigmergy for Agent Coalitions

In multi-agent AI systems, the "environment" that mediates stigmergic coordination is a shared data structure such as a distributed task board, a shared knowledge graph, or an artifact registry. Agents post work products, status updates, capability gaps, and intermediate results to this shared state. Other agents read the shared state and adjust their behavior accordingly. No direct negotiation is required.

Stigmergic Coordination — Shared Task Board Protocol Pseudo-code
// The Shared Task Board (STB) is a distributed, eventually-consistent store
// accessible to all coalition members. It replaces the orchestrator.

struct TaskBoard {
    task_graph:     DAG<Subtask>,       // the agreed decomposition
    claims:         Map<SubtaskId, AgentId>,  // who is working on what
    artifacts:      Map<SubtaskId, Artifact>, // completed work products
    capability_gaps: List<CapabilityGap>,     // unresolved needs (like pheromone trails)
    heartbeats:     Map<AgentId, Timestamp>,  // liveness signals
}

// Each agent runs an autonomous loop: observe → decide → act → post
function agent_loop(self, board: TaskBoard):
    loop:
        // 1. Observe: read the board's current state
        state = board.snapshot()

        // 2. Decide: find unclaimed subtasks that match my capabilities
        available = state.task_graph.subtasks
            .filter(t => t.id not in state.claims)
            .filter(t => t.dependencies_met(state.artifacts))
            .filter(t => capability_match(t, self) > 0.7)

        if available.is_empty():
            // Check for capability gaps I can fill (pheromone following!)
            gap = state.capability_gaps.find(g => can_address(g, self))
            if gap:
                // Create a new subtask to address the gap
                new_subtask = create_subtask_from_gap(gap)
                board.atomic_insert(task_graph, new_subtask)
                board.atomic_claim(new_subtask.id, self.id)
            else:
                sleep(POLL_INTERVAL)
                continue
        else:
            // 3. Act: claim the best-matching subtask (atomic CAS operation)
            target = available.best_match(self)
            success = board.atomic_claim(target.id, self.id)
            if !success: continue  // another agent claimed it first — retry

            // 4. Execute the subtask
            result = self.execute(target, inputs=gather_inputs(state, target))

            // 5. Post: write the artifact to the board
            board.post_artifact(target.id, result)

            // 6. Signal capability gaps if discovered during execution
            if result.gaps:
                for gap in result.gaps:
                    board.post_capability_gap(gap)  // ← the "pheromone deposit"

        // 7. Heartbeat: signal liveness
        board.update_heartbeat(self.id, now())

The capability gap mechanism is the digital equivalent of pheromone deposition. When GenomicsAgent discovers the CYP2D6 poor metabolizer status and recognizes a need for drug interaction analysis (Tutorial I, Section 5, Phase 3), it does not need to negotiate with an orchestrator. It simply posts a CapabilityGap to the shared board, leaving a trace in the environment. A DrugInteractionAgent, running its own autonomous observe-decide-act loop, eventually reads this gap, determines it can address it, creates a new subtask, claims it, and executes. The coalition grows not through centralized command but through environmentally mediated self-organization.

Stigmergic Coordination — Observe-Decide-Act Loops Around a Shared Task Board
Shared Task Board task_graph · claims · artifacts capability_gaps · heartbeats eventually consistent · distributed ClinEvid observe→act post artifact Genom observe→act post gap ● DrugInt observe→act read gap → claim Policy observe→act read artifacts gap: drug interaction No direct agent-to-agent messages. All coordination via the board.
Fig. 2 — Four agents coordinate through a shared task board. GenomicsAgent posts a capability gap (the "pheromone"). DrugInteractionAgent reads the gap and self-assigns. No agent directs another.

4.2 Emergent Collective Memory

Recent research demonstrates that stigmergic coordination produces emergent properties beyond mere task routing. Investigations into decentralized multi-agent frameworks with multi-categorical environmental traces (arXiv:2512.10166, December 2025) found that agents operating with stigmergic communication develop a form of collective memory that exists not in any single agent but in the distributed pattern of traces they leave. In the prior authorization setting, the shared task board gradually accumulates not just completed artifacts but a navigable record of how the coalition solved the problem: which gaps were discovered, which agents filled them, and which dependency orderings worked. This collective memory is available to future coalitions, enabling cross-coalition knowledge transfer (the gossip-based knowledge propagation from Tutorial I, Section 7) at the level of coordination strategies, not just agent identities.

Analogy

The shared task board functions like a Wikipedia article under collaborative editing. No single editor controls the article. Each editor (agent) reads the current version, contributes their expertise to the sections they know best, flags content gaps in talk pages (capability gaps), and watches for vandalism (liveness heartbeats detecting failed agents). The article (the shared plan) evolves through incremental, concurrent modifications by autonomous contributors. The edit history (collective memory) records how the article came to be.

5. Peer-to-Peer Negotiation — From Contract Nets to LLM Agents

Consensus produces shared beliefs. Stigmergy enables indirect coordination. But many coordination decisions require explicit negotiation: agent A offers to perform a subtask, agent B counter-offers a different scope, agent C argues for a different dependency ordering. When agents have heterogeneous preferences, private cost structures, and strategic interests, negotiation is the natural coordination primitive.

5.1 The Contract Net Protocol — The Classic Foundation

The intellectual ancestor of agent negotiation is the Contract Net Protocol (CNP), introduced by Smith & Davis in 1980 and still, forty-five years later, the most widely deployed coordination mechanism in multi-agent systems. The CNP defines three roles: a manager (who needs work done), contractors (who bid to do the work), and the protocol (announce → bid → award → execute).

The key insight of CNP for our purposes is that the manager role rotates with the task rather than remaining a fixed assignment. Any agent can be a manager for the subtask it identified, and a contractor for subtasks identified by others. This distinguishes CNP from orchestration: there is no permanent Ω. Authority is task-local.

Peer Contract Net — Rotating Manager, No Permanent Orchestrator Pseudo-code
// Any agent that identifies a subtask becomes its temporary manager
function peer_contract_net(subtask, coalition):
    // The identifying agent acts as manager for THIS subtask only
    manager = self  // whoever found the need is the manager

    // Step 1: Announce — broadcast the task to all coalition peers
    announcement = TaskAnnouncement {
        subtask:     subtask,
        manager:     self.id,
        deadline:    subtask.deadline,
        constraints: subtask.constraints,
    }
    broadcast(coalition, announcement)

    // Step 2: Collect bids — peers assess and respond
    bids = collect_with_timeout(coalition, Bid, timeout=5s)
    // Each bid includes: agent_id, estimated_cost, estimated_quality,
    //   estimated_latency, capability_evidence (VC from Tutorial II)

    // Step 3: Award — select the best bidder
    // Use the multi-attribute scoring from Tutorial III, Section 3.2
    winner = bids.max_by(b => multi_attr_score(b, weights={
        quality:  0.4,
        cost:     0.25,
        latency:  0.2,
        trust:    0.15,
    }))

    send(winner.agent_id, Award { subtask: subtask })
    broadcast(coalition, AwardNotification { subtask: subtask, winner: winner })

    // Step 4: The manager's role ends when the subtask is assigned
    // The winner executes independently and posts results to the board
    return winner

// Example: GenomicsAgent detects CYP2D6 gap → becomes manager for T2b
// GenomicsAgent broadcasts TaskAnnouncement for "drug interaction analysis"
// DrugInteractionAgent bids → GenomicsAgent awards → manager role dissolves

5.2 LLM Agents as Natural Language Negotiators

The Contract Net Protocol assumes structured messages such as bids, awards, and rejections that follow rigid schemas. LLM agents introduce a transformative capability: they can negotiate in natural language, making nuanced arguments about why they should or shouldn't take a subtask, proposing creative reformulations of the problem, and reaching agreements that no pre-defined schema could capture.

The supply chain consensus-seeking framework (Taylor & Francis, December 2025) demonstrated this concretely. LLM agents acting as decentralized supply chain actors negotiated order quantities through natural-language exchanges, achieving consensus that outperformed centralized demand-driven allocation, even though each agent pursued selfish economic incentives (EOQ-based ordering). The agents developed emergent negotiation strategies: early concessions to build goodwill, conditional offers ("I'll increase my order if you extend the delivery window"), and appeals to shared metrics.

Design Tension — Structure vs. Flexibility

Natural language negotiation is powerful but introduces two risks. First, ambiguity: an agent that says "I can probably handle the pharmacogenomic analysis" has made a commitment of uncertain strength. Is "probably" a 95% confidence or a 60% one? Without schema enforcement, the coalition's shared understanding can drift. Second, strategic manipulation: LLM agents with advanced reasoning capabilities can craft persuasive arguments that serve their interests over the coalition's. Guzman Piedrahita et al. (COLM 2025) showed that reasoning-focused LLMs are worse cooperators than standard LLMs in public goods games, explicitly justifying free-riding through sophisticated chain-of-thought reasoning. Natural language is not a neutral medium; it is a tool that strategic agents can exploit.

The practical implication is that negotiation protocols for the agentic web should be structured but extensible: a rigid schema for commitments (bids, claims, deadlines) supplemented by natural-language fields for justification, counter-arguments, and context. The schema ensures accountability; the natural language enables nuance.

6. Learning-Based Coordination — Emergent Protocols via Multi-Agent RL

The three approaches examined so far (consensus, stigmergy, and negotiation) are designed coordination protocols: humans specify the rules, and agents follow them. An alternative paradigm is learned coordination: agents discover effective coordination strategies through experience, using multi-agent reinforcement learning (MARL). The appeal is that learned protocols can adapt to specific task domains, agent populations, and environmental dynamics in ways that hand-crafted protocols cannot.

6.1 Centralized Training, Decentralized Execution (CTDE)

The dominant paradigm in MARL is Centralized Training, Decentralized Execution (CTDE). During training, a centralized critic observes the global state and all agents' actions, providing rich gradient signals. During deployment, each agent executes its learned policy using only local observations. The critic is discarded. This is analogous to a team that practices together with full visibility (the coach sees everyone's moves) but plays the game with each player making independent decisions.

CTDE elegantly sidesteps the NEXP-completeness of Dec-POMDPs by using centralized information only during training; the deployed system is fully decentralized. The trade-off is that the learned policies are optimized for the distribution of tasks and agent populations seen during training. When the distribution shifts (a new agent type joins, or a task domain changes), the policies may perform poorly.

6.2 Emergent Communication in Learned Agents

One of the most striking findings in MARL research is that agents trained to coordinate in partially observable environments spontaneously develop communication protocols. When given the ability to send discrete messages to teammates, agents learn to encode task-relevant information in these messages, effectively inventing their own signaling language. Graph neural networks and attention-based architectures allow agents to route messages to the most relevant peers, creating learned communication topologies that adapt to the task at hand.

Research Connection — AgentNet (NeurIPS 2025)

Yang et al.'s AgentNet framework represents the state of the art in learned decentralized coordination for LLM agents. Each agent contains a router (which decides how to forward tasks to other agents) and an executor (which performs tasks locally). The router makes routing decisions independently based on local knowledge, with no central authority. Over training, agents specialize dynamically: some become deep experts on narrow task types, others become broad generalists that route effectively. The resulting DAG topology evolves as agents adapt, forming a living network that emerges from learning rather than design. AgentNet outperformed AutoGen, MetaGPT, and other centralized baselines on task efficiency, specialization stability, and adaptive learning speed.

6.3 The Coordination–Learning Tension

Learned coordination faces two fundamental challenges that designed protocols do not. First, the co-adaptation dilemma: when agent A updates its policy based on rewards, it changes the environment that agent B experiences, which may invalidate B's learned policy. Simultaneous learning by all agents creates non-stationary dynamics that can prevent convergence. Second, the credit assignment problem: when the coalition succeeds or fails, how should each agent's policy update reflect its individual contribution? This is precisely the Shapley attribution problem from Tutorial III (Section 4), manifesting at the level of gradient updates rather than economic payments.

For practical deployment in the agentic web, this suggests that purely learned coordination is most suitable for repeated, homogeneous tasks where agents can accumulate training experience, such as a fleet of form-filling agents that coordinate on standard prior authorization templates. For novel, heterogeneous tasks (a rare genomic variant requiring an unexpected agent coalition), designed protocols remain essential. The coordination stack must accommodate both.

7. Failure Modes — Deadlock, Oscillation, and Strategic Exploitation

Decentralized coordination introduces failure modes that simply do not exist in centralized systems. An orchestrator can deadlock only with itself; a decentralized coalition can deadlock in complex, multi-agent patterns that are difficult to detect and harder to resolve. Understanding these failure modes is essential for designing robust protocols.

7.1 Deadlock — Circular Waiting

Deadlock occurs when two or more agents are each waiting for a resource held by another. In the prior authorization setting, consider: ClinEvidence claims T1 but needs GenomicsAgent's output from T2 before it can complete. GenomicsAgent claims T2 but discovers it needs ClinEvidence's partial output from T1 (an unexpected bidirectional dependency). Neither can proceed. In the orchestrated model, Ω would detect this circular dependency and restructure the task graph. Without Ω, the coalition is stuck.

Deadlock Detection via Distributed Timeout + Waits-For Graph Analysis Pseudo-code
// Each agent maintains a local view of what it's waiting for
on WaitingForInput(dependency: SubtaskId):
    wait_record = WaitRecord { 
        waiter: self.id, 
        blocked_on: dependency,
        since: now() 
    }
    board.post_wait_record(wait_record)

// Periodic deadlock check — any agent can run this
function detect_deadlock(board):
    waits = board.get_all_wait_records()
    
    // Build a waits-for graph: agent → agent it's waiting for
    graph = build_waits_for_graph(waits, board.claims)
    
    // Detect cycles in the waits-for graph
    cycles = find_cycles(graph)
    
    if cycles:
        // Resolution: the agent with lowest reputation in the cycle
        // yields its claim (releases the subtask for re-assignment)
        for cycle in cycles:
            victim = cycle.min_by(agent => reputation(agent))
            board.release_claim(victim.claimed_subtask)
            board.post_event(DeadlockResolved {
                cycle: cycle,
                resolved_by: "lowest-reputation yield"
            })

7.2 Oscillation — The Symmetry Breaking Problem

Oscillation occurs when agents repeatedly switch between two or more states without converging. In the task claiming context: agent A claims T1, then agent B claims T1 (A's claim is revoked), then A re-claims T1, and so on. This is a manifestation of the symmetry breaking problem: when multiple agents are equally qualified and equally motivated, deterministic protocols cycle.

The LoopBench benchmark (arXiv:2512.13713, December 2025) studied this phenomenon directly by placing LLM agents in over-constrained graph coloring problems where greedy heuristics provably oscillate. The study found that advanced reasoning models (o3-series) could discover emergent symmetry-breaking strategies: in a "eureka" moment, agents independently developed patience-based "hold" policies that allowed the system to escape oscillation loops. The key mechanism was meta-cognitive reasoning: agents that could reflect on the interaction history ("we've been flipping back and forth for three rounds") developed asymmetric strategies that broke the deadlock.

Practical Solution — Randomized Backoff

The standard distributed systems solution to oscillation is randomized exponential backoff, adapted here for agent coalitions. When an agent's claim is contested (another agent claimed the same subtask within a configurable window), it waits a random duration drawn from Uniform(0, 2k · τ) where k is the number of consecutive contentions and τ is the base interval. This is the same mechanism that prevents Ethernet frame collisions (CSMA/CD) and avoids write conflicts in optimistic concurrency databases. The randomness breaks symmetry; the exponential growth prevents persistent contention.

7.3 Strategic Exploitation — Agents That Benefit from Ambiguity

In decentralized systems, the protocol itself is an attack surface. A strategic agent can exploit coordination ambiguity in ways that are impossible when a single orchestrator controls all assignments. Consider three exploitation strategies:

Claim Squatting

An agent claims a high-value subtask on the shared board, preventing others from working on it, but delays execution. The goal: force the coalition to negotiate with the squatter (e.g., offering higher compensation) or waste time waiting for a timeout. Mitigation: claim expiry timers with reputation penalties for expired claims.

Belief Injection

During consensus-based task decomposition, a strategic agent submits an inflated confidence score for a decomposition that favors its capabilities. Because belief fusion is trust-weighted, an agent with high reputation can disproportionately steer the coalition's shared plan. Mitigation: cross-validation of decomposition proposals by independent agents.

Pheromone Poisoning

In stigmergic coordination, a malicious agent posts fabricated capability gaps to the shared board, directing coalition resources toward unnecessary work. This is the stigmergic equivalent of a denial-of-service attack. Mitigation: gap validation, where agents verify any posted gap against the task's actual requirements before creating a subtask from it.

Progress Fabrication

An agent posts heartbeats and progress signals to maintain its claim on a subtask, but never produces a real artifact. The coalition believes the subtask is being worked on. Mitigation: progress verification checkpoints requiring agents to produce verifiable intermediate outputs at configurable intervals, with automatic claim revocation on failure.

These attacks illustrate a principle articulated in the security literature: decentralization shifts the threat model from infrastructure attacks to protocol attacks. Centralized systems can be attacked by compromising the orchestrator; decentralized systems can be attacked by exploiting the coordination protocol itself. The defense is not to return to centralization but to design protocols that are incentive-compatible, meaning honest participation becomes each agent's dominant strategy. The mechanism design tools from Tutorial III (VCG auctions, Shapley attribution, staking) are the primary instruments here.

8. Worked Example — Orchestrator-Free Prior Authorization

We now replay the complete prior authorization scenario from Tutorial I, but with a fundamental change: there is no orchestrator Ω. The five agents (ClinEvidence, GenomicsAgent, DrugInteractionAgent, PolicyReasoner, and FormBuilder) must self-coordinate using the mechanisms developed in Sections 3–7. We use a hybrid coordination stack: consensus for task decomposition, stigmergy for execution coordination, and peer negotiation for conflict resolution.

Scenario

Task: Determine whether a novel gene therapy (onasemnogene abeparvovec, Zolgensma) should be authorized for a pediatric patient with spinal muscular atrophy type 1 (SMA1). The agents are operated by four different organizations: the insurer (PolicyReasoner, FormBuilder), a clinical decision support vendor (ClinEvidence), a genomics laboratory (GenomicsAgent), and a pharmacovigilance firm (DrugInteractionAgent). No single organization can serve as orchestrator because none trusts the others with that authority.

T₀ — COALITION BOOTSTRAP

Formation via Broadcast + Gossip (No Central Registry Required)

The prior authorization request arrives at the insurer's edge system, which broadcasts a TaskAvailable message to its local agent mesh. PolicyReasoner picks up the broadcast and recognizes the task domain. Via its gossip cache (Tutorial I, Section 3.2), it knows about ClinEvidence, GenomicsAgent, and FormBuilder. It initiates a CoalitionInvitation in the role of convener rather than orchestrator: a peer that calls the meeting but does not chair it. All four known agents accept. The coalition is live.

T₁ — CONSENSUAL TASK DECOMPOSITION

Each Agent Proposes, Then the Coalition Converges

Following the protocol in Section 3.2, each agent proposes a task decomposition from its domain perspective. ClinEvidence proposes: [evidence synthesis → GRADE assessment]. GenomicsAgent proposes: [variant annotation → pharmacogenomic screening]. PolicyReasoner proposes: [coverage policy lookup → prior auth determination → form generation]. FormBuilder proposes: [CMS form compilation, dependent on all upstream outputs]. The belief fusion algorithm runs for 3 rounds (convergence threshold ε = 0.01), producing a merged task DAG with 6 subtasks and explicit dependencies. The merged graph reflects all four perspectives: no single agent's proposal dominates, but each agent's proposal for its own domain carries the highest weight.

T₂ — SELF-ASSIGNMENT VIA STIGMERGIC CLAIMING

Agents Claim Subtasks from the Shared Board

The consensus-produced task DAG is posted to the shared task board. Each agent's autonomous loop (Section 4.1) begins scanning for claimable subtasks: those whose dependencies are met and that match the agent's capabilities. ClinEvidence atomically claims "evidence synthesis" and "GRADE assessment." GenomicsAgent claims "variant annotation" and "pharmacogenomic screening." PolicyReasoner claims "coverage policy lookup" and "prior auth determination." FormBuilder claims "CMS form compilation" (blocked: awaiting upstream outputs). No conflicts arise because each agent's capabilities uniquely match its claimed subtasks. Self-assignment took 340ms, faster than the negotiation overhead of a centralized orchestrator.

T₃ — PARALLEL EXECUTION WITH STIGMERGIC HANDOFFS

Agents Execute and Post Artifacts to the Board

ClinEvidence and GenomicsAgent execute their subtasks in parallel. As each completes, it posts an artifact to the shared board. ClinEvidence posts a structured evidence summary (3 pivotal trials, GRADE certainty "High"). GenomicsAgent posts a variant annotation report confirming homozygous SMN1 deletion and also posts a CapabilityGap: "CYP2D6 poor metabolizer detected; drug interaction analysis recommended." This is the pheromone deposit from Section 4.1.

T₄ — EMERGENT COALITION GROWTH (THE KEY MOMENT)

DrugInteractionAgent Self-Recruits via Pheromone Following

DrugInteractionAgent was not in the original coalition; none of the four founding agents knew of it. But GenomicsAgent's gossip cache included its Agent Card (learned from a previous collaboration). GenomicsAgent includes a suggested_peer field in the CapabilityGap. The shared board relays this to all coalition members. DrugInteractionAgent, which has been passively monitoring the board (or is notified via a pub/sub channel), reads the gap, determines it can address it, and initiates a peer contract net (Section 5.1): GenomicsAgent, as the agent that identified the gap, temporarily acts as manager, receives DrugInteractionAgent's bid, and awards the subtask. The coalition grows from 4 to 5 agents, replicating the outcome of Tutorial I but without any orchestrator involvement.

T₅ — DOWNSTREAM EXECUTION AND AGGREGATION

PolicyReasoner Reads All Artifacts and Self-Executes

PolicyReasoner's subtasks ("coverage policy lookup" and "prior auth determination") depend on outputs from ClinEvidence, GenomicsAgent, and DrugInteractionAgent. Its observe loop detects that all dependencies are now satisfied (three artifacts posted to the board). It gathers the inputs, executes the policy reasoning, and posts the authorization decision: approved, with pharmacogenomic caveat. FormBuilder reads this final artifact, compiles the CMS-compliant form, and posts it. The coalition's task is complete.

T₆ — DISSOLUTION AND REPUTATION UPDATE

Coalition Self-Dissolves

Each agent detects that all subtasks in the task DAG are marked complete. Following a quorum rule (≥80% of agents agree the task is done), agents release their claims and update their peer caches. Reputation scores update based on Shapley attribution (Tutorial III, Section 4): ClinEvidence 95 → 96.1, GenomicsAgent 74 → 77.3 (rewarded for identifying the capability gap), DrugInteractionAgent 68 → 72.5 (rewarded for responsive self-recruitment), PolicyReasoner 91 → 92.4, FormBuilder 84 → 85.1. The shared board's collective memory is archived for future coalition reference.

Orchestrator-Free Coalition — Final Topology at T₅
Shared Task Board 6 subtasks · 5 claims · 5 artifacts ✓ all dependencies satisfied ClinEvid ✓ done Genom ✓ done DrugInt ✓ done NEW Policy ✓ done FormBld ✓ done No orchestrator. All coordination via the shared task board.
Fig. 3 — The completed coalition: 5 agents, 6 subtasks, all coordinated through the shared task board. No agent acted as orchestrator. DrugInteractionAgent self-recruited via capability gap pheromone.
What Changed vs. Tutorial I

Compare this trace to the orchestrated version in Tutorial I. The same agents performed the same sub-tasks and produced the same clinical outcome. The differences are structural: (1) task decomposition was consensual, not dictated; (2) subtask assignment was self-organized via atomic claiming, not assigned by Ω; (3) the mid-task coalition growth occurred via pheromone-following, not orchestrator-directed recruitment; (4) result aggregation happened as PolicyReasoner naturally read upstream artifacts from the board, not as Ω collected and forwarded outputs. The clinical result is identical. The coordination is decentralized. The single point of failure is eliminated.

9. Architectural Synthesis — A Coordination Stack for the Open Agentic Web

The four coordination mechanisms examined in this tutorial are not competing alternatives but complementary layers in a coordination stack. Each addresses a different phase of the coalition lifecycle and has different trade-offs in latency, robustness, and expressiveness.

The Decentralized Coordination Stack
Layer 4: Learning-Based Adaptation Agents learn routing, specialization, and coordination strategies over repeated interactions Layer 3: Peer-to-Peer Negotiation Contract net for conflict resolution, scope negotiation, and mid-task re-planning Layer 2: Stigmergic Shared State Shared task board for claims, artifacts, capability gaps, and heartbeats Layer 1: Consensus Protocols Trust-weighted belief fusion for task decomposition, shared plans, and quorum decisions long-term conflicts execution planning evolves explicit implicit agrees Foundation: Trust Infrastructure (Tutorial II) + Economic Incentives (Tutorial III)
Fig. 4 — The four-layer coordination stack. Each layer builds on the one below. The trust and incentive infrastructure from earlier tutorials forms the foundation.

The stack operates as follows during a coalition's lifecycle. At formation time, consensus (Layer 1) produces the shared task decomposition and assignment plan. During execution, stigmergy (Layer 2) handles routine coordination tasks (artifact posting, dependency resolution, and capability gap signaling) with zero direct messaging overhead. When conflicts arise (competing claims, unexpected dependencies, failed agents), negotiation (Layer 3) provides the escalation path for explicit peer-to-peer resolution. Over many coalitions, learning (Layer 4) refines each agent's routing decisions, specialization, and coordination behavior. The trust infrastructure (Tutorial II) provides the identity and reputation inputs for trust-weighted consensus, and the economic infrastructure (Tutorial III) ensures that participation in the coordination protocol is incentive-compatible.

Architectural Principle

The coordination stack follows a principle from network design: use the cheapest mechanism that suffices, and escalate to richer mechanisms only when needed. Stigmergic board operations (atomic reads and writes) are the cheapest, at O(1) per agent per action with no inter-agent messages. Consensus rounds are more expensive, requiring O(n · k) messages for n agents over k rounds. Negotiation is the most expensive, with potentially unbounded message exchanges. By defaulting to stigmergy and escalating to consensus and negotiation only for planning and conflicts, the stack minimizes coordination overhead while maintaining full expressiveness.

10. Open Frontiers

10.1 Consensus Convergence Under Byzantine Agents

The belief fusion protocol in Section 3 assumes that every agent reports its true beliefs honestly. In adversarial settings, a Byzantine agent can report arbitrary beliefs to different peers, preventing convergence or steering the consensus toward a manipulated outcome. Zhu et al. (IEEE/CAA Journal of Automatica Sinica, July 2025) demonstrated that blockchain-based PBFT and Raft consensus algorithms can provide Byzantine fault tolerance for multi-agent coordination, but at significant communication cost. Adapting these approaches to LLM agent coalitions, where "beliefs" are natural-language task decompositions rather than numeric values, remains an open problem.

10.2 Shared State Consistency at Scale

The stigmergic shared task board requires eventual consistency across all coalition members. For small coalitions (5–10 agents), this is straightforward using CRDTs (Conflict-Free Replicated Data Types) or a lightweight distributed store. For coalitions spanning hundreds of agents across organizational boundaries with network partitions, maintaining consistency without a central database becomes a distributed systems challenge in its own right. The CAP theorem constrains the design space: the board can be consistent and available, or available and partition-tolerant, but not all three.

10.3 Emergent Role Hierarchies

Our protocol eliminates the permanent orchestrator but allows temporary roles (the convener in T₀, the rotating manager in contract net). Empirical studies of LLM agent groups consistently observe the emergence of informal hierarchies: some agents naturally take on coordinating roles even without explicit authority. Whether these emergent hierarchies are beneficial (efficient specialization) or harmful (re-centralization by the back door) depends on whether they remain fluid. Research into mechanisms that promote role rotation and prevent any single agent from accumulating persistent coordination authority is nascent.

10.4 Coordination Overhead Budgets

Amdahl's Law (Tutorial I, Section 8.3) applies to decentralized coordination even more severely than to centralized coordination. In the orchestrated model, the serial fraction is Ω's processing time. In the decentralized model, the serial fraction includes consensus rounds, contention resolution, and board synchronization latency. Quantifying the coordination overhead budget (the maximum fraction of total execution time that can be spent on coordination before the coalition becomes slower than sequential execution) and designing protocols that stay within that budget is an open engineering challenge.

10.5 Privacy-Preserving Coordination

In the healthcare prior authorization setting, the shared task board may contain protected health information (PHI) that is subject to HIPAA regulation. A fully shared board violates the minimum-necessary standard: FormBuilder does not need to see the patient's raw genomic variant data. Designing stigmergic coordination with selective disclosure, where each agent sees only the board entries relevant to its claimed subtasks, requires integrating the zero-knowledge credential infrastructure from Tutorial II (Section 4) with the shared state architecture. This intersection of privacy-preserving computation and decentralized coordination is a frontier challenge.

Summary

Removing the central orchestrator from a multi-agent coalition does not mean removing coordination; it means distributing coordination across the coalition itself. The four-layer stack provides the mechanisms: consensus for shared planning, stigmergy for implicit execution coordination, negotiation for explicit conflict resolution, and learning for long-term adaptation. The worked example demonstrated that the prior authorization scenario achieves the same clinical outcome without any agent designated as orchestrator: task decomposition is consensual, assignment is self-organized, mid-task growth is pheromone-mediated, and result aggregation is dependency-driven. The key architectural insight is that decentralized coordination is not a single mechanism but a stack of mechanisms, escalating from cheap implicit coordination to expensive explicit negotiation only when needed. Combined with the trust infrastructure (Tutorial II) and economic incentive mechanisms (Tutorial III), this stack provides a complete foundation for coordination in the open agentic web.

Coordination Mechanism Comparison Matrix

Mechanism Phase Msg Complexity Robustness Key Limitation
Consensus Planning O(n²·k) High (connected graph) Convergence time grows with agent count; vulnerable to Byzantine beliefs
Stigmergy Execution O(n) reads/writes High (no direct deps) Requires shared state infrastructure; consistency under partition
Contract Net Conflicts O(n) per subtask Moderate Manager rotation introduces latency; strategic bidding
Learned (MARL) Adaptation Varies (learned) Moderate Requires training; brittle under distribution shift; co-adaptation dilemma
Central Orch. All phases O(n) per round Low (SPOF) Single point of failure; authority concentration; scalability ceiling

References

Yang et al. (NeurIPS 2025), "AgentNet: Decentralized Evolutionary Coordination for LLM-based Multi-Agent Systems"

Bernstein et al. (2002), "The Complexity of Decentralized Control of Markov Decision Processes" — the foundational NEXP-completeness result

Smith & Davis (1980), "The Contract Net Protocol"

Grassé (1959), stigmergy in termite construction

Kaesberg et al. (ACL 2025), "Voting or Consensus? Decision-Making in Multi-Agent Debate"

Parsaee et al. (arXiv:2512.13713, 2025), "LoopBench: Discovering Emergent Symmetry Breaking Strategies with LLM Swarms"

Asmar et al. (2024), Dec-POMDP-Com models

Zhu et al. (IEEE/CAA J. Autom. Sinica, 2025), secure consensus via PBFT/Raft

Guzman Piedrahita et al. (COLM 2025), LLM agents in public goods games

Emergent Collective Memory in Decentralized Multi-Agent AI Systems (arXiv:2512.10166, 2025)

LLM consensus-seeking in supply chains (Taylor & Francis, December 2025)

Tran et al. (2025), multi-agent collaboration mechanisms survey (arXiv:2501.06322)