framework pillar · mode multi
Agentic architecture — building production multi-agent systems on Claude
The real shift in enterprise AI isn't smarter models — it's coordinated agent fleets that run document-heavy workflows for hours at a time, with governance, audit trails, and human oversight by exception. Here's the architecture that holds up in production, and the three hard problems most teams skip.
Willie Prosek··9 min read
When most people think about enterprise AI, they imagine a chatbot. You type a question, it answers. Maybe it has access to your documents. That is not where AI is going. It is where AI was in 2024. The real shift over the last eighteen months is not smarter models — though they have got dramatically smarter. It is what you can build *with* those models. Specifically, it is the rise of **agent fleets**: coordinated groups of specialised Claude agents that run your workflows autonomously for hours at a time, hand off work between each other, remember what they have learned, and escalate only when a human needs to decide. We run forty agents across eight internal teams every day. Some of those agents run for days at a time, doing real work — not demos, not prototypes. The lesson from that fleet is counter-intuitive: the hard part is not the AI. **The hard part is the orchestration.** This is what the orchestration actually looks like in production, why most enterprises still approach AI with the wrong mental model, and how Anthropic's A/A/A × 4D AI Fluency framework keeps us honest about what to ship and what to leave alone. ## The chatbot ceiling Every enterprise AI deployment we have audited hits the same ceiling. You start with a chatbot. It works well for simple Q&A. You plug it into your knowledge base. It works even better. You start to trust it. Then you ask it to *do* something — process a customer request end-to-end, generate a report, review a contract, triage a support ticket. And it falls apart. Not because the model is too dumb. The model is plenty smart. It falls apart because a single agent, operating in a single context window, with no persistent memory, no ability to hand off work, and no defined role, cannot reliably handle a multi-step business process. This is the chatbot ceiling. It has hit most enterprises harder than they expected. They have invested in AI platforms, bought licences, trained teams — and their biggest use case is still "help me write an email." The reason is architectural, not technological. A single AI with a single mission is not how work actually happens. Work happens through specialisation, coordination, and handoffs. You do not have one employee who does everything — you have a team. You need an agent team. ## What an agent fleet looks like Here is the shape of a typical document-heavy professional-services fleet. The same pattern works in legal review, claims handling, accounting workpapers, allied health intake, and any other workflow where the work is high-volume, structured, and judgement-heavy in defined places. **Agent 1 — Intake.** Monitors the incoming workflow trigger (an email, a portal upload, a calendar event). Parses the trigger, extracts structured data, creates a record, assigns a reference. Runs on Claude Opus 4.7. Triggered on event arrival. **Agent 2 — Plan.** Takes the new record, reads the attached documents, and produces a prioritised plan: what data to gather, what witnesses or sources to consult, what documents to request, in what order. Updates dynamically as new evidence arrives. **Agent 3 — Document processor.** Handles the unstructured input. Runs Opus 4.7's high-resolution vision over scanned PDFs, handwritten notes, photographs. Classifies each document, extracts key facts, indexes into a searchable store. **Agent 4 — Drafter.** The centrepiece. Trained on your firm's best 15–20 prior outputs (style, voice, structure, depth). When the work is complete, it drafts the full narrative — every fact linked to its evidence source. **Agent 5 — QA.** Reviews the draft. Checks facts against sources. Verifies completeness. Flags inconsistencies. Cross-references compliance and regulatory requirements. **Agent 6 — Delivery.** Packages the final output, generates the cover correspondence, and routes to the appropriate channel. Six agents. Each with a specific role, a defined handoff, and access to shared context. The human practitioner becomes the editor — reviewing, approving, and handling anything genuinely complex. Time on the workflow drops by a factor most teams did not believe was possible until they sat with it. That is what agent fleets look like in practice. Not a single smart AI. A coordinated team of specialised workers, each good at one thing, handing off between them. ## The three hard problems If the value is so clear, why is not every enterprise running agent fleets today? Because three problems are genuinely hard. ### 1. Memory and context Each agent in a fleet needs to know what happened before — what the upstream agent decided, what evidence is in scope, what constraints apply. A naive approach (pass the full conversation history) blows through context windows immediately and costs a fortune. The fix is shared state: a persistent store that agents read from and write to. Vector databases for semantic retrieval. Key-value stores for structured data. Explicit schemas for what each agent produces. This is not glamorous work. It is database design and API plumbing. It is also 40% of a real agent fleet project. In Anthropic's 4D vocabulary, this is **Description** done at the architecture level — describing not just to one model in one prompt but to a fleet across handoffs. ### 2. Orchestration and handoff When Agent 1 finishes, how does Agent 2 know to start? How do we handle errors? What if an agent produces output the next agent cannot use? What if two agents need to run in parallel and merge their results? This is workflow orchestration — the same problem that exists in any distributed system. With a twist: agent outputs are non-deterministic. The same input does not always produce the same output. You need validation layers, retry logic, and human checkpoints. We use Anthropic's Agent SDK for the agent-level coordination, with AWS Step Functions or Azure Service Bus for the higher-level workflow. Both patterns work. The important thing is having *something* — ad-hoc orchestration fails at scale. This is **Discernment** at the architecture level: every output gets evaluated by something (another agent, a deterministic check, a human) before the next step proceeds. ### 3. Governance and trust This is the problem that kills more deployments than the other two combined. When an AI agent makes a decision, someone has to be accountable for that decision. When it generates a report, someone has to verify the facts. When it processes customer data, someone has to guarantee the data stayed in the right jurisdiction and was not used to train a model. In Australia, this means Privacy Act 1988 compliance, APRA CPS 234 for financial services, the upcoming automated-decision provisions (10 December 2026 commencement), and IRAP for government workloads. You do not solve governance by bolting it on at the end. You design for it from the first architectural decision. Every agent deployment we ship includes: - Australian data sovereignty (AWS Sydney `ap-southeast-2` or Azure `australiaeast`) - Audit trail for every agent decision (who, what, when, why) - Human-in-the-loop checkpoints at legally sensitive decisions - Content sovereignty (no training on client data, ever) This is not overhead. It is the price of running AI that enterprises can actually trust. In the 4D vocabulary, this is **Diligence** — and it is the moat. Agency commoditises; Diligence does not. ## What changes with Opus 4.7 and Managed Agents Two recent developments materially change what is possible. **Claude Opus 4.7 (February 2026)** shipped with three features that matter for agent fleets specifically: 1. **Adaptive thinking.** Hard problems get more reasoning tokens, easy ones fewer, inside the same request. Two-tier routing pipelines collapse to single-tier on workloads with mixed-difficulty inputs. 2. **1M-token context (beta).** Whole-codebase reviews, regulatory corpora resident in context, multi-document review without retrieval loss. Cost scales linearly — use it as a specific tool, not a default. 3. **Compaction API.** Long-running sessions automatically summarise and compress earlier turns. We have seen agent deployments drop monthly API spend meaningfully after enabling Compaction without measurable output-quality change. For document-heavy fleets, Opus 4.7 moves the drafter from "acceptable draft" to "production-ready draft" on most test corpora. It also reads scanned PDFs, handwritten notes, and photographs at quality that approaches OCR. **Anthropic Managed Agents** (public beta, April 2026) is the other shift. Anthropic now offers a fully managed agent harness — describe what you want the agent to do, Anthropic handles the execution infrastructure, memory, tool calling, and error handling. For simple agents, this is genuinely a 10× productivity improvement. You are not writing orchestration code; you are describing behaviour. For enterprise deployments with strict data-sovereignty requirements, Managed Agents currently runs in US regions, which means Australian customers still need a local-hosted alternative. We have built a Managed-Agents-equivalent pattern using the Agent SDK directly, running in AWS Sydney. Same behaviour, Australian data sovereignty. ## The mental-model shift If you take one thing from this piece, take this: **Stop asking "which AI should I use?" and start asking "which of my workflows should be owned by an agent?"** The model is increasingly a commodity. The orchestration, the memory, the governance, the specialised roles — that is where the value is. The organisations that win with AI over the next five years will not be the ones with the best model. They will be the ones who figured out which parts of their work are agent-shaped, built fleets to run those parts, and kept the human experts focused on everything else. ## How we work We're a generalist Claude-native agency. We build tailored agent solutions for organisations with document-heavy workflows where the workflow fits Anthropic's A/A/A × 4D framework — Automation, Augmentation or Agency, plus Delegation, Description, Discernment, and Diligence. You pick how you pay. Four equal commercial formats: - **PAYG** — per-task pricing on a hosted agent - **Upfront** — paid build, paid delivery, fixed scope - **Self-hosted** — buy outright + code/IP transfer - **Managed** — monthly hosted, we operate it for you For prospects burnt by past AI pilots, we also offer **free scope, free build, payment only on acceptance** — one de-risking path that overlays the four formats. Eight teams. Forty agents in production. One Adelaide boutique that runs its own business on the fleet it sells. If you have a specific document-heavy workflow you want to scope: - Free 30-min scope-out call: [adaptation.ai/book](/book) - See how we run our practice: [adaptation.ai/trust](/trust) - Pick the format that fits: [adaptation.ai/pricing](/pricing) — Willie Prosek, Founder, Adaptation AI --- *Methodology: Anthropic's AI Fluency framework © 2025 Rick Dakan, Joseph Feller, and Anthropic. Released under CC BY-NC-SA 4.0. We use it under attribution; we do not rebadge it.*