Skip to content
← Back to insights

framework pillar · mode multi

The Claude Opus 4.7 enterprise upgrade playbook — what to migrate Monday, what to leave alone

Claude Opus 4.7 brings adaptive thinking, 1M-token context (beta), and the Compaction API. Most enterprises running Claude in production should not migrate every workload yet. The migration checklist we walk through with every managed-agent client the week after a flagship model lands — pre-production, staging, canary, and the three scenarios where you should stay on 4.6.

Willie Prosek··9 min read

Claude Opus 4.7 launched in February 2026 as the most capable Anthropic model to date. It introduced three changes that meaningfully shift enterprise deployment patterns: adaptive thinking, 1M-token context (beta), and the Compaction API. This is the playbook we walk through with every managed-agent client the week after an Opus model lands — what is genuinely new, what is marketing, what to test before you switch, and which production workloads should migrate Monday versus wait a quarter.

In Anthropic's A/A/A × 4D framework vocabulary, this whole exercise is a **Discernment** problem layered on **Diligence** — you are asking how to evaluate a new model on real workloads, then asking what governance has to change before the new model goes live. The model is just the input; the playbook is what keeps you safe.

## What actually changed in Opus 4.7

Most model upgrades ship a benchmark jump and call it a release. 4.7 is doing something more structural. Three capability shifts matter for production.

### 1. Adaptive thinking

Previous Claude generations exposed extended thinking as an all-or-nothing toggle — you paid for deep reasoning on every call, or you got none. Opus 4.7's adaptive thinking lets the model spend more tokens on hard problems and fewer on easy ones, inside the same request, based on its own assessment of difficulty.

In practice: agents that used to be built as two-tier routing pipelines ("simple calls go to Sonnet, hard calls get promoted to Opus with extended thinking") can often collapse to single-tier on 4.7 at comparable cost. We have seen review pipelines drop ~35% in architecture complexity with no measurable quality regression on internal evaluation sets.

It does not mean you should throw Opus at everything. See the cost section below — adaptive thinking reduces waste but does not eliminate the per-token premium of the flagship tier.

### 2. One million-token context (beta)

The 1M-context beta is the headline number. What it actually unlocks in production:

- Whole-codebase reviews in a single context instead of RAG-and-stitch
- Regulatory corpora resident in context (Corporations Act 2001 + ASIC regulatory guides + relevant case law in one conversation)
- Multi-document review without retrieval loss
- Genuinely long-running agent sessions with full conversation memory

The trap is that cost scales linearly with context. A 1M-context conversation is not 5× more expensive than a 200K-context one; it is 5× more expensive *per token used*, and you will use more of them. Treat 1M context as a specific tool for specific workloads, not a default.

### 3. Compaction API

This one is under-marketed and will save you more money than the first two combined. The Compaction API lets long-running sessions automatically summarise and compress earlier conversation turns, keeping the system prompt and recent context intact while reducing the token footprint of older turns by 70–90%.

For agent fleets that maintain context across hours or days of work (our typical managed deployment), this is a structural cost improvement. We have seen internal deployments cut monthly API spend meaningfully after enabling Compaction without measurable output-quality change.

### Also worth knowing

- `max` effort level added — explicit opt-in to the most thorough mode
- 128K output tokens (up from 64K) — relevant for long-form generation
- Model ID for programmatic use: `claude-opus-4-7`
- Pricing: unchanged from 4.6 at the base tier; 1M-context beta is priced separately

## What about Sonnet 4.6 and Haiku 4.5?

This is the question every customer asks when Opus ships. The answer almost always is: for most workloads, don't migrate to the top tier — migrate to the right tier.

| Model | When to use |
|-------|-------------|
| Claude Opus 4.7 | Governance-critical, complex reasoning, long-context analysis, regulated synthesis |
| Claude Sonnet 4.6 | The workhorse. Most agent workflows, coding assistance, document generation, customer-facing |
| Claude Haiku 4.5 | High-volume routing, classification, enrichment, first-pass filtering, cost-sensitive bulk |

Our internal routing rule: run the cheap tier first, escalate only when the cheap tier's confidence or output quality fails an evaluation. Most customers save 60–70% versus a naive "always use the best model" approach without losing meaningful quality.

Opus 4.7's adaptive thinking narrows this gap because Opus can now self-regulate token spend. But Sonnet at a fraction of the price still wins on economics for the vast majority of routine agent tasks.

## Migration checklist

These are the items we walk through with every enterprise upgrading an existing Claude deployment to 4.7. Work through them in order.

### Before you touch production

1. **Identify which workloads would actually benefit.** Not every workload needs 4.7. Adaptive thinking matters for variable-difficulty tasks. 1M context matters for long-document workloads. Compaction matters for persistent agent sessions. If none of these apply to a workload, keep it on Sonnet 4.6 or downgrade to Haiku and save real money.
2. **Run your evaluation suite.** You do have an evaluation suite. If you do not, *stop*, build one, then return to this step. Twenty to fifty regression cases is enough to start. Compare 4.7 output against your current model on the same cases. (This is the **Discernment** loop the 4D framework names — it is not optional.)
3. **Compare tokenizer counts.** Tokenisation has not materially changed across recent Claude generations, but costs scale with token counts and adaptive thinking can silently increase output tokens on hard inputs. Record baseline token counts per task before migrating.
4. **Budget a cost model.** Model spend at current volume × 1.2 (for buffer) × any new 1M-context usage. Get finance sign-off before flipping traffic, not after.

### In staging

5. **Side-by-side for two weeks.** Run 4.6 and 4.7 in parallel on a traffic slice. Measure quality, latency, token cost. Latency matters — 4.7 with adaptive thinking on hard inputs will be slower than 4.6 flat. This is by design but will affect user experience in real-time workflows.
6. **Test failure modes.** Force the model into known-hard edge cases. Compare how each version degrades. A new model can be better on average but fail differently on edge cases — you need to know the shape of the regression before production.
7. **Governance sign-off.** If your deployment is subject to APRA CPS 234, Privacy Act 2026 amendments, or industry-specific regulations, the model change constitutes a system change. Document it. Get the sign-off you would need for an audit. (This is **Diligence** — Creation Diligence and Transparency Diligence in the 4D vocabulary.)

### In production

8. **Canary deployment.** Flip 5% of traffic. Monitor for 48 hours. Flip 25%. Monitor. Flip the rest. Our default staged rollout.
9. **Cost alerts.** Set budget alerts 20% above your projected new baseline. If they fire, somebody enabled 1M context somewhere unexpectedly.
10. **Rollback plan.** Know exactly how to revert to 4.6 in a single config change. Do not delete the old config until you have been stable on 4.7 for a full month.

## When not to upgrade

Three production scenarios where we advise clients to stay on 4.6 (at least for a while):

**Tight-loop latency workloads.** Real-time chat, inline coding assistants, voice interfaces. Adaptive thinking can add tail latency on hard inputs. If p99 latency is your critical SLO, test carefully before switching.

**Highly tuned prompts.** If your deployment involves prompts that were hand-optimised against 4.6 over months, expect some re-tuning work. Not massive, but budget for it.

**Very cost-sensitive bulk processing.** If you are already running Haiku for economic reasons, 4.7 does not change that maths. Stay on Haiku.

## What Australian teams specifically should know

Two items on top of the general playbook for teams operating inside Australian regulatory regimes:

**AWS Sydney availability.** Claude Opus 4.7 via AWS Bedrock in `ap-southeast-2` is live as of the general release. Data residency is preserved. If you are on Google Vertex, check the `australia-southeast1` region — Opus availability there can lag Bedrock by a few weeks.

**The 1M-context beta and PII.** The same rules apply as for smaller contexts — but the volume of PII you might be putting into a single request becomes qualitatively different. A 1M-context call could contain a customer's entire file. Your prompt-logging, data-retention, and incident-response plans should be reviewed before you turn 1M context on in production. The OAIC guidance on automated decision-making applies, and the substantive provisions of the Privacy Act 2026 amendments commence on 10 December 2026.

## The economic argument

If you run the numbers honestly, most enterprise Claude deployments are not cost-bound by the API. Labour is the dominant cost (typically 60–80% of a deployment's total cost), governance and compliance infrastructure is 10–15%, and API cost — even on the most expensive model tier — is usually 5–15% of the whole.

Optimising blindly for API cost is therefore usually optimising the wrong variable. If Opus 4.7 produces 10% better output on your governance-critical workload, that 10% is almost always worth the 2–3× per-token premium over Sonnet, because the downstream cost of a wrong answer (review time, rework, incident response) dominates the model cost.

The right framing for the upgrade conversation is not "what does 4.7 cost?" It is "what does a wrong answer cost us, and how does 4.7 change that probability?" For the workloads where that answer is clear, migrate. For the rest, keep your routing and your economics intact.

## Where this fits in our practice

We're a generalist Claude-native agency. Every engagement we scope gets walked against Anthropic's A/A/A × 4D AI Fluency framework — Automation, Augmentation or Agency, plus Delegation, Description, Discernment, and Diligence. A model upgrade is an Augmentation-mode exercise (you augment your evaluation team with a new model and decide together) backed by Diligence (you document the change for governance).

You pick how you pay. Four equal commercial formats:

- **PAYG** — per-task pricing on a hosted agent
- **Upfront** — paid build, paid delivery, fixed scope
- **Self-hosted** — buy outright + code/IP transfer
- **Managed** — monthly hosted, we operate it for you

For prospects burnt by past AI pilots, we also offer **free scope, free build, payment only on acceptance** — one de-risking path that overlays the four formats.

If you want help running a 4.7 migration audit on a real workload:

- Free 30-min scope-out call: [adaptation.ai/book](/book)
- See how we run our own fleet: [adaptation.ai/trust](/trust)
- Pick the format that fits: [adaptation.ai/pricing](/pricing)

— Willie Prosek, Founder, Adaptation AI

---

*Methodology: Anthropic's AI Fluency framework © 2025 Rick Dakan, Joseph Feller, and Anthropic. Released under CC BY-NC-SA 4.0. We use it under attribution; we do not rebadge it.*

Want this applied to your workflow?

Free scope, free build, you only pay if it works. 30-min call books straight to a real engineer.

Book a 30-min Scope-out call →