← Back to Hello, AI
3 min

Claude Opus 4.6 Is Now 67% Cheaper — What Changes for Your Stack

Anthropic cut Opus 4.6 input pricing from $15 to $5 per million tokens. At that price it now undercuts Gemini and GPT-5.4 on input — and breaks the conventional cost-justification for tiered routing.

Anthropic cut Claude Opus 4.6 input pricing from $15 to $5 per million tokens this month, and output from $75 to $25. That's a 67% drop on the input side for what is still arguably the strongest general-purpose model on the market. The move lands alongside the Advisor Strategy release, and together they force a rethink of routing logic that most production stacks have been quietly accreting for the past year.

The comparison table tells the story faster than any analyst note. Opus 4.6 at $5 input is now undercutting Gemini 3.1 Pro at $7 and GPT-5.4 standard at $12. On output, $25 sits below GPT-5.4's $60 and roughly matches Gemini. A year ago, routing a user query to Opus meant eating a 5x premium over the mid-tier; today that premium is gone, and on some workloads Opus is the cheapest frontier option you can call. Layer in the Batch API's 50% discount for async workloads and Opus input drops to $2.50 per million — territory that used to belong to Haiku-class models.

This breaks the conventional cost-justification for tiered routing. The standard playbook was a classifier or cheap model upfront, escalating to Opus only when confidence dropped or the task demanded it. That architecture paid for itself when Opus was 5-10x the price of your fallback. At 3x or less, the routing logic, the evaluation harness, the drift monitoring, and the two-model cognitive overhead start costing more in engineering time than they save in inference. For most teams below a certain scale, the honest answer is now to delete the router and send everything to Opus.

The Advisor Strategy complicates that conclusion in an interesting direction. Our own BrowseComp numbers showed a Haiku-plus-Opus advisor configuration doubling task performance at 85% less cost than Sonnet solo. At the new pricing, the cost side of that equation compresses further, but the performance multiplier still holds — because the advisor pattern is about capability composition, not just cheap-model offloading. The interesting routing question is no longer "can I avoid Opus" but "how many Opus calls should I chain together."

None of this touches the sub-cent tier. OpenAI's Nano models at $0.10 to $0.20 per million remain the right call for classification, extraction, and anything you're running at millions of requests per day. What's ending is the middle-tier cost war as a meaningful architectural constraint. When frontier models converge around $5 to $15 input, the decisions that matter shift from price-per-token to latency budgets, tool-calling fidelity, and which model's failure modes your product can actually tolerate in production.

← More from Hello, AI