The Recommendation I Almost Made
Three weeks ago, if a CEO had asked me how to turn their company AI-native fast, my answer would have fit on one slide:
- Claude Cowork for every knowledge worker — sales, finance, HR, ops, marketing — so the AI lives on their desktop, reads their files, and actually completes work end to end
- Claude Code for every engineer, so they ship faster and stop reinventing scaffolding
- Take the software-licensing budget you would have spent elsewhere and put it into intensive enablement — not training videos, real time with real problems
- Then get out of the way and let your own people build what they need
I would have meant it. Technology is not what limits an enterprise today. Creativity is. Anthropic had, until recently, the most refined frontier stack I have ever used. Then the company started doing what every dominant platform eventually does. It began extracting value from the users who built its network effect.
The signal is clear if you read the last nine months of Anthropic product changes in sequence. Weekly rate limits. Peak-hour throttles. An enforced ban on third-party clients that were the main reason developers paid for Max in the first place. None of these are individually fatal. Together they redraw the economic picture for anyone considering a Claude-only enterprise rollout.
Diagnosis: What Anthropic Actually Changed
The tightening is not speculation. It is documented in Anthropic's own terms and in their engineers' public statements.
Weekly rate limits. Anthropic announced weekly caps for Claude Pro and Max in July 2025, effective 28 August. TechCrunch's briefing surfaced the concrete numbers Anthropic declined to publish on the pricing page. Max 20x at $200 per month gets roughly 240 to 480 hours of Sonnet per week, but only 24 to 40 hours of Opus. Opus is already effectively rationed for subscribers.
Peak-hour squeeze. On 26 March 2026, Anthropic tightened the 5-hour rolling window during peak global hours. An Anthropic engineer told The Register that "about 7 percent of users will hit session limits they wouldn't have before." The weekly quota did not move. The hourly distribution did.
Third-party client ban. On 20 February 2026, Anthropic updated its Claude Code legal and compliance doc to make the rules explicit. OAuth credentials from Pro, Max, Team and Enterprise subscriptions cannot be used by third-party developers. OpenCode removed Claude subscription support the same day, citing "anthropic legal requests." Crush followed. The wrappers that let developers get more value from their Max plans are gone.
Decoupled enterprise billing. Anthropic's Enterprise plan now bills seats and usage separately. Seat access at roughly $20, all tokens metered at API rates. The all-in-one subscription model is being phased out exactly where it would cost Anthropic margin.
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#1a2540', 'primaryTextColor': '#ffffff', 'primaryBorderColor': '#00d4ff', 'lineColor': '#00d4ff', 'background': '#0a0f1e', 'mainBkg': '#1a2540', 'nodeBorder': '#00d4ff', 'edgeLabelBackground': '#0a0f1e'}}}%%
graph LR
A["Jul 2025
Weekly rate limits
announced"] --> B["Aug 2025
Weekly limits
effective"]
B --> C["Jan 2026
Bans for third-party
harness users"]
C --> D["Feb 2026
Legal doc bans OAuth
in third-party clients"]
D --> E["Feb 2026
OpenCode and Crush
drop Claude Max"]
E --> F["Mar 2026
Peak-hour
session squeeze"]
classDef brand fill:#1a2540,stroke:#00d4ff,stroke-width:2px,color:#ffffff;
class A,B,C,D,E,F brand;
Reframe: The Frequent Flyer Playbook Comes To AI
This is not new corporate behaviour. Airlines have run this playbook for forty years.
Step one, issue generous points to build loyalty. Step two, let status benefits accumulate until switching airlines feels psychologically expensive. Step three, once the network effect is locked, devalue the points. Raise the redemption thresholds. Add blackout dates. Restrict partner bookings. The miles are still there. They just buy less.
Anthropic is running step three. The product is still excellent. The capacity is just worth less per dollar than it was last quarter, and the tools that let power users extract more value have been declared out of scope. The public justification is capacity management and abuse prevention. Both are real. But the pattern is the pattern, and the direction of travel for enterprise pricing is the same direction airline elite status has travelled for a generation. More tiers. Harder thresholds. Metered add-ons for what used to be included.
The uncomfortable part for enterprise buyers is that consumers and developers are the canary. Anthropic tightens the consumer plans first because the switching cost is lowest there and the margin pressure is highest. Enterprises are next, once the muscle memory, the codebases, the MCPs, the internal Claude-specific prompts and the embedded Claude Code workflows have reached a level where migration looks genuinely expensive. That point arrives faster than most CIOs expect.
Framework: The Diversification Decision
The question is not whether Claude is the best model. For many workloads it still is. The question is whether your architecture lets you move when the rules change again.
Three numbers matter for that decision. Blended cost per million tokens, switching cost per application, and sovereignty exposure.
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#1a2540', 'primaryTextColor': '#ffffff', 'primaryBorderColor': '#00d4ff', 'lineColor': '#00d4ff', 'background': '#0a0f1e', 'mainBkg': '#1a2540', 'nodeBorder': '#00d4ff', 'edgeLabelBackground': '#0a0f1e'}}}%%
graph LR
A["Monthly token volume"] --> B{"Under 300M?"}
B -->|"Yes"| C["Stay on Claude API
or Max subscription
Lock-in risk is reversible"]
B -->|"No"| D{"Regulated or EU-sovereign?"}
D -->|"Yes"| E["Mistral or Qwen3 on
AWS Sovereign Bedrock
or private cloud"]
D -->|"No"| F{"Over 2B tokens/month?"}
F -->|"No"| G["Managed open-source models:
Kimi K2.5, GLM-5.1,
MiniMax M2.7"]
F -->|"Yes"| H["Self-host on 8xH200
with MLOps team
or negotiate enterprise deal"]
classDef brand fill:#1a2540,stroke:#00d4ff,stroke-width:2px,color:#ffffff;
classDef decision fill:#0a0f1e,stroke:#ffb347,stroke-width:2px,color:#ffffff;
class A,C,E,G,H brand;
class B,D,F decision;
Cost. The price gap between Claude and the leading open-source alternatives — models you can either rent through an API or download and run on your own hardware — is no longer marginal. All figures verified against vendor pricing pages on 15 April 2026. Blended column assumes a 3:1 input-to-output ratio, typical for code-assist and retrieval workloads.
| Model | Input $/M | Output $/M | Blended 3:1 | Run yourself? |
|---|---|---|---|---|
| Claude Opus 4.6 | 5.00 | 25.00 | 10.00 | No |
| Claude Sonnet 4.6 | 3.00 | 15.00 | 6.00 | No |
| GPT-5.4 | 2.50 | 15.00 | 5.63 | No |
| GLM-5.1 (Z.ai) | 1.40 | 4.40 | 2.15 | Yes (MIT) |
| Qwen 3.6 Plus (Alibaba) | 0.50 | 3.00 | 1.13 | No (API-only) |
| Kimi K2.5 (Moonshot) | 0.60 | 3.00 | 1.20 | Yes (modified MIT) |
| GLM-4.6 (legacy tier) | 0.60 | 2.20 | 1.00 | Yes (MIT) |
| MiniMax M2.7 | 0.30 | 1.20 | 0.53 | Yes (modified MIT) |
| DeepSeek V3.2 | 0.28 | 0.42 | 0.32 | Yes (MIT) |
Quality on coding-heavy benchmarks tells the same story. The frontier is no longer Anthropic alone.
| Model | SWE-Bench Verified | Notable strength |
|---|---|---|
| Claude Opus 4.6 | 80.9% | Coding leader, strongest production track record |
| MiniMax M2.5 M2.7 is current flagship; reports SWE-Pro 56.2 instead | 80.2%* | Tied with Opus on coding at ~5% of the price (vendor-reported) |
| Qwen 3.6 Plus | 78.8%* | Per Alibaba: beats Opus on Terminal-Bench 2.0 (61.6 vs 59.3) & OmniDocBench (91.2), ~1.7× faster |
| GLM-5.1 | 77.8% | AIME 2026 95.3, GPQA-Diamond 86.2 — reasoning leader in open source |
| Kimi K2.5 | 76.8% | Strongest on LiveCodeBench, cache-hit input at $0.10/M |
| DeepSeek V3.2 | ~73% | Cheapest in the set, IMO 2025 gold medal |
The takeaways from these two tables, in plain terms:
- You are no longer choosing between quality and price. On vendor-reported numbers, MiniMax M2.5 effectively ties Claude Opus 4.6 on SWE-Bench Verified (80.2 vs 80.9) at roughly 5 percent of the cost. Qwen 3.6 Plus reports beating Opus on Terminal-Bench 2.0 and document understanding. Even discounted by the typical 2–5 point independent-replication gap, the field has closed in.
- The Chinese open-source labs ship faster than Anthropic. Qwen 3.6 Plus on 2 April, GLM-5.1 on 7 April, MiniMax M2.7 weights on 12 April. Three major releases in April alone, all from labs that did not exist as serious model providers eighteen months ago.
- Watch the licence column carefully. GLM-5.1 (MIT) and MiniMax M2.7 (modified MIT) you can download and run yourself. Qwen 3.6 Plus is Alibaba Cloud API-only — it closes the capability gap, not the sovereignty gap. Kimi K2.5's modified MIT adds an attribution clause that affects very large consumer products.
- Claude is still the right answer for some workloads. Production reliability, MCP ecosystem maturity and sustained coding performance keep Sonnet 4.6 and Opus 4.6 in the mix. They are no longer the only answer.
Sources: Claude pricing, Z.ai pricing, GLM-5.1 docs, MiniMax M2.7, Qwen 3.6 Plus announcement, Alibaba Cloud pricing, Kimi K2.5, DeepSeek V3.2, OpenAI API pricing.
Switching cost. This is where Anthropic's restrictions actually bite. Every Claude-specific prompt, every MCP server configured against Anthropic's tool-use schema, every internal fine-tune on Anthropic's evals, every developer muscle-memoried into Claude Code becomes a migration cost. The fix is architectural. Route every LLM call through an abstraction layer from day one. LiteLLM, Portkey, a bespoke gateway, it does not matter which. What matters is that no application code knows which vendor it is talking to.
Sovereignty. The EU AI Act fully applies from August 2026 with penalties up to 7 percent of global turnover. AWS Sovereign Cloud launched in January 2026 with Bedrock included. For regulated EU buyers, Claude is often already blocked at procurement, and the question is not Claude versus alternatives but which open-source model runs inside the sovereign boundary. Mistral, the smaller Qwen3 variants and GLM-5.1 are the practical choices.
Application: The 100-Engineer Scenario
The strategy I described in the opening — Claude on every desk plus heavy enablement — would put Claude Pro or Max on 100 engineers at roughly $100 per seat per month. Under the new weekly caps, heavy users hit limits during exactly the sessions where they need the tool most. The obvious workaround, third-party harnesses, has been removed by legal enforcement. The tier above, Anthropic Enterprise, decouples seat fees from usage and meters every token at API rates.
The alternative is not a single model swap. It is a gateway in front of three. Claude Sonnet 4.6 for the 20 percent of tasks where the quality gap still matters. Kimi K2.5 via Moonshot's own API for the 60 percent of daily coding work where SWE-Bench parity is real. GLM-5.1 via the GLM Coding Plan Lite tier, billed quarterly at $30 per seat (roughly $10 per developer per month), for the long tail. Same 100 engineers, same 4 billion tokens per month, very different bill.
| Claude-Only (Enterprise) | Multi-Vendor Gateway | |
|---|---|---|
| Model routing | 100% Claude Sonnet 4.6 | 20% Sonnet 4.6 60% Kimi K2.5 20% GLM-5.1 |
| Seat / subscription | ~$20/seat × 100 × 12 $24K Enterprise pricing is negotiated |
GLM Coding Plan Lite $30/quarter × 100 × 4 $12K |
| Token cost 4B/mo, 3:1 I:O |
4B × $6 × 12 $288K |
Sonnet 20%: $57.6K Kimi 60%: $34.6K GLM-5.1 20%: in Coding Plan $92K |
| Gateway engineering amortised yr 1 |
— | $10K |
| Total annual | $312K | $114K |
| Delta | — | −$198K / year |
| Lock-in | Single vendor, high switching cost | Swappable at runtime |
| Sovereignty | None | Route to EU-sovereign provider as needed |
The saved $198K is not a cost reduction story. It is the enablement program I would otherwise compromise on, plus the gateway engineering that keeps the architecture portable, plus a reserve for the next time a frontier lab decides to reprice its subscribers.
What about running your own AI in-house?
This is the question I get whenever someone hears the word "open source." Why pay anyone? Just download Kimi K2.5 or GLM-5.1 and run it on your own GPUs.
The answer used to be hardware. Frontier models needed 64-GPU clusters that only hyperscalers could afford. Today's MoE designs activate only a fraction of their parameters per token — MiniMax M2.7 activates 10 billion of 230 billion, GLM-5.1 activates 40 billion of 744 billion — so an 8-GPU node is enough to serve a frontier model at production speed. Hardware is no longer the gate.
The new gate is people, and the math is honest about it.
| Approach | Inference $/M | Annual ops overhead | Pencils out for |
|---|---|---|---|
| Claude Sonnet 4.6 (Anthropic API) | $6.00 | $0 | Quality-critical workloads |
| Managed open-source Kimi K2.5, GLM-5.1, MiniMax M2.7 | $0.50–$2.15 | ~$10K (gateway) | Most enterprises, any scale |
| Self-host on rented GPUs 8×H200 @ $28–$70/hr, util-dependent | $5–$19 | $300K–$500K | Almost no one — worst of both worlds |
| Self-host on owned hardware 8×H200 DGX, ~$400–$500K capex, 3-yr amortised | $0.80–$3.00 depends on util | $500K–$700K | Very heavy users only (see below) |
The headline number is the inference cost. The decisive number is the annual ops overhead. Running a production AI inference platform needs two to three senior MLOps or SRE engineers at roughly $250K each fully burdened, plus monitoring, evaluation tooling, model swaps, on-call. Vendor cost calculators leave this out. Real-world TCO studies put it at $500K to $700K per year before you serve a single token.
The break-even point depends heavily on what you assume. Published analyses span almost three orders of magnitude:
- AIPricingMaster and SitePoint's 36-month model put it as low as ~50–100M tokens per month vs. budget commercial APIs.
- My own derivation — on-prem at $0.80/M tokens vs. managed Kimi K2.5 at $1.20/M, $500K MLOps to recoup — lands at roughly 100B tokens/month sustained.
- At 30% utilisation (closer to a typical bursty internal workload, not a 24/7 product) on-prem inference cost rises to ~$2/M, the spread inverts, and self-hosting never breaks even on price alone.
The honest read is that the answer is a wide range and your number will depend on whether your workload looks like a 24/7 consumer product (high utilisation, self-host wins early) or a typical office workday (bursty, self-host loses badly). For the 100-engineer scenario above, the workload is the latter. Self-hosting loses by hundreds of thousands of dollars per year before you have written a single eval or paid a single on-call shift.
There is a second, non-financial reason to self-host that the table does not capture. Sovereignty. If your data cannot leave your perimeter — financial regulator, defence supplier, healthcare operator under specific data-residency rules — the comparison is not about price. It is about whether you have a usable model at all. In that case, GLM-5.1 or Mistral on your own hardware is not a cost-optimisation play. It is the only legal option.
Implication
Vendor choice is not a purchasing decision. It is an architecture decision, and it decides your operating model. Bet the enterprise on a single frontier lab and your enablement strategy, your developer tooling, your agent designs and your sovereignty posture are all downstream of that lab's next pricing meeting. The frequent flyer devaluation is coming for every frontier provider eventually. The enterprises that notice early will be the ones that treated the model as a swappable component all along.
I will still reach for Claude on specific workloads. I will not default to it anymore. The difference is the gateway in front of it.
Sources
- Anthropic, Claude Code legal and compliance, February 2026
- Anthropic, Claude pricing
- TechCrunch, Anthropic unveils new rate limits to curb Claude Code power users, July 2025
- The Register, Anthropic tweaks usage limits, March 2026
- The Register, Anthropic clarifies ban on third-party Claude access, February 2026
- MiniMax, M2.7 announcement (12 April 2026) and platform release notes
- Z.ai, GLM-5.1 documentation (7 April 2026), pricing page, and GLM Coding Plan
- Moonshot AI, Kimi K2.5 pricing and Hugging Face model card
- Alibaba Cloud, Qwen 3.6 Plus announcement (2 April 2026) and Model Studio pricing
- DeepSeek, V3.2 API pricing
- OpenAI, API pricing
- IntuitionLabs, Inference unit economics: true cost per million tokens
- TechPlusTrends, EU sovereign AI infrastructure stack 2026
Daniel Piatkowski Data & Analytics veteran shaping AI-native enterprises elicify.ai