The List Price Is Lying: Why AI Costs Decoupled From Pricing in May 2026

The List Price Is Lying: Why AI Costs Decoupled From Pricing in May 2026
Anthropic did not raise the price of Opus 4.7. The list price still reads $5 per million input tokens and $25 per million output tokens, exactly as it did for Opus 4.6. And yet teams running production workloads on Opus 4.7 this week are paying 27% more for the same work.
Something has shifted in how AI pricing works. It shifted quietly, through three separate vendor moves announced within days of each other, each using a different mechanism. The shift is large enough to break most enterprise AI budgets that were drafted for 2026, and subtle enough that it does not show up by reading the pricing pages.
This article unpacks what actually happened, why the list price has become a weak proxy for real cost, and what controls whether your team sees a 5% delta or a 90% one over the next 60 days.
Three vendors, three mechanisms, one direction
OpenAI: the explicit increase
GPT-5.5 launched with a clear, visible price change. Input tokens went from $2.50 per million to $5.00 per million. Output tokens went from $15 per million to $30 per million. A textbook 2x increase, announced openly.
OpenRouter ran an analysis on a switcher cohort, that is, users whose primary model was GPT-5.4 before launch and who switched to GPT-5.5 immediately after. Same users, same workflows, different model version. The result: real cost increases ranged from 49% to 92%, depending on prompt length.
Why not the full 100%? Because GPT-5.5 is less verbose on long prompts. For prompts above 10K tokens, the model produces 19% to 34% fewer completion tokens than GPT-5.4 did for the same task. That partially offsets the doubled per-token price. For shorter prompts the offset disappears, which is why the worst hit is on prompts under 2K tokens, where real cost increased 92%.
The headline is straightforward: OpenAI doubled the price, and your bill went up by somewhere between half and almost double. The mechanism is visible. The magnitude is not.
Anthropic: the invisible increase
Anthropic took a different path. The list price for Opus 4.7 is identical to Opus 4.6. No announcement of a price change, because there is no price change to announce.
Instead, Opus 4.7 ships with a new tokenizer. The new tokenizer produces 32% to 45% more native tokens for the same text, depending on prompt size. Anthropic disclosed an inflation range of 1.0x to 1.35x in their official documentation. OpenRouter's measurement on a switcher cohort confirmed the upper end of that range in practice.
The real cost impact: 12% to 27% more per task on prompts above 2K tokens. There is one striking exception. Prompts under 2K tokens actually cost 1.6% less, because Opus 4.7 is significantly more concise on short queries (62% fewer completion tokens at the median).
There is also a hidden mitigation. Prompt caching, billed at a 90% discount, absorbs a large share of the token inflation if your prompt structure is stable enough to hit the cache. On prompts of 128K tokens or more, 93% of the extra tokens land in cache. On prompts in the 10K to 25K range, only 9% do. This is not a minor implementation detail. This is the difference between a 33% cost increase and a 5% one, on the same model, with the same volume.
The list price did not move. The bill did. And how much the bill moved depends almost entirely on how your prompts are structured.
GitHub Copilot: the structural change
GitHub announced that on June 1, 2026, Copilot will move from request-based billing to usage-based billing. Plan prices stay the same: Copilot Pro at $10 per month, Pro+ at $39, Business at $19 per user. What changes is what those prices buy.
Today each interaction with Copilot consumes one premium request unit, with a model multiplier on top. From June 1, every interaction will consume tokens, priced at the underlying model's API rate, converted into GitHub AI Credits.
For developers who mainly use inline code completion and Next Edit Suggestions, the impact will be modest. Those features remain included in the base plans. For developers who rely on Copilot Chat, agentic coding sessions, large context windows, or code review, the cost curve changes substantially. Multipliers for annual plan holders staying on request-based billing will rise sharply: Opus 4.7, for instance, moves from a 7.5x multiplier to 27x.
GitHub also paused new sign-ups for Copilot Pro, Pro+, and Student plans starting April 20, 2026. The official reason is to protect the experience for existing customers. The implicit reason is that demand is exceeding what GitHub can profitably serve at current flat-rate pricing.
Three vendors. Three mechanisms (price change, tokenizer change, billing model change). One direction.
| Vendor | Mechanism | List price change | Real cost impact | What drives the gap |
|---|---|---|---|---|
| OpenAI (GPT-5.5) | Explicit price increase | +100% (input and output) | +49% to +92% | Less verbose completions on long prompts (-19% to -34% above 10K) |
| Anthropic (Opus 4.7) | New tokenizer | 0% | +12% to +27% above 2K; -1.6% below 2K | 32-45% more native tokens for the same text; cache absorbs 9-93% depending on prompt size |
| GitHub (Copilot) | Billing model change | 0% on plans | Up to 3-4x for agentic users from June 1 | Shift from request-based to token-based; multipliers up to 27x for some models |
Why this matters more than it looks
For most of the last three years, AI costs in enterprise budgets have been a rounding error. Procurement teams treated model APIs roughly the same way they treat S3 storage: a per-unit rate from a public pricing page, multiplied by an estimated volume, plus a contingency.
That approach has stopped working. To produce a credible AI cost estimate today you need three inputs, not one:
- The list price for the model. This still matters, but it is now necessary, not sufficient.
- The tokenizer behavior of the specific model version. Different vendors produce different token counts for the same text. Different versions of the same model can produce 30% to 45% different counts. The unit on the listing is no longer a stable unit.
- Your actual usage pattern: prompt length distribution, completion length distribution, cache hit rate, retry rate. These determine which segment of the vendor's pricing structure you are actually exposed to.
Only the intersection of these three gives you a real cost-per-task figure. Without all three, you are not budgeting. You are guessing.
The structural implication for anyone running a technical organization: the line item labeled "AI" in your 2026 budget is becoming the most volatile line in the entire technology stack. Volatility of 50% to 90% inside 48 hours has no precedent in enterprise commodity infrastructure. AWS in ten years never had this risk profile. Neither did the database market, the CDN market, or the observability market.
This volatility is not going away. It is the natural consequence of a market where compute supply, frontier-model capability, and provider business models are all evolving simultaneously. The vendors changing their pricing this month are not done. They will change it again, and the next vendors will use mechanisms we have not seen yet.
What actually controls cost in production
The same OpenRouter data that shows the headline cost increases also shows large dispersion within those increases. Two organizations using Opus 4.7 with the same volume can see a +5% or +27% real impact. The difference is not which contract they signed. It is how they built the system around the model. Four levers do most of the work.
| Lever | What it controls | Mechanism | Estimated impact |
|---|---|---|---|
| Cache-aware context structure | Tokenizer inflation absorption | Stable prompt prefix that hits cache (90% discount) | 9% absorption at 10K-25K prompts; 93% absorption above 128K |
| Model routing | Exposure to single-vendor pricing changes | Routing layer that directs tasks across model tiers | Buffer against 50-90% volatility events; option value across vendors |
| Prompt length discipline | Per-task token consumption | Compress irrelevant context; avoid pad-for-safety | 2K-10K bucket is the most punished (+27% to +69%) |
| Completion control | Output token volume | Strict schemas, tight stop conditions, concise output design | Compounds vendor-driven completion reductions |
Cache-aware context structure. Anthropic's prompt caching, billed at a 90% discount, absorbs a large share of tokenizer inflation when prompts are structured in stable, cacheable layers. The Anthropic data shows the gradient clearly: 9% of extra tokens absorbed at 10K-25K prompts, 77% at 50K-128K, 93% above 128K. The variable that controls cache hit rate is whether your prompt assembly produces the same prefix across requests. Prompts assembled ad-hoc per request lose this benefit entirely. Prompts assembled with a fixed structural layer (system instructions, fixed context, then variable input at the end) capture it.
Model routing. A workflow hard-wired to a single model is fully exposed to that model's pricing changes. A workflow with a routing layer that directs simple tasks to smaller models and reserves frontier models for complex tasks has a natural buffer. The routing layer also creates option value: when a vendor changes pricing unfavorably, you can shift load to alternatives without rewriting application code. This is not a hypothetical lever. With OpenAI, Anthropic, and GitHub all changing terms in the same week, organizations with routing layers reconfigured their traffic. Organizations without one filed change requests.
Prompt length discipline. Both OpenRouter analyses show the 2K-10K bucket as one of the most punished. GPT-5.5 is 69% more expensive in that range. Opus 4.7 is 27% more expensive. Many engineering teams instinctively pad prompts with context "to be safe", inflating the per-task token count without proportional benefit. Disciplined prompt construction (compressing irrelevant context, splitting long contexts across calls when caching applies, removing dead context that no longer matters) has direct economic return.
Completion control. GPT-5.5 produces 19% to 34% fewer completion tokens above 10K. Opus 4.7 produces 62% fewer below 2K. These are vendor-driven changes, but workflows can compound them. Strict output schemas, tight stop conditions, and tasks designed to require concise answers all reduce completion length. Teams that guide the model toward short, structured outputs pay less, regardless of the model's nominal price.
All four levers are properties of the system around the model, not of the model itself. They are not prompt engineering. Prompt engineering optimizes a single call. These optimize the architecture in which calls happen.
The CFO conversation has changed
Six months ago, the CTO defending the AI budget to the CFO walked in with the vendor's pricing page. That conversation worked, because the pricing page was a reasonable proxy for what the company would actually pay.
Today that pricing page is incomplete. In 60 days, after Copilot's June 1 transition, it will be explicitly wrong for one of the most common enterprise AI tools.
The conversation a CTO needs to be ready to have now is a different one. It is not "how much does the model cost". It is "how much does a representative task cost for us, and why". The answer to that second question depends almost entirely on the architecture, not on the model selection.
Organizations that can answer the second question are in a strong position. They can budget with confidence, negotiate with vendors from a position of measurement rather than hope, and absorb pricing changes without operational disruption. Organizations that cannot answer it are exposed to every vendor decision made over the next year, and there will be many.
The shift is structural, not cyclical
It would be reassuring to read this week's pricing changes as a temporary squeeze. Capacity-constrained vendors raising prices until supply catches up. That reading is wrong, or at least incomplete.
The deeper shift is that AI infrastructure is moving from a commodity-style pricing model (predictable per-unit rates, predictable units) to a structured-product pricing model (rates depend on tokenizer, billing model, plan tier, and usage pattern). Structured products require structured procurement. They reward sophistication and punish naive consumption.
This is not a problem to wait out. It is a property of the market that is here to stay, and it favors organizations that have invested in the architectural layer around the model: the constraints, the routing, the caching strategy, the feedback loops that measure real cost-per-task continuously.
At FairMind, we call this layer the harness. Building it is not optional anymore. It is the part of the AI stack that determines whether the next pricing change costs you 5% or 50%.
Read more about Harness Engineering.
Sources
- OpenRouter, "Opus 4.7's New Tokenizer: What It Actually Costs" (April 27, 2026): https://openrouter.ai/announcements/opus-47-tokenizer-analysis
- OpenRouter, "GPT-5.5 Price Increase: What It Actually Costs": https://openrouter.ai/announcements/gpt55-cost-analysis
- GitHub Blog, "GitHub Copilot is moving to usage-based billing": https://github.blog/news-insights/company-news/github-copilot-is-moving-to-usage-based-billing/
- GitHub Docs, "Models and pricing for GitHub Copilot": https://docs.github.com/en/copilot/reference/copilot-billing/models-and-pricing
Ready to Transform Your Enterprise Software Development?
Join the organizations using FairMind to revolutionize how they build, maintain, and evolve software.