Weekly analysis, honest takes, and hidden gems. No engagement bait.
DeepSeek V4 ships under MIT with $0.30/M output tokens — 83x cheaper than Claude Opus 4.7 — while scoring 80.6% on SWE-bench Verified. The agentic-coding price floor just moved an order of magnitude.
Read article →Gemini 3.1 Pro scores 77.1% on ARC-AGI-2 — 24 points above GPT-5.5 — yet Arena Elo places it in a three-way tie. Here's what the leaderboard hides, and when the reasoning gap actually changes your routing decision.
Read article →Google committed up to $40B in Anthropic on April 24 — the same week OpenAI launched a separate enterprise JV and GPT-5.5 doubled in price. The frontier market is hardening into two distribution channels, and the model is becoming the cheap part of the stack.
Read article →xAI's April 30 release prices Grok 4.3 at $1.25 input and $2.50 output per million tokens — undercutting Gemini 3.1 Pro by 79% and GPT-5.5 by 92% on output, while landing between Opus 4.7 and Gemini on agentic Elo.
Read article →OpenAI shipped GPT-5.5 yesterday at $5/$30 per million tokens — exactly double GPT-5.4. Anthropic spent April cutting prices; OpenAI just opted out of the cost war and bet on agents instead.
Read article →Opus 4.7 ships at the same $5 input price as 4.6, with the same 1M context window, and a 1503 Elo that tops the Arena. That nothing about pricing moved is precisely the story.
Read article →Anthropic cut Opus 4.6 input pricing from $15 to $5 per million tokens. At that price it now undercuts Gemini and GPT-5.4 on input — and breaks the conventional cost-justification for tiered routing.
Read article →Pairing a cheap executor with an expensive Opus advisor that only speaks at decision forks. The numbers are hard to dismiss — and the mental model behind them matters more than the benchmarks.
Read article →Anthropic built a model that dominates 17 of 18 benchmarks and achieves 100% on cybersecurity tasks — then decided the world isn't ready for it. Here's what that tells us about where AI is headed.
Read article →Don't sleep on Mistral 7B. This openly available model punches above its weight, offering impressive performance and a permissive license – making it a crucial choice for developers.
Read article →Emu, a lightweight agent from DeepMind, consistently outperforms many larger models on complex reasoning tasks — and it’s incredibly accessible. Here’s why this tiny titan is a serious contender.
Read article →Grok is rapidly emerging as a surprisingly capable model from xAI, and deserves your attention beyond the media circus. Here’s an honest look at what it does well and where it still falls short.
Read article →