Your unbiased guide to the world's smartest AIs
One-click access to today's frontier leaders. Ranked by capability, updated weekly.
Best planning, debugging, and self-correction in the game. The AI coworker developers actually trust.
Dominates reasoning and multimodal tasks. Scored 77% on ARC-AGI-2 — double its predecessor — and leads on graduate-level science benchmarks.
Multi-agent architecture with 1M context and built-in reasoning on every query. Now the cheapest frontier-quality option in the set — best for high-stakes research without the sugarcoating.
First fully retrained base since GPT-4.5. Leads agentic coding benchmarks with 82.7% on Terminal-Bench 2.0 and 73.1% on Expert-SWE. The enterprise default for autonomous multi-step work.
Elo ratings from Chatbot Arena blind votes. These shift weekly — here's the current snapshot.
No API key, no usage limits, no data leaving your machine. The best open-weight models ranked by Elo.
Google's Gemma 4 31B ranks #3 among all open-weight models on LMArena with an Elo of 1452 — beating models 20x its size. Fits a single RTX 4090 at Q4_K_M and ships Apache 2.0.
Alibaba's Qwen3 flagship dense model. Matches Qwen2.5-72B performance in a 32B package that fits a single RTX 4090, with hybrid thinking/non-thinking modes and Apache 2.0.
The strongest model that comfortably fits a single 24 GB GPU. Adds vision and tool-use over 3.1, tightly instruction-tuned, and Apache 2.0 with no usage restrictions — the best entry point for local AI.
No hype. Where each model actually leads, based on benchmarks and real-world usage as of today.
Currently #1 in blind user votes on LMArena with 1503 Elo. Gemini 3.1 Pro close behind at 1493.
Crushes it on planning, debugging, and self-correction. Many devs have switched and aren't looking back.
Leads on PhD-level benchmarks like GPQA and ARC-AGI subsets. Claude and Grok are strong contenders.
Shines for maximally truthful, witty conversation. Great for brainstorming without corporate polish.
Weekly analysis, honest takes, and hidden gems. No engagement bait.
Gemini 3.5 Flash beats its own 3.1 Pro flagship on agentic and coding benchmarks at $1.50/$9 per million tokens — the most cost-competitive frontier-class model launched this week.
Z.ai's GLM-4.6 lands within ~60 Elo of the closed frontier and ships under a no-strings MIT license — the best open-weight model nobody in Western dev circles is discussing.
DeepSeek V4 ships under MIT with $0.30/M output tokens — 83x cheaper than Claude Opus 4.7 — while scoring 80.6% on SWE-bench Verified. The agentic-coding price floor just moved an order of magnitude.
Gemini 3.1 Pro scores 77.1% on ARC-AGI-2 — 24 points above GPT-5.5 — yet Arena Elo places it in a three-way tie. Here's what the leaderboard hides, and when the reasoning gap actually changes your routing decision.
Google committed up to $40B in Anthropic on April 24 — the same week OpenAI launched a separate enterprise JV and GPT-5.5 doubled in price. The frontier market is hardening into two distribution channels, and the model is becoming the cheap part of the stack.
xAI's April 30 release prices Grok 4.3 at $1.25 input and $2.50 output per million tokens — undercutting Gemini 3.1 Pro by 79% and GPT-5.5 by 92% on output, while landing between Opus 4.7 and Gemini on agentic Elo.
OpenAI shipped GPT-5.5 yesterday at $5/$30 per million tokens — exactly double GPT-5.4. Anthropic spent April cutting prices; OpenAI just opted out of the cost war and bet on agents instead.
Opus 4.7 ships at the same $5 input price as 4.6, with the same 1M context window, and a 1503 Elo that tops the Arena. That nothing about pricing moved is precisely the story.
Anthropic cut Opus 4.6 input pricing from $15 to $5 per million tokens. At that price it now undercuts Gemini and GPT-5.4 on input — and breaks the conventional cost-justification for tiered routing.
Pairing a cheap executor with an expensive Opus advisor that only speaks at decision forks. The numbers are hard to dismiss — and the mental model behind them matters more than the benchmarks.