Loading player...
First video
0 / 1
Last video

Opus 4.7 is disappointing

9.6K views
332
156
April 17, 2026
intermediateai-models

Summary

Anthropic just released Claude Opus 4, and if you were hoping for a major upgrade, you're in for disappointment. Despite impressive marketing claims showing 10% improvements on benchmarks like SWE-bench Pro and promises of better instruction following, the real-world performance tells a different story. The model fails basic reasoning tests that previous versions handled better—like the classic "car wash 50 meters away" question where it confidently suggests walking instead of driving. When tested against custom benchmarks designed to catch common Opus regressions, version 4.7 performs no better than 4.6, while GPT-4o scores 75% on the same tests. Even simple coding tasks like creating a space shooter game produce inferior results compared to competitors like DeepSeek's GLM-5.1. The community response has been harsh, with Reddit users reporting serious regressions rather than improvements. The likely culprit? RLHF training that's dumbing down the consumer model while Anthropic reserves superior performance for enterprise customers using their expensive "Mefos" system through Project Glasswing. Companies like Apple, Google, and Cisco get access to better models, while regular users pay similar prices for degraded performance. There's a silver lining theory: Anthropic might be temporarily scaling back the model due to launch congestion, and performance could improve once traffic normalizes. The tokenization changes also mean you'll pay 1.0-1.3x more than Opus 4.6 for the same work. Meanwhile, Chinese competitors like DeepSeek doubled their prices from $30 to $72 because they know Opus isn't delivering. Bottom line: Opus 4.7 feels like March 2024's Opus 4.6—not the game-changer you were promised.

Related Videos