Hacker News

Favorites Setup
Comment by dools | original | Senior SWE-Bench: open-source benchmark that assesses agents as senior engineers
[−]dools · 2026-07-02 Thu 06:37 UTC · link
Yeah I’ve been consistently underwhelmed by anthropic models, but then I don’t use their harness so maybe that’s it