Hacker News
Favorites
Setup
☰
Home
Favorites
Setup
Comment by dools |
original
|
Senior SWE-Bench: open-source benchmark that assesses agents as senior engineers
[−]
dools
· 2026-07-02 Thu 06:37 UTC ·
link
fave
Yeah I’ve been consistently underwhelmed by anthropic models, but then I don’t use their harness so maybe that’s it