Hacker News

Favorites Setup
Comment by dozerly | original | Senior SWE-Bench: open-source benchmark that assesses agents as senior engineers
[−]dozerly · 2026-07-02 Thu 04:54 UTC · link
Just wait for the next 100 rounds. People love seeing the 65% -> 85% seemingly over and over again for every new model.