Hacker News
Favorites
Setup
☰
Home
Favorites
Setup
Comment by Madmallard |
original
|
Senior SWE-Bench: open-source benchmark that assesses agents as senior engineers
[−]
Madmallard
· 2026-07-02 Thu 04:11 UTC ·
link
fave
next round of trust me bro benchmarks
[−]
dozerly
· 2026-07-02 Thu 04:54 UTC ·
link
fave
Just wait for the next 100 rounds. People love seeing the 65% -> 85% seemingly over and over again for every new model.