Hacker News
Favorites
Setup
☰
Home
Favorites
Setup
Comment by pishpash |
original
|
Senior SWE-Bench: open-source benchmark that assesses agents as senior engineers
[−]
pishpash
· 2026-07-02 Thu 05:09 UTC ·
link
fave
Maybe models
should
ask for human-in-the-loop input, as a matter of convention.
[−]
sinuhe69
· 2026-07-02 Thu 06:24 UTC ·
link
fave
A model that can ask questions or ask for help when in doubt is indeed a major feat. None of the current frontier models can do that.