Hacker News
Favorites
Setup
☰
Home
Favorites
Setup
Comment by sinuhe69 |
original
|
Senior SWE-Bench: open-source benchmark that assesses agents as senior engineers
[−]
sinuhe69
· 2026-07-02 Thu 06:24 UTC ·
link
fave
A model that can ask questions or ask for help when in doubt is indeed a major feat. None of the current frontier models can do that.