Hacker News

Favorites Setup
Comment by sinuhe69 | original | Senior SWE-Bench: open-source benchmark that assesses agents as senior engineers
[−]sinuhe69 · 2026-07-02 Thu 06:24 UTC · link
A model that can ask questions or ask for help when in doubt is indeed a major feat. None of the current frontier models can do that.