Hacker News

Favorites Setup
Comment by hypfer | original | Senior SWE-Bench: open-source benchmark that assesses agents as senior engineers
[−]hypfer · 2026-07-02 Thu 06:44 UTC · link
Similarly, it explains to me why people found Claude so amazing, while I just thought "eh."

Tool expectations