Hacker News
Favorites
Setup
☰
Home
Favorites
Setup
Comment by e9 |
original
|
Senior SWE-Bench: open-source benchmark that assesses agents as senior engineers
[−]
e9
· 2026-07-02 Thu 05:39 UTC ·
link
fave
I agree with you on the harness. I find that Claude can be good in any harness but GPT is only superior inside Codex.