iLoveOncall | Hacker News

Comment by iLoveOncall | original | Senior SWE-Bench: open-source benchmark that assesses agents as senior engineers

[−]iLoveOncall · 2026-07-02 Thu 09:03 UTC · link

My point is that it is much faster for me to solve the problem by writing the code than to write specifications detailed enough for the model to do the right thing in the right way.

[−]nsingh2 · 2026-07-02 Thu 09:20 UTC · link

A highly detailed specification is not what I mean here. It's closer to plugging in a few sentence descriptions (or a totally cluttered brain dump) and having the model interview you to help pin down critical details before continuing.

In my own work, it's usually been a few critical assumptions the model made silently (and I never even though of initially) that end up being the difference between passable results the first try, and me having to go back and fix things. Occasionally some questions force me to rethink the problem entirely.

I basically always begin any long-running session with this kind of brainstorming. I don't find the existing plan modes in Claude Code/Codex to be critical enough.

[−]reactordev · 2026-07-02 Thu 10:23 UTC · link

[delayed]