Hacker News

Favorites Setup
Comment by olmo23 | original | Senior SWE-Bench: open-source benchmark that assesses agents as senior engineers
[−]olmo23 · 2026-07-02 Thu 06:33 UTC · link
How do you prevent degenerate strategies? I could trivially give a model a SHA256 hash and ask it to provide the source input.

In class you'd probably want a rule saying at least one LLM should be able to figure out the answer, but in a head-to-head I'm not sure how to solve it.

[−]wwind123 · 2026-07-02 Thu 07:34 UTC · link
Who knows. Maybe Mythos 5 already found a hole in SHA256, so this won't be too hard. :)