olmo23 | Hacker News

Comment by olmo23 | original | Senior SWE-Bench: open-source benchmark that assesses agents as senior engineers

[−]olmo23 · 2026-07-02 Thu 06:37 UTC · link

With a human at its disposal, it could probably count the number of R's in strawberry!

In all seriousness though, adding capabilities should not normally reduce the effectiveness of a model (within reason: don't pollute the context window with millions of useless tools).