Hacker News
Favorites
Setup
☰
Home
Favorites
Setup
Comment by TZubiri |
original
|
The Underhanded C Contest
[−]
TZubiri
· 2026-07-02 Thu 00:05 UTC ·
link
fave
2026 calls for an Underhanded prompt contest
[−]
theteapot
· 2026-07-02 Thu 00:56 UTC ·
link
fave
Or better, sleeper agents. Anthropic released a study on this in 2024 "Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training" --
https://www.anthropic.com/research/sleeper-agents-training-d...
,
https://www.youtube.com/watch?v=_y9j2BoHg2c