Hacker News
Favorites
Setup
☰
Home
Favorites
Setup
Comment by theteapot |
original
|
The Underhanded C Contest
[−]
theteapot
· 2026-07-02 Thu 00:56 UTC ·
link
fave
Or better, sleeper agents. Anthropic released a study on this in 2024 "Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training" --
https://www.anthropic.com/research/sleeper-agents-training-d...
,
https://www.youtube.com/watch?v=_y9j2BoHg2c