I’ve never had a refusal coding, and in some areas (AI red teaming specifically) I’ve found it quite good at recognizing and discussing “white hat” stuff that in the past I think would have got refusals.
But when there was the Hantavirus thing a while back, I asked it if there was a vaccine under development and got a refusal immediately. I’ve had a few like that. It seems they’ve implemented really poor guardrails on certain topics (CBRN and cyber) that have lots of false positives. But if you actually chat with the model itself it’s quite lucid about what is legitimately dangerous and what is just performative “AI Safety” style refusal.
Yeah, I’ve had Opus (and Fable) perform full security audits on my codebases that would run for 30mins. That’s what I think would have tripped it but went just fine.
[−]InvertedRhodium · 2026-07-01 Wed 22:45 UTC ·
link
Try using it as an agent to perform black box security testing on a live instance of your codebase (assuming it's a hosted service).
But when there was the Hantavirus thing a while back, I asked it if there was a vaccine under development and got a refusal immediately. I’ve had a few like that. It seems they’ve implemented really poor guardrails on certain topics (CBRN and cyber) that have lots of false positives. But if you actually chat with the model itself it’s quite lucid about what is legitimately dangerous and what is just performative “AI Safety” style refusal.
But that's the only refusal I managed to get.