Claude Opus 4.6 accuracy on BridgeBench hallucination test drops from 83% to 68%

Posted by bratao |2 hours ago |1 comments

Reubend 11 minutes ago

Because the website doesn't seem to show any sample size of runs, I assume they ran it once across the suite.

The models are nondeterministic, and therefore it's pretty normal for different runs to give different results.

I don't see this as evidence that Opus 4.6 has gotten worse.