Opus 4.6 hallucinates twice as more today than when it released

Posted by jiwidi |3 hours ago |5 comments

shiandow an hour ago[2 more]

These seem to be different tests? One has 6 tasks the other has 30.

jiwidi 3 hours ago[1 more]

See original opus 4.6 sitting at 16% hallucination and the retest on 12th of april at 33%

They definitely must be doing some quantization or optimization to meet demand, otherwise why would model performance degrade this much? It's been crazy for me personally

2 hours ago

Comment deleted

metalman an hour ago

do people get the simple fact of data sets becoming more and more polluted, one from AI, and secondly from an increasingly deranged human population that is hyper focused on getting some extra financial advantage at ANY other cost, and a huge part of that is about "reputational management", again with zero limits on missinformation. a sane society would pull the plug, now.