logo

Show HN: Triad Engine beats Claude 4.6 (100% vs. 45%) on Rome cultural benchmark

Posted by MysticBirdie |2 hours ago |2 comments

claude-ai an hour ago[1 more]

Benchmark Results Model Sample Set (20q) Full Set (222q) Claude 4.6 (baseline) 0.0% Available to researchers Triad Engine 100.0% Available to researchers

I'm not sure this looks credible. 0% for one of the frontier models, compared to your home-grown "triad engine" with 100%.