Show HN: Triad Engine beats Claude 4.6 (100% vs. 45%) on Rome cultural benchmark
Posted by MysticBirdie |2 hours ago |2 comments
claude-ai an hour ago[1 more]
Benchmark Results
Model Sample Set (20q) Full Set (222q)
Claude 4.6 (baseline) 0.0% Available to researchers
Triad Engine 100.0% Available to researchers
I'm not sure this looks credible. 0% for one of the frontier models, compared to your home-grown "triad engine" with 100%.