logo

First Proof

Posted by samasblack |3 hours ago |28 comments

Syzygies 41 minutes ago[2 more]

I'm a mathematician relying heavily on AI as an association engine of massive scope, to organize and expand my thoughts. One doesn't get best results by "testing" AI.

A surfboard is also an amazing tool, but there's more to operating one than telling it which way to go.

Many people want self-driving cars so they can drink in the back seat watching movies. They'll find their jobs replaced by AI, with a poor quality of life because we're a selfish species. In contrast Niki Lauda trusted fellow Formula 1 race car driver James Hunt to race centimeters apart. Some people want AI to help them drive that well. They'll have great jobs as AI evolves.

Gary Kasparov pioneered "freestyle" chess tournaments after his defeat by Big Blue, where the best human players were paired with computers, coining the "centaur" model of human-machine cooperation. This is frequently cited in the finance literature, where it is recognized that AI-guided human judgement can out-perform either humans or machines.

Any math professor knows how to help graduate students confidently complete a PhD thesis, or how to humiliate students in an oral exam. It’s a choice. To accomplish more work than one can complete alone, choose the former. This is the arc of human evolution: we develop tools to enhance our abilities. We meld with an abacus or a slide rule, and it makes us smarter. We learn to anticipate computations, like we’re playing a musical instrument in our heads. Or we pull out a calculator that makes us dumber. The role we see for our tools matters.

Programmers who actually write better code using AI know this. These HN threads are filled with despair over the poor quality of vibe coding. At the same time, Anthropic is successfully coding Claude using Claude.

_alternator_ 2 hours ago[1 more]

These are very serious research level math questions. They are not “Erdős style” questions; they look more like problems or lemmas that I encountered while doing my PhD. Things that don’t make it into the papers but were part of an interesting diversion along the way.

It seems likely that PhD students in the subfields of the authors are capable of solving these problems. What makes them interesting is that they seem to require fairly high research level context to really make progress.

It’s a test of whether the LLMs can really synthesize results from knowledge that require a human several years of postgraduate preparation in a specific research area.

blenderob 2 hours ago[2 more]

Can someone explain how this would work?

> the answers are known to the authors of the questions but will remain encrypted for a short time.

Ok. But humans may be able to solve the problems too. What prevents Anthropic or OpenAI from hiring mathematicians, have them write the proof and pass it off as LLM written? I'm not saying that's what they'll do. But shouldn't the paper say something about how they're going to validate that this doesn't happen?

Honest question here. Not trying to start a flame here. Honestly confused how this is going to test what it wants to test. Or maybe I'm just plain confused. Someone help me understand this?

falloutx 2 hours ago[1 more]

Anything special about these questions? Are they unsolved by humans. I am not working in mathematics research so its hard to tell the importance.

samasblack 3 hours ago

richard_chase 2 hours ago

Interesting questions. I think I'll attempt #7.

happa 2 hours ago[1 more]

February 13th is a pretty close deadline. They should at least have given a month.

baal80spam 2 hours ago[1 more]

I'll patiently wait for the "goalpost moving olympics" after this is published.

Aressplink 44 minutes ago

For policy feedback (Gas^∆ ÷ 2) · diag(u) · (Gas^∆ ÷ 2)^t A dampened shock propagates forward,is treated as independent, then feeds back into the system,that's quadratic form.