thenaturalist 2 hours ago
The one difference between "can do" and "should be trusted to do" is the ability to systematically prove that "can do" holds up close to 100% of task instances and under adverserial conditions.
Hacking and pentesting are already scaling fully autonomously - and systematically.
For now, lower level targets aren't yet attractive as such scale requires sophisticated (state) actors, but that is going to change.
So building systems that white-hat prove your code is not only functional but competent are going to be critical not to be ripped apart by black-hat later on.
One nice example that applies this quite nicely is roborev [0] by the legendary Wes McKinney.