Stop training your security ML on labeled attack data

Posted by projectnexus |2 hours ago |1 comments

projectnexus 2 hours ago

Signature-based detection and labeled ML classifiers only see what they’ve been told to see. In a SOC, the real threat is the behavior that doesn't show up in a feed.

I’ve been researching Energy-Based Models (EBMs) as a way to ditch labels entirely. Instead of teaching a model what "bad" looks like, we teach it what "normal" looks like across 40PB of data. The result was a 0.97 ROC-AUC and the detection of scripted service account activity that mimicked normal logins but had minor behavioral deviations.

I’m sharing the research on why EBMs outperform static rules and how to implement them without drowning in the false positives that usually plague unsupervised learning.