logo

Found a CVSS 10.0 bypass in Hugging Face's model scanner. We open-sourced ours

Posted by yashchhabria |3 hours ago |1 comments

yashchhabria 3 hours ago

Co-Author here. I worked on model scanning at Databricks before joining Promptfoo to build ModelAudit.

The short version: ML model files execute code at load time. Pickle's `__reduce__` runs arbitrary Python on deserialization, and ~45% of popular HuggingFace models still use pickle (CCS 2025). Every major framework has had a deserialization CVE in the last year - PyTorch (CVSS 9.3), Keras (CVSS 9.8), ONNX (CVSS 8.8).

Existing scanners use blocklists - maintain a list of known-dangerous functions, allow everything else. We kept finding gaps:

- *picklescan* (used by HuggingFace): 60+ published GHSAs. We found a CVSS 10.0 universal bypass via `pkgutil.resolve_name()` - one opcode sequence that renders the entire blocklist irrelevant. - *fickling* (Trail of Bits): We found an opcode handler bug where function calls vanish from the AST if you POP the result. Fickling reports `LIKELY_SAFE` on a pickle that spawns a reverse shell.

We also found 4 malicious models currently on HuggingFace that bypass every scanner in their pipeline (VirusTotal, JFrog, ClamAV, picklescan, ModelScan).

ModelAudit takes the opposite approach: allowlist-first. We maintain ~1,500 individually vetted safe globals for ML frameworks, and everything else is flagged. It covers 42+ formats (not just pickle), runs entirely offline, has no ML framework deps, and produces SARIF for CI/CD.

We filed 7 GHSAs total across fickling and picklescan through coordinated disclosure. All fixed by maintainers.

MIT licensed: https://github.com/promptfoo/modelaudit

Happy to answer questions about pickle VM internals, the bypass research, or the scanner architecture.