logo

Show HN: Content Addressable Storage for ML Checkpoints

Posted by TotallyNotOla |2 hours ago |1 comments

TotallyNotOla 2 hours ago

ML training workflows store checkpoints as full model files, even when most of the model hasn’t changed.I wanted to see what happens if you treat checkpoints as structured objects instead of opaque blobs, and deduplicate at the tensor level.

A few things surprised me:

- Delta compression mostly doesn’t work during training (deltas can be larger than the original)

- File-level deduplication (e.g. DVC) doesn’t capture most of the redundancy

- Almost all storage savings come from exact tensor identity, not partial overlap

For things like warm-start tree models and transfer learning, this ends up working really well. Curious if anyone has seen different behavior with larger models or different chunk sizes.