Apples to Apples: MLX vs. Llama.cpp for Gemma 4 12B on an M1 16GB

Posted by ABS |2 hours ago |1 comments

ABS 2 hours ago

lol, it took me 48 hours to do (and re-do, and re-do) this test + write it up and now that I convinced myself to stop changing bits and just publish it... Google's just announced the Gemma 4 QAT models :-D

It would not change the core of my article since the bottleneck remains the memory bandwidth on the old M1 16GB though