satvikpendem 43 minutes ago
Personal I'm using the 2B model for web search and structured JSON output back via Unsloth Studio and its API, works very well for that even with the model embedded on phones.
somewhatrandom9 6 minutes ago
minimaxir an hour ago
It's good that this post lists the expected VRAM usage for the models with Q4_0 Gemma 4 12B being 6.7GB, which will indeed fit Google's claims of fitting within 16GB comfortably, altough it confirms that only the quantized version will do so.
Relatedly, in Google's newly released Edge Gallery for macOS, Gemma 4 12B is explicitly listed as unsupported due to not enough RAM even on a 16GB machine, but given the expected VRAM usage here the Q4_0 variant definitely should fit and Google should fix that.
cr3cr3 10 minutes ago
netdur an hour ago
The E4B model doesn’t fit on my phone TPU, so it swaps to RAM, the QAT version means more accuracy, good!
refulgentis an hour ago