minimaxir an hour ago
> Vision: We replaced Gemma 4’s vision encoder with a lightweight embedding module consisting of a single matrix multiplication, positional embedding and normalizations.
That's technically encoding, just without using a dedicated model for it like SigLIP? The Developer's Guide elaborates, it's still a 35M layer which I am curious is robust enough. https://developers.googleblog.com/gemma-4-12b-the-developer-...
> Small enough to run locally on consumer laptops with 16GB of RAM, it unlocks powerful multimodal and agentic experiences right on your machine.
I am assuming that involves quantization, which due to the quality loss makes that statement somewhat misleading IMO.
ethanpil an hour ago
Is it simply goodwill and/or marketing? Or am I missing something strategic?
spott 4 minutes ago
I'm curious how they pre-trained it... I feel like it must have had audio/image output that they chopped off.
I wonder how hard it would be to add it back on.
mlmonkey 6 minutes ago
ComputerGuru 31 minutes ago
A model that comfortably fits in 16GB of VRAM (allowing room for context) is a welcome upgrade.
lxgr 29 minutes ago
Havoc 35 minutes ago
Zambyte an hour ago
[0] https://ollama.com/library/gemma4/tags
Edit: MLX being Mac-only is independent of the model being MLX (and therefore Mac) only. The latter is what I am asking about.
dwa3592 an hour ago
randomNumber7 an hour ago
I would be interested in how this actually works. I couldn't find a description of the model architecture (and I did check the links in the Google blog)
djyde an hour ago
nickandbro an hour ago
BiraIgnacio 30 minutes ago
an hour ago
Comment deleteddigdugdirk 36 minutes ago
claysmithr 31 minutes ago
zuminator an hour ago
jdelman an hour ago
26 minutes ago
Comment deleted