DiffusionGemma: 4x Faster Text Generation

Posted by meetpateltech |3 hours ago |10 comments

vineyardmike 2 hours ago[4 more]

Recently I had switched to OpenCode to try out many of the Non-US-Frontier-Labs models. My unexpected favorite model to use was Mercury (a diffusion model). Not because it was “smart” but because it was stupid fast. It was more of a pair-programming experience instead of the SOTA agentic experience of prompting and waiting. Honestly, it was also way more fun and brought back some of the pre-AI coding experience while still getting some benefits of AI. It felt less of a slot machine where you prompt, wait, and hope it went in the right direction. It made me even use the tiny models like Gemini Flash Lite and GPT Mini/Nano more too.

Anyways, so excited for an open-weight model and I hope it performs well. I’ll be testing this ASAP.

xnx 2 hours ago[2 more]

Is the diffusion approach any use in Multi-Token Prediction (MTP) drafters? https://blog.google/innovation-and-ai/technology/developers-...

beklein 2 hours ago

A good visual explanation of how text diffusion models like DiffusionGemma work: https://newsletter.maartengrootendorst.com/p/a-visual-guide-...

minimaxir 3 hours ago[3 more]

A few days ago I was just thinking that Google never talked about their diffusion text generation model after demoing it at I/O a year ago. The rumor is that it was too expensive to run, but with the provided chart using the same 1x H100 hardware and comparing DiffusionGemma to regular Gemma, that shouldn't be the case. I'm curious what the downside for this speed is here aside from being slightly weaker than Gemma.

kkukshtel 2 hours ago[2 more]

I think this is the future. The sort of left-field rumble that turns into a quake in 5 years.

rvz 3 hours ago[1 more]

We need more local open weight models that are performant and just as good (or good enough) as the best frontier ones.

Then you will be able to achieve Jevons Paradox and enjoy the same “productivity gains” without paying for these extortionate token prices by closed model providers or have it as cheap as possible.

And especially, no silent nerfing of the model.

hmate9 2 hours ago

I can’t help but feel like there’s something here that will matter for future LLMs.

The bidirectionality could be a big deal: being able to refine a sentence with both left and right context feels closer to how editing/thinking actually works than committing to each token forever.

Maybe the current models aren’t good enough yet, but the direction feels important.