logo

1-Bit Bonsai Image 4B Image Generation for Local Devices

Posted by modinfo |3 hours ago |29 comments

smallerize 4 minutes ago

To our knowledge, Bonsai Image 4B is the first image model in its parameter class to run directly on an iPhone.

Isn't SD XL 3.5B? And the refiner model is even larger. Those can run on an iPhone 13 Pro.

lumost an hour ago[2 more]

I actually can’t wait for the future where I upgrade hardware in order to upgrade my ai as an alternative to an expensive subscription.

There are many problems I want to work on which require billions of tokens. These are completely inaccessible without corporate project sponsorship at the moment. An asic generation machine which can pump out a few 10s of thousands of tokens per second at opus4.6 quality is more than sufficient.

jeroenhd 10 minutes ago

Couldn't try it because the demo app is iOS only and the web version just crashes my browser. The small model is impressive but if you front load a 1.8GB text encoder model, the savings aren't quite as useful.

I do wonder how these compare to existing image generation models. I've tried https://github.com/alichherawalla/off-grid-mobile-ai for a while but I find the image generation models rather lacking.

captainregex 7 minutes ago

what trade off would one need to clear to justify the hardware and the work to get this running locally as part of a broader system? It’s a lot of work setting up and maintaining a production harness/system on a local device. I don’t personally repeatedly generate images at a scale where using a lab’s app somehow burns all my tokens. I like the ideas of local ai but I don’t see widespread adoption of it happening in commercial or customer situations anytime soon no matter how little/good enough they get. Even Uber- token burn whiplash but I doubt their answer will be “run some of it local”. IT nightmare, I’d imagine.

woadwarrior01 4 minutes ago

The text encoder is still 4-bit quantized.

sorenjan 2 hours ago

They call it a diffusion model, but it's based on Flux.2 which is a rectified flow model.

wiradikusuma an hour ago

Is there a benchmark of local image generation models? Local = can run on a 16 GB MacBook or 8 GB+ NVIDIA card.

a1o an hour ago[1 more]

Anyone could pickup the minimal hardware requirements for this? Like both RAM and Storage?

sudb 36 minutes ago

Very interested to see where this kind of work goes for on-device video generation!

iJohnDoe 12 minutes ago[1 more]

Does anyone ever get their stuff to actually work. Like actually load?

potatoman22 43 minutes ago

I wonder why they didn't use a Bonsai model as the text encoder

MitPitt 2 hours ago[3 more]

Lately I've noticed posts with barely 10 points getting to HN frontpage. Was it always like this?

janniks an hour ago[1 more]

I was expecting to see images of Bonsai trees when I clicked this

SilentM68 an hour ago

Question,

Is it compatible with Ollama, ComfyUI or are those providers unneeded, compatible with low-end hardware?

Also, where does "./setup.sh/ drop the components in Linux?

Thank you, Sol

yieldcrv 2 hours ago

impressive, combines a couple techniques that I always wanted the frontier models to have

having trouble loading the webgl browser demo on my phone but no biggy