ZAYA1-8B: An 8B Moe Model with 760M Active Params Matching DeepSeek-R1 on Math

Posted by steveharing1 |3 hours ago |24 comments

adityashankar 4 minutes ago

I used their online api, and asked it to create code for a timer i can copy paste into about:blank to test out (prompt below)

it did it successfully, but it did need a follow up correction prompt, overally pretty impressive for a model with 760M active parameters, but definitely not deepseek-r1 level

that being said, if something with 760M active parameters can be this good then, there's a good chance it is likely that api-based models are likely to get cheaper in the future

Prompt ------

``` can you write me some js code (that i can put in the console for about:blank) which will basically create a timer for for me that i can start, stop, and store current values for (or rather lap)

so i want it to create buttons (start, stop, lap buttons) on the page for me with labels and divs and other elements that accordingly record the current information and display the current information, and can accordingly start, stop and lap :)

the js code that i copy paste automatically creates the html buttons and divs and other elements that can manage the timer and accordingly the timer works with them ```

throwaw12 2 hours ago[2 more]

> The math and coding part is impressive but the agentic one is not.

I think this is very important to eventually become a viable replacement for coding models. Because most of the time coding harnesses are leveraging tool calls to gather the context and then write a solution.

I am hopeful, that one day we can replace Claude and OpenAI models with local SOTA LLMs

Havoc an hour ago

0.76 active and it's vaguely competitive at coding sounds promising.

LM studio doesn't let me actually run this yet though: "Unsupported safetensors format: null"

yorwba an hour ago

Announcement blogpost: https://www.zyphra.com/post/zaya1-8b

2ndorderthought 2 hours ago[2 more]

I've been saying it for a long time now. I think small models are the future for LLMs. It's been fun seeing experiments to see just how much better models get by making them insanely large but it's not sustainable.

No I am not saying this model is a drop in Claude replacement. But I think in 2 years we might be really surprised what can be done in a desktop with commodity hardware, no connection to the internet, and a few models that span a subset of tasks.

Really happy to see amd put their hat in the ring. It's a good day for amd investors. I know a lot of AI bros will scoff at this, but having your first training run is a big deal for a new lab. AMD is on their way despite Nvidia having years of runway

immanuwell 25 minutes ago

[flagged]