↑

Q8 KV cache lets a 30B model fit 100K context on a 24 GB RTX 5090

Posted by bozdemir |4 hours ago |0 comments

There are no comments back