↑
Q8 KV cache lets a 30B model fit 100K context on a 24 GB RTX 5090
Posted by
bozdemir
|
4 hours ago |
0 comments
There are no comments
back