logo

Show HN: Quicktok, an exact BPE tokenizer 7x faster than tiktoken

Posted by dmatth1 |3 hours ago |1 comments

dmatth1 3 hours ago

quicktok runs the same algorithm as bpe-openai (exact backtracking BPE) but applies lots of data-structure optimizations to cut memory accesses and achieve the speedups (~7x over tiktoken). Output is byte-identical to tiktoken so this can be a great drop-in for anyone doing lots of corpus ingestion, search indexing etc.

Happy to answer all questions. If you find any input where quicktok's ids differ from tiktoken's that's a bug! Please report it.