Animats 2 hours ago
I was expecting a secondary market in tokens, perhaps crypto-powered, but no.
The cost difference for languages roughly correlates with how much text it takes to say something in that language. English is relatively terse. (This is a common annoyance when internationalizing dialog boxes. If sized for English, boxes need to be expanded.) They don't list any of the ideographic languages, which would be interesting.
aprentic 2 hours ago
The broader questions are still interesting.
If an AI is trained more on language A than language B but has some training in translating B to A, what is the overhead of that translation?
If the abilities are combined in the same model, how much lower is the overhead than doing it as separate operations?
ie is f(a) < f(b) < f(t(B,A) ? where a and b are in A and B and f() and t() are the costs of processing a prompt and the cost of translating a prompt.
Then there's the additional question of what happens with character based languages. It's not obvious how it would make sense to assign multiple tokens to a single character but there's the question of how much information in character based vs phonic based words and what the information content of sentences with either one is.
Mindless2112 2 hours ago
simonw 2 hours ago
I also don't like how this article presents numbers for language differences - in the "The Language Tax" section - but fails to clarify which tokenizer and where those numbers came from.
cyberge99 36 minutes ago
lxgr 2 hours ago
The product itself seems genuinely useful, but the article reads very sensationalist about something that should be pretty obvious.
In other news: French publishers are paying 30% more for paper than English publishers!!
simianwords 2 hours ago
charcircuit 2 hours ago
simianwords 2 hours ago
AI commits a racism.
AI commits an environmentalism.
Now use my product (that won't solve either)
vfalbor 2 hours ago
You pay for what you use. That's the deal. Except it's not.
When you use an AI model — GPT-4, Claude, Gemini — you do not pay per word. You pay per token. And that tiny technical detail is quietly costing you, depending on which company you choose, up to 60% more for the exact same request.