Show HN: Anitag2vec – tag embeddings for recommendations

Posted by michael-0acf4 |2 hours ago |1 comments

michael-0acf4 2 hours ago

This is similar in spirit to Deep Sets (learning embeddings over unordered collections), but instead of a permutation-invariant MLP it uses a small Transformer encoder without positional encoding. In practice this helps capture things like spelling variation and co-occurrence structure between the tags since they are usually way more well-behaved than generic sets.

The approach itself is very general; however, the available models were trained on imageboard and anime-focused tags. I did some experiments with the clearly biased models on completely nonsensical tags, and they still performed fairly well (suggesting some latent understanding of permutation invariance, cosine scores decrease but not significantly when the one term is a subset of the other).

More links: - https://huggingface.co/michael-0acf4/anitag2vec - https://blog.afmichael.dev/posts/2026/set-embeddings-and-ani...