WordPiece Tokenization

August 23, 2025 2 weeks ago 1 min read

Subword tokenization algorithm used by BERT like models to balance vocabulary size and coverage by splitting rare words into learned pieces.