Mixture of Experts (MoE)

August 22, 2025 2 weeks ago 1 min read

Architecture where a router activates subsets of expert MLPs per token, increasing capacity efficiently.