MoE
Full Form: Mixture of Experts
Category: AI Architecture
📖 Definition
MoE architectures use multiple specialized 'expert' subnetworks for different tasks, with a router that decides which experts to use. This provides more capability with less compute.
🔑 Key Points
- Different experts handle different types of inputs
- Only relevant experts activate for each input
- Enables larger effective model with less compute
- Used in Mixtral, DeepSeek-V2, and others
💡 Why It Matters
MoE enables training larger, more capable models more efficiently. It's becoming common in frontier AI models.