MoE

Full Form: Mixture of Experts

Category: AI Architecture

📖 Definition

MoE architectures use multiple specialized 'expert' subnetworks for different tasks, with a router that decides which experts to use. This provides more capability with less compute.

🔑 Key Points

Different experts handle different types of inputs
Only relevant experts activate for each input
Enables larger effective model with less compute
Used in Mixtral, DeepSeek-V2, and others

💡 Why It Matters

MoE enables training larger, more capable models more efficiently. It's becoming common in frontier AI models.

MoE

📖 Definition

🔑 Key Points

💡 Why It Matters

🔗 Related Terms

🛠️ Related Tools