Quantization
Category: AI Techniques
📖 Definition
Quantization reduces AI model size by using fewer bits to represent numbers. It trades some accuracy for dramatically smaller models that run faster and need less memory.
🔑 Key Points
- Reduces model size (e.g., 16-bit to 8-bit to 4-bit)
- Enables running large models on consumer hardware
- Quality loss is often minimal with modern techniques
- AWQ, GGML, and GPTQ are popular methods
💡 Why It Matters
Quantization makes powerful AI accessible to more people. It enables running large models on regular computers and even phones.