Quantization

Category: AI Techniques

📖 Definition

Quantization reduces AI model size by using fewer bits to represent numbers. It trades some accuracy for dramatically smaller models that run faster and need less memory.

🔑 Key Points

Reduces model size (e.g., 16-bit to 8-bit to 4-bit)
Enables running large models on consumer hardware
Quality loss is often minimal with modern techniques
AWQ, GGML, and GPTQ are popular methods

💡 Why It Matters

Quantization makes powerful AI accessible to more people. It enables running large models on regular computers and even phones.

Quantization

📖 Definition

🔑 Key Points

💡 Why It Matters

🔗 Related Terms

🛠️ Related Tools