Quantization là gì
Giam do chinh xac tinh toan de tang toc
Quantization la ky thuat toi uu hoa mang no-ron trong do trong so va kich hoat cua mo hinh duoc chuyen doi tu dinh dang do chinh xac cao (FP32) sang do chinh xac thap (INT8, INT4), giam kich thuoc mo hinh va tang toc suy luan.
Cac loai Quantization
- Post-Training Quantization (PTQ) — sau khi huan luyen mo hinh
- Quantization-Aware Training (QAT) — trong qua trinh huan luyen
- Quantization Dong — trong qua trinh suy luan
- Quantization Tinh — voi hieu chuan du lieu
Dinh dang Do chinh xac
- FP32 — dau phay dong 32 bit (goc)
- FP16 — 16 bit (nua do chinh xac)
- INT8 — so nguyen 8 bit (nen 4x)
- INT4 — so nguyen 4 bit (nen 8x)
Loi ich
- Giam kich thuoc mo hinh 2-8 lan
- Tang toc suy luan 2-4 lan
- Giam tieu thu nang luong
- Kha nang chay tren thiet bi edge
Cong cu
- TensorRT (NVIDIA)
- ONNX Runtime
- PyTorch quantization
- TensorFlow Lite