Mobilint’s NPU primarily operates on INT8 precision. Accordingly, our qb compiler is designed to quantize pre-trained deep learning models into INT8 to ensure optimal performance and efficiency.
However, to provide flexibility for various model requirements, we offer additional precision options:
- INT4 Quantization: Available for layers or operations that are robust to quantization, allowing for further reduced latency and memory footprint.
- INT16 Quantization: Supported for specific operations that are highly sensitive to quantization errors, ensuring the preservation of model accuracy.
By supporting a range of precisions from INT4 to INT16, our SDK allows you to balance performance and precision based on your specific use case.