What quantization precisions are supported by Mobilint's NPU?

Mobilint’s NPU primarily operates on INT8 precision. Accordingly, our qb compiler is designed to quantize pre-trained deep learning models into INT8 to ensure optimal performance and efficiency.

However, to provide flexibility for various model requirements, we offer additional precision options:

  • INT4 Quantization: Available for layers or operations that are robust to quantization, allowing for further reduced latency and memory footprint.
  • INT16 Quantization: Supported for specific operations that are highly sensitive to quantization errors, ensuring the preservation of model accuracy.

By supporting a range of precisions from INT4 to INT16, our SDK allows you to balance performance and precision based on your specific use case.