Quantization

12th October 2025

Enhancing GPTQv2 Format Support in vLLM: Analysis and Implementation

Deep technical analysis of GPTQv2 format limitations in vLLM, and implementation of CUDA kernel adaptations to enable efficient low-bit/asymmetric quantization inference.

← All tags