Latest posts

  • 12th October 2025

Enhancing GPTQv2 Format Support in vLLM: Analysis and Implementation

Deep technical analysis of GPTQv2 format limitations in vLLM, and implementation of CUDA kernel adaptations to enable efficient low-bit/asymmetric quantization inference.

Read more 
  • 16th September 2025

Vision-Language-Action (VLA) Models: A Review of Recent Progress

Recent VLAs evolve from discrete to continuous, and from single-system (system 1 only) to dual-system.

Read more 
  • 2nd August 2025
  • 9th January 2025
  • 9th January 2025