Posts
Enhancing GPTQv2 Format Support in vLLM: Analysis and Implementation
Deep technical analysis of GPTQv2 format limitations in vLLM, and implementation of CUDA kernel adaptations to enable efficient low-bit/asymmetric quantization inference.
Vision-Language-Action (VLA) Models: A Review of Recent Progress
Recent VLAs evolve from discrete to continuous, and from single-system (system 1 only) to dual-system.
Cheatsheet for Setting up Android Smartphones
Quickly setting up Android smartphones for development.
Cheatsheet for Setting up Termux on Android Smartphones
Quickly setting up Termux on Android smartphones for development.
- ← Prev
- 1 of 2
- Next →