Xiangyu Li

Enhancing GPTQv2 Format Support in vLLM: Analysis and Implementation

October 12, 2025

Deep technical analysis of GPTQv2 format limitations in vLLM, and implementation of CUDA kernel adaptations to enable efficient low-bit/asymmetric quantization inference.

Vision-Language-Action (VLA) Models: A Review of Recent Progress

September 16, 2025

Recent VLAs evolve from discrete to continuous, and from single-system (system 1 only) to dual-system.

Reading Notes of Dario Amodei's Blog

August 2, 2025

Reading Notes of Dario Amodei's Blog.

Cheatsheet for Setting up Android Smartphones

January 9, 2025

Quickly setting up Android smartphones for development.

Cheatsheet for Setting up Termux on Android Smartphones

January 9, 2025

Quickly setting up Termux on Android smartphones for development.

Cheatsheet for Setting up Pi Devices

January 3, 2025

Quickly setting up new single-board computers like Raspberry Pi.

"口袋里的 GPT"，离我们还有多远？

November 21, 2023

唠一唠端侧大模型部署那些事。