Blog

Enhancing GPTQv2 Format Support in vLLM: Analysis and Implementation

Deep technical analysis of GPTQv2 format limitations in vLLM, and implementation of CUDA kernel adaptations to enable efficient low-bit/asymmetric quantization inference.

Vision-Language-Action (VLA) Models: A Review of Recent Progress

Recent VLAs evolve from discrete to continuous, and from single-system (system 1 only) to dual-system.

Reading Notes of Dario Amodei's Blog

Reading Notes of Dario Amodei's Blog.

Cheatsheet for Setting up Android Smartphones

Quickly setting up Android smartphones for development.

Cheatsheet for Setting up Termux on Android Smartphones

Quickly setting up Termux on Android smartphones for development.

Cheatsheet for Setting up Pi Devices

Quickly setting up new single-board computers like Raspberry Pi.

"口袋里的 GPT",离我们还有多远?

唠一唠端侧大模型部署那些事。