News
See all →- [Mar. 2026] "OxyGen: Unified KV Cache Management for Vision-Language-Action Models under Multi-Task Parallelism" released. Paper / Code
- [Mar. 2026] "Vec-LUT: Vector Table Lookup for Parallel Ultra-Low-Bit LLM Inference on Edge Devices" accepted to MobiSys 2026. Paper / Code / Model
- [Aug. 2025] "An Empirical Study of LLM Reasoning Ability Under Strict Output Length Constraint" accepted to EMNLP 2025. Paper / Page
Selected Publications
See all →
Latest Posts
See all →Enhancing GPTQv2 Format Support in vLLM: Analysis and Implementation
Deep technical analysis of GPTQv2 format limitations in vLLM, and implementation of CUDA kernel adaptations to enable efficient low-bit/asymmetric quantization inference.
Vision-Language-Action (VLA) Models: A Review of Recent Progress
Recent VLAs evolve from discrete to continuous, and from single-system (system 1 only) to dual-system.
Reading Notes of Dario Amodei's Blog
Reading Notes of Dario Amodei's Blog.