Xiangyu Li

Ph.D. Candidate, AIR, Tsinghua University

I am a 4th year Ph.D. candidate at AIR, THU, working on efficient inference systems for edge AI. My current research interests focus on model-system co-design for embodied AI, especially VLA training and deployment.

News

See all →

2026/05 Vec-LUT selected as featured paper for the On-Device AI session of ACM MobiSys 2026. Paper Code Model
2026/05 OxyGen updated: released ArXiv v2, and added PyTorch support (previously JAX-only) for on-board deployment (e.g., on Jetson AGX Thor). Paper Code
2026/05 EmbodiSkill: Skill-Aware Reflection for Self-Evolving Embodied Agents released. Paper

Selected Papers

See all →

OxyGen: Unified KV Cache Management for VLA Inference under Multi-Task Parallelism

Xiangyu Li, Huaizhi Tang, Xin Ding, Weijun Wang, Ting Cao, Yunxin Liu

ArXiv preprint, 2026

Paper Code

Vec-LUT: Vector Table Lookup for Parallel Ultra-Low-Bit LLM Inference on Edge Devices

Xiangyu Li^*, Chengyu Yin^*, Weijun Wang, Jianyu Wei, Ting Cao, Yunxin Liu

ACM MobiSys 2026Featured Paper (On-Device AI), Results Reproduced @AE

Paper Code Model

FlexNN: Efficient and Adaptive DNN Inference on Memory-Constrained Edge Devices

Xiangyu Li, Yuanchun Li, Yuanzhe Li, Ting Cao, Yunxin Liu

ACM MobiCom 2024Results Replicated @AE

Paper Code Slides

Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security

Yuanchun Li^†, Hao Wen^‡, Weijun Wang^‡, Xiangyu Li^‡, Yizhen Yuan^‡, Guohong Liu^‡, Jiacheng Liu, Wenxing Xu, Xiang Wang, Yi Sun, Rui Kong, Yile Wang, Hanfei Geng, Jian Luan, Xuefeng Jin, Zilong Ye, Guanjing Xiong, Fan Zhang, Xiang Li, Mengwei Xu, Zhijun Li, Peng Li, Yang Liu, Ya-Qin Zhang, Yunxin Liu

ArXiv preprint, 2024Survey & Position, “Efficiency” Section Lead

Paper Code

Latest Posts

See all →

Xiangyu Li

News

Selected Papers

OxyGen: Unified KV Cache Management for VLA Inference under Multi-Task Parallelism

Vec-LUT: Vector Table Lookup for Parallel Ultra-Low-Bit LLM Inference on Edge Devices

FlexNN: Efficient and Adaptive DNN Inference on Memory-Constrained Edge Devices

Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security

Latest Posts

Enhancing GPTQv2 Format Support in vLLM: Analysis and Implementation

Vision-Language-Action (VLA) Models: A Review of Recent Progress

Reading Notes of Dario Amodei's Blog