Papers
Building Efficient Inference Systems for Resource-Constrained Edge AI Deployment
ACM MobiSys Companion 2026
OxyGen: Unified KV Cache Management for VLA Inference under Multi-Task Parallelism
ArXiv preprint, 2026
Vec-LUT: Vector Table Lookup for Parallel Ultra-Low-Bit LLM Inference on Edge Devices
ACM MobiSys 2026Featured Paper (On-Device AI), Results Reproduced @AE
Squeezer: Efficient Multi-DNN Inference for Edge Video Analytics via Cross-Model Scheduling
IEEE Transactions on Mobile Computing (TMC), 2025
ChainStream: An LLM-based Framework for Unified Synthetic Sensing
ArXiv preprint, 2024
Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security
ArXiv preprint, 2024Survey & Position, “Efficiency” Section Lead