News
See all →- 2026/05 Vec-LUT selected as featured paper for the On-Device AI session of ACM MobiSys 2026. PaperCodeModel
- 2026/05 OxyGen updated: released ArXiv v2, and added PyTorch support (previously JAX-only) for on-board deployment (e.g., on Jetson AGX Thor). PaperCode
- 2026/05 EmbodiSkill: Skill-Aware Reflection for Self-Evolving Embodied Agents released. Paper
Selected Papers
See all →OxyGen: Unified KV Cache Management for VLA Inference under Multi-Task Parallelism
ArXiv preprint, 2026
Vec-LUT: Vector Table Lookup for Parallel Ultra-Low-Bit LLM Inference on Edge Devices
ACM MobiSys 2026Featured Paper (On-Device AI), Results Reproduced @AE
Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security
ArXiv preprint, 2024Survey & Position, “Efficiency” Section Lead
Latest Posts
See all →Enhancing GPTQv2 Format Support in vLLM: Analysis and Implementation
Deep technical analysis of GPTQv2 format limitations in vLLM, and implementation of CUDA kernel adaptations to enable efficient low-bit/asymmetric quantization inference.
Vision-Language-Action (VLA) Models: A Review of Recent Progress
Recent VLAs evolve from discrete to continuous, and from single-system (system 1 only) to dual-system.
Reading Notes of Dario Amodei's Blog
Reading Notes of Dario Amodei's Blog.