Tao Luo

I am a CS Ph.D. candidate at the University of Pennsylvania (defending 2026), advised by Profs. Boon Thau Loo and Vincent Liu.

I build GPU scheduling, agentic RL post-training, and inference systems for large-scale LLMs. At Alibaba, I shipped Partial Overlapping, a high-priority feature in ROLL, for production RL training of models with 100s of billions of parameters on 1000s of GPUs, open-sourced RLix, and contributed to the ROME technical report. My work spans vLLM, Megatron-LM, and Ray.

My work has appeared at OSDI, SOSP, and SoCC. During M.S. study at Columbia University, I coined Privacy Budget Scheduling, the first study of scheduling ML training under differential privacy constraints, advised by Prof. Asaf Cidon.

Before academia, I developed quantitative investment algorithms in finance. I hold a B.S. in Financial Mathematics from Southern University of Science and Technology.

Selected Projects

GPU Scheduling and RL Infrastructure @Alibaba, DAMO Academy

Proposed Partial Overlapping, a scheduling mechanism for asynchronous agentic RL that reassigns idle training GPUs to rollout workers, improving rollout throughput by 3.5x.
Built Partial Overlapping entirely through AI coding (English/Chinese); it was a high-priority feature in alibaba/ROLL and the first built with zero human-written code.
Partial Overlapping is used in production for RL training of models with 100s of billions of parameters on 1000s GPUs, including Qoder IDE (coding), iFlow CLI (coding), Amap (travel planning), and Alimama (ads).
Extended Partial Overlapping to async multi-LoRA RL via per-adapter optimizers on a shared Megatron base model.
Generalized Partial Overlapping into RLix, a recipe-transparent GPU time-sharing control plane for full fine-tuning and async multi-LoRA RL, improving GPU utilization while preserving training semantics; open-sourced on GitHub as rlops/rlix.

ParaFlex: Multiplexed Heterogeneous LLM Serving via Stage-Aligned Parallelism @University of Pennsylvania

Proposed a novel LLM serving architecture that eliminates head-of-line blocking and improves token throughput by 1.6x.
Built multi-model KV cache management and robust NCCL concurrency controls.
Optimized sharding, replication, placement, and scheduling algorithms for heterogeneous serving workloads.
SoCC’25 paper

Privacy Budget Scheduling in ML Training @Columbia University

Coined Privacy Budget Scheduling and showed how to schedule 2x more jobs than FCFS under the same privacy budget.
Proposed DPF (Dominant Private Block Fairness), a dynamic scheduling algorithm derived from DRF.
Developed formal proofs for the algorithm’s game-theoretic properties.
OSDI’21 paper

Honors & Service

Program Committee: ACM Symposium on Cloud Computing 2025
Manjushri Fellowship, University of Pennsylvania, 2021
Financial Risk Manager (FRM) Certification, 2015
China Merchant Bank Scholarship, 2012-2014
Pioneering Undergraduate Fellowship, 2011-2014
First Prize, China High School Biology Olympiad, 2010