Tao Luo
I am a CS Ph.D. candidate at the University of Pennsylvania (defending 2026), advised by Profs. Boon Thau Loo and Vincent Liu.
I build GPU scheduling, agentic RL post-training, and inference systems for large-scale LLMs. At Alibaba, I shipped Partial Overlapping, a high-priority feature in ROLL, for production RL training of models with 100s of billions of parameters on 1000s of GPUs, open-sourced RLix, and contributed to the ROME technical report. My work spans vLLM, Megatron-LM, and Ray.
My work has appeared at OSDI, SOSP, and SoCC. During M.S. study at Columbia University, I coined Privacy Budget Scheduling, the first study of scheduling ML training under differential privacy constraints, advised by Prof. Asaf Cidon.
Before academia, I developed quantitative investment algorithms in finance. I hold a B.S. in Financial Mathematics from Southern University of Science and Technology.
Selected Projects
GPU Scheduling and RL Infrastructure @Alibaba, DAMO Academy
- Proposed Partial Overlapping, a scheduling mechanism for asynchronous agentic RL that reassigns idle training GPUs to rollout workers, improving rollout throughput by 3.5x.
- Built Partial Overlapping entirely through AI coding (English/Chinese); it was a high-priority feature in alibaba/ROLL and the first built with zero human-written code.
- Partial Overlapping is used in production for RL training of models with 100s of billions of parameters on 1000s GPUs, including Qoder IDE (coding), iFlow CLI (coding), Amap (travel planning), and Alimama (ads).
- Extended Partial Overlapping to async multi-LoRA RL via per-adapter optimizers on a shared Megatron base model.
- Generalized Partial Overlapping into RLix, a recipe-transparent GPU time-sharing control plane for full fine-tuning and async multi-LoRA RL, improving GPU utilization while preserving training semantics; open-sourced on GitHub as rlops/rlix.
ParaFlex: Multiplexed Heterogeneous LLM Serving via Stage-Aligned Parallelism @University of Pennsylvania
- Proposed a novel LLM serving architecture that eliminates head-of-line blocking and improves token throughput by 1.6x.
- Built multi-model KV cache management and robust NCCL concurrency controls.
- Optimized sharding, replication, placement, and scheduling algorithms for heterogeneous serving workloads.
- SoCC’25 paper
Privacy Budget Scheduling in ML Training @Columbia University
- Coined Privacy Budget Scheduling and showed how to schedule 2x more jobs than FCFS under the same privacy budget.
- Proposed DPF (Dominant Private Block Fairness), a dynamic scheduling algorithm derived from DRF.
- Developed formal proofs for the algorithm’s game-theoretic properties.
- OSDI’21 paper
Honors & Service
- Program Committee: ACM Symposium on Cloud Computing 2025
- Manjushri Fellowship, University of Pennsylvania, 2021
- Financial Risk Manager (FRM) Certification, 2015
- China Merchant Bank Scholarship, 2012-2014
- Pioneering Undergraduate Fellowship, 2011-2014
- First Prize, China High School Biology Olympiad, 2010
