Tao Luo

I am a CS Ph.D. candidate at the University of Pennsylvania (defending 2026), advised by Profs. Boon Thau Loo and Vincent Liu.

My research is on GPU scheduling and post-training infrastructure for extreme-scale LLMs, with a focus on agentic RL and heterogeneous LLM serving. I advanced GPU resource allocation in Alibaba’s ROLL framework, with production systems deployed across 1000+ GPUs training 100B+ parameter models.

Previously at Columbia University (M.S.), I coined privacy budget scheduling, leading the first study on scheduling ML training under differential privacy constraints. I was advised by Prof. Asaf Cidon and collaborated broadly with Profs. Ethan Katz-Bassett, Ryan Stutsman, Mathias Lécuyer, and Roxana Geambasu.

Before academia, I developed quantitative investment algorithms in the financial industry. I hold a B.S. in Financial Mathematics from Southern University of Science and Technology, as a member of its founding cohort.

Selected Projects

GPU Scheduling for Agentic RL @Alibaba

  • Designed and implemented a Partial Overlapping GPU scheduling algorithm for asynchronous agentic RL: reassigns idle training GPUs to rollout workers, maximizing GPU utilization.
  • Enabled concurrent multi-LoRA RL via per-adapter optimizer states on a shared Megatron base model with cross-engine weight synchronization.
  • Architected multi-tenant RL scheduling: decoupled per-job training logic from global GPU allocation via heartbeat progress, versioned checkpoints, and selective weight syncing (open-source in progress: rlops/rlix).
  • Deployed in production (100B+ parameters, 1000+ GPUs): Qoder IDE (coding), iFlow CLI (coding), Amap (travel planning), and Alimama (ads).
  • Pioneered vibe coding in production: Partial Overlapping was the first feature shipped from first commit to production with zero human-written code, via disciplined human-AI collaboration loops (English/Chinese).

ParaFlex: Multiplexed Heterogeneous LLM Serving via Stage-Aligned Parallelism @UPenn

  • Eliminated head-of-line blocking via novel LLM serving architecture, raising token throughput by 1.6×.
  • Built efficient multi-model KV cache management and robust NCCL concurrency controls.
  • Optimized sharding, replication, placement, and scheduling strategies.
  • SoCC’25 paper

Privacy Budget Scheduling in ML Training @Columbia

  • Scheduled more jobs than FCFS under identical privacy budgets.
  • Proposed a dynamic algorithm DPF (Dominant Private Block Fairness) based on DRF (dominant resource fairness).
  • Developed rigorous proofs for the game-theory properties of the new algorithm.
  • OSDI’21 paper

Honors & Service

  • Program Committee: ACM Symposium on Cloud Computing 2025
  • Manjushri Fellowship, University of Pennsylvania, 2021
  • Financial Risk Manager (FRM) Certification, 2015
  • China Merchant Bank Scholarship, 2012-2014
  • Pioneering Undergraduate Fellowship, 2011-2014
  • First Prize, China High School Biology Olympiad, 2010