Jeffrey W. Li

I’m a 5th year CSE PhD student at the University of Washington who is grateful to be advised by Ludwig Schmidt and Alexander Ratner. During my PhD, I have also spent time interning at both Apple MLR, hosted by Fartash Faghri, and Snorkel AI. Before my PhD, I obtained an MS in Machine Learning from CMU, where I was fortunate to be advised and introduced to ML research by Ameet Talwakar.
I am generally interested in applying data-centric approaches to improve the efficiency and performance of ML models. Most recently I have been particularly interested in LLM pre-training, completing recent projects about web-scale data curation and continual methods for reusing and updating previous models.
selected publications
- ACL-2025TiC-LM: A Web-Scale Benchmark for Time-Continual LLM PretrainingACL Main (Oral), 2025
- NeurIPS-2024DataComp-LM: In search of the next generation of training sets for language modelsNeurIPS Datasets and Benchmarks, 2024
- NeurIPS-2023Characterizing the Impacts of Semi-supervised Learning for Weak SupervisionNeurIPS, 2023