Jeffrey W. Li

I’m a 5th year CSE PhD student at the University of Washington who is grateful to be advised by Ludwig Schmidt and Alexander Ratner. During my PhD, I have also spent time interning at both Apple MLR, hosted by Fartash Faghri, and Snorkel AI. Before my PhD, I obtained an MS in Machine Learning from CMU, where I was fortunate to be advised and introduced to ML research by Ameet Talwakar.

I am generally interested in applying data-centric approaches to improve the efficiency and performance of ML models. Most recently I have been particularly interested in LLM pre-training, completing recent projects about web-scale data curation and continual methods for reusing and updating previous models.

selected publications

ACL-2025

TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining

Jeffrey Li^*, Mohammadreza Armandpour^*, Iman Mirzadeh, and 8 more authors

ACL Main (Oral), 2025
NeurIPS-2024

DataComp-LM: In search of the next generation of training sets for language models

Jeffrey Li, Alex Fang, Georgios Smyrnis, and 56 more authors

NeurIPS Datasets and Benchmarks, 2024

Website
NeurIPS-2023

Characterizing the Impacts of Semi-supervised Learning for Weak Supervision

Jeffrey Li, Jieyu Zhang, Ludwig Schmidt, and 1 more author

NeurIPS, 2023