Jeffrey W. Li

prof_pic.jpg

I’m a 5th year CSE PhD student at the University of Washington who is grateful to be advised by Ludwig Schmidt and Alexander Ratner. During my PhD, I have also spent time interning at both Apple MLR, hosted by Fartash Faghri, and Snorkel AI. Before my PhD, I obtained an MS in Machine Learning from CMU, where I was fortunate to be advised and introduced to ML research by Ameet Talwakar.

I am generally interested in applying data-centric approaches to improve the efficiency and performance of ML models. Most recently I have been particularly interested in LLM pre-training, completing recent projects about web-scale data curation and continual methods for reusing and updating previous models.

selected publications

  1. ACL-2025
    tic-lm-main.png
    TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining
    Jeffrey Li*, Mohammadreza Armandpour*, Iman Mirzadeh, and 8 more authors
    ACL Main (Oral), 2025
  2. NeurIPS-2024
    dclm-fig-1.png
    DataComp-LM: In search of the next generation of training sets for language models
    Jeffrey Li, Alex Fang, Georgios Smyrnis, and 56 more authors
    NeurIPS Datasets and Benchmarks, 2024
  3. NeurIPS-2023
    ssl4ws.png
    Characterizing the Impacts of Semi-supervised Learning for Weak Supervision
    Jeffrey Li, Jieyu Zhang, Ludwig Schmidt, and 1 more author
    NeurIPS, 2023