About Me

Hi! I’m a Ph.D. student in EECS at UC Berkeley, where I’m fortunate to be advised by Prof. Yi Ma, Prof. Jiantao Jiao, and Prof. Jason Lee. I’m affiliated with BAIR and supported by a UC Berkeley College of Engineering fellowship. Prior to my PhD, I completed a BA in CS and MS in EECS, also at UC Berkeley. Outside of academia, I’ve previously worked at Google DeepMind, Google Research, and NexusFlow (recently acquired by Nvidia).

My research interests broadly lie in developing principled methodology for large-scale deep learning. I work to develop scientific and mathematical principles for deep learning, apply these principles to analyze, simplify, and improve existing methods, and build and scale new principled approaches. As such, my work tends to have a combination of theory, controlled experiments, and larger-scale experiments. I’m particularly interested in how the structure of high-dimensional data (including environmental feedback) interacts with deep learning methods, and how this impacts representation learning and generalization.

Notes for undergraduate and masters students.
Note 1: I'm happy to chat about research, graduate school, etc. Please send me an email and we can work out a time. Please include "[Advising Chat]" in your email title.

Note 2: I am currently wrapping up my ongoing research projects and making a transition to industry, so I do not have additional bandwidth to mentor new undergraduate or masters student collaborators over an extended period of time. Thank you for your understanding.


Selected Recent Work

Learning Deep Representations of Data Distributions
Open-Source Textbook
Website | GitHub

Token Statistics Transformer: Linear-Time Attention via Variational Rate Reduction
ICLR 2025 (Spotlight)
Paper | Code

On the Edge of Memorization in Diffusion Models
Paper | Code

White-Box Transformers via Sparse Rate Reduction: Compression is all There Is?
JMLR 2024 (parts at NIPS 2023, ICLR 2024, CPAL 2024)
Paper | Code

Simplifying DINO via Coding Rate Regularization
ICML 2025
Paper | Code

Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs
Paper | Code