About Me
Hi! I’m a Member of Technical Staff at Thinking Machines Lab, and Ph.D. student in EECS at UC Berkeley. In academia, I’m fortunate to be advised by Prof. Yi Ma, Prof. Jiantao Jiao, and Prof. Jason Lee. I’m affiliated with BAIR and supported by a UC Berkeley College of Engineering fellowship. Prior to my PhD, I completed a BA in CS and MS in EECS, also at UC Berkeley. Outside of academia, I’ve previously worked at Google DeepMind, Google Research, and NexusFlow (recently acquired by Nvidia).
My research interests broadly lie in developing principled methodology for large-scale deep learning. I work to develop scientific and mathematical principles for deep learning, apply these principles to analyze, simplify, and improve existing methods, and build and scale new principled approaches. As such, my work tends to have a combination of theory, controlled experiments, and larger-scale experiments. I’m particularly interested in how the structure of high-dimensional data (including environmental feedback) interacts with deep learning methods, and how this impacts representation learning, generalization, and scaling laws.
Notes for undergraduate and masters students.
Note 1: I'm happy to chat about research, graduate school, etc. Please send me an email and we can work out a time. Please include "[Advising Chat]" in your email title.
Note 2: I am currently wrapping up my ongoing research projects and making a transition to industry, so I do not have additional bandwidth to mentor new undergraduate or masters student collaborators over an extended period of time. Thank you for your understanding.
Selected Recent Work
Learning Deep Representations of Data Distributions Token Statistics Transformer: Linear-Time Attention via Variational Rate Reduction On the Edge of Memorization in Diffusion Models | White-Box Transformers via Sparse Rate Reduction: Compression is all There Is? Simplifying DINO via Coding Rate Regularization Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs |
<!– Recent Updates
- (September 2025) We gave a tutorial on Learning Deep Representations of Data Distributions at IAISS 2025..
- (August 2025) Our new open-source textbook Learning Deep Representations of Data Distributions was released.
- (May 2025) Our paper Simplifying DINO by Coding Rate Regularization was accepted to ICML 2025.
- (February 2025) Our papers Simplifying DINO by Coding Rate Regularization, Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs, and Attention-Only Transformers via Unrolled Subspace Denoising were accepted to CPAL 2025 (non-archival track).
- (January 2025) Our paper Token Statistics Transformer: Linear-Time Attention via Variational Rate Reduction was accepted (spotlight) to ICLR 2025.
- (September 2024) Our paper Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs was accepted (oral) to NeurIPS 2024 M3L Workshop.
- (September 2024) Our paper Scaling White-Box Transformers for Vision was accepted to NeurIPS 2024.
- (May 2024) Started a summer research scientist internship at NexusFlow.
- (May 2024) Our new comprehensive paper White-Box Transformers via Sparse Rate Reduction: Compression is all There Is?, reviewing our “White-Box Transformers” line of work: deriving efficient, interpretable, and performant transformer-like architectures from first-principles information theory and signal processing, was accepted to JMLR.
- (May 2024) Our paper A Global Geometric Analysis of Maximal Coding Rate Reduction was accepted to ICML 2024.
- (January 2024) Our paper Masked Completion via Structured Diffusion with White-Box Transformers, which develops a connection between iterative denoising in diffusion models and representation learning in transformer-like deep networks, and uses it to construct a performant, efficient, and interpretable transformer-like autoencoder, was accepted to ICLR 2024.
- (November 2023) Our papers Emergence of Segmentation with Minimalistic White-Box Transformers, Closed-Loop Transcription via Convolutional Sparse Coding, and Masked Completion via Structured Diffusion with White-Box Transformers were accepted to CPAL 2024.
- (October 2023) Our paper Emergence of Segmentation with Minimalistic White-Box Transformers was accepted to NeurIPS 2023 XAIA Workshop.
- (September 2023) Our paper White-Box Transformers via Sparse Rate Reduction, proposing an interpretable and parameter-efficient transformer-like architecture derived from first-principles, was accepted to NeurIPS 2023.
- (August 2023) Started my Ph.D. program in EECS at UC Berkeley! –>
