About Me
Hi! I’m a Ph.D. student in EECS at UC Berkeley, where I’m fortunate to be advised by Prof. Yi Ma and Prof. Jiantao Jiao. I’m affiliated with BAIR and supported by a UC Berkeley College of Engineering fellowship. Prior to my PhD, I completed a BA in CS and MS in EECS, also at UC Berkeley.
My research interests broadly lie in developing principled methodology for large-scale deep learning. I work to develop scientific and mathematical principles for deep learning, apply these principles to analyze, simplify, and improve existing methods, and build and scale new principled approaches. As such, my work tends to have a combination of theory, controlled experiments, and larger-scale experiments. I’m particularly interested in how the structure of high-dimensional data (including environmental feedback) interacts with deep learning methods, and how this impacts representation learning and generalization.
Notes for undergraduate and masters students.
Note 1: I'm happy to chat about research, graduate school, etc. Please send me an email and we can work out a time. Please include "[Advising Chat]" in your email title.
Note 2: If you are interested in working with me on deep learning research, please send me an email with your background and specific interests (the more detailed, the better). Please mention what you would like to work on. Please include "[Research Collaboration Request]" in your email title. The ideal candidate is:
- able and willing to invest at least 15 hours per week;
- highly self-sufficient;
- able to read and understand deep learning papers;
- comfortable with advanced linear algebra and probability;
- proficient with either PyTorch (preferred) or Jax.
PS: I get many serious inquiries about advising chats or potential research collaborations. I try my best to reply to every single one of them. If you don't see a reply after, say, a week, feel free to bump the email thread. In return, if you're writing to ask for a research collaboration, please seriously think about whether you are interested in the work and are willing to invest time on it.
Selected Recent Work
White-Box Transformers via Sparse Rate Reduction: Compression is all There Is? Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs | Token Statistics Transformer: Linear-Time Attention via Variational Rate Reduction Simplifying DINO via Coding Rate Regularization |