About Me
Hi! I’m a Ph.D. student in EECS at UC Berkeley, where I’m fortunate to be advised by Prof. Yi Ma and Prof. Jiantao Jiao. I’m affiliated with BAIR and supported by a UC Berkeley College of Engineering fellowship. Prior to my PhD, I completed a BA in CS and MS in EECS, also at UC Berkeley.
My research interests broadly lie in developing theory for large-scale empirical deep learning methodology. I work on this problem through the following intertwined threads:
- Finding theoretical principles for deep learning that are relevant at large scales.
- Building theoretically principled deep learning systems at large scales.
I’m particularly interested in problem instances where data is high-dimensional yet has rich structure, such as computer vision and natural language processing, and how this structure interacts with mechanisms for representation and generation within deep neural networks.
Here are some specific problems I'm interested in.
Large Language Models (LLMs): What concepts and algorithms do LLMs learn, and how are they represented mechanistically? How do approximate retrieval and approximate reasoning manifest in LLMs? How do the (pre-)training dynamics of LLMs adapt to the structure of the training data and produce high-level model behaviors?
Diffusion Models: What allows diffusion models to generalize beyond the empirical distribution of their training data? What structures within data and network architecture enable diffusion models to succeed in some domains and not others?
Multi-Modal Deep Learning: What are the key information-theoretical principles of cross-modality learning? What is the relationship between the representations of text and visual data (both in modern vision-language models and conditional diffusion models), and how is this relationship mechanistically enforced by the underlying deep neural network?
Vision Self-Supervised Learning: How to model faithful and high-quality representations of visual data for recognition tasks? I'm especially interested in developing and applying principles for two problems: (1) continual self-supervised learning, (2) self-supervised learning of dynamic time-correlated data (such as frames of videos).
Finally: How to leverage answers to the above questions to build more powerful, more sample-efficient, multi-modal deep learning models at large scale?
Notes for undergraduate and masters students.
Note 1: I'm happy to chat about my research or general advising. Please send me an email and we can work out a time. Please include "[Advising Inquiry]" in your email title.
Note 2: If you are interested in research collaboration, please send me an email with your background and specific interests (the more detailed, the better). Please include "[Research Collaboration Inquiry]" in your email title. The recommended time investment is at least 15 hours per week. Unfortunately, right now my schedule is tight and generally does not permit consistent long-term mentoring of younger students, so some degree of self-sufficiency would be highly valued. To ensure a more fruitful collaboration, it would be best to have the technical knowledge to read and understand deep learning papers, especially theory-oriented work. Thank you for your understanding.
Recent Updates
- (September 2024) Our paper Scaling White-Box Transformers for Vision was accepted to NeurIPS 2024.
- (May 2024) Started a summer research scientist internship at NexusFlow.
- (May 2024) Our new comprehensive paper White-Box Transformers via Sparse Rate Reduction: Compression is all There Is?, reviewing our “White-Box Transformers” line of work: deriving efficient, interpretable, and performant transformer-like architectures from first-principles information theory and signal processing, was accepted to JMLR.
- (May 2024) Our paper A Global Geometric Analysis of Maximal Coding Rate Reduction was accepted to ICML 2024.
- (January 2024) Our paper Masked Completion via Structured Diffusion with White-Box Transformers, which develops a connection between iterative denoising in diffusion models and representation learning in transformer-like deep networks, and uses it to construct a performant, efficient, and interpretable transformer-like autoencoder, was accepted to ICLR 2024.
- (November 2023) Our papers Emergence of Segmentation with Minimalistic White-Box Transformers, Closed-Loop Transcription via Convolutional Sparse Coding, and Masked Completion via Structured Diffusion with White-Box Transformers were accepted to CPAL 2024.
- (October 2023) Our paper Emergence of Segmentation with Minimalistic White-Box Transformers was accepted to NeurIPS 2023 XAIA Workshop.
- (September 2023) Our paper White-Box Transformers via Sparse Rate Reduction, proposing an interpretable and parameter-efficient transformer-like architecture derived from first-principles, was accepted to NeurIPS 2023.
- (August 2023) Started my Ph.D. program in EECS at UC Berkeley!