About Me
Hi! I’m a Ph.D. student in EECS at UC Berkeley, where I’m fortunate to be advised by Prof. Yi Ma and Prof. Jiantao Jiao. I’m affiliated with BAIR and supported by a UC Berkeley College of Engineering fellowship. Prior to my PhD, I completed a BA in CS and MS in EECS, also at UC Berkeley.
My research interests broadly lie in developing theoretically principled large-scale empirical deep learning methodology. I work on building theoretical principles for existing approaches as well as developing new approaches from first principles. As such, my work tends to have a combination of theory, controlled experiments, and larger-scale experiments.
I’m particularly interested in problem instances where data is high-dimensional yet has rich structure, such as computer vision and natural language processing, and how this structure interacts with mechanisms for representation and generation within deep neural networks.
If this sounds too vague to you, here are some specific problems I'm interested in.
Large Language Models (LLMs): What concepts and algorithms do LLMs learn, and how are they represented mechanistically? How do approximate retrieval and approximate reasoning manifest in LLMs? How do the (pre-)training dynamics of LLMs adapt to the structure of the training data and produce high-level model behaviors?
Diffusion Models: What allows diffusion models to generalize beyond the empirical distribution of their training data? What structures within data and network architecture enable diffusion models to succeed in some domains and not others?
Multi-Modal Deep Learning: What are the key information-theoretical principles of cross-modality learning? What is the relationship between the representations of text and visual data (both in modern vision-language models and conditional diffusion models), and how is this relationship mechanistically enforced by the underlying deep neural network?
Vision Self-Supervised Learning: How to model faithful and high-quality representations of visual data for recognition tasks? I'm especially interested in developing and applying principles for two problems: (1) continual self-supervised learning, (2) self-supervised learning of dynamic time-correlated data (such as frames of videos).
Finally: How to leverage answers to the above questions to build more powerful, more sample-efficient, multi-modal deep learning models at large scale?
Notes for undergraduate and masters students.
Note 1: I'm happy to chat about my research or general advising. Please send me an email and we can work out a time. Please include "[Advising Inquiry]" in your email title.
Note 2: If you are interested in research collaboration, please send me an email with your background and specific interests (the more detailed, the better). Please mention what you would like to work on. Please include "[Research Collaboration Inquiry]" in your email title. The recommended time investment is at least 15 hours per week. Self-sufficiency is highly valued. To ensure a more fruitful collaboration, you should be able to read and understand deep learning papers, be comfortable with high-level linear algebra and probability, and be acquainted with either PyTorch or Jax. Thank you for your understanding.
PS: I try my best to reply to every serious inquiry about an advising chat or potential research collaboration. If you don't see a reply after, say, a week, feel free to bump the email thread. In return, if you're writing to ask to work with me, I ask that you really think about whether you are interested in the work and are willing to spend the time on it.
Recent Updates
- (September 2024) Our paper Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs was accepted (oral) to NeurIPS 2024 M3L Workshop.
- (September 2024) Our paper Scaling White-Box Transformers for Vision was accepted to NeurIPS 2024.
- (May 2024) Started a summer research scientist internship at NexusFlow.
- (May 2024) Our new comprehensive paper White-Box Transformers via Sparse Rate Reduction: Compression is all There Is?, reviewing our “White-Box Transformers” line of work: deriving efficient, interpretable, and performant transformer-like architectures from first-principles information theory and signal processing, was accepted to JMLR.
- (May 2024) Our paper A Global Geometric Analysis of Maximal Coding Rate Reduction was accepted to ICML 2024.
- (January 2024) Our paper Masked Completion via Structured Diffusion with White-Box Transformers, which develops a connection between iterative denoising in diffusion models and representation learning in transformer-like deep networks, and uses it to construct a performant, efficient, and interpretable transformer-like autoencoder, was accepted to ICLR 2024.
- (November 2023) Our papers Emergence of Segmentation with Minimalistic White-Box Transformers, Closed-Loop Transcription via Convolutional Sparse Coding, and Masked Completion via Structured Diffusion with White-Box Transformers were accepted to CPAL 2024.
- (October 2023) Our paper Emergence of Segmentation with Minimalistic White-Box Transformers was accepted to NeurIPS 2023 XAIA Workshop.
- (September 2023) Our paper White-Box Transformers via Sparse Rate Reduction, proposing an interpretable and parameter-efficient transformer-like architecture derived from first-principles, was accepted to NeurIPS 2023.
- (August 2023) Started my Ph.D. program in EECS at UC Berkeley!