Tech Blog
I believe in two things: 1. Mathematically-grounded work, if verified on small models, is more likely to be scalable to large models; 2. I as a rigorous AI researcher should understand the nitty-gritty details of deep learning systems to conduct large scale experiments with high iteration speed.
Math
- VAE and Diffusion
- LeJEPA: Isotropic Gaussian Latents for JEPA
- Linear Attention
- The Cumulant Generating Function: From Moments to Max
- Meta-Learning and Reward Learning Algorithms
DLSys
- Understand FSDP2
- FSDP2 Small Tricks
- Ring All-Reduce
- Ring Flash Attention (As An Example of Context Parallelism)
- Tensor Parallel, Sequence Parallel and Loss Parallel
Paper Reading
- Don’t engineer around a base model that can’t utilize a long context to reason. Do research instead.
- Is RL just distribution sharpening?
- Multi-agent Papers
- Similar teacher is better than a strong but different teacher
Intelligence
- A Brief Survey on Animal Behavior Studies
- Evolution of Cooperation[1]: Recursive Belief and Common Ground
- Evolution of Cooperation[2]: Ape Vocalizations and Gestures
- Evolution of Cooperation[3]: Joint Attention and Pre-Linguistic Gestures
- Evolution of Cooperation[4]: Origins of Cooperative Motivation
- Evolution of Cooperation[5]: Brain Evolution and MARL Reflections