I work on multimodal models that learn predictive representations of the physical world — how machines build internal models that support prediction, planning, and generalization. I'm currently on the World Models team at Waymo, and previously spent ~3 years as an AI researcher at Meta FAIR.
My work spans self-supervised video representation learning, 3D vision-language grounding, and large-scale multimodal training. I'm increasingly interested in interpretability — understanding what these predictive models actually learn internally.
Before research, I led the AI efforts at deep dive and majored in Applied Mathematics.
World models and multimodal foundation models for autonomous driving
World models for robotics, 3D vision-language grounding for robotic manipulation, and physical world modeling
Visual representations for robot control, language models for planning, and embodied AI research
Computer Vision and Natural Language Processing systems
Graduated with highest honors (Magna Cum Laude), top 3% of students
Learning predictive models of the world for planning and decision making
Grounding language in 3D space for embodied understanding
Learning visual representations that transfer across embodiments and tasks
Enabling robots to chain skills and plan complex behaviors