Sergio Arnaud

AI Researcher @ Meta FAIR
Sergio Arnaud

About

I spend most of my time thinking about how to build robots that can understand and interact with the physical world. Currently working on world models and foundation models for robotics at Meta FAIR.

My research focuses on representation learning, 3D vision-language grounding, and cross-embodiment transfer. I'm particularly interested in how we can get robots to learn from diverse data sources and generalize across different embodiments and tasks without extensive task-specific training.

Before this, I led AI development at deep dive and studied Applied Mathematics at ITAM in Mexico City.

Background

Senior Research Engineer

Meta FAIR · March 2024 – Present · Menlo Park, CA

World models for robotics, 3D vision-language grounding for robotic manipulation, and physical world modeling

AI Resident

Meta FAIR · September 2022 – February 2024 · Menlo Park, CA

Visual representations for robot control, language models for planning, and embodied AI research

Tech Lead (AI)

deep dive (dive.ai) · March 2020 – July 2022 · Mexico City, Mexico

Computer Vision and Natural Language Processing systems

BSc Applied Mathematics

Instituto Tecnológico Autónomo de México (ITAM) · 2020 · Mexico City, Mexico

Graduated with highest honors (Magna Cum Laude), top 3% of students

Featured Publications

World Modeling

Learning predictive models of the world for planning and decision making

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

M. Assran* , A. Bardes* , D. Fan* , Q. Garrido* , R. Howes* , M. Komeili* , M. Muckley* , A. Rizvi* , C. Roberts* , K. Sinha* , A. Zholus* , Sergio Arnaud* , et al.

arXiv 2025

Heterogeneous World Models for Cross-Embodiment Transfer

Sergio Arnaud , et al.

In Progress
Visuo-Tactile World Models demo

Visuo-Tactile World Models

Carolina Higuera , Sergio Arnaud , Byron Boots , Mustafa Mukadam , Francois Robert Hogan , Franziska Meier

ICLR 2026 In Press
Human-level Learning of Complex Novel Tasks as Theory-Based Modeling, Exploration, and Planning demo

Human-level Learning of Complex Novel Tasks as Theory-Based Modeling, Exploration, and Planning

P.A. Tsividis , J. Loula , J. Burga , J.P. Rodriguez , Sergio Arnaud , N. Foss , A. Campero , A. Subramanian , T. Pouncy , S.J. Gershman , J.B. Tenenbaum

Philosophical Transactions of the Royal Society In Press

3D Vision & Spatial Reasoning

Grounding language in 3D space for embodied understanding

Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D

Sergio Arnaud* , P. McVay* , A. Martin* , A. Majumdar , K.M. Jatavallabhula , P. Thomas , R. Partsey , D. Dugas , A. Gejji , A. Sax , et al.

ICML 2025 Spotlight Top 2.6%

From Thousands to Billions: 3D Visual Language Grounding via Render-Supervised Distillation (LiftGS)

A. Cao , Sergio Arnaud , O. Maksymets , J. Yang , A. Jain , S. Yenamandra , A. Martin , V.-P. Berges , P. McVay , R. Partsey , et al.

ICML 2025
Unifying 2D and 3D Vision-Language Understanding (UniVLG) demo

Unifying 2D and 3D Vision-Language Understanding (UniVLG)

A. Jain , A. Swerdlow , Y. Wang , Sergio Arnaud , A. Martin , A. Sax , F. Meier , K. Fragkiadaki

ICML 2025

OpenEQA: Embodied Question Answering in the Era of Foundation Models

A. Majumdar* , A. Ajay* , X. Zhang* , P. Putta , S. Yenamandra , M. Henaff , S. Silwal , P. McVay , O. Maksymets , Sergio Arnaud , K. Yadav , Q. Li , B. Newman , et al.

CVPR 2024

Representation Learning for Robotics

Learning visual representations that transfer across embodiments and tasks

Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence? (VC-1)

A. Majumdar* , K. Yadav* , Sergio Arnaud* , Y.J. Ma , C. Chen , S. Silwal , A. Jain , V.-P. Berges , T. Wu , J. Vakil , et al.

NeurIPS 2023
What do we learn from a large-scale study of pre-trained visual representations in sim and real environments? demo

What do we learn from a large-scale study of pre-trained visual representations in sim and real environments?

S. Silwal* , K. Yadav* , T. Wu* , J. Vakil* , A. Majumdar* , Sergio Arnaud* , C. Chen , V.-P. Berges , D. Batra , A. Rajeswaran , et al.

ICRA 2024

Robot Planning & Skill Coordination

Enabling robots to chain skills and plan complex behaviors

ASC: Adaptive Skill Coordination for Robotic Mobile Manipulation

N. Yokoyama , A. Clegg , J. Truong , E. Undersander , T.-Y. Yang , Sergio Arnaud , S. Ha , D. Batra , A. Rai

IEEE RA-L 2025

Language-Guided Skill Coordination (LSC)

Sergio Arnaud , et al.

CVPR 2024 Demo

Contact