Sergio Arnaud

Background

Senior Research Engineer

Meta FAIR · March 2024 – Present · Menlo Park, CA

World models for robotics, 3D vision-language grounding for robotic manipulation, and physical world modeling

AI Resident

Meta FAIR · September 2022 – September 2023 · Menlo Park, CA

Visual representations for robot control, language models for planning, and embodied AI research

Tech Lead (AI)

deep dive (dive.ai) · March 2020 – July 2022 · Mexico City, Mexico

Computer Vision and Natural Language Processing systems

BSc Applied Mathematics

Instituto Tecnológico Autónomo de México (ITAM) · 2020 · Mexico City, Mexico

Graduated with highest honors (Magna Cum Laude), top 3% of students

Featured Publications

World Modeling

Learning predictive models of the world for planning and decision making

▼

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

M. Assran* , A. Bardes* , D. Fan* , Q. Garrido* , R. Howes* , M. Komeili* , M. Muckley* , A. Rizvi* , C. Roberts* , K. Sinha* , A. Zholus* , Sergio Arnaud* , et al.

arXiv 2025

paper blog code

Heterogeneous World Models for Cross-Embodiment Transfer

Sergio Arnaud , et al.

In Progress

Visuo-Tactile World Models

Carolina Higuera , Sergio Arnaud , Byron Boots , Mustafa Mukadam , Francois Robert Hogan , Franziska Meier

ICLR 2026 In Press

Human-level Learning of Complex Novel Tasks as Theory-Based Modeling, Exploration, and Planning

P.A. Tsividis , J. Loula , J. Burga , J.P. Rodriguez , Sergio Arnaud , N. Foss , A. Campero , A. Subramanian , T. Pouncy , S.J. Gershman , J.B. Tenenbaum

Philosophical Transactions of the Royal Society In Press

3D Vision & Spatial Reasoning

Grounding language in 3D space for embodied understanding

▼

Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D

Sergio Arnaud* , P. McVay* , A. Martin* , A. Majumdar , K.M. Jatavallabhula , P. Thomas , R. Partsey , D. Dugas , A. Gejji , A. Sax , et al.

ICML 2025 Spotlight Top 2.6%

paper blog code demo

From Thousands to Billions: 3D Visual Language Grounding via Render-Supervised Distillation (LiftGS)

A. Cao , Sergio Arnaud , O. Maksymets , J. Yang , A. Jain , S. Yenamandra , A. Martin , V.-P. Berges , P. McVay , R. Partsey , et al.

ICML 2025

paper project

Unifying 2D and 3D Vision-Language Understanding (UniVLG)

A. Jain , A. Swerdlow , Y. Wang , Sergio Arnaud , A. Martin , A. Sax , F. Meier , K. Fragkiadaki

ICML 2025

paper code project

OpenEQA: Embodied Question Answering in the Era of Foundation Models

A. Majumdar* , A. Ajay* , X. Zhang* , P. Putta , S. Yenamandra , M. Henaff , S. Silwal , P. McVay , O. Maksymets , Sergio Arnaud , K. Yadav , Q. Li , B. Newman , et al.

CVPR 2024

paper blog code project

Representation Learning for Robotics

Learning visual representations that transfer across embodiments and tasks

▼

Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence? (VC-1)

A. Majumdar* , K. Yadav* , Sergio Arnaud* , Y.J. Ma , C. Chen , S. Silwal , A. Jain , V.-P. Berges , T. Wu , J. Vakil , et al.

NeurIPS 2023

paper blog code project

What do we learn from a large-scale study of pre-trained visual representations in sim and real environments?

S. Silwal* , K. Yadav* , T. Wu* , J. Vakil* , A. Majumdar* , Sergio Arnaud* , C. Chen , V.-P. Berges , D. Batra , A. Rajeswaran , et al.

ICRA 2024

paper project

Robot Planning & Skill Coordination

Enabling robots to chain skills and plan complex behaviors

▼

ASC: Adaptive Skill Coordination for Robotic Mobile Manipulation

N. Yokoyama , A. Clegg , J. Truong , E. Undersander , T.-Y. Yang , Sergio Arnaud , S. Ha , D. Batra , A. Rai

IEEE RA-L 2025

paper blog project

Language-Guided Skill Coordination (LSC)