REPLAY: Robot Embodiment via Intent-aware Policy Imitation by Replicating Human Demonstrations From Video

  • Sung-Gil Park
  • Yong-Jun Lee
  • Jeong-Seop Park
  • Myo Taeg Lim
  • Han-Byeol Kim
  • ... 안우진
  • 외 4명
Citations

WEB OF SCIENCE

0
Citations

SCOPUS

1

초록

Learning manipulation skills from human demonstrations traditionally relies on structured datasets, teleoperations, or environment-specific supervision, limiting scalability and generalization. In this paper, we introduce REPLAY, a novel imitation learning framework that enables robots to acquire intent-aware behaviors directly from raw monocular videos, including unstructured sources such as YouTube. REPLAY decomposes demonstration videos into a sequence of semantically meaningful sub-tasks through action segmentation, scene and task reasoning via vision-language models, and fine-grained action understanding using 3D human pose estimation. These extracted trajectories are then retargeted to robot embodiments through intent-aware motion adaptation that accounts for embodiment differences and environmental constraints. To support scalable evaluation, we also present Video2Sim, a method that reconstructs realistic 3D simulation environments directly from demonstration videos, enabling repeatable testing and training. We demonstrated that REPLAY outperformed strong baselines in both motion fidelity and task success on complex object manipulation tasks, illustrating its potential as a scalable, generalizable approach for real-world imitation learning from in-the-wild video data.

키워드

Cross-embodiment imitationintent-aware manipulationvideo-based learningvision-language-action.
제목
REPLAY: Robot Embodiment via Intent-aware Policy Imitation by Replicating Human Demonstrations From Video
저자
Sung-Gil ParkYong-Jun LeeJeong-Seop ParkMyo Taeg LimHan-Byeol KimYong-Geon KimSeuk-Woo RyuByeong-Gil YooSungeun Chung안우진
DOI
10.1007/s12555-025-0505-8
발행일
2025-12
유형
Article
저널명
International Journal of Control, Automation, and Systems
23
12
페이지
3599 ~ 3609