Robust Vision-Language-Action Models via Object-Centric Learning and Distance-Based Chunk Alignment

  • Park, Sung-Gil
  • Kim, Yong-Geon
  • Ryu, Seuk-Woo
  • Yoo, Byeong Gil
  • Chung, Sungeun
  • ... Ahn, Woo-Jin
  • 외 2명
Citations

WEB OF SCIENCE

0
Citations

SCOPUS

0

초록

Vision-language-action (VLA) models have shown strong potential for enabling robots to interpret goals and perform complex manipulation tasks by integrating perception, language, and control. However, existing VLAs rely heavily on large-scale, diverse demonstration datasets, which are difficult and expensive to collect. When trained with limited data, they often overfit to irrelevant visual cues such as background, lighting, or viewpoint, resulting in weak generalization. To overcome this limitation, we propose a simple yet effective object-centric learning framework for VLA. For each sub-task, the framework leverages an instance segmentation foundation model to identify and track task-relevant objects, and trains the policy on both the original RGB scene and two object-focused representations: (i) a masked image emphasizing the target object and (ii) an object-only crop. These multiple visual inputs share the same action supervision, encouraging the policy to attend to the manipulated object rather than the surrounding context. Furthermore, a distance-based chunk alignment mechanism ensures smooth control transitions between consecutive predicted action segments. Experiments conducted in both simulation and real hardware demonstrate that the proposed method achieves robust performance and stable trajectories across various manipulation tasks, validating its practicality and efficiency in training object-aware robotic behaviors.

키워드

vision-language-action modelobject-centric learningroboticsdeep learning
제목
Robust Vision-Language-Action Models via Object-Centric Learning and Distance-Based Chunk Alignment
저자
Park, Sung-GilKim, Yong-GeonRyu, Seuk-WooYoo, Byeong GilChung, SungeunPark, Jeong-SeopAhn, Woo-JinLim, Myo-Taeg
DOI
10.3390/app16073376
발행일
2026-03-31
유형
Article
저널명
APPLIED SCIENCES-BASEL
16
7