상세 보기
CrossModalSync: joint temporal-spatial fusion for semantic scene segmentation in large-scale scenes
- Tan, Shuyi;
- Zhang, Yi;
- Li, Yan;
- Shin, Byeong-Seok
WEB OF SCIENCE
0SCOPUS
0초록
Owing to its ability to enable precise perception of dynamic and complex environments, point cloud semantic segmentation has become a critical task for autonomously driven vehicles in recent years. However, in complex, dynamic scenes, cumulative errors and the "many-to-one" mapping problem are challenges for existing semantic segmentation methods, which further limit their accuracy and efficiency. To address these, this paper introduces a new framework that balances accuracy and computational efficiency by utilizing temporal alignment (TA), projection multi-scale convolution (PMC), and priority point retention (PPR). By combining TA and PMC, the framework effectively captures inter-frame correlations, improving local detail information, reducing error accumulation, and maintaining detailed scene features. Second, employing the PPR mechanism ensures that critical three-dimensional information is retained, thereby resolving information loss caused by the "many-to-one" mapping problem. Finally, by combining LiDAR and camera data through multimodal fusion, the framework provides complementary perspectives, further enhancing segmentation performance. Our method achieves state-of-the-art performance on the benchmark SemanticKITTI and nuScenes datasets. Notably, the proposed framework excels at detecting occluded objects and dynamic entities.
키워드
- 제목
- CrossModalSync: joint temporal-spatial fusion for semantic scene segmentation in large-scale scenes
- 저자
- Tan, Shuyi; Zhang, Yi; Li, Yan; Shin, Byeong-Seok
- 발행일
- 2025-07-14
- 유형
- Article
- 권
- 15
- 호
- 1