Options
Exploiting large-scale pre-trained vision foundation models in 3D point cloud segmentation
Publikationstyp
Journal Article
Date Issued
2025
Sprache
English
Volume
132
Issue
4
Start Page
158
End Page
167
Citation
Allgemeine Vermessungs Nachrichten 132 (4): 158-167 (2025)
Publisher DOI
Scopus ID
Publisher
VDE
The recent upstream of large-scale pre-trained Vision Foundation Models (VFM) follows the success of Large Language Models (LLM), e. g., the GPT series. Models like the Segment Anything Model (SAM) or Contrastive Language-Image Pre-training (CLIP) demonstrate strong generalization capabilities and open a new area of tasks, including various downstream tasks. The research vision community mainly focuses on the 2D and text modality, resulting in textual or visual promptable models capable of zero-shot object recognition or segmentation, which applications and transfer to the 3D domain are subject to current research. This article overviews approaches in leveraging large-scale pre-trained VFMs in the 3D domain. Moreover, we dive deeper into the application of the SAM and propose an approach for class-agnostic 3D segmentation in large-scale scenes.
Subjects
3D segmentation
SAM
VFM
Vision Foundation Model
DDC Class
600: Technology
006.31: Machine Learning
006.37: Machine Vision