TUHH Open Research
Help
  • Log In
    New user? Click here to register.Have you forgotten your password?
  • English
  • Deutsch
  • Communities & Collections
  • Publications
  • Research Data
  • People
  • Institutions
  • Projects
  • Statistics
  1. Home
  2. TUHH
  3. Publications
  4. Industrial Language-Image Dataset (ILID): Adapting Vision Foundation Models for Industrial Settings
 
Options

Industrial Language-Image Dataset (ILID): Adapting Vision Foundation Models for Industrial Settings

Citation Link: https://doi.org/10.15480/882.14053
Publikationstyp
Conference Paper
Date Issued
2024-05
Sprache
English
Author(s)
Moenck, Keno  orcid-logo
Flugzeug-Produktionstechnik M-23  
Thieu, Duc Trung  
Flugzeug-Produktionstechnik M-23  
Koch, Julian  orcid-logo
Flugzeug-Produktionstechnik M-23  
Schüppstuhl, Thorsten  orcid-logo
Flugzeug-Produktionstechnik M-23  
TORE-DOI
10.15480/882.14053
TORE-URI
https://tore.tuhh.de/handle/11420/52571
Volume
130
Start Page
250
End Page
263
Citation
57th CIRP Conference on Manufacturing Systems, CMS 2024
Contribution to Conference
57th CIRP Conference on Manufacturing Systems, CMS 2024  
Publisher DOI
10.1016/j.procir.2024.10.084
Scopus ID
2-s2.0-85214979896
Publisher
Elsevier
Peer Reviewed
true
Is New Version of
10.15480/882.13084
In recent years, the upstream of Large Language Models (LLM) has also encouraged the computer vision community to work on substantial multimodal datasets and train models on a scale in a self-/semi-supervised manner, resulting in Vision Foundation Models (VFM), as, e.g., Contrastive Language–Image Pre-training (CLIP). The models generalize well and perform outstandingly on everyday objects or scenes, even on downstream tasks, tasks the model has not been trained on, while the application in specialized domains, as in an industrial context, is still an open research question. Here, fine-tuning the models or transfer learning on domain-specific data is unavoidable when objecting to adequate performance. In this work, we, on the one hand, introduce a pipeline to generate the Industrial Language-Image Dataset (ILID) based on web-crawled data; on the other hand, we demonstrate effective self-supervised transfer learning and discussing downstream tasks after training on the cheaply acquired ILID, which does not necessitate human labeling or intervention. With the proposed approach, we contribute by transferring approaches from state-of-the-art research around foundation models, transfer learning strategies, and applications to the industrial domain.
Subjects
industrial dataset
self-supervised
CLIP
vision foundation model
DDC Class
629.1: Aviation
Publication version
publishedVersion
Lizenz
https://creativecommons.org/licenses/by-nc-nd/4.0/
Loading...
Thumbnail Image
Name

1-s2.0-S2212827124012411-main.pdf

Type

Main Article

Size

5.05 MB

Format

Adobe PDF

TUHH
Weiterführende Links
  • Contact
  • Send Feedback
  • Cookie settings
  • Privacy policy
  • Impress
DSpace Software

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science
Design by effective webwork GmbH

  • Deutsche NationalbibliothekDeutsche Nationalbibliothek
  • ORCiD Member OrganizationORCiD Member Organization
  • DataCiteDataCite
  • Re3DataRe3Data
  • OpenDOAROpenDOAR
  • OpenAireOpenAire
  • BASE Bielefeld Academic Search EngineBASE Bielefeld Academic Search Engine
Feedback