Options
Industrial Language-Image Dataset (ILID) : adapting vision foundation models for industrial settings
Citation Link: https://doi.org/10.15480/882.13084
Publikationstyp
Preprint
Date Issued
2024-06-14
Sprache
English
TORE-DOI
Citation
arXiv: 2406.09637 (2024)
Publisher DOI
ArXiv ID
Publisher
arXiv
Peer Reviewed
true
In recent years, the upstream of Large Language Models (LLM) has also
encouraged the computer vision community to work on substantial multimodal
datasets and train models on a scale in a self-/semi-supervised manner,
resulting in Vision Foundation Models (VFM), as, e.g., Contrastive
Language-Image Pre-training (CLIP). The models generalize well and perform
outstandingly on everyday objects or scenes, even on downstream tasks, tasks
the model has not been trained on, while the application in specialized
domains, as in an industrial context, is still an open research question. Here,
fine-tuning the models or transfer learning on domain-specific data is
unavoidable when objecting to adequate performance. In this work, we, on the
one hand, introduce a pipeline to generate the Industrial Language-Image
Dataset (ILID) based on web-crawled data; on the other hand, we demonstrate
effective self-supervised transfer learning and discussing downstream tasks
after training on the cheaply acquired ILID, which does not necessitate human
labeling or intervention. With the proposed approach, we contribute by
transferring approaches from state-of-the-art research around foundation
models, transfer learning strategies, and applications to the industrial
domain.
encouraged the computer vision community to work on substantial multimodal
datasets and train models on a scale in a self-/semi-supervised manner,
resulting in Vision Foundation Models (VFM), as, e.g., Contrastive
Language-Image Pre-training (CLIP). The models generalize well and perform
outstandingly on everyday objects or scenes, even on downstream tasks, tasks
the model has not been trained on, while the application in specialized
domains, as in an industrial context, is still an open research question. Here,
fine-tuning the models or transfer learning on domain-specific data is
unavoidable when objecting to adequate performance. In this work, we, on the
one hand, introduce a pipeline to generate the Industrial Language-Image
Dataset (ILID) based on web-crawled data; on the other hand, we demonstrate
effective self-supervised transfer learning and discussing downstream tasks
after training on the cheaply acquired ILID, which does not necessitate human
labeling or intervention. With the proposed approach, we contribute by
transferring approaches from state-of-the-art research around foundation
models, transfer learning strategies, and applications to the industrial
domain.
Subjects
cs.CV
DDC Class
620: Engineering
Publication version
publishedVersion
Loading...
Name
2406.09637v1.pdf
Type
Main Article
Size
9.51 MB
Format
Adobe PDF