Options
Surgical instrument-tissue interaction recognition with multi-task-attention video transformer
Citation Link: https://doi.org/10.15480/882.16144
Publikationstyp
Journal Article
Date Issued
2025-11-11
Sprache
English
TORE-DOI
Citation
International Journal of Computer assisted Radiology and Surgery (in Press): (2025)
Publisher DOI
Scopus ID
Publisher
Springer Science and Business Media LLC
Purpose The recognition of surgical instrument-tissue interactions can enhance the surgical workflow analysis, improve automated safety systems and enable skill assessment in minimally invasive surgery. However, current deep learning methods for surgical instrument-tissue interaction recognition often rely on static images or coarse temporal sampling, limiting their ability to capture rapid surgical dynamics. Therefore, this study systematically investigates the impact of incorporating fine-grained temporal context into deep learning models for interaction recognition.
Methods We conduct extensive experiments with multiple curated video-based datasets to investigate the influence of fine-grained temporal context for the task of instrument-tissue interaction recognition using video transformer with spatio-temporal feature extraction capabilities. Additionally, we propose a multi-task-attention module that utilizes cross-attention and a gating mechanism to improve communication between the subtasks of identifying the surgical instrument, atomic action, and anatomical target.
Results Our study demonstrates the benefit of utilizing the fine-grained temporal context for recognition of instrument-tissue interactions, with an optimal sampling rate of 6-8 Hz identified for the examined datasets. Furthermore, our proposed MTAM significantly outperforms state-of-the-art multi-task video transformer on the CholecT45-Vid and GraSP-Vid datasets, achieving relative increases of 4.8% and 5.9% in surgical instrument-tissue interaction recognition, respectively.
Conclusions In this work, we demonstrate the benefits of using a fine-grained temporal context rather than static images or coarse temporal context for the task of surgical instrument-tissue interaction recognition. We also show that leveraging cross-attention with spatio-temporal features from various subtasks leads to improved surgical instrument-tissue interaction recognition performance. The project is available at: https://lennart-maack.github.io/InstrTissRec-MTAM
Methods We conduct extensive experiments with multiple curated video-based datasets to investigate the influence of fine-grained temporal context for the task of instrument-tissue interaction recognition using video transformer with spatio-temporal feature extraction capabilities. Additionally, we propose a multi-task-attention module that utilizes cross-attention and a gating mechanism to improve communication between the subtasks of identifying the surgical instrument, atomic action, and anatomical target.
Results Our study demonstrates the benefit of utilizing the fine-grained temporal context for recognition of instrument-tissue interactions, with an optimal sampling rate of 6-8 Hz identified for the examined datasets. Furthermore, our proposed MTAM significantly outperforms state-of-the-art multi-task video transformer on the CholecT45-Vid and GraSP-Vid datasets, achieving relative increases of 4.8% and 5.9% in surgical instrument-tissue interaction recognition, respectively.
Conclusions In this work, we demonstrate the benefits of using a fine-grained temporal context rather than static images or coarse temporal context for the task of surgical instrument-tissue interaction recognition. We also show that leveraging cross-attention with spatio-temporal features from various subtasks leads to improved surgical instrument-tissue interaction recognition performance. The project is available at: https://lennart-maack.github.io/InstrTissRec-MTAM
Subjects
Deep learning
Video transformer
Surgical triplet recognition
Surgical activity recognition
DDC Class
617: Surgery, Regional Medicine, Dentistry, Ophthalmology, Otology, Audiology
006: Special computer methods
Publication version
publishedVersion
Loading...
Name
s11548-025-03546-3.pdf
Type
Main Article
Size
5.02 MB
Format
Adobe PDF