TUHH Open Research
Help
  • Log In
    New user? Click here to register.Have you forgotten your password?
  • English
  • Deutsch
  • Communities & Collections
  • Publications
  • Research Data
  • People
  • Institutions
  • Projects
  • Statistics
  1. Home
  2. TUHH
  3. Publications
  4. Surgical instrument-tissue interaction recognition with multi-task-attention video transformer
 
Options

Surgical instrument-tissue interaction recognition with multi-task-attention video transformer

Citation Link: https://doi.org/10.15480/882.16144
Publikationstyp
Journal Article
Date Issued
2025-11-11
Sprache
English
Author(s)
Maack, Lennart  
Medizintechnische und Intelligente Systeme E-1  
Cam, Berk
Latus, Sarah  orcid-logo
Medizintechnische und Intelligente Systeme E-1  
Maurer, Tobias  
Schlaefer, Alexander  
Medizintechnische und Intelligente Systeme E-1  
TORE-DOI
10.15480/882.16144
TORE-URI
https://hdl.handle.net/11420/58746
Journal
International journal of computer assisted radiology and surgery  
Citation
International Journal of Computer assisted Radiology and Surgery (in Press): (2025)
Publisher DOI
10.1007/s11548-025-03546-3
Scopus ID
2-s2.0-105021418856
Publisher
Springer Science and Business Media LLC
Purpose The recognition of surgical instrument-tissue interactions can enhance the surgical workflow analysis, improve automated safety systems and enable skill assessment in minimally invasive surgery. However, current deep learning methods for surgical instrument-tissue interaction recognition often rely on static images or coarse temporal sampling, limiting their ability to capture rapid surgical dynamics. Therefore, this study systematically investigates the impact of incorporating fine-grained temporal context into deep learning models for interaction recognition.
Methods We conduct extensive experiments with multiple curated video-based datasets to investigate the influence of fine-grained temporal context for the task of instrument-tissue interaction recognition using video transformer with spatio-temporal feature extraction capabilities. Additionally, we propose a multi-task-attention module that utilizes cross-attention and a gating mechanism to improve communication between the subtasks of identifying the surgical instrument, atomic action, and anatomical target.
Results Our study demonstrates the benefit of utilizing the fine-grained temporal context for recognition of instrument-tissue interactions, with an optimal sampling rate of 6-8 Hz identified for the examined datasets. Furthermore, our proposed MTAM significantly outperforms state-of-the-art multi-task video transformer on the CholecT45-Vid and GraSP-Vid datasets, achieving relative increases of 4.8% and 5.9% in surgical instrument-tissue interaction recognition, respectively.
Conclusions In this work, we demonstrate the benefits of using a fine-grained temporal context rather than static images or coarse temporal context for the task of surgical instrument-tissue interaction recognition. We also show that leveraging cross-attention with spatio-temporal features from various subtasks leads to improved surgical instrument-tissue interaction recognition performance. The project is available at: https://lennart-maack.github.io/InstrTissRec-MTAM
Subjects
Deep learning
Video transformer
Surgical triplet recognition
Surgical activity recognition
DDC Class
617: Surgery, Regional Medicine, Dentistry, Ophthalmology, Otology, Audiology
006: Special computer methods
Funding(s)
Centre of Excellence of Al for Sustainable Living and Working  
Projekt DEAL  
Lizenz
https://creativecommons.org/licenses/by/4.0/
Publication version
publishedVersion
Loading...
Thumbnail Image
Name

s11548-025-03546-3.pdf

Type

Main Article

Size

5.02 MB

Format

Adobe PDF

TUHH
Weiterführende Links
  • Contact
  • Send Feedback
  • Cookie settings
  • Privacy policy
  • Impress
DSpace Software

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science
Design by effective webwork GmbH

  • Deutsche NationalbibliothekDeutsche Nationalbibliothek
  • ORCiD Member OrganizationORCiD Member Organization
  • DataCiteDataCite
  • Re3DataRe3Data
  • OpenDOAROpenDOAR
  • OpenAireOpenAire
  • BASE Bielefeld Academic Search EngineBASE Bielefeld Academic Search Engine
Feedback