Options
Language grounding in deep reinforcement learning for dynamic goal-oriented robotics
Citation Link: https://doi.org/10.15480/882.17098
Publikationstyp
Doctoral Thesis
Date Issued
2026
Sprache
English
Author(s)
Advisor
Referee
Title Granting Institution
Technische Universität Hamburg
Place of Title Granting Institution
Hamburg
Examination Date
2026-02-10
Institute
TORE-DOI
Citation
Technische Universität Hamburg (2026)
Researchers have long attempted to teach robots and other embodied artificial agents to follow instructions, approaching language as the primary medium for communication, knowledge transfer, and cognition. While toddlers excel at language acquisition and utilizing it for problem-solving, robots and voice-based assistants struggle to achieve a grounded and robust understanding of natural language due to conversational noise, such as disfluencies and polysemy. This thesis investigates the limitations in language grounding that currently hinder the development of intelligent agents to comprehend and execute lingual goals, as well as their capacity to revise misinterpretations arising from underspecified or ambiguous instructions. We utilize a sparse reward-driven language-conditioned reinforcement learning setup and leverage insights from cognitive science and developmental psychology, presented in the following two pillars. The first pillar explores the utilization of linguistic feedback and egocentric speech as mechanisms for learning from unsuccessful outcomes, by implementing a synthetic caretaker that provides feedback when the agent deviates from the expected course of actions. Unintended deviations may prove beneficial as alternative goal specifications, potentially satisfying different objectives. For instance, a robot might be assigned to prepare a cup of tea, but ends up brewing coffee instead, thereby accomplishing an unintended objective, in this case a different goal. In the case of egocentric speech, our research focuses on developing a multimodal translation model, designed to generate appropriate goal specifications based on observed behaviors. The model retrospectively predicts suitable goal commands that align with the observed actions, used for learning in hindsight. Both approaches of linguistic feedback and egocentric speech aim to emulate aspects of language development in young children and significantly enhance sample efficiency in robotic reinforcement learning. The second pillar addresses the challenge of action correction, specifically targeting erroneous behaviors stemming from misinterpretations of goal specifications. We identify three distinct categories of misunderstanding: ambiguities arising from underspecified statements, unintentional miscommunications (e.g., erroneously conveyed intentions), and discrepancies in common ground between the instructor and the robotic agent. Instead of learning with a different goal specification in hindsight, like in the first pillar, we aim to correct the misunderstanding through further verbal input from the operator. This provides an additional challenge for the agent, which needs to reconsider the original language goal given the new context and the returned action correction. By implementing a novel approach that incorporates the uncertainty about the actual goal and utilizing our methods from the first pillar, we demonstrate that egocentric speech significantly improves learning by generating action corrections in hindsight. We highlight this context-sensitive hindsight approach as the first in this domain to enhance the resolution of misunderstandings.
Subjects
Reinforcement Learning
Language Grounding
Developmental Robotics
Embodied Intelligence
Deep Learning
Machine Learning
DDC Class
006.31: Machine Learning
Loading...
Name
Roeder_Frank_Language-Grounding-in-Deep-Reinforcement-Learning-for-Dynamic-Goal-Oriented-Robotics.pdf
Size
15.39 MB
Format
Adobe PDF