Conversational natural language is subject to noise, incompletions and grammatically ambiguous phrasing. To increase the robustness of communication, human conversation partners typically build on conversational repair (CR) to iteratively and interactively resolve misunderstandings. In the context of human-robot interaction, CR provides the possibility to interrupt and to repair a misunderstood instruction that is already being executed. However, current approaches do not consider the conversational repair of misunderstandings in human-robot dialog, even though this would significantly increase the robustness of human-robot interaction. The goal of this project is to fill this gap by addressing two core problems that have hindered existing approaches to successfully address conversational action repair for human-robot interaction. The first problem is the realization of an adaptive context-specific state model that integrates language with action. Most dialog systems consider only verbal communication, and they ignore that human communication is an embodied multi-modal process that is grounded in physical interaction. So how can we realize a scalable model that considers situated conceptual state representations for mixed verbal-physical interaction? To address this first problem, this project builds on a neuro-symbolic approach that integrates our previous work on embodied semantic parsing with our expertise in deep reinforcement learning. Herein, we will research a hybrid data- and knowledge-driven model for compositional interaction states that link the physical world state with semantics in language and dialog.The second problem pertains to the noise, disfluency, and polysemy of spoken natural language. Existing learning-based parsers are robust enough to parse noisy spoken language but they require large amounts of training data. So how can we realize a robust semantic parser that is data efficient while considering the mixed verbal-physical interaction? To address this second problem, this project complements our previous semantic parsing methods with a neural machine-translation approach. To this end, we will exploit the reward signal of the reinforcement learning as an additional data source to improve the data efficiency of the neural parser. The data required for this project will be generated using crowdsourcing, and the evaluation will be conducted on a humanoid robot. We expect the project to generate impact as a new approach for human-robot interaction, and to contribute novel methods for representation learning to the scientific communities in the fields of natural language understanding, machine learning, and intelligent robotics.