TUHH Open Research
Help
  • Log In
    New user? Click here to register.Have you forgotten your password?
  • English
  • Deutsch
  • Communities & Collections
  • Publications
  • Research Data
  • People
  • Institutions
  • Projects
  • Statistics
  1. Home
  2. TUHH
  3. Publications
  4. Comparing continuous single-agent reinforcement learning controls in a simulated logistic environment using NVIDIA Omniverse
 
Options

Comparing continuous single-agent reinforcement learning controls in a simulated logistic environment using NVIDIA Omniverse

Citation Link: https://doi.org/10.15480/882.9060
Publikationstyp
Conference Paper
Date Issued
2023-10-11
Sprache
English
Author(s)
Wesselhöft, Mike  orcid-logo
Technische Logistik W-6  
Braun, Philipp Maximilian  orcid-logo
Technische Logistik W-6  
Kreutzfeldt, Jochen  orcid-logo
Technische Logistik W-6  
TORE-DOI
10.15480/882.9060
TORE-URI
https://hdl.handle.net/11420/45143
Journal
Logistics journal / Proceedings  
Volume
2023
Article Number
5825
Citation
Logistics Journal : Proceedings 2023: 5825 (2023)
Contribution to Conference
19. Fachkolloquium
Publisher DOI
10.2195/lj_proc_wesselhoeft_en_202310_01
Publisher
WGTL, Wissenschaftliche Gesellschaft für Technische Logistik e. V.
With the transition to Logistics 4.0, the increasing demand for autonomous mobile robots (AMR) in logistics has amplified the complexity of fleet control in dynamic environments. Reinforcement learning (RL), particularly decentralized RL algorithms, has emerged as a potential solution given its ability to learn in uncertain terrains. While discrete RL structures have shown merit, their adaptability in logistics remains questionable due to their inherent limitations. This paper presents a comparative analysis of continuous RL algorithms - Advantage Actor-Critic (A2C), Deep Deterministic Policy Gradient (DDPG), and Proximal Policy Optimization (PPO) - in the context of controlling a Turtlebot3 within a warehouse scenario. Our findings reveal A2C as the frontrunner in terms of success rate and training time, while DDPG excels step minimization while PPO distinguishes itself primarily through its relatively short training duration. This study underscores the potential of continuous RL algorithms, especially A2C, in the future of AMR fleet management in logistics. Significant work remains to be done, particularly in the area of algorithmic fine-tuning.
Subjects
artificial intelligence
Autonome Roboter
autonomous mobile robots
Künstliche Intelligenz
logistics 4.0
Logistik 4.0
reinforcement learning
robotics
Robotik
DDC Class
620: Engineering
004: Computer Sciences
Publication version
publishedVersion
Lizenz
https://creativecommons.org/licenses/by/4.0/
Loading...
Thumbnail Image
Name

wesselhoeft_en_2023.pdf

Type

Main Article

Size

657.34 KB

Format

Adobe PDF

TUHH
Weiterführende Links
  • Contact
  • Send Feedback
  • Cookie settings
  • Privacy policy
  • Impress
DSpace Software

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science
Design by effective webwork GmbH

  • Deutsche NationalbibliothekDeutsche Nationalbibliothek
  • ORCiD Member OrganizationORCiD Member Organization
  • DataCiteDataCite
  • Re3DataRe3Data
  • OpenDOAROpenDOAR
  • OpenAireOpenAire
  • BASE Bielefeld Academic Search EngineBASE Bielefeld Academic Search Engine
Feedback