TUHH Open Research
Help
  • Log In
    New user? Click here to register.Have you forgotten your password?
  • English
  • Deutsch
  • Communities & Collections
  • Publications
  • Research Data
  • People
  • Institutions
  • Projects
  • Statistics
  1. Home
  2. TUHH
  3. Publication References
  4. Linear combination of one-step predictive information with an external reward in an episodic policy gradient setting: A critical analysis
 
Options

Linear combination of one-step predictive information with an external reward in an episodic policy gradient setting: A critical analysis

Publikationstyp
Journal Article
Date Issued
2013-11-04
Sprache
English
Author(s)
Ghazi-Zahedi, Keyan Mahmoud  
Martius, Georg  
Ay, Nihat  
TORE-URI
http://hdl.handle.net/11420/14501
Journal
Frontiers in psychology  
Volume
4
Issue
11
Article Number
801
Citation
Frontiers in Psychology 4 (11): 801 (2013)
Publisher DOI
10.3389/fpsyg.2013.00801
Scopus ID
2-s2.0-84889677996
Publisher
Frontiers Research Foundation
One of the main challenges in the field of embodied artificial intelligence is the open-ended autonomous learning of complex behaviors. Our approach is to use task-independent, information-driven intrinsic motivation(s) to support task-dependent learning. The work presented here is a preliminary step in which we investigate the predictive information (the mutual information of the past and future of the sensor stream) as an intrinsic drive, ideally supporting any kind of task acquisition. Previous experiments have shown that the predictive information (PI) is a good candidate to support autonomous, open-ended learning of complex behaviors, because a maximization of the PI corresponds to an exploration of morphology- and environment-dependent behavioral regularities. The idea is that these regularities can then be exploited in order to solve any given task. Three different experiments are presented and their results lead to the conclusion that the linear combination of the one-step PI with an external reward function is not generally recommended in an episodic policy gradient setting. Only for hard tasks a great speed-up can be achieved at the cost of an asymptotic performance lost.
Subjects
Embodied artificial intelligence
Embodied machine learning
Information-driven self-organization
Predictive information
Reinforcement learning
DDC Class
004: Informatik
TUHH
Weiterführende Links
  • Contact
  • Send Feedback
  • Cookie settings
  • Privacy policy
  • Impress
DSpace Software

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science
Design by effective webwork GmbH

  • Deutsche NationalbibliothekDeutsche Nationalbibliothek
  • ORCiD Member OrganizationORCiD Member Organization
  • DataCiteDataCite
  • Re3DataRe3Data
  • OpenDOAROpenDOAR
  • OpenAireOpenAire
  • BASE Bielefeld Academic Search EngineBASE Bielefeld Academic Search Engine
Feedback