Ay, NihatNihatAy2025-06-022025-06-022024-12-19in: Foundational Papers in Complexity Science (4): (2024)9781947864559https://hdl.handle.net/11420/55752The idea of learning as an optimization process can be traced back to the early years of artificial neural networks. This idea has been very fruitful, ultimately leading to the recent successes of deep neural networks as learning machines. While being consistent with optimization, however, the first learning algorithms for neurons were inspired by neurophysiological and neuropsychological paradigms, most notably by the celebrated work of Donald Hebb (1949). Building on such paradigms, Frank Rosenblatt (1957) proposed an algorithm for training a simple neuronal model, which Warren McCulloch and Walter Pitts had introduced in their seminal article in 1943. The convergence of this algorithm can be formally proved with elementary arguments from linear algebra (perceptron convergence theorem; see Novikoff 1962). The idea of learning as an optimization process, however, offers not only a unified conceptual foundation of learning, it also allows us to study learning from a rich mathematical perspective. In this context, the stochastic gradient descent method plays a fundamentally important role (Widrow 1963; Amari 1967; Rumelhart, Hinton, and Williams 1986). Nowadays, it represents the main instrument for training artificial neural networks, which brings us to Shun-Ichi Amari’s article “Natural Gradient Works Efficiently in Learning.” Let us unfold this title and thereby reveal the main insights of Amari’s work.en#PLACEHOLDER_PARENT_METADATA_VALUE#Foundational Papers in Complexity Science: Volume 42024SFI PressFrom the Euclidean to the NaturalBook Part10.37911/9781947864559.85Book Chapter