Options
On dissipativity of cross-entropy loss in training ResNets — A turnpike towards architecture search
Citation Link: https://doi.org/10.15480/882.16637
Publikationstyp
Journal Article
Date Issued
2026-04-01
Sprache
English
TORE-DOI
Journal
Volume
186
Article Number
112767
Citation
Automatica 186: 112767 (2026)
Publisher DOI
Scopus ID
Publisher
Elsevier
The training of ResNets and neural ODEs can be formulated and analyzed from the perspective of optimal control. This paper proposes a dissipative formulation of the training of ResNets and neural ODEs for classification problems. Specifically, we consider a variant of the cross-entropy (label smoothing) as a loss function and as a regularization in the stage cost. Based on our dissipative formulation of the training, we prove that the training OCPs for ResNets and neural ODEs alike exhibit the turnpike phenomenon. We illustrate this finding with numerical results for the two spirals and MNIST datasets. Crucially, our training formulation ensures that the transformation of the data from input to output is achieved in the first layers. In the following layers, which constitute the turnpike, the data remains at an equilibrium state and therefore these layers do not contribute to the transformation learned. In principle, these layers can be pruned after training, resulting in a network with only the necessary number of layers thus simplifying tuning of hyperparameters.
Subjects
Deep learning
Dissipativity
Huber loss
Label smoothing
Manifold turnpike
Neural networks
Optimal control
DDC Class
006: Special computer methods
515: Analysis
004: Computer Sciences
Publication version
publishedVersion
Loading...
Name
1-s2.0-S000510982500665X-main.pdf
Type
Main Article
Size
2.6 MB
Format
Adobe PDF