Options
Geometric learning of latent parameters with Helmholtz Machines
Citation Link: https://doi.org/10.15480/882.14213
Publikationstyp
Doctoral Thesis
Date Issued
2025
Sprache
English
Author(s)
Várady, Csongor-Huba
Advisor
Referee
Title Granting Institution
Technische Universität Hamburg
Place of Title Granting Institution
Hamburg
Examination Date
2024-11-28
Institute
TORE-DOI
In this thesis, we use concepts from Information Geometry (IG), such as Natural Gradient Descent (NG), to improve the training of a Helmholtz Machine (HM) through the design and implementation of a novel algorithm called the Natural Reweighted Wake-Sleep (NRWS).
First, we prove that for any Directed Acyclic Graph (DAG) the associated Fisher Information Matrix (FIM), which describes the geometry of the statistical manifold, has a fine-grained block-diagonal structure that is efficient to invert. By exploiting the fact that the HM is composed of two DAG networks, we adapt its training algorithm into the NRWS implementing NG.
The NRWS not only achieves better performance in the minimum of the optimization loss compared to other training methods, such as the Reweighted Wake-Sleep (RWS) and Bidirectional Helmholtz Machine but also outperforms them in both epochs and wall-clock time. In particular, we present how the NRWS achieves state-of-the-art performance on standard benchmark datasets (MNIST, FashionMNIST, and Toronto Face Dataset) based on the importance sampling estimation of the log-likelihood of the HM.
By adapting Accelerated Gradients (AG) methods to operate within the geometry defined by the FIM of the HM, we further improve the performance of the NRWS. Using first-order AG methods, such as Momentum and Nesterov Momentum, improves the convergence rate of the NRWS without any computational overhead. Additionally, we develop a regularizer method based on the Maximum Entropy Principle, named the Entropy Regularizer (ER), which we show further improves the NRWS by reaching lower optimization loss and narrowing the generalization gap of the algorithm without extra time penalty, which can also be applied to non-geometric training methods. Conveniently, the NRWS framework is compatible with continuous random variables; hence, we show how the FIM can be derived for normally distributed hidden variables.
Finally, we explore the possibilities of using HMs with Convolutional Neural Networks (CNNs) by computing the FIM for such network topologies and showing that the resulting matrix also has a finely-grained block-diagonal structure. We finish by presenting a hypothesis on the difficulties of using CNNs with HMs and NRWS. We make significant contributions to the field of IG and HM, with numerous findings that could be further explored or reused in other research fields. Our results can represent a starting point for future research on improving training algorithms for neural networks and deep learning models using geometric methods, such as the NG.
First, we prove that for any Directed Acyclic Graph (DAG) the associated Fisher Information Matrix (FIM), which describes the geometry of the statistical manifold, has a fine-grained block-diagonal structure that is efficient to invert. By exploiting the fact that the HM is composed of two DAG networks, we adapt its training algorithm into the NRWS implementing NG.
The NRWS not only achieves better performance in the minimum of the optimization loss compared to other training methods, such as the Reweighted Wake-Sleep (RWS) and Bidirectional Helmholtz Machine but also outperforms them in both epochs and wall-clock time. In particular, we present how the NRWS achieves state-of-the-art performance on standard benchmark datasets (MNIST, FashionMNIST, and Toronto Face Dataset) based on the importance sampling estimation of the log-likelihood of the HM.
By adapting Accelerated Gradients (AG) methods to operate within the geometry defined by the FIM of the HM, we further improve the performance of the NRWS. Using first-order AG methods, such as Momentum and Nesterov Momentum, improves the convergence rate of the NRWS without any computational overhead. Additionally, we develop a regularizer method based on the Maximum Entropy Principle, named the Entropy Regularizer (ER), which we show further improves the NRWS by reaching lower optimization loss and narrowing the generalization gap of the algorithm without extra time penalty, which can also be applied to non-geometric training methods. Conveniently, the NRWS framework is compatible with continuous random variables; hence, we show how the FIM can be derived for normally distributed hidden variables.
Finally, we explore the possibilities of using HMs with Convolutional Neural Networks (CNNs) by computing the FIM for such network topologies and showing that the resulting matrix also has a finely-grained block-diagonal structure. We finish by presenting a hypothesis on the difficulties of using CNNs with HMs and NRWS. We make significant contributions to the field of IG and HM, with numerous findings that could be further explored or reused in other research fields. Our results can represent a starting point for future research on improving training algorithms for neural networks and deep learning models using geometric methods, such as the NG.
Subjects
Helmholtz machine
Natural gradient
Natural reweighted wake sleep
DDC Class
006.3: Artificial Intelligence
510: Mathematics
Loading...
Name
Varady_Csongor_Huba-Geometric_Learning_of_Latent_Parameters_with_Helmholtz_Machines.pdf
Size
11.87 MB
Format
Adobe PDF