Geometric learning of latent parameters with Helmholtz Machines

Várady, Csongor-Huba

doi:https://doi.org/10.15480/882.14213

Geometric learning of latent parameters with Helmholtz Machines

Citation Link: https://doi.org/10.15480/882.14213

Publikationstyp

Doctoral Thesis

Date Issued

2025

Sprache

English

Author(s)

Várady, Csongor-Huba

Advisor

Ay, Nihat

Referee

Zemke, Jens

Title Granting Institution

Technische Universität Hamburg

Place of Title Granting Institution

Hamburg

Examination Date

2024-11-28

Institute

Data Science Foundations E-21

TORE-DOI

10.15480/882.14213

TORE-URI

https://tore.tuhh.de/handle/11420/52903

Citation

Technische Universität Hamburg (2025)

In this thesis, we use concepts from Information Geometry (IG), such as Natural Gradient Descent (NG), to improve the training of a Helmholtz Machine (HM) through the design and implementation of a novel algorithm called the Natural Reweighted Wake-Sleep (NRWS).

First, we prove that for any Directed Acyclic Graph (DAG) the associated Fisher Information Matrix (FIM), which describes the geometry of the statistical manifold, has a fine-grained block-diagonal structure that is efficient to invert. By exploiting the fact that the HM is composed of two DAG networks, we adapt its training algorithm into the NRWS implementing NG.

The NRWS not only achieves better performance in the minimum of the optimization loss compared to other training methods, such as the Reweighted Wake-Sleep (RWS) and Bidirectional Helmholtz Machine but also outperforms them in both epochs and wall-clock time. In particular, we present how the NRWS achieves state-of-the-art performance on standard benchmark datasets (MNIST, FashionMNIST, and Toronto Face Dataset) based on the importance sampling estimation of the log-likelihood of the HM.

By adapting Accelerated Gradients (AG) methods to operate within the geometry defined by the FIM of the HM, we further improve the performance of the NRWS. Using first-order AG methods, such as Momentum and Nesterov Momentum, improves the convergence rate of the NRWS without any computational overhead. Additionally, we develop a regularizer method based on the Maximum Entropy Principle, named the Entropy Regularizer (ER), which we show further improves the NRWS by reaching lower optimization loss and narrowing the generalization gap of the algorithm without extra time penalty, which can also be applied to non-geometric training methods. Conveniently, the NRWS framework is compatible with continuous random variables; hence, we show how the FIM can be derived for normally distributed hidden variables.

Finally, we explore the possibilities of using HMs with Convolutional Neural Networks (CNNs) by computing the FIM for such network topologies and showing that the resulting matrix also has a finely-grained block-diagonal structure. We finish by presenting a hypothesis on the difficulties of using CNNs with HMs and NRWS. We make significant contributions to the field of IG and HM, with numerous findings that could be further explored or reused in other research fields. Our results can represent a starting point for future research on improving training algorithms for neural networks and deep learning models using geometric methods, such as the NG.

Subjects

Helmholtz machine

Natural gradient

Natural reweighted wake sleep

DDC Class

006.3: Artificial Intelligence

510: Mathematics

Lizenz

https://creativecommons.org/licenses/by/4.0/

Name

Varady_Csongor_Huba-Geometric_Learning_of_Latent_Parameters_with_Helmholtz_Machines.pdf

Size

11.87 MB

Format

Adobe PDF

Options

Geometric learning of latent parameters with Helmholtz Machines