Options
Dynamic structure investigation and spectra prediction of biomolecules using machine learning techniques
Citation Link: https://doi.org/10.15480/882.9689
Publikationstyp
Doctoral Thesis
Date Issued
2024
Sprache
English
Author(s)
Advisor
Referee
Title Granting Institution
Technische Universität Hamburg
Place of Title Granting Institution
Hamburg
Examination Date
2024-06-06
Institute
TORE-DOI
Citation
Technische Universität Hamburg (2024)
The investigation of biomolecular structures and the prediction of their spectra using experimental and theoretical studies in the gas phase represent fundamental steps in comprehending their intrinsic properties and biological functions. Nonetheless, the complexity of the potential energy surface of biomolecules, combined with limitations in computational resources, limits the interpretation of experimental observations. Integrating supervised and unsupervised machine learning (ML) techniques into theoretical calculations is considered as an effective way to address these challenges.
Infrared (IR) and X-ray absorption spectroscopy (XAS) has proven to be powerful experimental techniques to study the electronic and spatial structure of biomolecules such as peptides and proteins. Reproducing and validating the features observed in spectra resulting from these experiments often requires the use of sophisticated ab initio calculations and comprehensive understanding of biomolecules’ configurational space.
In this thesis, I introduced a novel approach in interpretation of IR experimental spectrum of a peptide which aims enhancing the exploratory power of searching configurational space by
combining REMD simulations, unsupervised machine learning, and ab initio calculations. This scheme relies on a set of structural descriptors and data-driven clustering technique which accounts for canonical ensemble of real experimental condition to obtain an accurate computed spectrum. We show that by partitioning the configurational space into subensembles of imilar conformations i.e. clusters, an accurate IR spectrum can be calculated by averaging the IR contribution of each representative conformer in each cluster, weighted according to the population of each cluster. While this approach unravels important fingerprints of experimental spectroscopic data, the calculation of IR and particularly XAS spectra, due to its inherently expensive theoretical computation, is often computationally prohibitive task for even medium-sized molecules.
To remedy the computational obstacles associated with spectra prediction, we develope a data-driven supervised ML frameworks, i.e. graph neural networks which are trained on a custom-generated XAS dataset to find a mapping between structures and spectroscopic signals, thus bypassing the need for expensive ab initio quantum chemistry calculations. To insure the
interpretability of GNN models’ predictions, we employ feature attribution to determine the respective contributions of various atoms in the molecules to the peaks observed in the XAS spectrum. Within this approach, we show that it is possible to link the peaks observed in the spectra to certain core and virtual orbitals from the quantum chemical calculations and obtain an
in-depth understanding of the ML predicted XAS spectrum.
The results presented in this thesis show that the integration of supervised and unsupervised ML techniques can effectively enhance the interpretation of spectroscopic data and make efficient use of the expensive ab initio calculations.
Infrared (IR) and X-ray absorption spectroscopy (XAS) has proven to be powerful experimental techniques to study the electronic and spatial structure of biomolecules such as peptides and proteins. Reproducing and validating the features observed in spectra resulting from these experiments often requires the use of sophisticated ab initio calculations and comprehensive understanding of biomolecules’ configurational space.
In this thesis, I introduced a novel approach in interpretation of IR experimental spectrum of a peptide which aims enhancing the exploratory power of searching configurational space by
combining REMD simulations, unsupervised machine learning, and ab initio calculations. This scheme relies on a set of structural descriptors and data-driven clustering technique which accounts for canonical ensemble of real experimental condition to obtain an accurate computed spectrum. We show that by partitioning the configurational space into subensembles of imilar conformations i.e. clusters, an accurate IR spectrum can be calculated by averaging the IR contribution of each representative conformer in each cluster, weighted according to the population of each cluster. While this approach unravels important fingerprints of experimental spectroscopic data, the calculation of IR and particularly XAS spectra, due to its inherently expensive theoretical computation, is often computationally prohibitive task for even medium-sized molecules.
To remedy the computational obstacles associated with spectra prediction, we develope a data-driven supervised ML frameworks, i.e. graph neural networks which are trained on a custom-generated XAS dataset to find a mapping between structures and spectroscopic signals, thus bypassing the need for expensive ab initio quantum chemistry calculations. To insure the
interpretability of GNN models’ predictions, we employ feature attribution to determine the respective contributions of various atoms in the molecules to the peaks observed in the XAS spectrum. Within this approach, we show that it is possible to link the peaks observed in the spectra to certain core and virtual orbitals from the quantum chemical calculations and obtain an
in-depth understanding of the ML predicted XAS spectrum.
The results presented in this thesis show that the integration of supervised and unsupervised ML techniques can effectively enhance the interpretation of spectroscopic data and make efficient use of the expensive ab initio calculations.
Subjects
Machine learning
Infrared (IR)
X-ray absorption spectroscopy (XAS)
Graph neural networks (GNN)
Explainability AI
DDC Class
540: Chemistry
570: Life Sciences, Biology
510: Mathematics
Funding(s)
Funding Organisations
Loading...
Name
Amir_Kotobi_dynamic_structure_investigation_and_spectra_prediction_of_biomolecules_using_machine_learning_techniques.pdf
Size
26.45 MB
Format
Adobe PDF