###### Options

# Algebraic Statistics

Citation Link: https://doi.org/10.15480/882.1268

Publikationstyp

Book

Publikationsdatum

2015

Sprache

English

Author

Institut

Algebraic statistics brings together ideas from algebraic geometry, commutative algebra, and combinatorics to address problems in statistics and its applications. Computer algebra provides powerful tools for the study of algorithms and software. However, these tools are rarely prepared to address statistical challenges and therefore new algebraic results need often be developed. This way of interplay between algebra and statistics fertilizes both disciplines.

Algebraic statistics is a relatively new branch of mathematics that developed and changed rapidly over the last ten years. The seminal work in this field was the paper of Diaconis and Sturmfels (1998) introducing the notion of Markov bases for toric statistical models and showing the connection to commutative algebra. Later on, the connection between algebra and statistics spread to a number of different areas including parametric inference, phylogenetic invariants, and algebraic tools for maximum

likelihood estimation. These connection were highlighted in the celebrated book Algebraic Statistics for Computational Biology of Pachter and Sturmfels (2005) and subsequent publications.

In this report, statistical models for discrete data are viewed as solutions of systems of polynomial equations. This allows to treat statistical models for sequence alignment, hidden Markov models, and phylogenetic tree models. These models are connected in the sense that if they are interpreted in the tropical algebra, the famous dynamic programming algorithms (Needleman-Wunsch, Viterbi, and Felsenstein) occur in a natural manner. More generally, if the models are interpreted in a higher dimensional

analogue of the tropical algebra, the polytope algebra, parametric versions of these dynamic programming algorithms can be established.

Markov bases allow to sample data in a given fibre using Markov chain Monte Carlo algorithms. In this way, Markov bases provide a means to increase the sample size and make statistical tests in inferential statistics more reliable.We will calculate Markov bases using Groebner bases in commutative polynomial rings.

The manuscript grew out of lectures on algebraic statistics held for Master students of Computer Science at the Hamburg University of Technology. It appears that the first lecture held in the summer term 2008 was the first course of this kind in Germany. The current manuscript is the basis of a four-hour introductory course. The use of computer algebra systems is at the heart of the course. Maple is employed for symbolic computations, Singular for algebraic computations, and R for statistical computations. The second edition at hand is just a streamlined version of the first one.

Algebraic statistics is a relatively new branch of mathematics that developed and changed rapidly over the last ten years. The seminal work in this field was the paper of Diaconis and Sturmfels (1998) introducing the notion of Markov bases for toric statistical models and showing the connection to commutative algebra. Later on, the connection between algebra and statistics spread to a number of different areas including parametric inference, phylogenetic invariants, and algebraic tools for maximum

likelihood estimation. These connection were highlighted in the celebrated book Algebraic Statistics for Computational Biology of Pachter and Sturmfels (2005) and subsequent publications.

In this report, statistical models for discrete data are viewed as solutions of systems of polynomial equations. This allows to treat statistical models for sequence alignment, hidden Markov models, and phylogenetic tree models. These models are connected in the sense that if they are interpreted in the tropical algebra, the famous dynamic programming algorithms (Needleman-Wunsch, Viterbi, and Felsenstein) occur in a natural manner. More generally, if the models are interpreted in a higher dimensional

analogue of the tropical algebra, the polytope algebra, parametric versions of these dynamic programming algorithms can be established.

Markov bases allow to sample data in a given fibre using Markov chain Monte Carlo algorithms. In this way, Markov bases provide a means to increase the sample size and make statistical tests in inferential statistics more reliable.We will calculate Markov bases using Groebner bases in commutative polynomial rings.

The manuscript grew out of lectures on algebraic statistics held for Master students of Computer Science at the Hamburg University of Technology. It appears that the first lecture held in the summer term 2008 was the first course of this kind in Germany. The current manuscript is the basis of a four-hour introductory course. The use of computer algebra systems is at the heart of the course. Maple is employed for symbolic computations, Singular for algebraic computations, and R for statistical computations. The second edition at hand is just a streamlined version of the first one.

Schlagworte

Sequence alignment, hidden Markov model, tree models, Gröbner bases, Markov bases, inferential statistics

DDC Class

510: Mathematik