IEEE-754 precision-p base-β Arithmetic Implemented in binary

Rump, Siegfried M.

IEEE-754 precision-p base-β Arithmetic Implemented in binary

Publikationstyp

Journal Article

Date Issued

2023-12-15

Sprache

English

Author(s)

Rump, Siegfried M.

Zuverlässiges Rechnen E-19 (H)

TORE-URI

https://hdl.handle.net/11420/45259

Journal

ACM transactions on mathematical software

Volume

49

Issue

4

Article Number

32

Citation

ACM Transactions on Mathematical Software 49 (4): 32 (2023-12-15)

Publisher DOI

10.1145/3596218

Scopus ID

2-s2.0-85181742863

Publisher

Association for Computing Machinery

We show how an IEEE-754 conformant precision-p base-β arithmetic can be implemented based on some binary floating-point and/or integer arithmetic. This includes the four basic operations and square root subject to the five IEEE-754 rounding modes, namely the nearest roundings with roundTiesToEven and roundTiesToAway, the directed roundings downwards and upwards, as well as rounding towards zero. Exceptional values like ∞ of NaN are covered according to the IEEE-754 arithmetic standard. The results of the precision-p base-β operations are computed using some underlying precision-q binary arithmetic. We distinguish two cases. When using a precision-q binary integer arithmetic, the base-β precision p is limited for all operations by β2p ≤ 2q, whereas using a precision-q binary floating-point arithmetic imposes stronger limits on the base-β precision, namely β2p ≤ 2q for addition and multiplication, β2p ≤ 2q-1 for division and β2p ≤ 2q-3 for the square root. Those limitations cannot be improved. The algorithms are implemented in a Matlab/Octave flbeta-toolbox with the choice of using uint64 or binary64 as underlying arithmetic. The former allows larger precisions, the latter is advantageous for the square root, whereas computing times are similar. The flbeta-toolbox offers precision-p base-β scalar, vector and matrix operations including sparse matrices as well as corresponding interval operations. The base β can be chosen in the range β [2,64]. The flbeta-toolbox will be part of Version 13 of INTLAB [18], the Matlab/Octave toolbox for reliable computing.

Subjects

base-β

double rounding

Floating-point arithmetic

IEEE-754

interval arithmetic

INTLAB

precision-p

DDC Class

510: Mathematics

Options

IEEE-754 precision-p base-β Arithmetic Implemented in binary