Borweins’ formula for \(\pi\) using GNU Multi-Precision library

I was watching this Mathologer’s video about “Pi Must Die”. He talks about the vanity of memorizing the digits of \(\pi\). Well, I thought, let me kill some time by computing the digits of pi for others to memorize. If you’re looking for a faster version than the code below, check out Y-cruncher.

There’re many formulas for computing \(\pi\). Y-cruncher provides two of them:

Ramanujan’s formula (pdf) (1913)

\[\frac{1}{\pi} = \frac{\sqrt{8}}{9801} \sum_{n=0}^{\infty}\frac{(4n)! (26390n + 1103)}{(n!)^4\;396^{4n}}\]

Chudnovsky formula (1987)

\[\frac{1}{\pi} = 12 \sum_{n=0}^\infty (-1)^n \frac{(6n)! (545140134n + 13591409)}{(3n)(n!)^3 (640320)^{3n+3/2}}\]

Interestingly, Ramanujan didn’t give a proof of his equation in 1913–14. It was Jonathan Borwein and Peter Borwein who proved it in 1987. Both these formulas converge very fast. Each term of the formulas add about 8 digits and 14 digits of \(\pi\) respectively.

Quadratic convergence

That’s less impressive compared to what the Borwein brothers discovered next. They came up with recursive formulas which give millions of digits per recursion! The first one which has a quadratic (\(O(2^n)\)) convergence:

\[\begin{aligned} x_1 &= \frac{1}{2}\left(2^{1/4} + \frac{1}{2^{1/4}}\right) \\ y_1 &= 2^{1/4} \\ \pi_0 &= 2 + 2^{1/2} \\ \\ x_{n+1} &= \frac{1}{2}\left(x_n^{1/2} + x_n^{-1/2}\right) \\ y_{n+1} &= \frac{y_n x_n^{1/2} + x_n^{-1/2}}{y_n + 1} \\ \pi_n &= \pi_{n-1} \left( \frac{x_n + 1}{y_n + 1} \right) \end{aligned}\]

where \(n \geq 1\). This formula converges so fast that 10 iterations will give you 2000 digits but 20 iterations will give 2 million digits! And 30 iterations will give you more than a billion digits.

I’ve implemented the above algorithm in C using GMP. The latest implementation can be found in here. GNU Multiprecision Library provides nice API for arbitrary-precision computing. The floating point routines all start with mpf_ prefix. You’ve to use a lot of temporary variables because C lacks operator overloading. Probably the C++ version is more readable. One gotcha with GMP is that you shouldn’t call mpf_clear() unless you want to free the memory allocated for that variable. I got a core dump and double-free errors at the beginning.

The main body of the loop looks thus:

    // Iteration for 'y'
    mpf_sqrt(tmp[3], x);                         // √x
    mpf_ui_div(tmp[4], 1, tmp[3]);               // 1/√x
    mpf_mul(tmp[5], tmp[3], y);                  // y√x
    mpf_add(tmp[6], tmp[4], tmp[5]);             // y√x + (1/√x)
    mpf_add_ui(tmp[7], y, 1);                    // 1+y
    mpf_div(y, tmp[6], tmp[7]);                  // y -> (y√x + (1/√x)) / (1+y)

    // Iteration for 'pi'
    mpf_add_ui(tmp[8], x, 1);                    // 1+x
    mpf_div(tmp[9], tmp[8], tmp[7]);             // (1+x)/(1+y)
    mpf_mul(tmp[10], pi, tmp[9]);                // pi * (1+x)/(1+y)
    mpf_set(pi, tmp[10]);                        // overwrite pi with above value

    // Iteration for 'x'
    mpf_sqrt(tmp[11], x);                        // √x
    mpf_ui_div(tmp[12], 1, tmp[11]);             // 1/√x
    mpf_add(tmp[13], tmp[11], tmp[12]);          // √x + 1/√x
    mpf_div_ui(x, tmp[13], 2);                   // x -> 1/2 (√x + 1/√x)

The GMP API is well- documented. The ui in these functions stand for unsigned int.

You can’t use this program to compute trillions of digits because you’ll hit GMP’s limit for floating point resolution (max \(2^{68719476736}\)). You should be fine upto a few billion digits though.

The above program runs pretty fast. Computing 10 million digits took about 3 minutes on a 2010-laptop. A billion digits took about a day.

Quartic convergence and an errorneous formula

Can we do better? Borweins’ say yes! They came up with another recurrence which converges quartically (\(O(4^n)\)). This is a so fast that just 10 iterations will spit out 2 million digits! 15 iterations are enough for 2 billion digits.

The Mathworld page provides the algorithm. However, they have this equation wrong (eqn 11):

\[\alpha_n = (1 + y_n)^4 \alpha_{n-1} - 4^n \sqrt{r} y_n (1+y_n+y_n^2)\]

where \(r = 2\). After some head-scractching as to why my program didn’t work, I went back to the original paper (pdf) by Borweins’ and Bailey. The correct equation is:

\[\alpha_{n+1} = (1 + y_{n+1})^4 \alpha_{n} - 2^{2n+3} y_{n+1} (1+y_{n+1}+y_{n+1}^2)\]

which is same as (\(n \to n-1\))

\[\alpha_n = (1 + y_n)^4 \alpha_{n-1} - 2^{2n+1} y_{n} (1+y_{n}+y_{n}^2)\]

That is, the constant \(4^n \sqrt{2}\) should actually be \(2^{2n+1} = 4^n \cdot 2\).

The implementation is available in the same place. Obviously, this program runs faster that the quadratic one. 10 million digits took about a minute or so.

Quintic convergence

Why stop at fourth-power? There’s also a quintic (\(O(5^n)\)) convergence algorithm, described in Mathworld. Well, it’s got to wait for another day.

Caveat

Make sure to call mpf_set_default_prec() before calling any other GMP functions, including variable init’s or assignments.
These algorithms are not self-correcting. Meaning, you’ve to perform all iterations of the loop at the maximum precision you want. This is a bottleneck as precision gets higher (for example, 1 billion digits) because every floating-point operation costs more time. The Chudnovsky sum is not prone to this because each term adds different set of digits to \(\pi\).
These algorithms are not parallalizable because the current value of \(\pi_{n}\) depends on \(\pi_{n-1}\) computed in the previous iteration. Once again, summation-style formulas are immune to this.

If you want an even faster version, just you gmp-chudnovsky.c from the GMP website itself.