The classic Monte Carlo "Hello World" is to calculate $\pi$ by generating random numbers and accepting or rejecting them. The algorithm below is taken from one of my graduate textbooks on molecular simulations - I apologize that I can't remember where, and that book is long gone. Essentially one is throwing darts into the unit square and then measuring if it is within the quarter of the unit circle. $\pi$ equals the number of darts in the circle divided by all of the attempts. Since it is a quarter of the unit circle, we multiple by 4.
I thought it would be fun to try to use some of the available tooling to get a speed performance in Python as well as using some Fortan inline since Fortran was the major language I used in graduate school. Note that
fortranmagic is located in the
%load_ext fortranmagic import numpy as np from numba import jit, njit, prange
n = 10000000
def calc_pi_p(n): accept = 0.0 for i in range(n): r = np.random.random(2) if (np.dot(r,r) <= 1.0): accept += 1.0 return 4.0 * accept / float(n)
CPU times: user 47.9 s, sys: 25.6 ms, total: 47.9 s Wall time: 48.1 s
@jit def calc_pi_numba(n): accept = 0.0 for i in range(n): r = np.random.random(2) if (np.dot(r,r) <= 1.0): accept += 1.0 return 4.0 * accept / float(n)
CPU times: user 2.29 s, sys: 13.2 ms, total: 2.31 s Wall time: 2.31 s
@njit(parallel=True) def calc_pi_numba_parallel(n): accept = 0.0 for i in prange(n): r = np.random.random(2) if (np.dot(r,r) <= 1.0): accept += 1.0 return 4.0 * accept / float(n)
CPU times: user 1.24 s, sys: 9.69 ms, total: 1.25 s Wall time: 1.09 s
%%fortran function calc_pi_f(n) result(pi) implicit none integer(8) :: accept real(8) :: r(2), pi integer(8), intent(in) :: n integer(8) :: i accept = 0 do i = 1, n call random_number(r) if (dot_product(r,r) <= 1.0) then accept = accept + 1 end if end do pi = 4.0d0 * dble(accept)/dble(n) end function calc_pi_f
CPU times: user 406 ms, sys: 0 ns, total: 406 ms Wall time: 405 ms
Looks like Fortran is the winner in this case. We didn't even parallelize the operation, but Numba does give a signifcant speedup compared to plain Python/Numpy. There might be a way to vectorize some of the operations in Numpy though to perform better, but I failed to find out how to do that in this case.