Speed comparisons with Python, Numba, Fortran
A quick comparison of calculating Pi the Monte Carlo way
The classic Monte Carlo "Hello World" is to calculate $\pi$ by generating random numbers and accepting or rejecting them. The algorithm below is taken from one of my graduate textbooks on molecular simulations - I apologize that I can't remember where, and that book is long gone. Essentially one is throwing darts into the unit square and then measuring if it is within the quarter of the unit circle. $\pi$ equals the number of darts in the circle divided by all of the attempts. Since it is a quarter of the unit circle, we multiple by 4.
I thought it would be fun to try to use some of the available tooling to get a speed performance in Python as well as using some Fortan inline since Fortran was the major language I used in graduate school. Note that fortranmagic
is located in the fortran-magic
package.
%load_ext fortranmagic
import numpy as np
from numba import jit, njit, prange
n = 10000000
def calc_pi_p(n):
accept = 0.0
for i in range(n):
r = np.random.random(2)
if (np.dot(r,r) <= 1.0):
accept += 1.0
return 4.0 * accept / float(n)
%%time
calc_pi_p(n)
@jit
def calc_pi_numba(n):
accept = 0.0
for i in range(n):
r = np.random.random(2)
if (np.dot(r,r) <= 1.0):
accept += 1.0
return 4.0 * accept / float(n)
%%time
calc_pi_numba(n)
@njit(parallel=True)
def calc_pi_numba_parallel(n):
accept = 0.0
for i in prange(n):
r = np.random.random(2)
if (np.dot(r,r) <= 1.0):
accept += 1.0
return 4.0 * accept / float(n)
%%time
calc_pi_numba_parallel(n)
%%fortran
function calc_pi_f(n) result(pi)
implicit none
integer(8) :: accept
real(8) :: r(2), pi
integer(8), intent(in) :: n
integer(8) :: i
accept = 0
do i = 1, n
call random_number(r)
if (dot_product(r,r) <= 1.0) then
accept = accept + 1
end if
end do
pi = 4.0d0 * dble(accept)/dble(n)
end function calc_pi_f
%%time
calc_pi_f(n)
Looks like Fortran is the winner in this case. We didn't even parallelize the operation, but Numba does give a signifcant speedup compared to plain Python/Numpy. There might be a way to vectorize some of the operations in Numpy though to perform better, but I failed to find out how to do that in this case.