Pure Python is slow at number crunching
Want C-like speed but without writing C (or Fortran!)
Many algorithms have irregular data access, per-element branching, etc.
Fit for interactive use
Mandelbrot (20 iterations):
CPython | 1x |
Numpy array-wide operations | 13x |
Numba (CPU) | 120x |
Numba (NVidia Tesla K20c) | 2100x |
Constant time arithmetic series
Assembler output
2.6, 2.7, 3.3, 3.4, 3.5
Can run side by side with regular Python code
int8
, int16
, int32
, int64
, uint8
, ...(int8, float64)
)Opens opportunities for inlining and other optimizations
Constructors : np.empty
, etc.
Iterating, indexing, slicing
Reductions: .argmax()
, .prod()
, etc.
Scalar types and values (including datetime64
and timedelta64
)
numpy.random
@jit-decorate a function to designate it for JIT compilation
Automatic lazy compilation (recommended):
@numba.jit
def my_function(x, y, z):
...
Manual specialization:
@numba.jit("(int32, float64, float64)")
def my_function(x, y, z):
...
N-core scalability by releasing the Global Interpreter Lock:
@numba.jit(nogil=True)
def my_function(x, y, z):
...
No protection from race conditions!
Tip
Use concurrent.futures.ThreadPoolExecutor
on Python 3
CPython | 1x |
Numba (CPU) | 130x |
Fortran | 275x |
Recommended: precompiled binaries with Anaconda or Miniconda:
conda install numba
Otherwise: install LLVM 3.6.x, compile llvmlite, install numba from source