Pure Python is slow at number crunching
Want C-like speed but without writing C (or Fortran!)
Many algorithms have irregular data access, per-element branching, etc.
Fit for interactive use
Mandelbrot (20 iterations):
| CPython | 1x | 
| Numpy array-wide operations | 13x | 
| Numba (CPU) | 120x | 
| Numba (NVidia Tesla K20c) | 2100x | 
Constant time arithmetic series
 
Assembler output
 
2.6, 2.7, 3.3, 3.4, 3.5
Can run side by side with regular Python code
 
int8, int16, int32, int64, uint8, ...(int8, float64))Opens opportunities for inlining and other optimizations
Constructors : np.empty, etc.
Iterating, indexing, slicing
Reductions: .argmax(), .prod(), etc.
Scalar types and values (including datetime64 and timedelta64)
numpy.random
@jit-decorate a function to designate it for JIT compilation
Automatic lazy compilation (recommended):
@numba.jit
def my_function(x, y, z):
    ...
Manual specialization:
@numba.jit("(int32, float64, float64)")
def my_function(x, y, z):
    ...
N-core scalability by releasing the Global Interpreter Lock:
@numba.jit(nogil=True)
def my_function(x, y, z):
    ...
No protection from race conditions!
Tip
Use concurrent.futures.ThreadPoolExecutor on Python 3
 
 
| CPython | 1x | 
| Numba (CPU) | 130x | 
| Fortran | 275x | 
 
Recommended: precompiled binaries with Anaconda or Miniconda:
conda install numba
Otherwise: install LLVM 3.6.x, compile llvmlite, install numba from source