NumPy Arrays

NumPy's numpy.ndarrays are similar to arrays (please read Buffers and Arrays before reading this) but with the following new conditions.

Basic 1D Examples

Let's wrap the sum() function from totals.c but use NumPy instead of array. The C code remains the same but the type checking/normalisation in Python differs. The following is a correct way to wrap with NumPy.

import numpy as np


def sum_(arr):
    # Ensure `arr` is an array, is of the correct dtype and is C contiguous.
    arr = np.asarray(arr, dtype=np.double, order="C")

    # Find its sum.
    return slug.dll.sum(ptr(arr), len(arr))
  • The dtype=np.double enforces correct data types (see Data types).

  • The order="C" enforces contiguity (see Contiguity).

And for completeness, the cumsum() example for writing an array.

def cumsum(x):
    # Normalise `x`.
    x = np.asarray(x, dtype=np.int32, order="C")

    # Create an empty array of the correct size and dtype.
    # Note that ``order="C"`` is already the default - no need to set it.
    out = np.empty(len(x), dtype=np.int32)

    # Call cumsum() C function.
    slug.dll.cumsum(ptr(x), ptr(out), len(x))
    return out

Contiguity

Contiguity describes how an array's elements are laid out in memory. In C this is always just one after the other in order with no gaps in between. Multi dimensional arrays are flattened column major so a \(2 \times 3\) array is really:

a := [ a[0, 0], a[0, 1], a[0, 2], a[1, 0], a[1, 1], a[1, 2] ]

This is the opposite of Fortran which uses column minor arrays. NumPy, which is written in both C and Fortran, not only bounces between the two formats but invents all kinds of other formats in between.

The effect of contiguity will be easier to explain when you see it go wrong. Let's start by bypassing the contiguity enforcement in the sum example above. The following creates an array of the correct dtype but without setting order and passes it to our C function.

>>> arr = np.arange(10, dtype=np.double)
>>> slug.dll.sum(ptr(arr), len(arr))
45.0
# Compare to the correct answer.
>>> np.sum(arr)
45.0

Hmm, everything seems to be working ok without the order="C". But… that is only because NumPy writes C contiguous arrays by default so arr is already contiguous.

>>> arr.flags.c_contiguous
True

NumPy only creates non-contiguous arrays if it thinks it can avoid a redundant copy operation by using a strided array instead. Slicing with a step is one example of when this may happen.

>>> arr[::2].flags.c_contiguous
False

Instead of copying out every second element of arr into a new array, NumPy has simply said "use the same raw data as arr but skip every second element".

>>> np.shares_memory(arr, arr[::2])
True

Whilst NumPy is smart enough to work with strided arrays:

>>> arr = np.arange(10, dtype=np.double)[::2]
>>> arr
array([0., 2., 4., 6., 8.])
>>> arr.sum()
20.0  # <-- Yes, that is the correct answer.

Our C code isn't. cslug.ptr() actively prohibits us from making this mistake. It raises a ValueError error if its given a non-contiguity buffer:

>>> ptr(arr)
ValueError: ndarray is not C-contiguous and, by using `ptr()`,
you have specified that a contiguous array is required. See
the bottom of `help(cslug.ptr)` for how to resolve this.

To remove this safety net and see why arr should be contiguous, use cslug.nc_ptr():

>>> from cslug import nc_ptr
>>> slug.dll.sum(nc_ptr(arr), len(arr))
10.0  # <-- This is the wrong answer.

Because arr was created using arr = np.arange(10, dtype=np.double)[::2], under the hood arr is just np.arange(10, dtype=np.double) but with a flag set to say, "skip every second element". The above code is therefore really running:

>>> slug.dll.sum(ptr(np.arange(10, dtype=np.double)), 5)
10.0

Another way you can see that a sliced array is the same array as the one it was sliced from is to compare pointers. Again, because arr[::2] is non-contiguous, you will have to use nc_ptr().

>>> a = np.arange(10, dtype=np.double)
>>> ptr(a)
<Void Pointer 94425197003296>
>>> nc_ptr(a[::2])
<Void Pointer 94425197003296>
>>> nc_ptr(a[::2]) == nc_ptr(a)
True

Data types

Data types are controlled in NumPy with the dtype property. The mapping of C types to numpy.dtypes is relatively intuitive par a few special cases. Again there are no safety checks. If you get it wrong, anything can happen.

Note

NumPy's default data types are platform dependent. For example, np.arange(10).dtype returns int32 on Windows but int64 on Linux. If your code works perfectly on one OS but spits out nonsense on another then the cause is likely a forgotten explicit dtype.

Integers

  • The basic int in C is numpy.intc. Not numpy.int which is just the builtin Python int type.

  • Inexact integer types such as short, long and long long are named the same (minus any spaces): numpy.short, numpy.long, numpy.longlong.

  • Exact types such as int32_t have the _t stripped, giving numpy.int32.

  • Unsigned integers have a u prefix: numpy.ushort, numpy.uint16.

  • ssize_t and size_t are numpy.intp and numpy.uintp respectively.

Floats

  • float is numpy.single.

  • double is numpy.double.

String-like types

  • char is numpy.void.

  • wchar_t is numpy.unicode_.

Multidimensional arrays

Assuming C contiguity, multidimensional arrays are really just 1D arrays. Each row is laid out end to end in memory. For example the array:

[[0, 1, 2], [3, 4, 5]]

Is in fact just:

[0, 1, 2, 3, 4, 5]

With the metadata specifying that the array is shaped (2, 3) stored elsewhere (in numpy.ndarray.shape). Some basic arithmetic is needed to convert multidimensional indices into one dimensional ones.

Note

C does allow arrays of arrays but they're not fully featured. Most importantly, all but the outermost dimension must be known at compile time. This rules out almost all practical usages.

Contiguous Multidimensional Arrays

For the less masochistically mathematically inclined, below are the formula you will need to convert multidimensional indices into flat ones. In these formula, shape is the numpy.ndarray.shape attribute which may be passed directly to C as an size_t * using ptr(arr.ctypes.shape).

NumPy syntax

Convert dimensional indices (i, j, k) to flat index (n).

arr[i]

arr[i]

arr[i, j]

arr[i * shape[1] + j]

arr[i, j, k]

arr[(i * shape[1] + j) * shape[2] + k]

And, in the less likely case that you want to convert flat indices back to multidimensional ones, the formula below will undo the formula above. Note that, in Python, you may simply use numpy.ravel_multi_index() and numpy.unravel_index() which perform the same calculation as these formula.

NumPy syntax

Convert flat index (n) to dimensional indices (i, j, k).

arr[i]

arr[i]

arr[i, j]

arr[n // shape[1], n % shape[1]]

arr[i, j, k]

arr[n // (shape[1] * shape[2]), (n // shape[2]) % shape[1], n % shape[2]]

The following trivial reimplementation of numpy.ravel() demonstrates the above for a 3D array.

#include <stddef.h>

void flatten_3D(double * in, double * out, size_t * shape) {
  /* Copy the contents of a contiguous 3D array into a flat array. */
  size_t n = 0;
  for (size_t i = 0; i < shape[0]; i++) {
    for (size_t j = 0; j < shape[1]; j++) {
      for (size_t k = 0; k < shape[2]; k++) {
        // Convert from 3D indices to flat index.
        size_t flat_index = (i * shape[1] + j) * shape[2] + k;
        // Copy the value.
        out[n++] = in[flat_index];
      }
    }
  }
}
def flatten_3D(arr):
    """Wrapper for ``flatten_3D()``."""

    # Normalise `arr`.
    arr = np.asarray(arr, dtype=np.double, order="C")
    # Sanity check that `arr` is 3D.
    assert arr.ndim == 3

    # Create a flat empty output array to populate.
    out = np.empty(arr.size, arr.dtype)

    # Pass `arr`, `out` and the shape of `arr` to the C function.
    # Note the use of `arr.ctypes.shape` which is a ctypes size_t array,
    # instead of `arr.shape` which is a tuple.
    slug.dll.flatten_3D(ptr(arr), ptr(out), ptr(arr.ctypes.shape))

    return out

The more mathematically inclined may observe that n is always equal to flat_index in the above example and that the whole triple for loop is therefore redundant in this case.

Strided Arrays

For those wanting to squeeze out every last clock cycle, strided arrays can be supported directly (without conversion) relatively easily if the number of dimensions is fixed. Use the numpy.ndarray.strides attribute and the formula below to locate elements. In these formula, itemsize is the arr.itemsize attribute.

NumPy syntax

Convert dimensional indices (i, j, k) to flat index (n).

arr[i]

arr[(i * strides[0]) / itemsize]

arr[i, j]

arr[(i * strides[0] + j * strides[1]) / itemsize]

arr[i, j, k]

arr[(i * strides[0] + j * strides[1] + k * strides[2]) / itemsize]

To demonstrate, the code below is functionally equivalent to the flatten example above except that it is faster for non contiguous input.

void flatten_strided_3D(double * in, double * out, size_t * shape, size_t * strides) {
  /* Copy the contents of a non-contiguous 3D array into a flat array. */
  size_t n = 0;
  for (size_t i = 0; i < shape[0]; i++) {
    for (size_t j = 0; j < shape[1]; j++) {
      for (size_t k = 0; k < shape[2]; k++) {
        // Convert from 3D strided indices to flat.
        size_t flat_index = (i * strides[0] + j * strides[1] + k * strides[2]);
        // NumPy's stride offsets are in bytes rather than items.
        flat_index /= sizeof(double);
        // Copy the value.
        out[n++] = in[flat_index];
      }
    }
  }
}
def flatten_strided_3D(arr):
    """Wrapper for ``flatten_strided_3D()``."""

    # Normalise array dtype but no need to enforce contiguity.
    arr = np.asarray(arr, dtype=np.double)
    assert arr.ndim == 3

    # As before, create a flat empty output array to populate.
    out = np.empty(arr.size, arr.dtype)

    # Pass the arrays, the shape and strides to C.
    # Note, you must use `nc_ptr(arr)` instead of `ptr()` because `arr` is
    # not necessarily contiguous.
    slug.dll.flatten_strided_3D(nc_ptr(arr), ptr(out), ptr(arr.ctypes.shape),
                                ptr(arr.ctypes.strides))
    return out

Vectorization

Often in NumPy, parallelizable functions are applied to arrays simply for speed, rather than because they expect an array of inputs. A simplest example of this is:

arr + 1

which will work if arr is scalar or implicitly use a for loop if arr is an array of any dimension. Getting this to work for arbitrary dimensions and strided arrays efficiently is far harder than it looks. Travis E. Oliphant (the original author of NumPy) wrote his PhD on it. There is cheat however that's almost as efficient.

  • Ravel arbitrary dimensional arrays into 1D arrays.

  • Write your C function to use a single for loop to traverse a 1D array.

  • Unravel the output back to the original shape.

So to vectorize the add_1() hello cslug example add a single for loop to the C function:

#include <stddef.h>

void add_1(int * in, int * out, size_t len) {
  /* Add 1 to every element in the 1D array `in`.
     Write to the 1D array `out`. */
  for (size_t i = 0; i < len; i++) {
    out[i] = in[i] + 1;
  }
}

Then to wrap it with:

def add_1(arr):
    # Ensure `arr` is an array, is of the correct dtype and is C contiguous.
    arr = np.asarray(arr, dtype=np.intc, order="C")

    # Store the original shape, then flatten `arr` to make it 1D.
    old_shape = arr.shape
    arr = arr.ravel()

    # Set up an empty output array with the same type and shape as `arr`.
    out = np.empty_like(arr)

    # Call the C function on our 1D arrays.
    slug.dll.add_1(ptr(arr), ptr(out), len(arr))

    # Return the output after restoring the original shape.
    return out.reshape(old_shape)

Ravelling and reshaping have no effect on the underlying contents of an array provided that the array was C contiguous before being reshaped or ravelled.

In the above Python code, order="C" was explicitly specified, meaning that arr and arr.ravel() use the same memory and therefore ptr(arr) == ptr(arr.ravel()) is guaranteed to be true. A consequence of this is that the use of arr.ravel() and out.reshape(old_shape) is redundant and the above wrapper can be simplified to:

def add_1_(arr):
    # Ensure `arr` is an array, is of the correct dtype and is C contiguous.
    arr = np.asarray(arr, dtype=np.intc, order="C")

    # Create an empty output array with the same type and shape as `arr`.
    out = np.empty_like(arr)

    # Call the C function on our arrays.
    # It doesn't matter that they're not 1D - they look the same to C.
    # Note the length parameter ``arr.size`` instead of ``len(arr)``.
    slug.dll.add_1(ptr(arr), ptr(out), arr.size)

    return out