Numpy and the ndarray
When working with large amounts of data, the built-in python datastructures begin to show their limitations.
- The standard
listis designed for heterogenous data, meaning it uses memory less efficiently than something designed to work with homogenous numerical data. - Numpy's array structures are optimized for doing mathematics such as linear algebra
- They are the backbone of many SciPy libraries and data structures.
Note: It is strongly advised to consult Numpy's official documentation. The official docs are very well written, and go into more depth than this presentation.
Basic operations on numpy arrays and their standard-library counterparts
# uci_bootcamp_2021/examples/numpy_example.py
import numpy as np
# Libraries in the SciPy stack tend to shorthand their module names upon import.
# In an effort to be consistent with existing documentation, I will follow that convention.
# Generating some test data
vanilla_list = [randbelow(2 ** 32) for _ in range(500)]
# Declaring an 1d array from the vanilla list.
array = np.array(vanilla_list)
# Taking the sum via vanilla means
print(sum(vanilla_list))
# Taking the sum via numpy
print(array.sum())
# Taking the subset of values that are even
evens = array[array % 2 == 0]
# and the equivalent pure-python list:
evens_list = [value for value in vanilla_list if value % 2 == 0] # "list comprehension"
# multiplying all values of the array by a scalar
double_array = array * 2
double_list = [value * 2 for value in vanilla_list] # "list comprehension"
# Taking the dot product of two arrays:
v1 = np.array([1, 2, 3])
v2 = np.array([4, 5, 6])
dot = v1.dot(v2)
# alternatively,
dot = v1 @ v2
print(dot)