Programming for Data Science¶

NumPy¶

Dr. Bhargavi R

SCOPE, VIT Chennai

NumPy - Basics¶

  • Numpy is the library for scientific (mostly matrix manipulation) computing.
  • numpy provides a high-performance multidimensional array object
  • Also, provides tools for working with these Ndarrays
  • The package that you must know in Programming for Data Science.
  • It helps you in many ways in future (not only for this course)

Array¶

  • A numpy array is a grid of values, all of them same type. (Think! Not as python lists)
  • Any element in the numpy array is indexed by a tuple of non-negative integers each corresponding to a dimension.
  • The number of dimensions is the rank of the array.
  • Shape of an array is a tuple of integers giving the size of the array along each dimension.
In [1]:
# First import numpy library
import numpy as np
In [2]:
%%timeit -n 100
a = [[1, 2, 3],
    [4, 5, 6],
    [7, 8,  9]]
b = [[1, 2, 3],
    [4, 5, 6],
    [7, 8,  9]]
c = [[0, 0, 0],
    [0, 0, 0],
    [0, 0, 0]]
for i in range(0, 3):
    for j in range(0, 3):
        for k in range(0,  3):
            c[i][j] += a[i][k] * b[k][j]
# print(c)
The slowest run took 6.80 times longer than the fastest. This could mean that an intermediate result is being cached.
20.8 µs ± 22.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [3]:
%%timeit -n 100

a = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8,  9]])
b = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8,  9]])
c = a @ b
# print(c)
7.61 µs ± 826 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)

numpy arrays¶

In [4]:
array1            =  np.array([2,5,7,4,5,9])
print(array1)
[2 5 7 4 5 9]
In [5]:
array2            =  np.array((3,5,7,4,5,9))
print(array2)
[3 5 7 4 5 9]
In [6]:
my_list           =  [1,4,6,8,2,9]
array_from_list   =  np.array(my_list)
print(array_from_list)
[1 4 6 8 2 9]
In [7]:
array_with_range  =  np.array(range(1,11))
print(array_with_range)
[ 1  2  3  4  5  6  7  8  9 10]
In [8]:
array_with_arange =  np.arange(5)
print(array_with_arange)
[0 1 2 3 4]
In [9]:
# Check the type of the array
print(type(array1))
<class 'numpy.ndarray'>
In [10]:
print('Shape of the array - ',array_from_list.shape)
Shape of the array -  (6,)
In [11]:
# Indiviual elements can be accessed as folllows

print('array2[0]'.ljust(10, ' ') , ':', array2[0])
print('array1[-1]'.ljust(10, ' '), ':', array1[-1])
array2[0]  : 3
array1[-1] : 9

Matrices¶

$$ \begin{bmatrix} 1 & 2 & 3 & 4\\ 6 & 7 & 8 & 9\\ 11 & 12 & 13 & 14\\ 16 & 17 & 18 & 19 \end{bmatrix} $$
  • Questions that can be asked on a matrix?
  • What's the dimension (#rows, #columns)?
  • What's the total # of elements? Transpose? Inverse?
  • What's the $2^{nd}$ row $3^{rd}$ element? What's the $4^{th}$ column?

Let's see how numpy answers these questions.

In [12]:
array2D = np.array([
    [1, 2, 3, 4, 5],
    [6, 7, 8, 9, 10],
    [11, 12, 13, 14, 15],
    [16, 17, 18, 19, 20],
    [21, 22, 23, 24, 25]
])                               #Hey, it's a 5x5 matrix. Can you recognize it?

print(array2D)
[[ 1  2  3  4  5]
 [ 6  7  8  9 10]
 [11 12 13 14 15]
 [16 17 18 19 20]
 [21 22 23 24 25]]
In [13]:
array2D.shape   #'shape' is an attribute which tells the shape (dimension) of the matrix 
                # when we say array, it can be 1d or 2d (matrix) or nd.
Out[13]:
(5, 5)
In [14]:
print(array2D.size)              #Number of elements in this matrix (numpy array)
25
In [15]:
print('2nd row 3rd element is', array2D[1,2])
2nd row 3rd element is 8
In [16]:
matrix_2_3 = np.array([
    [1, 2, 3],
    [4, 5, 6]
])
print(matrix_2_3.shape)
print('original Matrix',matrix_2_3)
#What if we can change (reshape) it to 3 by 2? Need not be Transpose always!
matrix_3_2 = matrix_2_3.reshape((3, 2))
print('Reshaped matrix',matrix_3_2)
matrix_3_2.shape
(2, 3)
original Matrix [[1 2 3]
 [4 5 6]]
Reshaped matrix [[1 2]
 [3 4]
 [5 6]]
Out[16]:
(3, 2)

Slicing¶

Let's extract rows and columns of a matrix¶

In [17]:
third_row = array2D[3]                    #Extract 4th row (0 indexing) from the matrix
print('Third row of matrix:', third_row)
Third row of matrix: [16 17 18 19 20]
In [18]:
third_col = array2D[:, 3]
print('Third column of matrix:', third_col)
Third column of matrix: [ 4  9 14 19 24]
In [19]:
# properties of the rows and columns
print('third_row.shape  :', third_row.shape)
print('third_col.shape  :', third_col.shape)
print('third_row\'s type :', type(third_row))
print('third_col\'s type :', type(third_col))
third_row.shape  : (5,)
third_col.shape  : (5,)
third_row's type : <class 'numpy.ndarray'>
third_col's type : <class 'numpy.ndarray'>
In [20]:
# Extract a sub-matrix

sub_matrix = array2D[1:3, 2:4]        #I need 2nd (0 based indexing) row to 3rd row
                                      #I need 3rd column to 4th column (exclude 5th column [4th index])
    
print('sub_matrix.shape  :', sub_matrix.shape)
print('sub_matrix\'s type :', type(sub_matrix))
sub_matrix
sub_matrix.shape  : (2, 2)
sub_matrix's type : <class 'numpy.ndarray'>
Out[20]:
array([[ 8,  9],
       [13, 14]])

Difference Between Indexing and Slicing¶

  • Mixing integer indexing with slices yields an array of lower rank
  • while using only slices yields an array of the same rank as the original array
In [21]:
a = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

a.ndim           #Number of array dimensions. Simply, you need 2 values to access an element.
Out[21]:
2
In [22]:
b = np.array([
    [1, 2],
    [4, 5],
    [6, 7]
])

b.ndim
Out[22]:
2
In [23]:
a = np.array([
    [1,2,3,4], 
    [5,6,7,8], 
    [9,10,11,12]
])

row_r1 = a[1, :]            # Rank 1 view of the second row of a
row_r2 = a[1:2, :]          # Rank 2 view of the second row of a

print(row_r1, 'has shape -'.rjust(14, ' '), row_r1.shape)  
print(row_r2, 'has shape -'.rjust(12, ' '), row_r2.shape)  
[5 6 7 8]    has shape - (4,)
[[5 6 7 8]]  has shape - (1, 4)
In [24]:
col_r1 = a[:, 1]
col_r2 = a[:, 1:2]
print(col_r1, 'has shape -', col_r1.shape)  
print(col_r2, 'has shape -', col_r2.shape)  
                                           
[ 2  6 10] has shape - (3,)
[[ 2]
 [ 6]
 [10]] has shape - (3, 1)
  • The resulting array with slicing will always be a subarray view of the original array.

  • In contrast, integer array indexing allows to construct arbitrary arrays using the data from another array.

In [25]:
original = np.arange(9).reshape(3,3)
# original = np.array([
#     [0, 1, 2], 
#     [3, 4, 5], 
#     [6, 7, 8]
# ])
print(original)
[[0 1 2]
 [3 4 5]
 [6 7 8]]
In [26]:
sub = original[1: , 1: ]
print(sub)
[[4 5]
 [7 8]]
In [27]:
sub[0, 1] = 22
print(sub)
[[ 4 22]
 [ 7  8]]
In [28]:
print(original)
[[ 0  1  2]
 [ 3  4 22]
 [ 6  7  8]]
In [29]:
a = np.array([ [1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12] ])
# Use slicing to pull out the subarray consisting of the first 2 rows
# and columns 1 and 2
b = a[:2, 1:3]
print(b)
[[2 3]
 [6 7]]
In [30]:
# A slice of an array is a view into the same data, so modifying it
# will modify the original array.
print(a[0, 1])   # Prints "2"
b[0, 0] = 77     # b[0, 0] is the same piece of data as a[0, 1]
print(a[0, 1])   # Prints "77"
print(a)
2
77
[[ 1 77  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
In [31]:
sub[: , :] = sub + 10
print(sub)
print(original)
[[14 32]
 [17 18]]
[[ 0  1  2]
 [ 3 14 32]
 [ 6 17 18]]

Integer Array Indexing¶

In [32]:
a = np.array([[1,2], [3, 4], [5, 6]])

sub = a[[0, 1, 2, 1] , [0, 1, 0, 1]]
print(sub)    # Prints "[1 4 5 4]"
print('sub\'s shape:',  sub.shape)

# The above example of integer array indexing is equivalent to the followig:
sub = np.array([a[0, 0], a[1, 1], a[2, 0], a[1, 1]])
print(sub)  
print('Shape of sub is -' ,  sub.shape)
[1 4 5 4]
sub's shape: (4,)
[1 4 5 4]
Shape of sub is - (4,)

Boolean Indexing¶

Will be a helpful tool when we work with pandas library or in other DS task.¶

In [33]:
a = np.array([i for i in range(-3, 6)]).reshape(3, 3)
print(a)
[[-3 -2 -1]
 [ 0  1  2]
 [ 3  4  5]]
In [34]:
bool_indx = a < 0
print(bool_indx)
[[ True  True  True]
 [False False False]
 [False False False]]
In [35]:
c = a[bool_indx]
# c = a[a<0]
print(c)
[-3 -2 -1]

Basic (Matrix) Operations¶

In [36]:
a1 = np.array(range(1,10)).reshape(3, 3)    #Create a numpy array of 1x10 and reshape it to 3x3 matrix
print(a1)
[[1 2 3]
 [4 5 6]
 [7 8 9]]
In [37]:
a2 = np.identity(3)   #Identity Matrix
print(a2)
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
In [38]:
# Add a1 and a2
a_add = a1 + a2
print('Result of a1 + a2 is', a_add, sep='\n')       #We've seen this 'sep' argument in print long back

# Alternatively
a_add = np.add(a1, a2)
print('Result of a1 + a2 is', a_add, sep='\n')       #You should have recognized by this time what 'sep' does.
Result of a1 + a2 is
[[ 2.  2.  3.]
 [ 4.  6.  6.]
 [ 7.  8. 10.]]
Result of a1 + a2 is
[[ 2.  2.  3.]
 [ 4.  6.  6.]
 [ 7.  8. 10.]]
In [39]:
# sbtract a2 from a1
a_sub = a1 - a2
print('Result of a1 - a2 is', a_sub, sep='\n')   #Still you've not recognized 'sep'

# Alternatively
a_sub = np.subtract(a1, a2)
print('Result of a1 - a2 is', a_sub, sep='\n')   #by default print insert space between it's arguments
                                                 #we're changing it to a newline character by modifying
                                                 #'sep' argument
Result of a1 - a2 is
[[0. 2. 3.]
 [4. 4. 6.]
 [7. 8. 8.]]
Result of a1 - a2 is
[[0. 2. 3.]
 [4. 4. 6.]
 [7. 8. 8.]]
In [40]:
# Multiply a1 and a2
# Element by element multiplication

a_multiply = a1 * a2                               # Equivalent to np.multiply(a1, a2)
print('Result of a1 * a2' , a_multiply, sep='\n')  #You've got it. Good.
Result of a1 * a2
[[1. 0. 0.]
 [0. 5. 0.]
 [0. 0. 9.]]
In [41]:
# Multiply a1 and a2
# Matrix multiplication

a_mat_multiply = a1 @ a2
print('Result of a1 @ a2' , a_mat_multiply, sep='\n') 

a_dot = np.dot(a1, a2)
print('Result of a1 dot a2' , a_dot, sep='\n')
Result of a1 @ a2
[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]
Result of a1 dot a2
[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]
In [42]:
a = np.arange(25)
a = a.reshape((5, 5))
print(a)
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]
 [20 21 22 23 24]]
In [43]:
b = np.array([10, 62, 1, 14, 2, 56, 79, 2, 1, 45,
              4, 92, 5, 55, 63, 43, 35, 6, 53, 24,
              56, 3, 56, 44, 78])
b = b.reshape((5,5))
print(b)
[[10 62  1 14  2]
 [56 79  2  1 45]
 [ 4 92  5 55 63]
 [43 35  6 53 24]
 [56  3 56 44 78]]
In [44]:
print('Less than')
print(a < b)
Less than
[[ True  True False  True False]
 [ True  True False False  True]
 [False  True False  True  True]
 [ True  True False  True  True]
 [ True False  True  True  True]]
In [45]:
print('Greater than')
print(a > b)
Greater than
[[False False  True False  True]
 [False False  True  True False]
 [ True False  True False False]
 [False False  True False False]
 [False  True False False False]]
In [46]:
print('Dot Product')
print(a.dot(b))
Dot Product
[[ 417  380  254  446  555]
 [1262 1735  604 1281 1615]
 [2107 3090  954 2116 2675]
 [2952 4445 1304 2951 3735]
 [3797 5800 1654 3786 4795]]

More Operations on Arrays¶

In [47]:
a = np.array([item for item in range(1, 8, 2)]).reshape(2,2)  # using list comprehension
print(a)
[[1 3]
 [5 7]]
In [48]:
b = np.full((2,2), 3)   # usage of np.full
print(b)
[[3 3]
 [3 3]]
In [49]:
c = a ** b
print(c)
[[  1  27]
 [125 343]]
  • axis is one of the most frequently used attribute in other package (pandas) too. So, pay close attention to it.
In [50]:
#  Find the sum of the elements of an array
a = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])
print(a)
print('Sum of a'.ljust(13, ' '), ':', a.sum()) # sum of all elements

print('Column sum'.ljust(13, ' '), ':', a.sum(axis = 0)) # Compute sum of each column
print('row sum'.ljust(13, ' '), ':', a.sum(axis = 1)) # Compute sum of each row
[[1 2 3]
 [4 5 6]
 [7 8 9]]
Sum of a      : 45
Column sum    : [12 15 18]
row sum       : [ 6 15 24]
In [51]:
# Find the max element in each row of the matrix
array1 = np.array([
    [2,56,34],
    [67,35,46],
    [72,47,7]
])
a_max = array1.max(axis = 0)
print(a_max)
array1.max()
[72 56 46]
Out[51]:
72
In [52]:
a_max = array1.max(axis = 1)
print(a_max)
[56 67 72]
In [53]:
row_sum = a.cumsum(axis = 0)
print(row_sum)
[[ 1  2  3]
 [ 5  7  9]
 [12 15 18]]
In [54]:
# Create an identity matrix using eye method
print(np.eye(4))
print(np.identity(5))
[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]
[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]
In [55]:
# Find the transpose of array1
array1 = np.array([[2,56,34],
                 [67,35,46],
                 [72,47,7]])
print('Array\n', array1)

a_Transpose = array1.T
print('Transpose\n',a_Transpose)
Array
 [[ 2 56 34]
 [67 35 46]
 [72 47  7]]
Transpose
 [[ 2 67 72]
 [56 35 47]
 [34 46  7]]

More (Interesting & Useful) functions¶

In [56]:
# Create a 3 x 3 array of 0s

zero = np.zeros((3,3))
print(zero)
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]
In [57]:
# Create a 4 x 3 array of 1s

one = np.ones((4, 3))
print(one)
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]
In [58]:
#Create 2 x 2 x 2 array with random numbers 
print(np.random.random((2,2,2)))
[[[0.08273355 0.01468043]
  [0.71287094 0.28305381]]

 [[0.16472826 0.75087823]
  [0.75140571 0.55458617]]]

More Examples¶

In [59]:
import numpy as np
import matplotlib.pyplot as plt
In [60]:
# print 0
mat = np.zeros((5,5))
num_rows, num_cols = mat.shape

mat[ :, 1:2]   = 1
mat[ :, 3:4]   = 1
mat[0, 1:4]    = 1
mat[4, 1:4]    = 1

plt.imshow(mat, cmap = 'gray')
plt.xticks([])
plt.yticks([]) 
Out[60]:
([], [])