Programming for Data Science¶

NumPy- Linear Algebra¶

Dr. Bhargavi R

SCOPE, VIT Chennai

Linear Algebra Module¶

  • linalg module of Numpy provides most of the Linear Algebra functions.
  • Any data science application deals with handling images / multidimentional data.
  • Inorder to handle and process this multidimensional data, we need to apply linalg functions on the data.
  • Hence the data undergoes several transformations, and these transformations are applied with Lin.Alg.
  • In this session we are going to see some of the commonly used Linear Algebra functions supported by linalg module.
In [1]:
import numpy as np
import pandas as pd
import numpy.linalg as linalg
import matplotlib.pyplot as plt
from random import randint
from sklearn.datasets import load_digits
In [2]:
# Define three arrays and check their dimension

v1 = np.array([1, 2, 3, 4, 5, 6])     # Single Dim - one index is required to access each element
v2 = np.array([[1, 2, 3, 4, 5, 6]])   # Two dim - Two indices required
v3 = np.array([[[1, 2, 3, 4, 4, 6]]]) # 3 dim - 3 indices to access eackh element
print(v1.shape)
print(v2.shape)
print(v3.shape)
(6,)
(1, 6)
(1, 1, 6)
In [3]:
# Change the dimension of the above defined arrays
v2 = v2.reshape(2,3)
print(v2.shape)
print(v2)
(2, 3)
[[1 2 3]
 [4 5 6]]
In [4]:
# Change v3 dim as 2 elements with each element as 1 x 3 dim
v3 = v3.reshape(2, 1, 3)
print(v3)
print(v3.shape)
[[[1 2 3]]

 [[4 4 6]]]
(2, 1, 3)

Practical Examples - ndarray¶

In [5]:
# Let us now load the MNIST digits data and tke a look at it
digits = load_digits()

data   = digits.data
labels = digits.target
In [6]:
# read a random figure and display the figure
random_idx = randint(0, 1797)
plt.imshow(digits.images[random_idx], cmap='gray')
plt.title(f'Label: {digits.target[random_idx]}');
In [7]:
data.shape
Out[7]:
(1797, 64)
In [8]:
data
Out[8]:
array([[ 0.,  0.,  5., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ..., 10.,  0.,  0.],
       [ 0.,  0.,  0., ..., 16.,  9.,  0.],
       ...,
       [ 0.,  0.,  1., ...,  6.,  0.,  0.],
       [ 0.,  0.,  2., ..., 12.,  0.,  0.],
       [ 0.,  0., 10., ..., 12.,  1.,  0.]])

dot product¶

  • {magnitude / length / norm} of a vector $$ ||x|| = \sqrt{x_{1}^{2} + x_{2}^{2}} $$

  • length squared $$ <x,x> = x^{T}x = ||x||^{2}$$

  • similarity measure $$ <x,y> = {x_{1}*y_{1} + x_{2}*y_{2}} $$ $$ <x,y> = x^{T}y $$

  • numpy.dot returns the dot product of two arrays
  • dot product of two 1-D arrays is equal to their inner product
  • dot product of two 2-D arrays is equal to the matrix multiplication.
In [9]:
person1 = [23, 6]     #vector(age, height)
person2 = [24, 5.3]
person3 = [23, 6]
person4 = [85, 5]
person5 = [0.5,0.25]
In [10]:
np.dot(person1/np.linalg.norm(person1), person3/np.linalg.norm(person3))
Out[10]:
0.9999999999999999
In [11]:
np.dot(person1/np.linalg.norm(person1), person2/np.linalg.norm(person2))
Out[11]:
0.9992842523879324
In [12]:
# np.dot(person3/np.linalg.norm(person3), person4/np.linalg.norm(person4))
np.dot(person4/np.linalg.norm(person4), person5/np.linalg.norm(person5))
Out[12]:
0.9191450300180578
In [13]:
# Define two matrices of shape 2 x 2 (2 vectors with 2 components)
m1 = np.array([[1, 2],
               [3, 4]]) 
m2 = np.array([[5, 6],
               [7, 8]]) 
In [14]:
# Check the dimensions of m1
print(m1.shape)
(2, 2)
In [15]:
# FInd the dot product
print(np.dot(m1, m2))
[[19 22]
 [43 50]]

Rank & determinant of a Matrix¶

$$(X^{T}X)^{-1}X^{T}y$$
In [16]:
dataset = np.array([
    [1, 2, 3],
    [3, 1, 2],
    [4, 7, 5]
])
np.linalg.matrix_rank(dataset)  #all rows and columns are indepenent
Out[16]:
3
In [17]:
dataset = np.array([
    [1, 2, 3],
    [1, 2, 3],
    [7, 8, 4]
])
np.linalg.matrix_rank(dataset)   #rows (samples) are dependent
Out[17]:
2
In [18]:
dataset = np.array([
    [1, 2, 9],
    [2, 4, 8],
    [3, 6, 7]
])
np.linalg.matrix_rank(dataset)   #columns (features) are dependent  
Out[18]:
2
In [19]:
dataset = np.array([
    [1, 2, 9],
    [2, 4, 8],
    [3, 6, 7]
])
np.linalg.inv(dataset)
# np.linalg.det(dataset) 
Out[19]:
array([[ 6.00479950e+15, -1.20095990e+16,  6.00479950e+15],
       [-3.00239975e+15,  6.00479950e+15, -3.00239975e+15],
       [ 2.00000000e-01, -1.00000000e-01,  0.00000000e+00]])
In [20]:
np.linalg.det(dataset)
Out[20]:
-3.3306690738754795e-15
In [21]:
my_array = np.array([
    [6, 2, 3],
    [4, -2, 7],
    [2, 8, 7]
])
print(my_array)
[[ 6  2  3]
 [ 4 -2  7]
 [ 2  8  7]]
In [22]:
Rank = np.linalg.matrix_rank(my_array)

# Alternatively

Rank = linalg.matrix_rank(my_array)

print("Ranks of my_array is ", Rank)
Ranks of my_array is  3
In [23]:
# FInd the determinant of my_array
Det = linalg.det(my_array)
print("Determinent of my_array is ", Det)
Determinent of my_array is  -340.0000000000001

norm¶

  • Returns the norm or length of a vector
In [24]:
# Find the norm of the vector X
X = np.arange(9).reshape(3,3)
print(X)
[[0 1 2]
 [3 4 5]
 [6 7 8]]
In [25]:
Norm = linalg.norm(X)
print("Norm of X is ", Norm)
Norm of X is  14.2828568570857

inv¶

  • inv() function is used to calculate the inverse of a matrix.
In [26]:
# Find the inverse of a matrix
X = np.array([[1, 5, 3],
             [2, 3, 4],
             [9, 3, 3 ]])
print(X)
Inv = linalg.inv(X)
print("Inverse of X is ", Inv, sep = '\n')
[[1 5 3]
 [2 3 4]
 [9 3 3]]
Inverse of X is 
[[-0.03571429 -0.07142857  0.13095238]
 [ 0.35714286 -0.28571429  0.02380952]
 [-0.25        0.5        -0.08333333]]
In [27]:
result = np.dot(X, Inv)
print(result)
[[ 1.00000000e+00  0.00000000e+00  1.38777878e-17]
 [-1.11022302e-16  1.00000000e+00  5.55111512e-17]
 [ 1.11022302e-16  0.00000000e+00  1.00000000e+00]]

matrix_power¶

In [28]:
# define a 3x 3 matrix initialized with 2s
a = np.full((3,3), 2)
print(a)
[[2 2 2]
 [2 2 2]
 [2 2 2]]
In [29]:
a_power_3 = np.linalg.matrix_power(a, 2)
print(a_power_3)
[[12 12 12]
 [12 12 12]
 [12 12 12]]
In [30]:
a ** 3
Out[30]:
array([[8, 8, 8],
       [8, 8, 8],
       [8, 8, 8]])

solve¶

  • Used for solving a system of linear equations
In [31]:
# x + 2y = 8
# 3x + 4y = 18
# Solve x and y

X = np.array([[1, 2],
             [3,  4]])
y = np.array([8,18])

x1, x2  = np.linalg.solve(X, y)
print(x1, x2)
1.9999999999999993 3.0000000000000004

trace¶

  • Return the sum along diagonals of an array.
  • If the array is 2-D, the sum along its diagonal with the given offset is returned.
In [32]:
X = np.array([[1, 2, 3],
             [5, -1, 6],
             [6, 7, 5]])
result = np.trace(X)
print(" Trace of X is ", result)
 Trace of X is  5
In [33]:
print(np.trace(X, 1))
8

inner¶

  • It returns the inner product of vectors for 1-D arrays.
  • For higher dimensions, it returns the sum product over the last axes.
In [34]:
a = np.array([[5, 6], [3, 4]]) 

b = np.array([[1, 12], [10, 14]]) 

print(a,b, sep = '\n')

print ('Inner product:' , np.inner(a,b), sep = '\n')
[[5 6]
 [3 4]]
[[ 1 12]
 [10 14]]
Inner product:
[[ 77 134]
 [ 51  86]]

outer¶

  • This returns the outer product of two vectors.
In [35]:
a = np.array([[5, 6], [3, 4]]) 

b = np.array([[1, 12], [10, 14]]) 

print ('Outer product:' , np.outer(a,b), sep = '\n')
Outer product:
[[ 5 60 50 70]
 [ 6 72 60 84]
 [ 3 36 30 42]
 [ 4 48 40 56]]

lstsq¶

  • Returns the least-squares solution to a linear matrix equation.
  • Solves the equation A x = b by computing a vector x that minimizes the Euclidean 2-norm || b – a x ||^2.
  • If a is square and of full rank, then x is the exact solution of the equation.
  • Table shows the years of experience of employees and their corresponding salaries.

  • Find the ( least squares solution) that best describes or models the above data.
In [36]:
years = np.array([3, 8, 9, 13, 3, 6, 11, 21, 1, 16])
exp = np.array([30, 57, 64, 72, 36, 43, 59, 90, 20, 83])

X = np.vstack([np.ones(len(years)), years]).T
print(X)
[[ 1.  3.]
 [ 1.  8.]
 [ 1.  9.]
 [ 1. 13.]
 [ 1.  3.]
 [ 1.  6.]
 [ 1. 11.]
 [ 1. 21.]
 [ 1.  1.]
 [ 1. 16.]]
In [37]:
plt.figure(figsize = (6,6))
plt.scatter(years, exp)
Out[37]:
<matplotlib.collections.PathCollection at 0x13d9a24d0>
In [38]:
c, m = np.linalg.lstsq(X, exp, rcond=None)[0]
In [39]:
print(m, c)
3.5374756199498503 23.208971858456362