CSCI 2033代做、代写Python, C++/Java编程
- 首页 >> CS CSCI 2033 sections 001 and 010: Elementary Computational Linear Algebra (2024 Spring)
Assignment 1
Due 11:59pm, February 27, 2024 on Gradescope. A 25% penalty will be applied for submissions
that are up to 24 hours late, and a 50% penalty for submissions up to 48 hours late. Any later
submission will not be graded and will get zero automatically.
Notation Scalars are small letters (e.g., a, b, λ, α, β), vectors are boldface small letters (e.g., v, w),
and matrices are boldface capital letters (e.g., A, B). Vectors by default are column vectors; they
are matrices with single columns. Row vectors are matrices with single rows.
Instruction
• This homework set totals 20 points (final grade percentage will be either 8% or 17% depending
on the other homework scores);
• We assume that you know basic concepts in a high-level programming language such as
Python, C++, Java, Matlab, Julia—this is a prerequisite for this course. But we are using Python
throughout this course, because it is the No. 1 language used in modern scientific computing
and industrial applications, especially in areas related to modern artificial intelligence, data
science, machine/deep learning, computer vision, AR/VR where most modern applications
and job positions gravitate. Please find resources to pick up Python yourself; there are tons of
options online, for example https://www.pythonlikeyoumeanit.com/index.html.
• Problems 0–2 are designed to familiarize you with NumPy1—the de-facto standard for scientific computing in Python. Problems 3–4 are about applications using NumPy functions.
• We assume that you are using the Google Colab environment (https://colab.research.
google.com/), which provides a convenient and free Jupyter notebook environment ready for
computing. Please watch this video tutorial https://youtu.be/oCngVVBSsmA or search and
watch tons of similar video tutorials to get started. If you are advanced and comfortable with
local installation and running of Jupyter Notebook or JupyterLab (https://jupyter.org/),
feel free to do so. But we will not provide support for these and you will need to resolve your
own installation and running issues.
• Please show all your work in the 4 Colab files (.ipynb) we release with this homework.
Do not modify any provided code and only write your code in regions marked "YOUR
CODE STARTS HERE". In your final submission, submit the 4 files separately for their
corresponding problems in Gradescope.
Problem 0 NumPy Tutorial
You will need to work through the Prob0_Numpy_Tutorial file to master the minimal background
necessary to proceed. We will point you to additional tutorial materials as we move on; they are
mostly linked from the clickable words and phrases that are in blue.
The problems in this homework are closely related to the textbook of this course — Linear
Algebra: Step by Step by Kuldeep Singh, 2013. In the following problems, we will simply call it the
textbook.
1
https://numpy.org/
1
Problem 1 Vector Operations (5 points)
Create 3 random vectors u, v, w ∈ R
10000 as follows:
1 import numpy as np
2 rng = np . random . default_rng (20232033) # fix a random seed . Please do not modify it
3 u = rng . random ((10000 ,1) ) # generate random vector u
4 v = rng . random ((10000 ,1) ) # generate random vector v
5 w = rng . random ((10000 ,1) ) # generate random vector w
we will use these vectors for all the following questions in Problem 1.
1.1 (1.5/5) Vector indexing and concatenation (textbook section 1.3) Please obtain the following
element or subvectors; we have provided some examples in the Prob0_Numpy_Tutorial file:
(a) The 2023rd element of vector u. NOTE: Python/NumPy indexing starts from 0 instead of 1;
(b) The 2023rd to 2033rd elements of vector v (including the 2023rd and 2033rd elements). NOTE:
Python/NumPy indexing will not include the last element in indexing. Make sure that the
size of the subvector you obtain is 11. You may want to use the built-in numpy.ndarray.shape
to help you check the size of your subvector;
(c) Make a new vector by combining the first 30 elements of v and the last 100 elements of w.
You need to use the Numpy built-in function numpy.concatenate.
Note: If want to learn more about this, you can go to this NumPy tutorial.
1.2 (1/5) Linear combinations (textbook section 1.3) Calculate the following linear combinations:
u + v + w, 2u + 3v + 3w.
1.3 (1.5/5) Inner products (textbook section 1.3) Calculate the following inner products using
the built-in function numpy.inner:
⟨u,u⟩, ⟨u − 2v, w⟩, ⟨3u, 2v + w⟩.
1.4 (1/5) Vector norms (textbook section 2.1) Calculate the following vector norms using the
NumPy built-in function numpy.linalg.norm:
∥u∥ , ∥v + 3w∥ .
Problem 2. Matrix Operations (5 points)
Reminder about our notation and convention: Scalars are small letters (e.g., a, b, λ, α, β), vectors
are boldface small letters (e.g., v, w), and matrices are boldface capital letters (e.g., A, B). Vectors
by default are column vectors; they are matrices with single columns. Row vectors are matrices
with single rows.
2
We start by generating a few random matrices and vectors:
1 import numpy as np
2 rng = np . random . default_rng (20232033) # fix a random seed . Please do not modify it
3 A = rng . random ((100 ,100) ) # generate random matrix A
4 B = rng . random ((100 ,200) ) # generate random matrix B
5 C = rng . random ((100 ,200) ) # generate random matrix C
6 D = rng . random ((100 ,100) ) # generate random matrix D
7 u = rng . random ((100 ,1) ) # generate random vector u
8 v = rng . random ((200 ,1) ) # generate random vector v
We will use these matrices and vectors for all the following questions in Problem 2. We also
provided some examples in Prob0_Numpy_Tutorial file.
2.1 (0.5/5) Matrix norms (textbook section 2.1 & section 1.6) The magnitude of a matrix can be
measured similarly to that of vectors. For any matrix M ∈ R
m×n
, its (Frobenius) norm is defined as
∥M∥F =
q
⟨M,M⟩, (1)
where F is for Frobenius (a famous German mathematician). Call the NumPy built-in function
numpy.linalg.norm, and calculate the following
(a) ∥A∥F
,
(b) ∥B − C∥F
. This is the distance between B and C.
2.2 (0.5/5) Matrix indexing (Discussion Session) Please obtain these submatrices:
(a) The top-left 50-by-50 submatrix of A;
(b) The bottom-right 30-by-25 submatrix of B.
Note: If want to learn more about this, you can go to this NumPy tutorial.
2.3 (0.5/5) Matrix-vector multiplication (textbook section 1.4) Calculate the following matrixvector multiplication using the built-in function numpy.matmul (for matrix multiplication) and
numpy.transpose (for matrix transpose). NOTE: The @ operator can be used as a shorthand for
NumPy.matmul on ndarrays; M.T can be used as a shorthand for NumPy.transpose of matrix M:
Au, C
⊺u, Bv.
2.4 (0.5/5) Matrix-matrix multiplication (textbook section 1.4 & section 1.6) Calculate the
following matrix-matrix multiplication using the built-in function numpy.matmul (for matrix
multiplication) and numpy.transpose (for matrix transpose). NOTE: The @ operator can be used as
a shorthand for NumPy.matmul on ndarrays; M.T can be used as a shorthand for NumPy.transpose
of matrix M:
AB, BC⊺
, C
⊺B, uv⊺
.
3
2.5 (1.5/5) Matrix power (textbook section 1.5) For any square matrix M ∈ R
n×n
, its p-th power
is defined naturally as
Mp = |MMM{z
...M}
p times
. (2)
We have two identities for matrix power parallel to those for scalar power:
(Mp
)(Mq
) = Mp+q
, (Mp
)
q = Mpq
. (3)
Follow the following steps to numerically verify the two identities:
(a) Implement your own matrix power function mat_pow(): it should take any square matrix
M and the integer power p ≥ 0, and output the values of the matrix Mp
. NOTE: To debug,
you are encouraged to test your implementation against the Numpy built-in matrix power
function numpy.linalg.matrix_power. But, this is not required in your submission.
(b) Use your own mat_pow() function to calculate (A6
)(A8
) and A6+8, and also calculate the
relative distance (see definition below) between (A6
)(A8
) and A6+8 — the relative distance
should be very close to 0;
(c) Using your own mat_pow() function to calculate (A6
)
8 and A6∗8
, and also calculate the relative
distance between (A6
)
8 and A6∗8 — the relative distance should be very close to 0.
Definition: relative distance of matrices M and N of the same size equals ∥M−N∥F
∥M∥F
.
2.6 (1.5/5) Inverse and transpose of matrices (textbook section 1.6) Complete the following
calculations using the NumPy built-in function numpy.linalg.inv (for matrix inverse):
(a) (AD)
−1 and D−1A−1
, and the relative distance between them—the relative distance should
be very close to 0;
(b) (A−1
)
⊺ and (A⊺
)
−1
, and the relative distance between them—the relative distance should be
very close to 0;
(c) (AB)
⊺ and B⊺A⊺
, and the relative distance between them—the relative distance should be
very close to 0.
Problem 3. Gaussian Elimination and Back Substitution (5 points)
In this problem, we will implement Gaussian elimination and back substitution. In the end, we will
solve a large linear system Ax = b using our implementation. The Gaussian elimination algorithm
is largely based on Section 1.2 of the textbook; we make small necessary changes to ensure that it works
reliably on computers. Check the Colab file Prob3_Gaussian_Elimination_n_Back_Substitution
for code template.
4
3.0 (0/5) Preparation Gaussian elimination involves three types of row operations:
(a) Multiply a row by a non-zero factor. For example, multiplying λ (λ ̸= 0) on the i-row to
produce the new i-th row can be written as
1 M [[ i ] ,:] = lamb * M [[ i ] ,:]
(b) Subtract a multiple of a top row from a bottom row. For example, subtracting λ times the
i-th row from the j-th row of M, where i < j, to produce the new j-th row, can be written as
1 M [[ j ] ,:] = M [[ j ] ,:] - lamb * M [[ i ] ,:]
(c) Exchanging rows. For example, exchanging the i-th and j-th row of the matrix M can be
written as
1 M [[ i , j ] ,:] = M [[ j , i ] ,:]
3.1 (1.5/5) Gaussian elimination (Version 0) (textbook section 1.2) Implement Gaussian elimiAlgorithm 1 Gaussian Elimination Version 0
Input: A, b
1: U = concatenate(A, b) ▷ generate the augmented matrix U by concatenating A and b
2: n = number of rows in U ▷ n is the number of rows of U
3: for k = 0 : (n − 1) do ▷ k will iterate from 0 to (n − 2) (included) with increment 1
4: for j = (k + 1) : n do ▷ iteratively eliminate the rows below using the current row
5: λ = U[j, k]/U[k, k] ▷ U[k, k] is the current leading number
6: U[[j], :] = U[[j], :] − λ ∗ U[[k], :] ▷ subtract λ multiple of the k-th row from the j-th row
7: end for
8: end for
9: return U ▷ return the final augmented matrix
nation following the pseudocode in Algorithm 1. Your function should be called gauss_elim_v0
that: (i) takes an square matrix A ∈ R
n×n
, a vector b ∈ R
n
, and a print flag print_flag that
controls whether we print the intermediate augmented matrix after each row operation, and (ii)
returns a matrix U ∈ R
n×(n+1) where the left n×n submatrix of U is in the row echelon form. Hint:
Suppose that two matrices M and N have the same number of rows. To concatenate them in the
horizontal direction, we can call the built-in function numpy.concatenate:
1 P = np . concatenate (( M , N ) , axis =1)
To test your implementation, let us take a test case
(4)
Your Gaussian elimination should produce the following sequence of intermediate augmented
matrices in the right order (Note: the elements marked red are the leading numbers that we are
currently using to eliminate non-zeros below them):
1 −1 1 1
2 −1 3 4
2 0 3 5
R1=R1−2R0
−−−−−−−−→
1 −1 1 1
0 1 1 2
2 0 3 5
R2=R2−2R0
−−−−−−−−→
1 −1 1 1
0 1 1 2
0 2 1 3
5
R2=R2−2R1
−−−−−−−−→
1 −1 1 1
0 1 1 2
0 0 −1 −1
(5)
To get full credit, you need to print out the intermediate augmented matrix after each row
operation.
3.2 (2/5) Back substitution (textbook section 1.2) We first implement back substitution, and
then combine Gaussian elimination and back substitution into a linear system solver for cases where
A is square. Finally, we test our linear solver against the Numpy built-in.
Algorithm 2 Backward Substitution
Input: U ▷ U is the output matrix from Gaussian elimination
1: n = number of rows in U ▷ n is the number of rows of U
2: x = 0 ▷ initialize x as an all-zero vector
3: c = U[:, [−1]] ▷ c: the last column of the augmented matrix, i.e., updated b
4: D = U[:, : −1] ▷ D: the rest part of the augmented matrix, i.e., updated A
5: x[n − 1] = c[n − 1]/D[n − 1, n − 1] ▷ obtain xn−1 first
6: for i = n − 2 : −1 : −1 do ▷ i will iterate from n − 2 to 0 (included) with increment −1
7: x[i] = n
c[i] −
Pn−1
j=i+1 D[i, j]x[j]
o
/D[i, i] ▷ x[i] is the newly solved variable
8: end for
9: return x
(a) Implement back substitution following the pseudocode in Algorithm 2. Your function should
be called back_subs that: (i) takes an augmented matrix U ∈ R
n×(n+1) in the row echelon
form, and a print flag print_flag that controls whether we print the newly solved variable
value after each substitution step, and (ii) returns an x ∈ R
n as a solution to Ax = b. As a
test, take our previous final augmented matrix in Eq. (5), back substitution should give us
R2 : x2 = (−1)/(−1) = 1
R1 : x1 = (2 − 1 ∗ 1)/1 = 1 (6)
R0 : x0 = (1 − (−1) ∗ 1 − 1 ∗ 1)/1 = 1
as we move from bottom to top, row by row. To get full credit, you need to print out the
intermediate newly solved variable after each substitution step (i.e., x2, x1, and x0 in our
test).
(b) Implement a function my_solver_v0 by combining the gauss_elim_v0 and back_subs functions implemented above: this function takes a square matrix A ∈ R
n×n and a vector b ∈ R
n
,
and returns a vector x ∈ R
n
so that Ax = b. In other words, my_solver_v0 solves the linear
system Ax = b when given A and b. To test your solver, in the code template, we provide a
randomly generated A ∈ R300×300 and b ∈ R
300. Please
(i) solve the given 300 × 300 linear system using your solver—we will denote this solution
by x1;
(ii) validate your solution x1 by calculating the relative error ∥Ax1 − b∥ / ∥A∥F
, which
should be very close to 0 if your solver works well;
6
(iii) call the NumPy built-in function numpy.linalg.solve to solve the given linear system to
give a solution x2. Ideally, x1 and x2 should be the same. Please calculate the relative
distance between x1 and x2, i.e., ∥x1 − x2∥ / ∥x2∥. The relative distance should be very
close to 0 if your solver works well.
Congratulations! Now you have a simple solver for large linear systems!
3.3 (1.5/5) Gaussian elimination (Version 1) (textbook section 1.2) Gaussian elimination Version
0 works for “typical" augmented U’s, but can fail for certain U’s. Consider
U =
0 1 1 −1
2 6 4 6
1 2 3 6
.
We cannot use the red 0 to eliminate 2 and 1 below it by row subtractions only. To make progress,
we need another row operation: row exchange. Obviously, if we exchange row 0 with row 1 or row
2, the top left element becomes non-zero and then we can make progress in elimination. Between
the 2 possibilities, we take the row with the largest element in magnitude, i.e., row 1 to be exchanged
with row 0. For subsequent elimination steps, we do similar things if we encounter elimination
difficulties due to 0’s.
The above modification sounds straightforward. However, we need another consideration when
working on actual computers: when we calculate in float precision, it is hard to tell zero from
non-zero (try 1 − 1/2023 ∗ 2023 in Python or Numpy, do you get exact 0?). This means that it
might be tricky to decide when to perform a row exchange. This also suggests an always-exchange
strategy that works the best in practice: we always exchange the current row with the row below
(including itself) with the largest element in magnitude, no matter if the current element is close
to 0 or not. Let us work through an example to understand this.
So we arrive at Gaussian elimination Version 1 described in Algorithm 3. Compared to Algorithm 1,
we only need two extra lines, marked in orange!
To implement Algorithm 3, you will need to use the following two Numpy built-in functions:
(a) numpy.absolute takes element-wise absolute value of a given vector or matrix: vector (matrix)
in, vector (matrix) out
1 u = np . array ([[1] ,[ -1] ,[2] ,[ -2]])
2 v = np .abs( u ) # short hand version for np. absolute (u)
3 # v now is [[1] ,[1] ,[2] ,[2]]
7
Algorithm 3 Gaussian Elimination Version 1
Input: A, b
1: U = concatenate(A, b) ▷ generate the augmented matrix U by concatenating A and b
2: n = number of rows in U ▷ n is the number of rows of U
3: for k = 0 : (n − 1) do ▷ k will iterate from 0 to (n − 2) (included) with increment 1
4: Find the first i so that abs{U[i, k]} is largest among abs{U[k, k]}, abs{U[k + 1, k]}, · · ·
5: ▷ here abs{} means absolute value
6: U[[k], :] ↔ U[[i], :] ▷ exchange the two rows to get the largest number (in abs{}) on top
7: for j = (k + 1) : n do ▷ iteratively eliminate the rows below using the current row
8: λ = U[j, k]/U[k, k] ▷ U[k, k] is the current leading number
9: U[[j], :] = U[[j], :] − λ ∗ U[[k], :] ▷ subtract λ multiple of the k-th row from the j-th row
10: end for
11: end for
12: return U ▷ return the final augmented matrix
(b) numpy.argmax returns the index (not value) of the maximum value of an input vector (when
ties occur, it returns the first one)
1 u = np . array ([[1] ,[ -1] ,[2] ,[ -2]])
2 idx = np . argmax ( u )
3 # idx is 2
Now we are ready to go!
(a) Implement Algorithm 3. Your function should be called gauss_elim_v1 that: (i) takes an
square matrix A ∈ R
n×n
, a vector b ∈ R
n
, and a print flag print_flag that controls whether
we print the intermediate augmented matrix after each row operation, and (ii) returns a
matrix U ∈ R
n×(n+1) where the left n × n submatrix of U is in the row echelon form. To test
and debug your implementation, please take the worked example in Eq. (7). To get full credit,
you need to print out the intermediate augmented matrix after each row operation.
(b) Implement a function my_solver_v1 by combining the gauss_elim_v1 and back_subs functions implemented above: this function takes a square matrix A ∈ R
n×n and a vector b ∈ R
n
,
and returns a vector x ∈ R
n
so that Ax = b. In other words, my_solver_v1 solves the linear
system Ax = b when given A and b. To test your solver, in the code template, we provide a
randomly generated A ∈ R
300×300 and b ∈ R
300. Please
(i) solve the given 300 × 300 linear system using your solver—we will denote this solution
by x1;
(ii) validate your solution x1 by calculating the relative error ∥Ax1 − b∥ / ∥A∥F
, which
should be very close to 0 if your solver works well;
(iii) call the NumPy built-in function numpy.linalg.solve to solve the given linear system to
give a solution x2. Ideally, x1 and x2 should be the same. Please calculate the relative
distance between x1 and x2, i.e., ∥x1 − x2∥ / ∥x2∥. The relative distance should be very
close to 0 if your solver works well.
Congratulations! Now you have a mature solver for large linear systems!
8
Problem 4. Nearest Neighbor Classification (5 points)
The MNIST (Mixed National Institute of Standards) dataset2
comprises tens of thousands of images
of handwritten digits, i.e., from 0 to 9; check out Fig. 1 for a few examples. Each of the images is a
28×28 matrix. For convenience, we “flatten” each of these matrices into a length-784 (28×28 = 784)
row vector by stacking the rows.
Figure 1: 25 images of handwritten digits from the MNIST dataset. Each image is of size 28 × 28, and can
be represented by a length-784 vector.
Classification here means assigning a label from {0, 1, · · · , 9} to each given image/row vector,
where hopefully the assigned label is the true digit contained in the image. This is easy for human
eyes, but took several decades for computer scientists to develop reliable methods. Today, these
technologies (which can also classify letters, symbols, and so on), collectively known as optical
character recognition (OCR), are hidden in every corner of our digital lives; for interested minds,
please check out this Wikipedia article https://en.wikipedia.org/wiki/Optical_character_
recognition.
In this problem, we explore and implement the k-nearest neighbor (KNN) method for digit
recognition on the MNIST dataset. The method goes like this: we have a dictionary (called training
set) with numerous pairs of (image, label), where the label from {0, 1, · · · , 9} is the true digit
contained in each image. For each given image that we want to predict its label (called a test),
we search the dictionary for the k most similar images (i.e., k-nearest neighbors) and assign the
majority of the labels of those k images to the current test image (i.e., majority voting). To assess the
performance, on a bunch of test images (called test set), we can compare the majority-voting labels
with the true labels. A visual illustration of the k-nearest neighbor (KNN) method is shown in
Fig. 2. We strongly suggest you read this blog article before attempting the following questions.
2Available from http://yann.lecun.com/exdb/mnist/.
9
Figure 2: A visual illustration of the KNN algorithm. Image credit: https://medium.com/swlh/
k-nearest-neighbor-ca2593d7a3c4.
In the Colab file, we provide the training set Xtrain (a Ntrain × 784 NumPy array) and the test
set Xtest (a Ntest × 784 NumPy array). Each row of Xtrain and Xtest is a flattened image. Their
corresponding true labels are ytrain (Ntrain × 1 NumPy array) and ytest (Ntest × 1 NumPy array). In
this problem, Ntrain = 600 and Ntest = 100.
4.1 (1/5) Data visualization Visualize the first and third images (row vectors) in Xtrain, and the
last 5 images (row vectors) in Xtest. What are their corresponding true labels? (Note: this problem
can be solved in one line by calling the provided function visualization(). )
4.2 (1.5/5) Distance calculation Calculate
(1) the distance between v1 and w; (2) the distance between v2 and w,
where v1, v2, w are provided in the Colab file. Compare the two distance values, and explain the
physical meaning of distance in this problem.
4.3 (2.5/5) KNN implementation Algorithm 4 is the pseudocode of the k-nearest neighbor
method. Implement the algorithm and assess performance using the validation code provided;
the validation code compares ypredict and ytest and calculates the prediction accuracy (Note: the
prediction accuracy should be more than 80%). Please use k = 7 in this problem.
To implement Algorithm 4, you will need to use the following two Numpy built-in functions:
(a) numpy.argsort takes in a column vector, and sorts the elements into ascending order, and
returns the corresponding element indices (i.e., sorted indices) as a column vector. For
10
(8)
1 u = np . array ([[1] ,[ -1] ,[2] ,[ -2]])
2 v = np . argsort (u , axis =0)
3 # v now is [[3] ,[1] ,[0] ,[2]] , a column vector (i.e. , 2 -D array with a single
column )
4 v = v . flatten () # This turns the 2 -D array into a 1 -D array
5 # v now is [3 ,1 ,0 ,2]
(b) numpy.bincount takes in a 1-D array with non-negative integer values, finds the largest
integer Nmax, and counts the occurrences of each integer between 0 and Nmax (both ends
included) inside the array. It returns the occurrence counts as a 1-D array of size Nmax + 1. For
example, for an input [0, 1, 1, 3, 2, 1, 7], this function generates the output [1, 3, 1, 1, 0, 0, 0, 1]
because there are one 0, three 1’s, one 2, one 3, zero 4, zero 5, zero 6, and one 7, inside the
input array.
1 u = np . array ([0 , 1 , 1 , 3 , 2 , 1 , 7])
2 v = np . bincount ( u )
3 # v now is [1 , 3 , 1 , 1 , 0 , 0 , 0 , 1]
(c) numpy.argmax returns the index (not value) of the maximum value of an input 1-D array
(when ties occur, it returns the first one)
1 u = np . array ([1 , -1 ,2 , -2])
2 idx = np . argmax ( u )
3 # idx is 2
Algorithm 4 k-nearest neighbor algorithm
Input: k = 7, training set Xtrain ∈ R
600×784 and labels ytrain ∈ R
600×1
, test set Xtest ∈ R
100×784 and
labels ytest ∈ R
100×1
.
Output: ypredict
1: ypredict = −1 ▷ all predicted labels initialized as −1; provided in the code template
2: for i = 0 : Ntest do ▷ iterate over all test images
3: x = Xtest[[i], :] ▷ x stores the current test image as a row vector
4: d = 0 ▷ d ∈ R
600×1
stores the distances of the current test image to all training images
5: for j = 0 : Ntrain do ▷ iterate over all training/dictionary images
6: d[j] = ∥x − Xtrain[[j], :]∥ ▷ distance between the test image and the j-th training image
7: end for
8: Obtain the indices of the bottom k values from d ▷ Try using np.argsort
9: Get the most frequent label of these k training images ▷ Use np.bincount and np.argmax
10: Save the predicted label of the test image in the corresponding index of ypredict
11: end for
11
4.4 (Optional, 3 Bonus Points) ℓ1 norm and vectorization The norm ∥v∥ =
p
⟨v, v⟩ we introduced in the lecture is not the only way to measure magnitudes of vectors, and hence ∥a − b∥ is
also not the only way to measure distance between vectors a, b. Another norm, the ℓ1 norm (also
known as Manhattan Distance) is calculated by
∥v∥1 =
Xn
i=1
|vi
| for v ∈ R
n
,
where |·| denotes the absolute value. This also leads naturally to ℓ1 distance between a, b: ∥a − b∥1
.
Please redo problem 4.3 with the distance in line 6 of the pseudo-code replaced by the ℓ1
distance and run the prediction and validation again. In order to receive full marks, please use
numpy.absolute and numpy.sum functions to write the function l1_norm. This is based on the idea
of vectorization—many scalar operations are broadcast componentwise and performed in parallel
on vectors and matrices, which is used to speed up the Python code without using loop. You can
check out this webpage https://www.pythonlikeyoumeanit.com/Module3_IntroducingNumpy/
VectorizedOperations.html or alike for more information.
12
Assignment 1
Due 11:59pm, February 27, 2024 on Gradescope. A 25% penalty will be applied for submissions
that are up to 24 hours late, and a 50% penalty for submissions up to 48 hours late. Any later
submission will not be graded and will get zero automatically.
Notation Scalars are small letters (e.g., a, b, λ, α, β), vectors are boldface small letters (e.g., v, w),
and matrices are boldface capital letters (e.g., A, B). Vectors by default are column vectors; they
are matrices with single columns. Row vectors are matrices with single rows.
Instruction
• This homework set totals 20 points (final grade percentage will be either 8% or 17% depending
on the other homework scores);
• We assume that you know basic concepts in a high-level programming language such as
Python, C++, Java, Matlab, Julia—this is a prerequisite for this course. But we are using Python
throughout this course, because it is the No. 1 language used in modern scientific computing
and industrial applications, especially in areas related to modern artificial intelligence, data
science, machine/deep learning, computer vision, AR/VR where most modern applications
and job positions gravitate. Please find resources to pick up Python yourself; there are tons of
options online, for example https://www.pythonlikeyoumeanit.com/index.html.
• Problems 0–2 are designed to familiarize you with NumPy1—the de-facto standard for scientific computing in Python. Problems 3–4 are about applications using NumPy functions.
• We assume that you are using the Google Colab environment (https://colab.research.
google.com/), which provides a convenient and free Jupyter notebook environment ready for
computing. Please watch this video tutorial https://youtu.be/oCngVVBSsmA or search and
watch tons of similar video tutorials to get started. If you are advanced and comfortable with
local installation and running of Jupyter Notebook or JupyterLab (https://jupyter.org/),
feel free to do so. But we will not provide support for these and you will need to resolve your
own installation and running issues.
• Please show all your work in the 4 Colab files (.ipynb) we release with this homework.
Do not modify any provided code and only write your code in regions marked "YOUR
CODE STARTS HERE". In your final submission, submit the 4 files separately for their
corresponding problems in Gradescope.
Problem 0 NumPy Tutorial
You will need to work through the Prob0_Numpy_Tutorial file to master the minimal background
necessary to proceed. We will point you to additional tutorial materials as we move on; they are
mostly linked from the clickable words and phrases that are in blue.
The problems in this homework are closely related to the textbook of this course — Linear
Algebra: Step by Step by Kuldeep Singh, 2013. In the following problems, we will simply call it the
textbook.
1
https://numpy.org/
1
Problem 1 Vector Operations (5 points)
Create 3 random vectors u, v, w ∈ R
10000 as follows:
1 import numpy as np
2 rng = np . random . default_rng (20232033) # fix a random seed . Please do not modify it
3 u = rng . random ((10000 ,1) ) # generate random vector u
4 v = rng . random ((10000 ,1) ) # generate random vector v
5 w = rng . random ((10000 ,1) ) # generate random vector w
we will use these vectors for all the following questions in Problem 1.
1.1 (1.5/5) Vector indexing and concatenation (textbook section 1.3) Please obtain the following
element or subvectors; we have provided some examples in the Prob0_Numpy_Tutorial file:
(a) The 2023rd element of vector u. NOTE: Python/NumPy indexing starts from 0 instead of 1;
(b) The 2023rd to 2033rd elements of vector v (including the 2023rd and 2033rd elements). NOTE:
Python/NumPy indexing will not include the last element in indexing. Make sure that the
size of the subvector you obtain is 11. You may want to use the built-in numpy.ndarray.shape
to help you check the size of your subvector;
(c) Make a new vector by combining the first 30 elements of v and the last 100 elements of w.
You need to use the Numpy built-in function numpy.concatenate.
Note: If want to learn more about this, you can go to this NumPy tutorial.
1.2 (1/5) Linear combinations (textbook section 1.3) Calculate the following linear combinations:
u + v + w, 2u + 3v + 3w.
1.3 (1.5/5) Inner products (textbook section 1.3) Calculate the following inner products using
the built-in function numpy.inner:
⟨u,u⟩, ⟨u − 2v, w⟩, ⟨3u, 2v + w⟩.
1.4 (1/5) Vector norms (textbook section 2.1) Calculate the following vector norms using the
NumPy built-in function numpy.linalg.norm:
∥u∥ , ∥v + 3w∥ .
Problem 2. Matrix Operations (5 points)
Reminder about our notation and convention: Scalars are small letters (e.g., a, b, λ, α, β), vectors
are boldface small letters (e.g., v, w), and matrices are boldface capital letters (e.g., A, B). Vectors
by default are column vectors; they are matrices with single columns. Row vectors are matrices
with single rows.
2
We start by generating a few random matrices and vectors:
1 import numpy as np
2 rng = np . random . default_rng (20232033) # fix a random seed . Please do not modify it
3 A = rng . random ((100 ,100) ) # generate random matrix A
4 B = rng . random ((100 ,200) ) # generate random matrix B
5 C = rng . random ((100 ,200) ) # generate random matrix C
6 D = rng . random ((100 ,100) ) # generate random matrix D
7 u = rng . random ((100 ,1) ) # generate random vector u
8 v = rng . random ((200 ,1) ) # generate random vector v
We will use these matrices and vectors for all the following questions in Problem 2. We also
provided some examples in Prob0_Numpy_Tutorial file.
2.1 (0.5/5) Matrix norms (textbook section 2.1 & section 1.6) The magnitude of a matrix can be
measured similarly to that of vectors. For any matrix M ∈ R
m×n
, its (Frobenius) norm is defined as
∥M∥F =
q
⟨M,M⟩, (1)
where F is for Frobenius (a famous German mathematician). Call the NumPy built-in function
numpy.linalg.norm, and calculate the following
(a) ∥A∥F
,
(b) ∥B − C∥F
. This is the distance between B and C.
2.2 (0.5/5) Matrix indexing (Discussion Session) Please obtain these submatrices:
(a) The top-left 50-by-50 submatrix of A;
(b) The bottom-right 30-by-25 submatrix of B.
Note: If want to learn more about this, you can go to this NumPy tutorial.
2.3 (0.5/5) Matrix-vector multiplication (textbook section 1.4) Calculate the following matrixvector multiplication using the built-in function numpy.matmul (for matrix multiplication) and
numpy.transpose (for matrix transpose). NOTE: The @ operator can be used as a shorthand for
NumPy.matmul on ndarrays; M.T can be used as a shorthand for NumPy.transpose of matrix M:
Au, C
⊺u, Bv.
2.4 (0.5/5) Matrix-matrix multiplication (textbook section 1.4 & section 1.6) Calculate the
following matrix-matrix multiplication using the built-in function numpy.matmul (for matrix
multiplication) and numpy.transpose (for matrix transpose). NOTE: The @ operator can be used as
a shorthand for NumPy.matmul on ndarrays; M.T can be used as a shorthand for NumPy.transpose
of matrix M:
AB, BC⊺
, C
⊺B, uv⊺
.
3
2.5 (1.5/5) Matrix power (textbook section 1.5) For any square matrix M ∈ R
n×n
, its p-th power
is defined naturally as
Mp = |MMM{z
...M}
p times
. (2)
We have two identities for matrix power parallel to those for scalar power:
(Mp
)(Mq
) = Mp+q
, (Mp
)
q = Mpq
. (3)
Follow the following steps to numerically verify the two identities:
(a) Implement your own matrix power function mat_pow(): it should take any square matrix
M and the integer power p ≥ 0, and output the values of the matrix Mp
. NOTE: To debug,
you are encouraged to test your implementation against the Numpy built-in matrix power
function numpy.linalg.matrix_power. But, this is not required in your submission.
(b) Use your own mat_pow() function to calculate (A6
)(A8
) and A6+8, and also calculate the
relative distance (see definition below) between (A6
)(A8
) and A6+8 — the relative distance
should be very close to 0;
(c) Using your own mat_pow() function to calculate (A6
)
8 and A6∗8
, and also calculate the relative
distance between (A6
)
8 and A6∗8 — the relative distance should be very close to 0.
Definition: relative distance of matrices M and N of the same size equals ∥M−N∥F
∥M∥F
.
2.6 (1.5/5) Inverse and transpose of matrices (textbook section 1.6) Complete the following
calculations using the NumPy built-in function numpy.linalg.inv (for matrix inverse):
(a) (AD)
−1 and D−1A−1
, and the relative distance between them—the relative distance should
be very close to 0;
(b) (A−1
)
⊺ and (A⊺
)
−1
, and the relative distance between them—the relative distance should be
very close to 0;
(c) (AB)
⊺ and B⊺A⊺
, and the relative distance between them—the relative distance should be
very close to 0.
Problem 3. Gaussian Elimination and Back Substitution (5 points)
In this problem, we will implement Gaussian elimination and back substitution. In the end, we will
solve a large linear system Ax = b using our implementation. The Gaussian elimination algorithm
is largely based on Section 1.2 of the textbook; we make small necessary changes to ensure that it works
reliably on computers. Check the Colab file Prob3_Gaussian_Elimination_n_Back_Substitution
for code template.
4
3.0 (0/5) Preparation Gaussian elimination involves three types of row operations:
(a) Multiply a row by a non-zero factor. For example, multiplying λ (λ ̸= 0) on the i-row to
produce the new i-th row can be written as
1 M [[ i ] ,:] = lamb * M [[ i ] ,:]
(b) Subtract a multiple of a top row from a bottom row. For example, subtracting λ times the
i-th row from the j-th row of M, where i < j, to produce the new j-th row, can be written as
1 M [[ j ] ,:] = M [[ j ] ,:] - lamb * M [[ i ] ,:]
(c) Exchanging rows. For example, exchanging the i-th and j-th row of the matrix M can be
written as
1 M [[ i , j ] ,:] = M [[ j , i ] ,:]
3.1 (1.5/5) Gaussian elimination (Version 0) (textbook section 1.2) Implement Gaussian elimiAlgorithm 1 Gaussian Elimination Version 0
Input: A, b
1: U = concatenate(A, b) ▷ generate the augmented matrix U by concatenating A and b
2: n = number of rows in U ▷ n is the number of rows of U
3: for k = 0 : (n − 1) do ▷ k will iterate from 0 to (n − 2) (included) with increment 1
4: for j = (k + 1) : n do ▷ iteratively eliminate the rows below using the current row
5: λ = U[j, k]/U[k, k] ▷ U[k, k] is the current leading number
6: U[[j], :] = U[[j], :] − λ ∗ U[[k], :] ▷ subtract λ multiple of the k-th row from the j-th row
7: end for
8: end for
9: return U ▷ return the final augmented matrix
nation following the pseudocode in Algorithm 1. Your function should be called gauss_elim_v0
that: (i) takes an square matrix A ∈ R
n×n
, a vector b ∈ R
n
, and a print flag print_flag that
controls whether we print the intermediate augmented matrix after each row operation, and (ii)
returns a matrix U ∈ R
n×(n+1) where the left n×n submatrix of U is in the row echelon form. Hint:
Suppose that two matrices M and N have the same number of rows. To concatenate them in the
horizontal direction, we can call the built-in function numpy.concatenate:
1 P = np . concatenate (( M , N ) , axis =1)
To test your implementation, let us take a test case
(4)
Your Gaussian elimination should produce the following sequence of intermediate augmented
matrices in the right order (Note: the elements marked red are the leading numbers that we are
currently using to eliminate non-zeros below them):
1 −1 1 1
2 −1 3 4
2 0 3 5
R1=R1−2R0
−−−−−−−−→
1 −1 1 1
0 1 1 2
2 0 3 5
R2=R2−2R0
−−−−−−−−→
1 −1 1 1
0 1 1 2
0 2 1 3
5
R2=R2−2R1
−−−−−−−−→
1 −1 1 1
0 1 1 2
0 0 −1 −1
(5)
To get full credit, you need to print out the intermediate augmented matrix after each row
operation.
3.2 (2/5) Back substitution (textbook section 1.2) We first implement back substitution, and
then combine Gaussian elimination and back substitution into a linear system solver for cases where
A is square. Finally, we test our linear solver against the Numpy built-in.
Algorithm 2 Backward Substitution
Input: U ▷ U is the output matrix from Gaussian elimination
1: n = number of rows in U ▷ n is the number of rows of U
2: x = 0 ▷ initialize x as an all-zero vector
3: c = U[:, [−1]] ▷ c: the last column of the augmented matrix, i.e., updated b
4: D = U[:, : −1] ▷ D: the rest part of the augmented matrix, i.e., updated A
5: x[n − 1] = c[n − 1]/D[n − 1, n − 1] ▷ obtain xn−1 first
6: for i = n − 2 : −1 : −1 do ▷ i will iterate from n − 2 to 0 (included) with increment −1
7: x[i] = n
c[i] −
Pn−1
j=i+1 D[i, j]x[j]
o
/D[i, i] ▷ x[i] is the newly solved variable
8: end for
9: return x
(a) Implement back substitution following the pseudocode in Algorithm 2. Your function should
be called back_subs that: (i) takes an augmented matrix U ∈ R
n×(n+1) in the row echelon
form, and a print flag print_flag that controls whether we print the newly solved variable
value after each substitution step, and (ii) returns an x ∈ R
n as a solution to Ax = b. As a
test, take our previous final augmented matrix in Eq. (5), back substitution should give us
R2 : x2 = (−1)/(−1) = 1
R1 : x1 = (2 − 1 ∗ 1)/1 = 1 (6)
R0 : x0 = (1 − (−1) ∗ 1 − 1 ∗ 1)/1 = 1
as we move from bottom to top, row by row. To get full credit, you need to print out the
intermediate newly solved variable after each substitution step (i.e., x2, x1, and x0 in our
test).
(b) Implement a function my_solver_v0 by combining the gauss_elim_v0 and back_subs functions implemented above: this function takes a square matrix A ∈ R
n×n and a vector b ∈ R
n
,
and returns a vector x ∈ R
n
so that Ax = b. In other words, my_solver_v0 solves the linear
system Ax = b when given A and b. To test your solver, in the code template, we provide a
randomly generated A ∈ R300×300 and b ∈ R
300. Please
(i) solve the given 300 × 300 linear system using your solver—we will denote this solution
by x1;
(ii) validate your solution x1 by calculating the relative error ∥Ax1 − b∥ / ∥A∥F
, which
should be very close to 0 if your solver works well;
6
(iii) call the NumPy built-in function numpy.linalg.solve to solve the given linear system to
give a solution x2. Ideally, x1 and x2 should be the same. Please calculate the relative
distance between x1 and x2, i.e., ∥x1 − x2∥ / ∥x2∥. The relative distance should be very
close to 0 if your solver works well.
Congratulations! Now you have a simple solver for large linear systems!
3.3 (1.5/5) Gaussian elimination (Version 1) (textbook section 1.2) Gaussian elimination Version
0 works for “typical" augmented U’s, but can fail for certain U’s. Consider
U =
0 1 1 −1
2 6 4 6
1 2 3 6
.
We cannot use the red 0 to eliminate 2 and 1 below it by row subtractions only. To make progress,
we need another row operation: row exchange. Obviously, if we exchange row 0 with row 1 or row
2, the top left element becomes non-zero and then we can make progress in elimination. Between
the 2 possibilities, we take the row with the largest element in magnitude, i.e., row 1 to be exchanged
with row 0. For subsequent elimination steps, we do similar things if we encounter elimination
difficulties due to 0’s.
The above modification sounds straightforward. However, we need another consideration when
working on actual computers: when we calculate in float precision, it is hard to tell zero from
non-zero (try 1 − 1/2023 ∗ 2023 in Python or Numpy, do you get exact 0?). This means that it
might be tricky to decide when to perform a row exchange. This also suggests an always-exchange
strategy that works the best in practice: we always exchange the current row with the row below
(including itself) with the largest element in magnitude, no matter if the current element is close
to 0 or not. Let us work through an example to understand this.
So we arrive at Gaussian elimination Version 1 described in Algorithm 3. Compared to Algorithm 1,
we only need two extra lines, marked in orange!
To implement Algorithm 3, you will need to use the following two Numpy built-in functions:
(a) numpy.absolute takes element-wise absolute value of a given vector or matrix: vector (matrix)
in, vector (matrix) out
1 u = np . array ([[1] ,[ -1] ,[2] ,[ -2]])
2 v = np .abs( u ) # short hand version for np. absolute (u)
3 # v now is [[1] ,[1] ,[2] ,[2]]
7
Algorithm 3 Gaussian Elimination Version 1
Input: A, b
1: U = concatenate(A, b) ▷ generate the augmented matrix U by concatenating A and b
2: n = number of rows in U ▷ n is the number of rows of U
3: for k = 0 : (n − 1) do ▷ k will iterate from 0 to (n − 2) (included) with increment 1
4: Find the first i so that abs{U[i, k]} is largest among abs{U[k, k]}, abs{U[k + 1, k]}, · · ·
5: ▷ here abs{} means absolute value
6: U[[k], :] ↔ U[[i], :] ▷ exchange the two rows to get the largest number (in abs{}) on top
7: for j = (k + 1) : n do ▷ iteratively eliminate the rows below using the current row
8: λ = U[j, k]/U[k, k] ▷ U[k, k] is the current leading number
9: U[[j], :] = U[[j], :] − λ ∗ U[[k], :] ▷ subtract λ multiple of the k-th row from the j-th row
10: end for
11: end for
12: return U ▷ return the final augmented matrix
(b) numpy.argmax returns the index (not value) of the maximum value of an input vector (when
ties occur, it returns the first one)
1 u = np . array ([[1] ,[ -1] ,[2] ,[ -2]])
2 idx = np . argmax ( u )
3 # idx is 2
Now we are ready to go!
(a) Implement Algorithm 3. Your function should be called gauss_elim_v1 that: (i) takes an
square matrix A ∈ R
n×n
, a vector b ∈ R
n
, and a print flag print_flag that controls whether
we print the intermediate augmented matrix after each row operation, and (ii) returns a
matrix U ∈ R
n×(n+1) where the left n × n submatrix of U is in the row echelon form. To test
and debug your implementation, please take the worked example in Eq. (7). To get full credit,
you need to print out the intermediate augmented matrix after each row operation.
(b) Implement a function my_solver_v1 by combining the gauss_elim_v1 and back_subs functions implemented above: this function takes a square matrix A ∈ R
n×n and a vector b ∈ R
n
,
and returns a vector x ∈ R
n
so that Ax = b. In other words, my_solver_v1 solves the linear
system Ax = b when given A and b. To test your solver, in the code template, we provide a
randomly generated A ∈ R
300×300 and b ∈ R
300. Please
(i) solve the given 300 × 300 linear system using your solver—we will denote this solution
by x1;
(ii) validate your solution x1 by calculating the relative error ∥Ax1 − b∥ / ∥A∥F
, which
should be very close to 0 if your solver works well;
(iii) call the NumPy built-in function numpy.linalg.solve to solve the given linear system to
give a solution x2. Ideally, x1 and x2 should be the same. Please calculate the relative
distance between x1 and x2, i.e., ∥x1 − x2∥ / ∥x2∥. The relative distance should be very
close to 0 if your solver works well.
Congratulations! Now you have a mature solver for large linear systems!
8
Problem 4. Nearest Neighbor Classification (5 points)
The MNIST (Mixed National Institute of Standards) dataset2
comprises tens of thousands of images
of handwritten digits, i.e., from 0 to 9; check out Fig. 1 for a few examples. Each of the images is a
28×28 matrix. For convenience, we “flatten” each of these matrices into a length-784 (28×28 = 784)
row vector by stacking the rows.
Figure 1: 25 images of handwritten digits from the MNIST dataset. Each image is of size 28 × 28, and can
be represented by a length-784 vector.
Classification here means assigning a label from {0, 1, · · · , 9} to each given image/row vector,
where hopefully the assigned label is the true digit contained in the image. This is easy for human
eyes, but took several decades for computer scientists to develop reliable methods. Today, these
technologies (which can also classify letters, symbols, and so on), collectively known as optical
character recognition (OCR), are hidden in every corner of our digital lives; for interested minds,
please check out this Wikipedia article https://en.wikipedia.org/wiki/Optical_character_
recognition.
In this problem, we explore and implement the k-nearest neighbor (KNN) method for digit
recognition on the MNIST dataset. The method goes like this: we have a dictionary (called training
set) with numerous pairs of (image, label), where the label from {0, 1, · · · , 9} is the true digit
contained in each image. For each given image that we want to predict its label (called a test),
we search the dictionary for the k most similar images (i.e., k-nearest neighbors) and assign the
majority of the labels of those k images to the current test image (i.e., majority voting). To assess the
performance, on a bunch of test images (called test set), we can compare the majority-voting labels
with the true labels. A visual illustration of the k-nearest neighbor (KNN) method is shown in
Fig. 2. We strongly suggest you read this blog article before attempting the following questions.
2Available from http://yann.lecun.com/exdb/mnist/.
9
Figure 2: A visual illustration of the KNN algorithm. Image credit: https://medium.com/swlh/
k-nearest-neighbor-ca2593d7a3c4.
In the Colab file, we provide the training set Xtrain (a Ntrain × 784 NumPy array) and the test
set Xtest (a Ntest × 784 NumPy array). Each row of Xtrain and Xtest is a flattened image. Their
corresponding true labels are ytrain (Ntrain × 1 NumPy array) and ytest (Ntest × 1 NumPy array). In
this problem, Ntrain = 600 and Ntest = 100.
4.1 (1/5) Data visualization Visualize the first and third images (row vectors) in Xtrain, and the
last 5 images (row vectors) in Xtest. What are their corresponding true labels? (Note: this problem
can be solved in one line by calling the provided function visualization(). )
4.2 (1.5/5) Distance calculation Calculate
(1) the distance between v1 and w; (2) the distance between v2 and w,
where v1, v2, w are provided in the Colab file. Compare the two distance values, and explain the
physical meaning of distance in this problem.
4.3 (2.5/5) KNN implementation Algorithm 4 is the pseudocode of the k-nearest neighbor
method. Implement the algorithm and assess performance using the validation code provided;
the validation code compares ypredict and ytest and calculates the prediction accuracy (Note: the
prediction accuracy should be more than 80%). Please use k = 7 in this problem.
To implement Algorithm 4, you will need to use the following two Numpy built-in functions:
(a) numpy.argsort takes in a column vector, and sorts the elements into ascending order, and
returns the corresponding element indices (i.e., sorted indices) as a column vector. For
10
(8)
1 u = np . array ([[1] ,[ -1] ,[2] ,[ -2]])
2 v = np . argsort (u , axis =0)
3 # v now is [[3] ,[1] ,[0] ,[2]] , a column vector (i.e. , 2 -D array with a single
column )
4 v = v . flatten () # This turns the 2 -D array into a 1 -D array
5 # v now is [3 ,1 ,0 ,2]
(b) numpy.bincount takes in a 1-D array with non-negative integer values, finds the largest
integer Nmax, and counts the occurrences of each integer between 0 and Nmax (both ends
included) inside the array. It returns the occurrence counts as a 1-D array of size Nmax + 1. For
example, for an input [0, 1, 1, 3, 2, 1, 7], this function generates the output [1, 3, 1, 1, 0, 0, 0, 1]
because there are one 0, three 1’s, one 2, one 3, zero 4, zero 5, zero 6, and one 7, inside the
input array.
1 u = np . array ([0 , 1 , 1 , 3 , 2 , 1 , 7])
2 v = np . bincount ( u )
3 # v now is [1 , 3 , 1 , 1 , 0 , 0 , 0 , 1]
(c) numpy.argmax returns the index (not value) of the maximum value of an input 1-D array
(when ties occur, it returns the first one)
1 u = np . array ([1 , -1 ,2 , -2])
2 idx = np . argmax ( u )
3 # idx is 2
Algorithm 4 k-nearest neighbor algorithm
Input: k = 7, training set Xtrain ∈ R
600×784 and labels ytrain ∈ R
600×1
, test set Xtest ∈ R
100×784 and
labels ytest ∈ R
100×1
.
Output: ypredict
1: ypredict = −1 ▷ all predicted labels initialized as −1; provided in the code template
2: for i = 0 : Ntest do ▷ iterate over all test images
3: x = Xtest[[i], :] ▷ x stores the current test image as a row vector
4: d = 0 ▷ d ∈ R
600×1
stores the distances of the current test image to all training images
5: for j = 0 : Ntrain do ▷ iterate over all training/dictionary images
6: d[j] = ∥x − Xtrain[[j], :]∥ ▷ distance between the test image and the j-th training image
7: end for
8: Obtain the indices of the bottom k values from d ▷ Try using np.argsort
9: Get the most frequent label of these k training images ▷ Use np.bincount and np.argmax
10: Save the predicted label of the test image in the corresponding index of ypredict
11: end for
11
4.4 (Optional, 3 Bonus Points) ℓ1 norm and vectorization The norm ∥v∥ =
p
⟨v, v⟩ we introduced in the lecture is not the only way to measure magnitudes of vectors, and hence ∥a − b∥ is
also not the only way to measure distance between vectors a, b. Another norm, the ℓ1 norm (also
known as Manhattan Distance) is calculated by
∥v∥1 =
Xn
i=1
|vi
| for v ∈ R
n
,
where |·| denotes the absolute value. This also leads naturally to ℓ1 distance between a, b: ∥a − b∥1
.
Please redo problem 4.3 with the distance in line 6 of the pseudo-code replaced by the ℓ1
distance and run the prediction and validation again. In order to receive full marks, please use
numpy.absolute and numpy.sum functions to write the function l1_norm. This is based on the idea
of vectorization—many scalar operations are broadcast componentwise and performed in parallel
on vectors and matrices, which is used to speed up the Python code without using loop. You can
check out this webpage https://www.pythonlikeyoumeanit.com/Module3_IntroducingNumpy/
VectorizedOperations.html or alike for more information.
12