MIT A 2020 Vision of Linear Algebra (2/2)

Published: by Creative Commons Licence

Part 3: Orthogonal Vectors

orthogonal which means perpendicular.

orthogonal matrices Q. Every column qi of Q is perpendicular to every other column (Orthogonal unit vector).

QTQ=[qT1:qTn][q1..qn]=[101.01]=In QQT=[q1..qn][qT1:qTn]=q1qT1+...+qnqTn=I

These matrices are the best because they don't change the length of anything. You don't have to blow up, you don't have to going to zero. You can multiply together 1,000 matrices, and you still have another orthogonal matrix.

Qx2=xTQTQx=xTQ1Qx=xTx=x2
Length is preserved!

Eigenvalues of Q.

Qx=λx,Qx2=|λ|2x2,|λ|2=1

If we have a bunch of columns, not orthogonal, then, often, we would like to convert them like this:

A=QR[a1...an]=[q1...qn][r11r12.r1nr22.r2n..rnn]
  • Columns a1 to an are independent;
  • Columns q1 to qn are orthonormal.

What's the difference between the word "orthogonal" and "orthonormal"? Two vectors are orthogonal if their inner product is zero. They are orthonormal if they are orthogonal, and additionally each vector has norm 1.

Least Squares: Major Applications of A=QR

m>n, m equations Ax=b, n unknowns, minimize bAx2=e2.

Normal equations for the best ˆx: ATe=0 or ATAˆx=ATb.

we put a little hat on that x to show that it doesn't solve the original equation Ax=b, but it comes the closest. It's the closest solution I could find.

ATA (A transpose A) is a terrifically important matrix. It's a square matrix (A didn't have to be square) and also symmetric. A transpose A is a great matrix for theory. And the QR makes it work in practice.

If A=QR then RTQTQRˆx=RTQTb leads to Rˆx=QTb.

Part 4: Eigenvalues and Eigenvectors

S=ST has orthogonal eigenvectors xTy=0. Proof is important.

Start from these facts:

Sx=λx,Sy=αy,λα,ST=S
  1. Transpose to xTST=λxT and use ST=S. xTSy=λxTy
  2. We can also multiply Sy=αy by xT. xTSy=αxTy
  3. Now λxTy=αxTy. Since λα, xTy must be zero.

Eigenvalues of S go into Orthogonal Matrix Q

S[q1..qn]=[λ1q1..λnqn]=[q1..qn][λ1..λn]

That says

SQ=QAS=QΛQ1=QΛQT
  • S=QΛQT is a sum ni=1λiqiqTi of rank one matrices.
  • With S=ATA this will lead to the sigular values of A.
  • A=UΣVT is a sum ni=1σiuivTi of rank one matrices.
  • Singular values σ1 to σr in Σ. Singular vectors in U and V.

Eigenvectors and Eigenvectors of A: Not symmetric

A[x1..xn]=[λ1x1..λnxn],AX=XΛ

With n independent eigenvetors A=XΛX1.

A2,A3,.. have the same eigenvectors as A.

An0whenΛn0:All  |λi|<1

ATA is square, symmetric, nonnegative definite

  1. Square. ATA=(n×m)(m×n)=n×n
  2. Symmetric. (BA)T=ATBT,(ATA)=ATATT=ATA
  3. S=ST is nonnegative definite IF
    1. EIGENVALUE TEST 1: All eigenvalues 0, Sx=λx;
    2. ENERGY TEXT 2: xTSx0 for every vector x.
the matrices we see in data are rectangular, and eigenvalues don't make sense for them. And SINGULAR VALUE takes the place of the eigenvalues.

Part 5: Singular Values and Singular Vectors

Singular value decomposition (SVD)

A=UΣVT with UTU=I and VTV=1.

AV=UΣ means

A[v1..vn]=[u1..un][σ1..σn]andAvi=σiui

SINGULAR VALUES σ1σ2σr>0, r=rank of A

How to choose orthonormal vi in the row space of A? The vi are eigenvectors of ATA,

ATAvi=λivi=σ2ivi

The vi are orthonormal. (VTV=I)

How to choose ui in the column space?

ui=Aviσi

the ui are orthonormal. (what shows below is an important step of UTU=I)

(Avjσj)T(Avjσj)=vTjATAvjσjσi=vTjσ2ivjσjσi=1i=j0ij

Full size SVD. A=UΣVT=(m×n)(m×m)(n×n)

  • ur+1 to um: Nullspace of AT
  • vr+1 to vn: Nullspace of A
Σ=[σ10:σr00]

Low rank approximation to a big matrix

  • Start from the SVD: A=UΣVT=ri=1σiuivTi
  • Keep the k largest σ1 to σk: Ak=ki=1σiuivTi
  • Ak is the closest rank k matrix to A: AAkABk

Norms.

A=σmax,AF=σ21+...+σ2r,AN=σ1+...+σr

Randomized Numerical Linear Algebra

For very large matrices, randomization has brought a revolution.

Example: Multiply AB with Column-row sampling (AS)(STB)

AS=[a1a2a3][s110000s32]=[s11a1s32a3]andSTB=[s11bT1s32bT3]

NOTICE. SST is not close to I. But we can have

E[SST]=I,E[(AS)(STB)]=AB

Norm-squared sampling. Choose column-row with probabilities aibTi

This choice minimizes the sampling variance.


If you wanna see the vedio, click MIT A 2020 Vision of Linear Algebra, Spring 2020.