Why Singular Value Decomposition?

Because we encounter Singular Value Decomposition in the following sections of The Elements of Statistical Learning, a machine learning class major textbook.

Ridge regression
Principal Component Analysis (PCA)

Big Picture

Singular Value Decomposition (SVD) is similar to Eigendecomposition. Eigendecomposition only works for a square matrix. But SVD is more general. It works on a rectangular matrix as well as a square matrix. People say eigendecomposition is a special case of SVD, or SVD is an extension of eigendecomposition sometimes.

The goal of SVD is to decompose a given matrix into 3 matrices.

is a given matrix. It can be rectangular or square.

is a orthogonal matrix. Formally, orthogonal basis for column space of . is called left singular vectors.

is a singular values diagonal matirx of . Singular values are on diagonal elements. The size of this matrix is the same as the given matrix .

is a orthogonal matrix. Orthogonal basis for row space of . is called right singular vectors.

We should think that SVD equation looks like a diagonalization that eigendecomposition produces, . In fact, when is a square symmetric matrix, . So . and are both diagonal matrices.

How to Compute in SVD

of SVD can be computed by eigendecomposition of .

Starting from the SVD equation,

Left-multiply it by

By LIVE EVIL rule,

Becuase is an orthogonal matrix, meaning

Because ,

Because transposing a transposed matrix goes back to the original matrix,

Because is a diagonal matrix, . And powers of a diagonal matrix is powers of each diagonal element in a matrix,

Because is a square and the right side is a diagonalization form, this equation can be read as eigendecomposition of . So is eigenvector of and is eigenvalue of . can be computed by eigendecomposition of .

How to Compute in SVD

of SVD can be computed by eigendecomposition of .

Starting from the SVD equation,

Right-multiply it by

So to compute , apply eigendecomposition to , and the eigenvector is .

can be also computed by eigendecomposition of and by and . Going back to SVD formula,

Right-multiply it by

Because is an orthogonal matrix,

Because

By looking at each vector and value in ,

So after getting and by eigendecomposition of , we can compute by that division.

How to Interpret Singular Values in

Singular values are scale-dependent, meaning you cannot compare the singular values from one matrix with the singular values from another matrix by the magnitudes of each singular value, because the scale of the singular values depend on the numbers in the two matrices.

When you scale up the matrix for SVD, singular values are also scaled up by the same amount. and stay intact, because both are kept being normalized.

Singular values will be interpretable after normalizing it. So singular values will be percentage.

Sum all the singular values
Divide each singular value by the sum

When you take the maximum of the normalized singular value,

If the max is high, we can interpret that the given matrix is simple.
If the max is low, the matrix is complicated made by lots of information, distributed along different directions.

SVD vs. Eigendecomposition for Square Symmetric Matrix

Here matrix needs to be square because Eigendecomposition is only for square matrix and we can compare SVD and Eigendecomposition.

For a symmetric matrix, SVD and Eigendecomposition produce the same result, except there could be sign flip in eigenvectors.

Singular Value vs. Eigenvalue

For symmetric matrix, singular value and eigenvalue are the same, but non-symmetric square matrix, singular value and eigevalue are not the same.

Eigenvalue of = Squared singular value of (where is a square symmetric matrix)
Eigenvalue of = Singular value of (where is a square symmetric matrix)
Eigenvalue of Singular value of (where is a square matrix)

Low-Rank Approximation by SVD

The SVD equation is,

Low-rank approximation is to only use a part of singular vectors and singular values (from 1st vector and value to kth vector and value) to recontruct the original matrix. Hopefully we reduce lots of data, but don't lose the information (or meaning) of the data.

So after reducing the size of matrices from SVD, still apply the same SVD equation for matrix multiplicationm, and we can approximate the original matrix.

SVD of Square Matrix

When a given matrix is square in SVD,

In such case, is possibe. Because and are orthogonal matrices, is a stack of effects of just changing orientation (rotating) of a vector 2 times (meaning no stretching),

But we have the different matrix multiplication results

SVD of Orthogonal Matrix

Given an orthogonal matrix ,

Eigenvalues of an identity matrix has all 1s. SVD of is eigendecomposition of , singular values of is eigenvalues of . So singular values of SVD of are all 1.

Moore-Penrose Pseudoinverse by SVD

Pseudoinverse is approximated inverse of non-invertible matrix. Moore-Penrose pseudoinverse is not the only way to compute pseudoinverse, but it's the most common way.

Inverse in terms of SVD of a full-rank matrix is,

In the right-hand side, each matrix is invertible, because and are orthogonal matrix (invertible), and is a diagonal matrix with all diagonal elements non-zero, because we assume the given matrix is a full-rank. And by LIVE EVIL rule,

Because orthogonal matrix is , so ,

So from SVD perspective, inverse of the given matrix is just swap and and get the inverse of diagonal matrix with singular values. Inverse of a diagonal matrix is,

In pseudoinverse, inverse of diagonal matrix is defined by taking reciprocal of non-zero elements, and with superscript

And just do the same inverse of SVD matrices

So should be close to an identity matrix.

Left Inverse by SVD

The matrix is an invertible symmetric matrix, so . We say is left inverse of . Left inverse is only possible when is a full-rank.

But when is a reduced-rank, if we convert into SVD form, we can compute Moore-Penrose pseudoinverse. When is a tall matrix and its SVD is , left inverse can be computed as .

Because of SVD, we have,

Left inverse of is,

Substitute SVD to the equation by ,

By LIVE EVIL rule,

Because transpose of a transposed matrix is just an original matrix, , and transpose of a diagonal matrix just remain the same diagonal matrix, ,

For orthogonal matrix , , and

Becuase matrix multiplication of diagonal matrix is just take powers of diagonal elements, ,

By LIVE EVIL rule again but to inverse this time

Because for orthogonal matrix, , so ,

Because of again,

In , in diagonal elements,

This form is the same as the inverse of a matrix by SVD

Condition Number by SVD

Condition number is defined as the ratio of the largest singular value and the smallest like below,

Condition number of a singluar matrix is not defined, because singular matrix has at least one 0-valued singular value, so we cannot do division by 0. As a matrix gets closer to a singular matrix, minimum singular value gets smaller, so the divisor gets smaller, so the condition number get larger.

By condition number, people say,

ill-condition when condition number is large
well conditioned when condition number is small

But ill-condition does not mean there's something wrong with a given matrix. Informative matrix could have a large condition number (so called ill-condition), and a random noisy non-informative matrix could have a small condition number (called well condition even if there's no information). So we cannot make any conclusion about a matrix just by looking at the condition number.

In statistics, condition number is used as an indicator of the reliability and sensitivity of a matrix. A matrix with a large condition number is more sensitive to a small perturbation.

vs.

If is a left singular vector of , and if is the corresponding singular value, is a right singular value of with the same .
If is a left singular vector of , and if is the corresponsing singular value, is a right singular value of with the same .

Start from SVD equation,

By focusing on a single vector and value, and just think about will be made by layers

Right-multiply both sides with

Because is a vector in an orthogonal matrix

When we are given , it's a square symmetric matrix. In that case, SVD of the square symmetric matrix produces the same and .

Left-multiply both sides by

Replace with

This equation is eigendecomposition with eigenvalue and eigenvector , so is a right singular value of