Because we encounter Singular Value Decomposition in the following sections of The Elements of Statistical Learning, a machine learning class major textbook.
Ridge regression
Principal Component Analysis (PCA)
Big Picture
Singular Value Decomposition (SVD) is similar to Eigendecomposition. Eigendecomposition only works for a square matrix. But SVD is more general. It works on a rectangular matrix as well as a square matrix. People say eigendecomposition is a special case of SVD, or SVD is an extension of eigendecomposition sometimes.
The goal of SVD is to decompose a given matrix into 3 matrices.
is a given matrix. It can be rectangular or square.
is a orthogonal matrix. Formally, orthogonal basis for column space of . is called left singular vectors.
is a singular values diagonal matirx of . Singular values are on diagonal elements. The size of this matrix is the same as the given matrix .
is a orthogonal matrix. Orthogonal basis for row space of . is called right singular vectors.
We should think that SVD equation looks like a diagonalization that eigendecomposition produces, . In fact, when is a square symmetric matrix, . So . and are both diagonal matrices.
How to Compute in SVD
of SVD can be computed by eigendecomposition of .
Starting from the SVD equation,
Left-multiply it by
By LIVE EVIL rule,
Becuase is an orthogonal matrix, meaning
Because ,
Because transposing a transposed matrix goes back to the original matrix,
Because is a diagonal matrix, . And powers of a diagonal matrix is powers of each diagonal element in a matrix,
Because is a square and the right side is a diagonalization form, this equation can be read as eigendecomposition of . So is eigenvector of and is eigenvalue of . can be computed by eigendecomposition of .
How to Compute in SVD
of SVD can be computed by eigendecomposition of .
Starting from the SVD equation,
Right-multiply it by
So to compute , apply eigendecomposition to , and the eigenvector is .
can be also computed by eigendecomposition of and by and . Going back to SVD formula,
Right-multiply it by
Because is an orthogonal matrix,
Because
By looking at each vector and value in ,
So after getting and by eigendecomposition of , we can compute by that division.
How to Interpret Singular Values in
Singular values are scale-dependent, meaning you cannot compare the singular values from one matrix with the singular values from another matrix by the magnitudes of each singular value, because the scale of the singular values depend on the numbers in the two matrices.
When you scale up the matrix for SVD, singular values are also scaled up by the same amount. and stay intact, because both are kept being normalized.
Singular values will be interpretable after normalizing it. So singular values will be percentage.
Sum all the singular values
Divide each singular value by the sum
When you take the maximum of the normalized singular value,
If the max is high, we can interpret that the given matrix is simple.
If the max is low, the matrix is complicated made by lots of information, distributed along different directions.
SVD vs. Eigendecomposition for Square Symmetric Matrix
Here matrix needs to be square because Eigendecomposition is only for square matrix and we can compare SVD and Eigendecomposition.
For a symmetric matrix, SVD and Eigendecomposition produce the same result, except there could be sign flip in eigenvectors.
Singular Value vs. Eigenvalue
For symmetric matrix, singular value and eigenvalue are the same, but non-symmetric square matrix, singular value and eigevalue are not the same.
Eigenvalue of = Squared singular value of (where is a square symmetric matrix)
Eigenvalue of = Singular value of (where is a square symmetric matrix)
Eigenvalue of Singular value of (where is a square matrix)
Low-Rank Approximation by SVD
The SVD equation is,
Low-rank approximation is to only use a part of singular vectors and singular values (from 1st vector and value to kth vector and value) to recontruct the original matrix. Hopefully we reduce lots of data, but don't lose the information (or meaning) of the data.
So after reducing the size of matrices from SVD, still apply the same SVD equation for matrix multiplicationm, and we can approximate the original matrix.
SVD of Square Matrix
When a given matrix is square in SVD,
In such case, is possibe. Because and are orthogonal matrices, is a stack of effects of just changing orientation (rotating) of a vector 2 times (meaning no stretching),
But we have the different matrix multiplication results
SVD of Orthogonal Matrix
Given an orthogonal matrix ,
Eigenvalues of an identity matrix has all 1s. SVD of is eigendecomposition of , singular values of is eigenvalues of . So singular values of SVD of are all 1.
Moore-Penrose Pseudoinverse by SVD
Pseudoinverse is approximated inverse of non-invertible matrix. Moore-Penrose pseudoinverse is not the only way to compute pseudoinverse, but it's the most common way.
Inverse in terms of SVD of a full-rank matrix is,
In the right-hand side, each matrix is invertible, because and are orthogonal matrix (invertible), and is a diagonal matrix with all diagonal elements non-zero, because we assume the given matrix is a full-rank. And by LIVE EVIL rule,
Because orthogonal matrix is , so ,
So from SVD perspective, inverse of the given matrix is just swap and and get the inverse of diagonal matrix with singular values. Inverse of a diagonal matrix is,
In pseudoinverse, inverse of diagonal matrix is defined by taking reciprocal of non-zero elements, and with superscript
And just do the same inverse of SVD matrices
So should be close to an identity matrix.
Left Inverse by SVD
The matrix is an invertible symmetric matrix, so . We say is left inverse of . Left inverse is only possible when is a full-rank.
But when is a reduced-rank, if we convert into SVD form, we can compute Moore-Penrose pseudoinverse. When is a tall matrix and its SVD is , left inverse can be computed as .
Because of SVD, we have,
Left inverse of is,
Substitute SVD to the equation by ,
By LIVE EVIL rule,
Because transpose of a transposed matrix is just an original matrix, , and transpose of a diagonal matrix just remain the same diagonal matrix, ,
For orthogonal matrix , , and
Becuase matrix multiplication of diagonal matrix is just take powers of diagonal elements, ,
By LIVE EVIL rule again but to inverse this time
Because for orthogonal matrix, , so ,
Because of again,
In , in diagonal elements,
This form is the same as the inverse of a matrix by SVD
Condition Number by SVD
Condition number is defined as the ratio of the largest singular value and the smallest like below,
Condition number of a singluar matrix is not defined, because singular matrix has at least one 0-valued singular value, so we cannot do division by 0. As a matrix gets closer to a singular matrix, minimum singular value gets smaller, so the divisor gets smaller, so the condition number get larger.
By condition number, people say,
ill-condition when condition number is large
well conditioned when condition number is small
But ill-condition does not mean there's something wrong with a given matrix. Informative matrix could have a large condition number (so called ill-condition), and a random noisy non-informative matrix could have a small condition number (called well condition even if there's no information). So we cannot make any conclusion about a matrix just by looking at the condition number.
In statistics, condition number is used as an indicator of the reliability and sensitivity of a matrix. A matrix with a large condition number is more sensitive to a small perturbation.
vs.
If is a left singular vector of , and if is the corresponding singular value, is a right singular value of with the same .
If is a left singular vector of , and if is the corresponsing singular value, is a right singular value of with the same .
Start from SVD equation,
By focusing on a single vector and value, and just think about will be made by layers
Right-multiply both sides with
Because is a vector in an orthogonal matrix
When we are given , it's a square symmetric matrix. In that case, SVD of the square symmetric matrix produces the same and .
Left-multiply both sides by
Replace with
This equation is eigendecomposition with eigenvalue and eigenvector, so is a right singular value of