Why Does Eigendecomposition Matter?

It appears in the following sections of The Elements of Statistical Learning, a textbook used in machine learning classes, to explain how to compute parameters in statistical methodologies, so that I think understanding it will make us better understand statistics.

Principal Component Analysis (PCA)
Ridge Regression
Linear Discriminant Analysis (LDA)

Big Picture

Eigendecomposition is a way to break down a given matrix into scalar (number) and vector (list of numbers). The vectors represent the special direction of the given matrix. The scalars represent how much special the corresponding vector is. The scalar is called eigenvalue, and the vector is eigenvector.

Eigenvalue

Eigenvalue is something we need to get eigenvector, tells us the scale of the eigenvector, and also used as how much important a certain thing is from the data in statistics.

How to Compute Eigenvalue

We start with the following equation.

is a given square matrix. We are provided this data.

is a vector. In eigendecomposition, it should not be a 0 vector, so there need to be at least one non-zero number in the vector.

is a scalar, a number.

That equation is saying that matrix multiplication with a vector can be expressed as just a scalar element-wise multiplication to the vector. We could have a huge size of the matrix . For example, if is 10x10 matrix, you have 100 numbers in the matrix, and you need to use all of them to multiply with . But is just one number. It's saying by using a single number, we can do the same thing.

In linear algebra, means that we apply to . Typically is changing . The change results in changing the direction of and stretching or shrinking the . But , a single number, is applied to . This is only stretching or shrinking . So says that we are only changing the length of , but we are not changing the direction of .

We wanna compute so move it to the left of the equation.

We cannot do , because is a matrix and is a scalar. To make element-wise subtraction possible, we need to make the shape of and equal by multiplying identity matrix. So is mxm, and is also mxm.

People say that, in , a vector is in the null space of a matrix ( shifted by ), because it's producing 0 vector. means is in the null space of a matrix .

In eigendecomposition, we exclude a case where is a vector of all 0s. needs to have at least one element non-zero. So to produce 0 at the right side of the equation, must be a singular matrix (reduced rank matrix). If is a singular matrix, it means that a determinant of is 0.

In , is a 0 vector, but in , is a scalar 0. is called characteristic equation.

If is a singular matrix (reduced-rank matrix, i.e. a matrix where one column can be a combination of other columns), at least one eigenvalue is 0.

Eigenvector

Eigenvector is more important than eigenvector, but you need to compute eigenvalue before computing eigenvector

Find all eigenvalues
For each , find a vector which is in the null space of a shifted matrix by , . That is

in the null space of means that gives us a vector with all 0s in elements. In math,

Typically, after finding those vectors, people normalize it to make the vector have a unit length.

To double check the process to compute eigenvector, first find eigenvalues, and shift the original matrix by the eigenvalue, and find the basis vector (meaning normalized unit length vector) for the null space of the shifted matrix.

Diagonalization of a Matrix with Eigenvalue and Eigenvector

Diagonalization in the context of eigendecomposition is to extract a special property of a given matrix in a form of a diagonal matrix by using eigenvalue and eigenvector.

After decomposition, we have a set of eigen values and the according eigen vectors. For example, for 3x3 matrix we have,

These can be written in matrix as below. Let column vector

A matrix with eigenvectors () must left multiply a diagonal matrix of eigenvalues because the below is wrong.

That means that the first eigenvalue is multiplied with the first element of each eigenvector. But it's wrong, because eigenvalue and eigenvector is a pair. You cannot mix an eigenvalue with a different eigenvector.

So a set of can be written as below by using is the given matrix, is a matrix with eigenvectors in each column, and is a diagonal matrix of eigenvalues in diagonal elements.

By right multiplying the both sides by

That can be interpreted as a hidden insight of a given square matrix is a diagonal matrix by passing through and . And it also suggests that the given square matrix can be reconstructed by a set of eigenvectors and , and a diagonal matrix of eigenvalues .

Also, after diagonalization, matrix powers will be easy. From the below equation,

Suppose getting to the 3rd power

By regrouping the matrices

Because

Because is a diagonal matrix, so

We just first did eigendecomposition, and then get the power, but it still works by first getting power and do eigendecomposition, because if we have , is

Because

We cannot change the order of matrix multiplication, but is a scalar, so we can move it, so move it front

Because

Because is a scalar

Eigenvalue of Diagonal Matrix

Eigenvalues of a diagonal matrix are diagonal elements of the diagonal matrix. For example,

So we can imagine that, even if a diagonal matrix gets bigger, off-diagonal elements will disappear by 0 multiplication in characteristic equations and it gives us a bunch of , so we can directly use diagonal elements as eigenvalues.

Eigenvalue of Triangular Matrix

Eigenvalues of a triangular matrix has the same result as a diagonal matrix. Regardless of upper triangular matrix or lower triangular matrix, eigenvalues will be diagonal elements of the triangular matrix. For example in 2x2 upper triangular matrix,

Because when computing determinant and when making multiplication, 0 at the either side of the off-diagonal elements cancels the other, so off-diagonal elements will disappear.

Repeated Eigenvalue

Distinct eigenvalues 's lead to distinct eigenvectors 's. But when we have the same eigenvalues (repeated eigenvalues), the repeated eigenvectors form the single eigenvector (producing the same vector), or the repeated eigenvectors form the eigenplane.

For example,

By characteristic equation to find eigenvalues,

When ,

For , it will have the same direction .

Eigendecomposition of Symmetric Matrix

Eigendecomposition of symmetrix matrix has the following 2 properties, assuming it has distinct eigenvalues

Eigenvalues are real values (no complex numbers).
Eigenvectors are all pair-wise orthogonal, meaning

If two vectors are orthogonal, the dot product is 0, so start from making a dot product. Below and are eigenvectors.

Because ,

By LIVE EVIL rule,

Because a given matrix is symmetrix (), and because ,

Becuase is a scalar and we can move it forward by changing order,

Extract the first and the last expression,

Because eigenvalues are distinct, and won't be the same, so is not 0. For the right hand side of the above equation to be 0, needs to be 0. When a dot product is 0, those vectors are orthogonal, so and are orthogonal. So symmetrix matrix has orthogonal eigenvectors as long as eigenvalues are distinct.

It means that gives a diagonal matrix, because row and columns are orthogonal. By default eigenvectors are not guaranteed to be a unit vector (not normalized to be the length of a vector 1). But when we normalize each columns of a matrix with eigenvectors in columns, pair-wise dot product produces 1, so from a matrix of eigenvectors ,

Because ,

It means that we don't need to do inverse calculation (difficult compute and time consuming). We can replace it with just a transpose.

Eigendecomposition of Singular Matrix

Eigendecomposition can also be applied to singular matrix (non-invertible matrix, its determinant is 0). In such case, there will be at least one 0 eigenvalue. To know the rank of a singular matrix, we can count the number of non-zero eigenvalues. The determinant is equal to the product of all eigenvalues.

Diagonalization as Sum of Matrices

From a given square matrix , either a symmetrix matrix or non-symmetrix matrix, eigendecomposition and diagonalization of the matrix gives us the following

is a matrix with eigenvectors in each column, and is a diagonal matrix with eigenvalues in each diagonal element. From one eigenvector and the corresponding eigenvalue in the decomposition, we can create a matrix like below.

Because and ,

This is a mx1 column vector times a 1xm row vector , scaled by , giving us a matrix (mxm). It's actually an outer product of an eigenvector scaled by eigenvalue. We can think of this single matrix as a single layer of the given matrix . If we wanna reconstruct (mxn), we can use the following equation.

We can arbitrarily choose any number as long as it doesn't exceed . So it means that we can only sum the first layers to recontruct .

Numpy (and most of the other softwares) by default normalizes eigenvectors to have a unit length. It means that gives the matrix which has all 1s in the diagonal elements, because each pair of vectors are orthogonal. So the magnitude of each eigenvector is 1. It means eigenvectors only give us the direction. And eigenvalues give us the importance of each eigenvector.

Eigenvectors point in the important direction in the space of matrix , and eigenvalues tell us how important those specific directions are. The larger the magnitude of eigenvalue is, the more important the direction is in the space of matrix .