Yuki's App
Yuki's App
Principal Component Analysis
Statistics | 2022-04-06 | 143 views | 0 likes

Concept

is the number of data. is the number of features. is given data. Each observation is centered, meaning the means of each feature are subtracted from each feature. In code,

np is numpy. axis=0 means that getting means of each column.

Principal component analysis (PCA) is the singular value decomposition (SVD) of this centered data.

is left singular vectors. is a diagonal matrix with singular values in diagonal elements. is right singular vectors.

The columns of are called the principal components of .

Dimension reduction of from to () is given by the first principal components like below.

is of the first columns and all the rows. is of the first columns and first rows.

Scikit-learn

In sklearn.decomposition.PCA, parameter n_components is .

needs to be centered before doing fit(X) or fit_transform(X).

Attribute singular_values_ is singular values of SVD.

Attribute components_ is right singular vectors of SVD.

Dimension reduction by fit_transform(X) is of SVD.

Reference