Principle Component Analysis#
Principle Component Analysis (PCA) reduces the dimension by projecting onto another subspace that captures as much variation of the data as possible.
Gaussian Components#
Given cenetered data \(X\), the covariance matrix is given by
Project \(X\) onto the principle coordinates of dimension \(k\) which are the eigenvectors \(v_1 \ldots v_k\) associated with a set of the largest eigenvalues \(\lambda_1 \ldots \lambda_k\)
Variability Accounted#
The percentage of variability accounted for is given by the proportion of the sum of the eigenvalues accounted for.
General PCA#
In general, a gaussian fit to the data is not needed. The goal of PCA once again is to find the direction of projection that maximizes variance. This can be written as a maximization problem:
The identity is the Raleigh quotient of \(X^\top X\) and \(w\). Therefore if \(w\) is an eigenvector \(v_i\) then the Rayleigh quotient is \(\lambda_i\).
Restricting \(w\) to be an eigenvector \(v_d\) maximizes the variance to \(\lambda_d / n\)
Not restricting \(w\) to be an eigenvector, because \(w\) is a linear combination of the eigenvectors \(v\), the Raleyigh quotient is a linear combination of the eigenvalues.
Deriving PCA Using Projection Error#
The sum of squared error is the projection error of PCA. We want to minimize the projection error.
Eigenfaces#
Given \(X\) containing \(n\) faces of \(d\) grayscale pixels each, find the nearest neighbor to the training set. This is a slow process of \(\Theta(nd)\). PCA can reduce this dimension by projection \(d\) to \(d'\).
Random Projections#
An alternative to PCA as a preprocess. Choosing a small \(\epsilon\) and \(\delta\) create a random subspace \(S \in \mathbb R^d\) of dimension \(k\) where
For any point \(q\), let \(\hat q\) be the orthogonal projection of \(q\) onto \(S\) multiplied by \(\sqrt{d/k}\).
For any two points \(q\), \(w \in \mathbb R^{d}\)
with probability \(\mathbb P \ge 1 - 2\delta\)
The typical hyperparameters are: