Kernel Ridge Regression#
Take some cenetered data \(X, y\) such that we can confidently write the normal equation for ridge regression as:
Now to write \(\theta\) in terms of the linear combination of \(X\),
Now to remove the \(\theta\) dependency,
Replacing \(\theta\) in the ridge regression loss function,
Another property is when solving for a out of sample point \(z\),
Here motivates the kernel function,
thus also the kernel matrix \(K\). Now we can summarize the optimization solution (i.e., the normal equation) and the model
Performance#
The dual (or kernel) form of ridge regression has the performance:
compared with the primal form which has the performance:
\(\mathcal O(d^3 + d^2 n)\)
So whenever \(d > n\) , the kernel form is best used. Furthermore any increase in \(d\) due to the design transformation is ignored with kernels.