k-Nearest Neighbors#
k-Nearest Neighbors (k-NN) is used in both classification and regression. The predictor predicts the aggregate of the \(k\) nearest training label using some distance metric \(d(z,x)\).
Specifically for classification, the mode of the k-NN is often used as the predicted label, while for regression, the mean of the k-NN is often used.
Algorithm#
Given a query point \(z\):
Scan through all \(n\) sample points.
Add neighbors to \(k\) size max-heap.
The computation time is \(\mathcal O(nd + n \log k)\)
Low Dimension Algorithm#
In very small dimensions (\(~2-5\)) Use Voronoi diagrams.
In medium dimensions (\(\lesssim30\)) Use k-d trees.
Otherwise use traditional algorithm or preprocessing.
Vornoi Diagrams#
Let \(X\) be a point set. The Voronoi cell of \(w \in X\) is
For \(k=1\),
In 2D, using the Vornoi diagram and trapezoidal map for point location the k-NN can be reduced to \(\mathcal O(\log n)\)
In \(\dim d\), use binary space partition tree for point location.
Other than that, the Vornoi diagram isn’t feasible for k-NN.
k-D Trees#
Very similar to decision trees with the follow differences:
Choose splitting feature is chosen by the feature of greatest width which maximizes \(X_{ji} - X_{ki}\).
Alternatively rotate the features.
Choose splitting value by either:
Median (shortest tree builds in \(\mathcal O(nd \log n)\) )
Pairwise midpoint (builds in \(\mathcal O(n \log n)\) )
Each internal node stores a sample point at the split.
Each subtree can be seen as a box region with values of its node and sub-nodes.
Given a query point \(z\) ,find a sample point such that \(|\overline{zw}| \le (1 + \epsilon)|\overline{zs}|\) where \(s\) is the true nearest neighbor. The algorithm maintains:
Nearest neighbor found so far
Binary heap of unexplored subtrees, keyed by distance from the query.
A better nearest point is obtained by going through the binary heap and exploring the subtree no further than the distance to the nearest neighbor so far.
Facts and Heuristics#
Theorem (Cover & Hart 1967) : As \(n \rightarrow \infty\), \(k=1\) has the error rate bounded by \(< B(2-B)\) where \(B\) is the Bayes risk.
If $k=2$, then error is $\le 2B(1-B)$.
Theorem (Fix & Hodges 1951) : As \(n \rightarrow \infty\), \(k \rightarrow \infty\), and \(\frac{k}{n} \rightarrow 0\) then k-NN error rate converges to \(B\) the Bayes risk.