Percepton Algorithm#

The perceptron algorithm is the most simplest form of what now called the neural network. The algorithm can be used to classify a binary class. The goal is to find some weights \(\theta\) that satisfies the following: $\( y = \begin{cases} 1 & X_i^\top \theta \ge 0\\ -1 & X_i^\top \theta \le 0 \end{cases} \)$ This is equivalent to finding a hyperplane that linearly separates the data points. However a strict constraint is that the hyperplane is centered at the origin[1].

We define a loss function as,

\[\begin{split} L(y_i, \theta) = \begin{cases} 0, & \text{sign}(X_i^\top \theta) = y_i\\ |X_i^\top \theta|, &\text{otherwise} \end{cases} \end{split}\]

The risk function (expectation value of \(L\) on \(X\) parameterized over \(\theta\)) is given by,

\[ R(X,\theta) = \sum_{i=1}^{n} L(\theta_i, y_i) = \sum_{i \in W} |X_i^\top \theta| \]
  • \(W\): A set indices where \(\hat y_i\) is wrong.

Feature & Parameter Space Symmetry#

The percepton algorithm’s constrain allows for a nice symmetry between feature \(X_i\) and parameter space \(\theta\) which may be stated as:

  • In \(x\) feature space, the feature are points while the parameters are vectors that draws a hyperplane at the origin.

  • In \(\theta\) parameter space, the parameters are points while the features are vectors that draws a hyperplane at the origin.

It is very useful to look at the parameter space because there may exist a region bounded by the feature hyperplanes that determines the values of \(\theta\) which correctly classifies any linearly separable set of training data \(X\).

Perceptron Convergence Theorem#

For the modified perceptron classifier,

\[ \hat y(x) = x \cdot \theta + \theta_0 \]

If the data is linearly separable then there always exist a solution that perfectly classifies the data in at most \(\mathcal P(R^2/M^2)\) iterations

  • \(R\) : \(\max\lvert X_i \rvert\)

  • \(M\) : Maximal margin.