Feature Selection#
Forward Stepwise Selection#
\(\mathcal O(d^2)\) algorithm that builds a model by adding features iteratively:
Start with no features (null model)
Score single-feature models
Add the best scoring single-feature model to the null model
Repeatedly add the best scoring feature until validation error starts increasing.
Backward Stepwise Selection#
\(\mathcal O(d^2)\) algorithm that builds a model by removing features iteratively:
Start with all-features model
Remove the feature that results to the best \(d-1\) feature model.
Selection by Parameter#
Remove features with small parameters. This is best used for normalized data.
If the data is not normalized, alternatively some metric called the z-score can be used to score the features’ importance. $\( z_j = \frac{\theta_j}{\sigma \sqrt{(X^\top X)_{jj}}} \)$
LASSO#
Naturally adding the LASSO regularization would set some of the features to zero. Using this these features may as well be removed from the dataset.