Prediction by Expectation#
We can take advantage of the Law of Averages if what we like to predict, \(Y\), is a random variable. However, if all we have is some data that is a random variable \(X\), then the prediction for \(Y\) can be expressed as the conditional expectation on \(X\),
Let’s define the error to be the residual from the true value,
A few key facts:
The expectation of the error is zero $\(E(\varepsilon) = 0\)$
The expectation of the best estimator and true value is equal $\(E(\hat y(X)) = E(Y)\)$
The best estimator and deviance are uncorrelated (not necessarily independent). $\(\text{Cov}\left[ \hat y(X), \varepsilon \right] = 0\)$
Let’s take a look at the variance. We define two deviances of \(Y\) and \(\hat y\) as,
They are related by,
The two terms \(D_{\hat y}\) and \(\varepsilon\) are uncorrelated because \(D_{\hat y}\) is a function of \(X\). The variance of \(D_Y\) is then,
Some facts:
The variance of the deviance of \(Y\) is the same as the variance of \(Y\)
\[ \text{Var}(D_Y) = \text{Var}(Y) \]The variance of the deviance of \(\hat y\) is the same as the variance of the best estimator
\[ \text{Var}(D_{\hat y}) = \text{Var}(\hat y(X)) \]The variance of the residual error is,
\[\begin{split} \begin{align*} \text{Var}(\varepsilon) &= E(\varepsilon^2) \\ &= E(E(Y - \hat y(X))^2 \mid X)\\ &= E(\text{Var}(Y \mid X)) \end{align*} \end{split}\]Where,
\[ \text{Var}(Y \mid X) \equiv E\left[(Y - E(Y \mid X))^2 \mid X \right] \]So,
\[ \text{Var}(\varepsilon) = E(\text{Var}(Y \mid X)) \]
We can now fully express \(\text{Var}(D_Y)\) as,