Prediction by Expectation

Prediction by Expectation#

We can take advantage of the Law of Averages if what we like to predict, $Y$, is a random variable. However, if all we have is some data that is a random variable $X$, then the prediction for $Y$ can be expressed as the conditional expectation on $X$,

\[ \hat y(x) = E(Y \mid X = x) \]

\[ \hat y(X) = E(Y \mid X) \]

Let’s define the error to be the residual from the true value,

\[ \varepsilon = Y - \hat y(X) \]

A few key facts:

The expectation of the error is zero $$E(\varepsilon) = 0$$
The expectation of the best estimator and true value is equal $$E(\hat y(X)) = E(Y)$$
The best estimator and deviance are uncorrelated (not necessarily independent). $$\text{Cov}\left[ \hat y(X), \varepsilon \right] = 0$$

Let’s take a look at the variance. We define two deviances of $Y$ and $\hat y$ as,

\[ D_Y = Y - E(Y) \]

\[ D_{\hat y} = \hat y(X) - E(Y) \]

They are related by,

\[ D_Y = D_{\hat y} + \varepsilon \]

The two terms $D_{\hat y}$ and $\varepsilon$ are uncorrelated because $D_{\hat y}$ is a function of $X$. The variance of $D_Y$ is then,

\[ \text{Var}(D_Y) = \text{Var}(D_{\hat y}) + \text{Var}(\varepsilon) \]

Some facts:

The variance of the deviance of $Y$ is the same as the variance of $Y$

\[ \text{Var}(D_Y) = \text{Var}(Y) \]
The variance of the deviance of $\hat y$ is the same as the variance of the best estimator

\[ \text{Var}(D_{\hat y}) = \text{Var}(\hat y(X)) \]
The variance of the residual error is,

\[\begin{split} \begin{align*} \text{Var}(\varepsilon) &= E(\varepsilon^2) \\ &= E(E(Y - \hat y(X))^2 \mid X)\\ &= E(\text{Var}(Y \mid X)) \end{align*} \end{split}\]

Where,

\[ \text{Var}(Y \mid X) \equiv E\left[(Y - E(Y \mid X))^2 \mid X \right] \]

So,

\[ \text{Var}(\varepsilon) = E(\text{Var}(Y \mid X)) \]

We can now fully express $\text{Var}(D_Y)$ as,

\[ \text{Var}(D_Y) = E(\text{Var}(Y \mid X)) + \text{Var}(E(Y \mid X)) \]