The ideal line for the population is:
\[\mu_i\,=\,\beta_o \,+\,\beta_1\,X_i\]
The estimated regression line is:
\[\hat
\mu_i\,=\,\hat\beta_o\,+\,\hat\beta_1\,X_i\]
Preliminary definitions:
\(\text{cov} (X,Y) =
\frac{1}{n-1}\displaystyle \sum_{i-1}^n (X_i -\bar X) (Y_i - \bar Y)
=\frac{1}{n-1}\left (\displaystyle \sum_{i-1}^n X_i Y_i - n\bar X\bar
Y\right)\)
\(\text{SD} (X) =
\frac{1}{n-1}\displaystyle \sum_{i-1}^n (X_i -\bar X) (X_i - \bar X)
=\frac{1}{n-1}\left (\displaystyle \sum_{i-1}^n X_i^2 - n\bar
X^2\right)\)
\(\text{cor}(X,Y)
= \frac{\text{cov}(X,Y)}{\text{SD}(X) \text{SD}(Y)}\)
We want to minimize the squared distances from all the observed
values \(Y_i\) in the population and
\(\mu_i\), which are the fitted values
that we would get if we had the “ideal” regression line calculated based
on all the population - it has the symbol of mean, because it’s the mean
of every bell’s curve as in the diagram above; \(\hat \mu_i\) is the fitted values for the
sample based on our estimated regression line.
Just subtracting and adding \(\hat \mu_i\):
\[\begin{align}
\displaystyle \sum_{i=1}^n (Y_i-\mu_i)^2 &= \displaystyle
\sum_{i=1}^{n}(Y_i-\hat \mu_i + \hat \mu_i - \mu_i)^2\\[2ex]
&=
\displaystyle \sum_{i=1}^{n}(Y_i-\hat \mu_i)^2 + 2 \sum_{i=1}^{n} (Y_i -
\hat \mu_i)\,(\hat \mu_i - \mu_i) + \displaystyle \sum_{i=1}^n (\hat
\mu_i - \mu_i)^2
\end{align}\]
Finding the minimum amounts to:
\[\sum_{i=1}^{n} (Y_i - \hat \mu_i)\,(\hat \mu_i - \mu_i)\,=\,0\,\small \tag 1\]
leaving us with:
\[\displaystyle \sum_{i=1}^{n}(Y_i-\mu_i)^2 = \displaystyle \sum_{i=1}^{n}(Y_i-\hat \mu_i)^2 + \displaystyle \sum_{i=1}^n (\hat \mu_i - \mu_i)^2\]
It follows that:
\[\displaystyle \sum_{i=1}^{n}(Y_i-\mu_i)^2\,\geq \sum_{i=1}^{n}(Y_i-\hat \mu_i)^2\]
Considering only horizontal lines, and from equation \((1)\):
\[ \displaystyle\sum_{i=1}^{n} (Y_i - \hat \beta_o)\,(\hat \beta_o - \beta_o)\,=\,(\hat \beta_o - \beta_o)\,\displaystyle\sum_{i=1}^{n}(Y_i - \hat \beta_o)=\,0\]
This will happen if:
\[\displaystyle\sum_{i=1}^{n}(Y_i - \hat \beta_o)=\,0.\]
In other words if \[n\,\bar Y\,-\,n\,\hat \beta_o\,=\,0,\]
or
\[\bar Y = \beta_o \small \tag
2.\]
If we do regression through the origin \[y_i=x_{i}\beta_1+\epsilon\]
\[\begin{align}\displaystyle\sum_{i=1}^{n}
(Y_i - \hat \mu_i)\,(\hat \mu_i - \mu_i)
&=\,\displaystyle\sum_{i=1}^{n} (Y_i - \hat \beta_1\, X_i)\,(\hat
\beta_1 X_i - \beta_1\,X_i)\\[2ex]
&=\,\displaystyle\sum_{i=1}^{n}(Y_i\,\hat
\beta_1\,X_i\,-\,Y_i\beta_1 X_i)(-\hat \beta_1 X_i \hat \beta_1 X_i\,
+\,\hat\beta_1 X_i \beta_1 X_i)\\
&=(\hat \beta_1 -
\beta_1)\,\displaystyle\sum_{i=1}^{n}(Y_iX_i)-(\hat \beta_1 X_i^2)
\end{align}\]
And this will be zero if:
\[\displaystyle\sum_{i=1}^{n}(Y_iX_i)-(\hat \beta_1 X_i^2)=\displaystyle\sum_{i=1}^{n}(Y_iX_i)-\hat \beta_1\displaystyle\sum_{i=1}^{n}X_i^2=0.\]
Hence,
\[\hat \beta_1=\frac{\displaystyle\sum_{i=1}^{n}Y_iX_i}{\displaystyle\sum_{i=1}^{n}X_i^2} \small \tag 3.\]
Now doing both intercept and slope:
\[\begin{align}
0 &=\displaystyle\sum_{i=1}^{n} (Y_i - \hat \mu_i)\,(\hat \mu_i -
\mu_i)\\[2ex]
&=\displaystyle\sum_{i=1}^{n}(Y_i -\hat\beta_o - \hat \beta_1
X_i)(\hat\beta_o+\hat\beta_1X_i-\beta_o-\beta_1X_i)\\[2ex]
&=\displaystyle\sum_{i=1}^{n} (Y_i -\hat\beta_o \hat \beta_1
X_i)((\hat\beta_o-\beta_o)+\hat\beta_1X_i-\beta_1X_i)\\[2ex]
&= (\hat
\beta_o-\beta_o)\displaystyle\sum_{i=1}^{n}(Y_i-\hat\beta_o-\hat\beta_1X_i)+(\hat\beta_1-\beta_1)\displaystyle\sum_{i=1}^{n}(Y_i-\hat\beta_o-\hat\beta_1X_i)X_i \tag
4
\end{align}\]
For the first part of equation \((4),\)
\[0=\displaystyle\sum_{i=1}^{n}(Y_i-\hat\beta_o-\hat\beta_1X_i)=n\bar Y_i-n\hat\beta_o-n\hat\beta_1\bar X_i\]
Hence,
\(\color{blue}{\hat\beta_o\,=\,\bar
Y\,-\,\hat\beta_1\,\bar X \small \tag 5}\)
And for the second part of equation \((4)\):
\[0=\displaystyle\sum_{i=1}^{n}(Y_i-\hat\beta_o-\hat\beta_1X_i)X_i.\] Substituting \(\hat\beta_o=\bar Y- \hat \beta_1 \bar X\) for \(\hat\beta_o\):
\[0=\displaystyle\sum_{i=1}^{n}(Y_i-\bar Y + \hat \beta_1 \bar X - \hat \beta_1X_i)X_i =\displaystyle\sum_{i=1}^{n}(Y_i-\bar Y)X_i - \hat\beta_1 \sum_{i=1}^{n}(X_i-\bar X)X_i\]
Since \(\displaystyle\sum_{i=1}^{n}(Y_i-\bar Y)=0\), also \(\displaystyle\bar X\sum_{i=1}^{n}(Y_i-\bar Y)=0\).
\[\begin{align}\hat \beta_1&=\frac{\sum_{i=1}^{n}(Y_i-\bar Y)X_i}{\sum_{i=1}^{n}(X_i-\bar X)X_i}\\[2ex] &=\frac{\sum_{i=1}^{n}(Y_i-\bar Y)(X_i-\bar X)}{\sum_{i=1}^{n}(X_i-\bar X)(X_i-\bar X)}\\[2ex] &= \frac{\text{cov}(X,Y)}{\text{var}(X)}\\[2ex] &=\frac{\text{cor}(X,Y)\,\text{SD}(X)\text{SD}(Y)}{\text{SD}(X)\,\text{SD}(X)} \end{align}\]
Hence,
\(\color{blue}{\hat\beta_1\,=\,\text{cor}(Y,
X)\,\frac{\text{SD}(Y)}{\text{SD}(X)}\,=\,\frac{\text{cov}(Y,X)}{\text{var}(X)}\small
\tag 6}\)
\[Y_i\,=\,\beta_1X_{1i}\,+\,\beta_2X_{2i}+\dots+\,\beta_pX_{pi}\,+\,\varepsilon_i\]
is the ideal model for the population. One of the terms corresponds to
the intercept.
For the sample:
\[\hat\mu_i=\hat\beta_1X_{1i}\,+\,\hat\beta_2X_{2i}\,+\dots\,+\,\hat\beta_pX_{pi}+\epsilon_i\]
As in the univariable case, \(\displaystyle \sum_{i=1}^n(Y_i-\hat\mu_i)(\hat\mu_i-\mu_i)=0\).
Let’s assume two regressors: \(Y_i = \beta_1X_{1i}+\beta_2X_{2i}\):
\[\displaystyle \sum_{i=1}^n\color{brown}{(Y_i-\hat\mu_i)}\color{orange}{(\hat\mu_i-\mu_i)}=\displaystyle \sum_{i=1}^n \color{brown}{(Y_i-\hat\beta_1X_{1i}-\hat\beta_2X_{2i})}\color{orange}{\left((\hat\beta_1-\beta_1)X_{1i}+(\hat\beta_2-\beta_2)X_{2i}\right)}.\]
We need a system of equations:
\[\begin{cases}\sum_{n=1}^n\color{brown}{(Y_i-\hat\beta_1X_{1i}-\hat\beta_2X_{2i})}\color{orange}{X_{1i}}=0\\[2ex] \sum_{n=1}^n\color{brown}{(Y_i-\hat\beta_1X_{1i}-\hat\beta_2X_{2i})}\color{orange}{X_{2i}}=0 \end{cases}\]
Solving for \(\hat\beta_2\) in the second equation:
\[\bbox[5px,border:2px solid red]{\hat\beta_2\,=\,\frac{\displaystyle\sum_{i=1}^n(Y_i\,-\,\hat\beta_1\,X_{1i})\,X_{2i}}{\displaystyle\sum_{i=1}^nX_{2i}^2}}\tag 7\]
Now plugging \(\hat \beta_2\) into the first of the equations in the system above:
\[\begin{align}
0&=\displaystyle\sum_{i=1}^n\left\{Y_i - \hat\beta_1
X_{1i}-\left(\frac{\displaystyle\sum_{i=1}^n(Y_i-\hat\beta_1X_{1i})X_{2i}}{\displaystyle\sum_{i=1}^n
X_{2i}^2}\right)X_{2i}\right\}X_{1i}\\[2ex]
&=\displaystyle\sum_{i=1}^n\left\{Y_i - \hat\beta_1
X_{1i}-\left(\frac{\displaystyle\sum_{i=1}^nY_iX_{2i}}{\displaystyle\sum_{i=1}^nX_{2i}^2}-\frac{\displaystyle\sum_{i=1}^n
\hat\beta_1X_{1i}X_{2i}}{\displaystyle\sum_{i=1}^n
X_{2i}^2}\right)X_{2i}\right\}X_{1i}\\[2ex]
&=\displaystyle\sum_{i=1}^n\left\{Y_i - \hat\beta_1
X_{1i}-\frac{\displaystyle\sum_{i=1}^nY_iX_{2i}}{\displaystyle\sum_{i=1}^nX_{2i}^2}X_{2i}+\frac{\displaystyle\sum_{i=1}^n
\hat\beta_1X_{1i}X_{2i}}{\displaystyle\sum_{i=1}^n
X_{2i}^2}X_{2i}\right\}X_{1i}\\[2ex]
&=\displaystyle\sum_{i=1}^n\left\{Y_i
-\frac{\displaystyle\sum_{i=1}^nY_iX_{2i}}{\displaystyle\sum_{i=1}^nX_{2i}^2}X_{2i}+\hat\beta_1\frac{\displaystyle\sum_{i=1}^n
X_{1i}X_{2i}}{\displaystyle\sum_{i=1}^n X_{2i}^2}X_{2i}\,- \hat\beta_1
X_{1i}\right\}X_{1i}\\[2ex]
&=\displaystyle\sum_{i=1}^n\left\{Y_i
-\frac{\displaystyle\sum_{i=1}^nY_iX_{2i}}{\displaystyle\sum_{i=1}^nX_{2i}^2}X_{2i}-
\hat\beta_1 X_{1i}+\hat\beta_1\frac{\displaystyle\sum_{i=1}^n
X_{1i}X_{2i}}{\displaystyle\sum_{i=1}^n
X_{2i}^2}X_{2i}\right\}X_{1i}\\[2ex]
&=\displaystyle\sum_{i=1}^n\left\{Y_i
-\color{magenta}{\frac{\displaystyle\sum_{i=1}^nY_iX_{2i}}{\displaystyle\sum_{i=1}^nX_{2i}^2}}X_{2i}-
\hat\beta_1 \left(\color{red}{X_{1i}-\frac{\displaystyle\sum_{i=1}^n
X_{1i}X_{2i}}{\displaystyle\sum_{i=1}^n
X_{2i}^2}X_{2i}}\right)\right\}X_{1i} \small \tag 8
\end{align}\]
\[\color{magenta}{\frac{\displaystyle\sum_{i=1}^nY_iX_{2i}}{\displaystyle\sum_{i=1}^nX_{2i}^2}}\]
is the slope of the regression through the origin of \(y_i=x_{2i}\beta_2+\epsilon\). Why? Just see
equation \((3).\)
\[Y_i -\color{magenta}{\frac{\displaystyle\sum_{i=1}^nY_iX_{2i}}{\displaystyle\sum_{i=1}^nX_{2i}^2}}X_{2i}\]
is the residual only including \(X_{2i}\) as a regressor through the origin. Why? Just consider that with one single variable, regression through the origin is \(y_i=x_{i}\beta_1+\epsilon\) with the slope being as in equation \((3):\) i.e. \(\small\hat\beta_1=\frac{\displaystyle\sum_{i=1}^{n}Y_iX_i}{\displaystyle\sum_{i=1}^{n}X_i^2},\) and therefore, the line is \(\frac{\displaystyle\sum_{i=1}^{n}Y_iX_i}{\displaystyle\sum_{i=1}^{n}X_i^2}X_i\)
Likewise,
\[\color{red}{X_{1i}-\frac{\displaystyle\sum_{i=1}^n X_{1i}X_{2i}}{\displaystyle\sum_{i=1}^n X_{2i}^2}X_{2i}}\] is the residual of the regression through the origin \(x_{1i}=x_{2i}\gamma+\epsilon\), which is \(X_1\) regressed over \(X_2.\)
So rewriting the equation \[e\,=\,\text{residuals}\left(Y\,-\,\left\{\hat\beta_1=\frac{\displaystyle\sum_{i=1}^n Y_i X_i}{\displaystyle\sum_{i=1}^n X_i^2}\right\}X\right)\tag 9\] (see equation (3)).
For instance:
\[\color{red}{e_{iX_1|X_2}=X_{1i}-\frac{\displaystyle\sum_{i=1}^n X_{1i}X_{2i}}{\displaystyle\sum_{i=1}^n X_{2i}^2}\,X_{2i}}\]
So continuing where we left it in eq \(8:\)
\[\begin{align} 0&=\displaystyle\sum_{i=1}^n\left\{Y_i -\color{magenta}{\frac{\displaystyle\sum_{i=1}^nY_iX_{2i}}{\displaystyle\sum_{i=1}^nX_{2i}^2}}X_{2i}- \hat\beta_1 \left(\color{red}{X_{1i}-\frac{\displaystyle\sum_{i=1}^n X_{1i}X_{2i}}{\displaystyle\sum_{i=1}^n X_{2i}^2}X_{2i}}\right)\right\}X_{1i}\\[2ex] &=\displaystyle\sum_{i=1}^n\left\{e_{iY|X_2}\,-\,\hat\beta_1\,e_{iX_1|X_2}\right\}\,X_{1i} \small \tag {10}\\[2ex] &=\displaystyle\sum_{i=1}^n\left\{e_{iY|X_2}\frac{e_{iX_1|X_2}}{e_{iX_1|X_2}}\,-\,\hat\beta_1\,e_{iX_1|X_2}\,\frac{e_{iX_1|X_2}}{e_{iX_1|X_2}}\right\}\,X_{1i} \end{align} \]
Therefore,
\[\displaystyle\sum_{i=1}^n e_{Y|X_2}X_{1i}=\,\hat\beta_1\,\displaystyle\sum_{i=1}^n e_{X_1|X_2}\,X_{1i}\]
\[\hat\beta_1\,=\,\frac{\displaystyle\sum_{i=1}^n e_{Y|X_2}X_{1i}}{\displaystyle\sum_{i=1}^n e_{X_1|X_2}X_{1i}}\,=\,\frac{\displaystyle\sum_{i=1}^n e_{Y|X_2}e_{X_1|X_2}}{\displaystyle\sum_{i=1}^n e_{X_1|X_2}X_{1i}} \small \tag {11}\]
What happened to the numerators? It turns out they are the same (proof from equation \((10)\)).
The estimator \(\beta_1\) from the model \(y_i =\beta_1 x_{1i} + \beta_2 x_{2i} + \dots + \epsilon_i\) estimates the association between \(X_1\) and \(Y\) after “partialling out” \(X_2\). \(e_{i, Y|X_2}\) is \(Y\) after “partialling out” the variation that is explained by \(X_2\) . The expression \(e_{i, X_1|X_2}\) is \(X_1\) after “partialling out” the variation that is explained by \(X_2\).
\(\hat\beta_1\) is simply the sample analogue of \(\frac{\text{cov}(\bar X_1, \bar Y)}{\text{var}(\bar X_1)}\) or \(\text{cor}(\hat X_1, \hat Y)\frac{\text{SD}(\bar Y)}{\text{SD}(\bar X)}\), where \(\hat X_1\) and \(\hat Y\) are partialled out by \(X_1\) and \(Y\).
The \((11)\) numerator expression, i.e. \(\displaystyle\sum_{i=1}^n X_{1i} e_{i,Y|X_2}=\displaystyle\sum_{i=1}^n e_{i,X_1|X_2} \, e_{i,Y|X_2},\) can be shown to be true:
by \((9),\)
\[\begin{align} \displaystyle\sum_{i=1}^n X_{1i} e_{i,Y|X_2} &= \color{blue}{\displaystyle\sum_{i=1}^n X_{1i}\left(y_i - \frac{\displaystyle\sum_{j=1}^n X_{2j} y_j}{\displaystyle\sum_{j=1}^n X_{2j}^2} X_{2i} \right)} \end{align}\]
(replacing \(e_{i,Y|X_2}\)) by its expansion - see under equation \((8)\) - let’s call this last equation “\(X_1\) times residuals \(X_2\) through origin”).
Now,
\[\begin{align}\small \displaystyle\sum_{i=1}^n e_{i,X_1|X_2} \, e_{i,Y|X_2} &=\displaystyle\sum_{i=1}^n \left(X_{1i} - \frac{\displaystyle\sum_{j=1}^n X_{2j} X_{1j}}{\displaystyle\sum_{j=1}^n X_{2i}^2} X_{2i}\right)\,\left(y_i - \frac{\displaystyle\sum_{j=1}^n X_{2j}y_{j}}{\displaystyle\sum_{j=1}^n X_{2j}^2} X_{2i}\right)\\[2ex] &=\small\displaystyle\sum_{i=1}^n \left(X_{1i}y_i - \frac{\displaystyle\sum_{j=1}^n X_{2j} y_j}{\displaystyle\sum_{j=1}^n X_{2j}^2} X_{1i} X_{2i} - \frac{\displaystyle\sum_{j=1}^n X_{2j}X_{1j}}{\displaystyle\sum_{j=1}^n X_{2i}^2} X_{2i}y_i + \frac{\displaystyle\sum_{j=1}^n X_{2j}X_{1j} \displaystyle\sum_{j=1}^n X_{2j}y_j}{\left(\displaystyle\sum_{j=1}^n X_{2j}^2\right)^2} X_{2i}^2 \right)\\[2ex] &=\small\displaystyle\sum_{i=1}^n X_{1i}y_i - \frac{\displaystyle\sum_{j=1}^n X_{2j} y_j}{\displaystyle\sum_{j=1}^n X_{2j}^2} \displaystyle\sum_{i=1}^n X_{1i} X_{2i} - \frac{\displaystyle\sum_{j=1}^n X_{2j}X_{1j}}{\displaystyle\sum_{j=1}^n X_{2i}^2} \displaystyle\sum_{i=1}^n X_{2i}y_i + \frac{\displaystyle\sum_{j=1}^n X_{2j}X_{1j} \displaystyle\sum_{j=1}^n X_{2j}y_j}{\left(\displaystyle\sum_{j=1}^n X_{2j}^2\right)^2} \displaystyle\sum_{i=1}^nX_{2i}^2 \end{align}\]
Now, anything that is a summation indexed by either \(i\) or \(j\) are equivalent because they are summing
the same \(n\) observations, with the
only difference being the index itself. So it follows that the above is
equal to:
\[\begin{align}
&...=\small\displaystyle\sum_{i=1}^n X_{1i}y_i -
\frac{\displaystyle\sum_{j=1}^n X_{2j} y_j}{\displaystyle\sum_{j=1}^n
X_{2j}^2} \displaystyle\sum_{i=1}^n X_{1i} X_{2i} -
\frac{\displaystyle\sum_{i=1}^n X_{2i}y_i \displaystyle\sum_{j=1}^n
X_{2j}X_{1j}}{\displaystyle\sum_{j=1}^n X_{2i}^2} +
\frac{\displaystyle\sum_{j=1}^n X_{2j}y_j \displaystyle\sum_{j=1}^n
X_{2j}X_{1j}}{\displaystyle\sum_{j=1}^n X_{2j}^2}\\[2ex]
&=\displaystyle\sum_{i=1}^n X_{1i}y_i -
\frac{\displaystyle\sum_{j=1}^n X_{2j} y_j}{\displaystyle\sum_{j=1}^n
X_{2j}^2} \displaystyle\sum_{i=1}^n X_{1i} X_{2i}\\[2ex]
&=\color{blue}{\displaystyle\sum_{i=1}^n X_{1i}\left(y_i -
\frac{\displaystyle\sum_{j=1}^n X_{2j}y_j}{\displaystyle\sum_{j=1}^n
X_{2j}^2}\,X_{2i}\right)}\quad\square
\end{align}\]
Indeed, this is the same blue expression above - the equation “\(X_1\) times residuals \(X_2\) through the origin”).
It can be shown that \(\displaystyle\sum_{i=1}^n\,e_{i,X_1|X_2}\,X_{1i}\) (denominator of \((11)\)) is equivalent to \(\displaystyle\sum_{i=1}^n \,e_{i,X_1|X_2}^2\).
\[\begin{align}\displaystyle\sum_{i=1}^n e_{X_1|X_2}^2 &=\displaystyle\sum_{i=1}^n e_{X_1|X_2} \left( \color{red}{X_{1i} - \frac{\displaystyle\sum_{i=1}^n X_{1i}X_{2i}}{\displaystyle\sum_{i=1}^n X_{2i}^2}X_{2i}} \right) \end{align}\]
\(\displaystyle\sum_{i=1}^n e_{X_1|X_2}X_{2i}=0\) (sum of the residuals times the regressor is zero). The normal equations of OLS: the least squares residuals must be orthogonal to all the regressors.
\[\displaystyle\sum_{i=1}^n e_{X_1|X_2}^2 = \displaystyle\sum_{i=1}^n e_{X_1|X_2}X_{1i}\quad\square\].
And plugging into the \(\hat\beta_1\) equation \((11)\):
\[\bbox[5px,border:2px solid red]{\hat\beta_1=\,\frac{\displaystyle\sum_{i=1}^n e_{Y|X_2}X_{1i}}{\displaystyle\sum_{i=1}^n e_{X_1|X_2}X_{1i}}=\frac{\displaystyle\sum_{i=1}^n e_{Y|X_2}e_{X_1|X_2}}{\displaystyle\sum_{i=1}^n e_{X_1|X_2}^2}=\frac{\displaystyle\sum_{i=1}^n e_{Y|X_2}e_{X_1|X_2}}{\displaystyle\sum_{i=1}^n e_{X_1|X_2}X_{1i}}}\small\tag {12}\]
which leads to the intended insight, or point of the entire exercise:
Regression estimate for \(\beta_1\) is the regression through the origin estimate having regressed \(X_2\) out of both the response and the predictor.
Regression estimate for \(\beta_2\) is the regression through the origin estimate having regressed \(X_1\) out of both the response and the predictor.
Multivariate regression estimates are exactly those having removed the linear relationship of the other variables from both the regressor and the response.
INTUITION:
The ingredients for the slope of the linear component to \(y\) afforded by one of the independent variables or regressors is the result of adding up and multiplying the residuals of the other variables with the regressor under consideration, as well as the residuals of the other variables with \(y.\) In essence, it’s the measure of 1. Lack of correlation between regressors \((e_{x_1|x_2})\); and 2. How far off in predicting y the other variables are \((e{Y|x_2})\).
Solving will entail p equations with p unknowns of the form
of “system of equations”:
\[\displaystyle\sum_{i=1}^n(Y_i-\hat\beta_1X_{1i}-\hat\beta_2X_{2i}-\dots-\hat\beta_pX_{pi})X_k=0\]
Holding \(\hat\beta_1\) through \(\hat\beta_{p-1}\) fixed…
\[\hat\beta_p=\frac{\displaystyle\sum_{i=1}^n (Y_i-\hat\beta_1X_{1i}-\hat\beta_2X_{2i}-\dots-\hat\beta_{p-1}X_{i(p-1)})X_{ip}}{\displaystyle\sum_{i=1}^n X_{ip}^2}\]
as in equation (7).
Plugging it into the equation above:
\[\displaystyle\sum_{i=1}^n(e_{i,Y|X_p}-e_{i,X_1|X_p}\hat\beta_1-\dots-e_{i,X_{p-1}|X_p}\hat\beta_{p-1})X_k=0\] as in equation (8).
\[X_k = e_{i,X_k|X_p}+\frac{\displaystyle\sum_{i=1}^n X_{ik}X_{ip}}{\displaystyle\sum_{i=1}^n X_{ip}^2}X_p\] and \[\displaystyle\sum_{i=1}^n e_{i,X_k|X_p}X_{ip}=0\]
Therefore:
\[\displaystyle\sum_{i=1}^n(e_{i,Y|X_p}-e_{i,X_1|X_p}\hat\beta_1-\dots-e_{i,X_{p-1}|X_p}\hat\beta_{p-1})X_k=0\] is equal to:
\[\displaystyle\sum_{i=1}^n(e_{i,Y|X_p}-e_{i,X_1|X_p}\hat\beta_1-\dots-e_{i,X_{p-1}|X_p}\hat\beta_{p-1})e_{i,X_kX_p}=0\]
\[y_i\,=\,\beta_1X_{1i}\,+\,\beta_2X_{2i}+\dots+\,\beta_pX_{pi}\,+\,\varepsilon_i\]
\[\min(\hat\beta_o,\hat\beta_1,\hat\beta_2)\displaystyle\sum_{i=1}^n e_i^2\] where
\[e_i= y_i - \hat\beta_o+\hat\beta_1X_{1i}+\hat\beta_2X_{2i}\]
Then first order conditions are:
\(\frac{\partial}{\partial\hat\beta_o}\displaystyle\sum_{i=1}^n e_i^2=-2\displaystyle\sum_{i=1}^n(y_i-\hat\beta_o-\hat\beta_1x_{1i}-\hat\beta_2x_{2i})\)
\(\frac{\partial}{\partial\hat\beta_1}\displaystyle\sum_{i=1}^n e_i^2=-2\displaystyle\sum_{i=1}^nX_{1i}(y_i-\hat\beta_o-\hat\beta_1x_{1i}-\hat\beta_2x_{2i})\)
\(\frac{\partial}{\partial\hat\beta_2}\displaystyle\sum_{i=1}^n e_i^2=-2\displaystyle\sum_{i=1}^n X_{2i}(y_i-\hat\beta_o-\hat\beta_1x_{1i}-\hat\beta_2x_{2i})\)
Set to zero and solve:
\(-2\displaystyle\sum_{i=1}^n(y_i-\hat\beta_o-\hat\beta_1x_{1i}-\hat\beta_2x_{2i})=0\)
\(\displaystyle\sum_{i=1}^n(y_i-\hat\beta_o-\hat\beta_1x_{1i}-\hat\beta_2x_{2i})=0\)
\(\displaystyle\sum_{i=1}^n y_i-n\hat\beta_o-\hat\beta_1 \displaystyle\sum_{i=1}^n x_{1i}-\hat\beta_2 \displaystyle\sum_{i=1}^nx_{2i}=0\)
\(\displaystyle\sum_{i=1}^n y_i-\hat\beta_1 \displaystyle\sum_{i=1}^n x_{1i}-\hat\beta_2 \displaystyle\sum_{i=1}^nx_{2i}=n\hat\beta_o\)
\(\bar Y -\hat\beta_1 \bar X_1 -\hat\beta_2 \bar X_2 =\hat\beta_o\)
\(-2\displaystyle\sum_{i=1}^nX_{1i}(y_i-\hat\beta_o-\hat\beta_1x_{1i}-\hat\beta_2x_{2i})=0\)
\(\displaystyle\sum_{i=1}^nX_{1i}(y_i-\hat\beta_o-\hat\beta_1x_{1i}-\hat\beta_2x_{2i})=0\)
\(\displaystyle\sum_{i=1}^nX_{1i}(y_i-\bar Y +\hat\beta_1 \bar X_1 +\hat\beta_2 \bar X_2 -\hat\beta_1 X_{1i} - \hat\beta_2 x_{2i})=0\)
\(\displaystyle\sum_{i=1}^nX_{1i}[(y_i-\bar Y -\hat\beta_1 (x_{1i}-\bar X_1) -\hat\beta_2 (x_{2i}-\bar X_2)]=0\)
\(\displaystyle\sum_{i=1}^n X_{1i}(y_i-\bar Y) -\hat\beta_1 \displaystyle\sum_{i=1}^n x_{1i}(x_{1i}-\bar X_1) -\hat\beta_2 \displaystyle\sum_{i=1}^n x_{1i}(x_{2i}-\bar X_2)]=0\)
\(\displaystyle\sum_{i=1}^n X_{1i}(y_i-\bar Y) -\hat\beta_2 \displaystyle\sum_{i=1}^n x_{1i}(x_{2i}-\bar X_2)]=\hat\beta_1 \displaystyle\sum_{i=1}^n x_{1i}(x_{1i}-\bar X_1)\)
\(\hat\beta_1=\frac{\displaystyle\sum_{i=1}^nx_{1i}(y_i-\bar Y)}{\displaystyle\sum_{i=1}^nx_{1i}(x_{1i}-\bar X_1)}\,-\, \hat\beta_2\, \frac{\displaystyle\sum_{i=1}^n x_{1i}(x_{2i}-\bar X_2)}{\displaystyle\sum_{i=1}^n x_{1i}(x_{1i}-\bar X_1)}\)
compare to the formula with prior method through the origin (equation \((11)\)):
\(\hat\beta_1=\frac{\displaystyle\sum_{i=1}^n e_{Y|X_2}e_{X_1|X_2}}{\displaystyle\sum_{i=1}^n e_{X_1|X_2}^2}\)
\(\hat\beta_2=\frac{\displaystyle\sum_{i=1}^n (Y_i-\hat\beta_1X_{1i})X_{2i}}{\displaystyle\sum_{i=1}^n X_{2i}^2}\)
The first part of the equation above appears to be the simple ordinary least squares (OLS) estimators for \(\beta_1\). The second part is the product of the estimate \(\beta_2\) and another term. The second part of this product should also be familiar: it is the simple OLS estimator of the regression of \(X_2\) on \(X_1\); it captures the part of the variation in \(X_2\) that is associated with variations in \(X_1\). Together, the product that comprises the second term captures the association between \(X_1\) and \(Y\) that is “explained” by \(X_2\) ; it subtraction from the simple OLS estimator is the “partialling out” of the influence of \(X_2\) , hence we arrive at the familiar interpretation of 1: “the association between \(X_1\) and \(Y\) , adjusting for \(X_2\) (holding \(X_2\) fixed).”
We can solve for \(\beta_2\) as a simple exercise; it is essentially the same as the case with \(\beta_1\).