OLS without lin alg

NOTES ON STATISTICS, PROBABILITY and MATHEMATICS

OLS Intuition Without Linear Algebra:

The ideal line for the population is:

\[\mu_i\,=\,\beta_o \,+\,\beta_1\,X_i\]

The estimated regression line is:

\[\hat \mu_i\,=\,\hat\beta_o\,+\,\hat\beta_1\,X_i\]

Preliminary definitions:

\(\text{cov} (X,Y) = \frac{1}{n-1}\displaystyle \sum_{i-1}^n (X_i -\bar X) (Y_i - \bar Y) =\frac{1}{n-1}\left (\displaystyle \sum_{i-1}^n X_i Y_i - n\bar X\bar Y\right)\)

\(\text{SD} (X) = \frac{1}{n-1}\displaystyle \sum_{i-1}^n (X_i -\bar X) (X_i - \bar X) =\frac{1}{n-1}\left (\displaystyle \sum_{i-1}^n X_i^2 - n\bar X^2\right)\)

\(\text{cor}(X,Y) = \frac{\text{cov}(X,Y)}{\text{SD}(X) \text{SD}(Y)}\)

We want to minimize the squared distances from all the observed values \(Y_i\) in the population and \(\mu_i\), which are the fitted values that we would get if we had the “ideal” regression line calculated based on all the population - it has the symbol of mean, because it’s the mean of every bell’s curve as in the diagram above; \(\hat \mu_i\) is the fitted values for the sample based on our estimated regression line.

Just subtracting and adding \(\hat \mu_i\):

\[\begin{align} \displaystyle \sum_{i=1}^n (Y_i-\mu_i)^2 &= \displaystyle \sum_{i=1}^{n}(Y_i-\hat \mu_i + \hat \mu_i - \mu_i)^2\\[2ex] &= \displaystyle \sum_{i=1}^{n}(Y_i-\hat \mu_i)^2 + 2 \sum_{i=1}^{n} (Y_i - \hat \mu_i)\,(\hat \mu_i - \mu_i) + \displaystyle \sum_{i=1}^n (\hat \mu_i - \mu_i)^2 \end{align}\]

Finding the minimum amounts to:

\[\sum_{i=1}^{n} (Y_i - \hat \mu_i)\,(\hat \mu_i - \mu_i)\,=\,0\,\small \tag 1\]

leaving us with:

\[\displaystyle \sum_{i=1}^{n}(Y_i-\mu_i)^2 = \displaystyle \sum_{i=1}^{n}(Y_i-\hat \mu_i)^2 + \displaystyle \sum_{i=1}^n (\hat \mu_i - \mu_i)^2\]

It follows that:

\[\displaystyle \sum_{i=1}^{n}(Y_i-\mu_i)^2\,\geq \sum_{i=1}^{n}(Y_i-\hat \mu_i)^2\]

Considering only horizontal lines, and from equation \((1)\):

\[ \displaystyle\sum_{i=1}^{n} (Y_i - \hat \beta_o)\,(\hat \beta_o - \beta_o)\,=\,(\hat \beta_o - \beta_o)\,\displaystyle\sum_{i=1}^{n}(Y_i - \hat \beta_o)=\,0\]

This will happen if:

\[\displaystyle\sum_{i=1}^{n}(Y_i - \hat \beta_o)=\,0.\]

In other words if \[n\,\bar Y\,-\,n\,\hat \beta_o\,=\,0,\]

\[\bar Y = \beta_o \small \tag 2.\]

If we do regression through the origin \[y_i=x_{i}\beta_1+\epsilon\]

\[\begin{align}\displaystyle\sum_{i=1}^{n} (Y_i - \hat \mu_i)\,(\hat \mu_i - \mu_i) &=\,\displaystyle\sum_{i=1}^{n} (Y_i - \hat \beta_1\, X_i)\,(\hat \beta_1 X_i - \beta_1\,X_i)\\[2ex] &=\,\displaystyle\sum_{i=1}^{n}(Y_i\,\hat \beta_1\,X_i\,-\,Y_i\beta_1 X_i)(-\hat \beta_1 X_i \hat \beta_1 X_i\, +\,\hat\beta_1 X_i \beta_1 X_i)\\ &=(\hat \beta_1 - \beta_1)\,\displaystyle\sum_{i=1}^{n}(Y_iX_i)-(\hat \beta_1 X_i^2) \end{align}\]

And this will be zero if:

\[\displaystyle\sum_{i=1}^{n}(Y_iX_i)-(\hat \beta_1 X_i^2)=\displaystyle\sum_{i=1}^{n}(Y_iX_i)-\hat \beta_1\displaystyle\sum_{i=1}^{n}X_i^2=0.\]

Hence,

\[\hat \beta_1=\frac{\displaystyle\sum_{i=1}^{n}Y_iX_i}{\displaystyle\sum_{i=1}^{n}X_i^2} \small \tag 3.\]

Now doing both intercept and slope:

\[\begin{align} 0 &=\displaystyle\sum_{i=1}^{n} (Y_i - \hat \mu_i)\,(\hat \mu_i - \mu_i)\\[2ex] &=\displaystyle\sum_{i=1}^{n}(Y_i -\hat\beta_o - \hat \beta_1 X_i)(\hat\beta_o+\hat\beta_1X_i-\beta_o-\beta_1X_i)\\[2ex] &=\displaystyle\sum_{i=1}^{n} (Y_i -\hat\beta_o \hat \beta_1 X_i)((\hat\beta_o-\beta_o)+\hat\beta_1X_i-\beta_1X_i)\\[2ex] &= (\hat \beta_o-\beta_o)\displaystyle\sum_{i=1}^{n}(Y_i-\hat\beta_o-\hat\beta_1X_i)+(\hat\beta_1-\beta_1)\displaystyle\sum_{i=1}^{n}(Y_i-\hat\beta_o-\hat\beta_1X_i)X_i \tag 4 \end{align}\]

For the first part of equation \((4),\)

\[0=\displaystyle\sum_{i=1}^{n}(Y_i-\hat\beta_o-\hat\beta_1X_i)=n\bar Y_i-n\hat\beta_o-n\hat\beta_1\bar X_i\]

Hence,

\(\color{blue}{\hat\beta_o\,=\,\bar Y\,-\,\hat\beta_1\,\bar X \small \tag 5}\)

And for the second part of equation \((4)\):

\[0=\displaystyle\sum_{i=1}^{n}(Y_i-\hat\beta_o-\hat\beta_1X_i)X_i.\] Substituting \(\hat\beta_o=\bar Y- \hat \beta_1 \bar X\) for \(\hat\beta_o\):

\[0=\displaystyle\sum_{i=1}^{n}(Y_i-\bar Y + \hat \beta_1 \bar X - \hat \beta_1X_i)X_i =\displaystyle\sum_{i=1}^{n}(Y_i-\bar Y)X_i - \hat\beta_1 \sum_{i=1}^{n}(X_i-\bar X)X_i\]

Since \(\displaystyle\sum_{i=1}^{n}(Y_i-\bar Y)=0\), also \(\displaystyle\bar X\sum_{i=1}^{n}(Y_i-\bar Y)=0\).

\[\begin{align}\hat \beta_1&=\frac{\sum_{i=1}^{n}(Y_i-\bar Y)X_i}{\sum_{i=1}^{n}(X_i-\bar X)X_i}\\[2ex] &=\frac{\sum_{i=1}^{n}(Y_i-\bar Y)(X_i-\bar X)}{\sum_{i=1}^{n}(X_i-\bar X)(X_i-\bar X)}\\[2ex] &= \frac{\text{cov}(X,Y)}{\text{var}(X)}\\[2ex] &=\frac{\text{cor}(X,Y)\,\text{SD}(X)\text{SD}(Y)}{\text{SD}(X)\,\text{SD}(X)} \end{align}\]

Hence,

\(\color{blue}{\hat\beta_1\,=\,\text{cor}(Y, X)\,\frac{\text{SD}(Y)}{\text{SD}(X)}\,=\,\frac{\text{cov}(Y,X)}{\text{var}(X)}\small \tag 6}\)

MULTIVARIABLE REGRESSION:

\[Y_i\,=\,\beta_1X_{1i}\,+\,\beta_2X_{2i}+\dots+\,\beta_pX_{pi}\,+\,\varepsilon_i\] is the ideal model for the population. One of the terms corresponds to the intercept.

For the sample:

\[\hat\mu_i=\hat\beta_1X_{1i}\,+\,\hat\beta_2X_{2i}\,+\dots\,+\,\hat\beta_pX_{pi}+\epsilon_i\]

As in the univariable case, \(\displaystyle \sum_{i=1}^n(Y_i-\hat\mu_i)(\hat\mu_i-\mu_i)=0\).

Let’s assume two regressors: \(Y_i = \beta_1X_{1i}+\beta_2X_{2i}\):

\[\displaystyle \sum_{i=1}^n\color{brown}{(Y_i-\hat\mu_i)}\color{orange}{(\hat\mu_i-\mu_i)}=\displaystyle \sum_{i=1}^n \color{brown}{(Y_i-\hat\beta_1X_{1i}-\hat\beta_2X_{2i})}\color{orange}{\left((\hat\beta_1-\beta_1)X_{1i}+(\hat\beta_2-\beta_2)X_{2i}\right)}.\]

We need a system of equations:

\[\begin{cases}\sum_{n=1}^n\color{brown}{(Y_i-\hat\beta_1X_{1i}-\hat\beta_2X_{2i})}\color{orange}{X_{1i}}=0\\[2ex] \sum_{n=1}^n\color{brown}{(Y_i-\hat\beta_1X_{1i}-\hat\beta_2X_{2i})}\color{orange}{X_{2i}}=0 \end{cases}\]

Solving for \(\hat\beta_2\) in the second equation:

\[\bbox[5px,border:2px solid red]{\hat\beta_2\,=\,\frac{\displaystyle\sum_{i=1}^n(Y_i\,-\,\hat\beta_1\,X_{1i})\,X_{2i}}{\displaystyle\sum_{i=1}^nX_{2i}^2}}\tag 7\]

Now plugging \(\hat \beta_2\) into the first of the equations in the system above:

\[\begin{align} 0&=\displaystyle\sum_{i=1}^n\left\{Y_i - \hat\beta_1 X_{1i}-\left(\frac{\displaystyle\sum_{i=1}^n(Y_i-\hat\beta_1X_{1i})X_{2i}}{\displaystyle\sum_{i=1}^n X_{2i}^2}\right)X_{2i}\right\}X_{1i}\\[2ex] &=\displaystyle\sum_{i=1}^n\left\{Y_i - \hat\beta_1 X_{1i}-\left(\frac{\displaystyle\sum_{i=1}^nY_iX_{2i}}{\displaystyle\sum_{i=1}^nX_{2i}^2}-\frac{\displaystyle\sum_{i=1}^n \hat\beta_1X_{1i}X_{2i}}{\displaystyle\sum_{i=1}^n X_{2i}^2}\right)X_{2i}\right\}X_{1i}\\[2ex] &=\displaystyle\sum_{i=1}^n\left\{Y_i - \hat\beta_1 X_{1i}-\frac{\displaystyle\sum_{i=1}^nY_iX_{2i}}{\displaystyle\sum_{i=1}^nX_{2i}^2}X_{2i}+\frac{\displaystyle\sum_{i=1}^n \hat\beta_1X_{1i}X_{2i}}{\displaystyle\sum_{i=1}^n X_{2i}^2}X_{2i}\right\}X_{1i}\\[2ex] &=\displaystyle\sum_{i=1}^n\left\{Y_i -\frac{\displaystyle\sum_{i=1}^nY_iX_{2i}}{\displaystyle\sum_{i=1}^nX_{2i}^2}X_{2i}+\hat\beta_1\frac{\displaystyle\sum_{i=1}^n X_{1i}X_{2i}}{\displaystyle\sum_{i=1}^n X_{2i}^2}X_{2i}\,- \hat\beta_1 X_{1i}\right\}X_{1i}\\[2ex] &=\displaystyle\sum_{i=1}^n\left\{Y_i -\frac{\displaystyle\sum_{i=1}^nY_iX_{2i}}{\displaystyle\sum_{i=1}^nX_{2i}^2}X_{2i}- \hat\beta_1 X_{1i}+\hat\beta_1\frac{\displaystyle\sum_{i=1}^n X_{1i}X_{2i}}{\displaystyle\sum_{i=1}^n X_{2i}^2}X_{2i}\right\}X_{1i}\\[2ex] &=\displaystyle\sum_{i=1}^n\left\{Y_i -\color{magenta}{\frac{\displaystyle\sum_{i=1}^nY_iX_{2i}}{\displaystyle\sum_{i=1}^nX_{2i}^2}}X_{2i}- \hat\beta_1 \left(\color{red}{X_{1i}-\frac{\displaystyle\sum_{i=1}^n X_{1i}X_{2i}}{\displaystyle\sum_{i=1}^n X_{2i}^2}X_{2i}}\right)\right\}X_{1i} \small \tag 8 \end{align}\]

\[\color{magenta}{\frac{\displaystyle\sum_{i=1}^nY_iX_{2i}}{\displaystyle\sum_{i=1}^nX_{2i}^2}}\]

is the slope of the regression through the origin of \(y_i=x_{2i}\beta_2+\epsilon\). Why? Just see equation \((3).\)

\[Y_i -\color{magenta}{\frac{\displaystyle\sum_{i=1}^nY_iX_{2i}}{\displaystyle\sum_{i=1}^nX_{2i}^2}}X_{2i}\]

is the residual only including \(X_{2i}\) as a regressor through the origin. Why? Just consider that with one single variable, regression through the origin is \(y_i=x_{i}\beta_1+\epsilon\) with the slope being as in equation \((3):\) i.e. \(\small\hat\beta_1=\frac{\displaystyle\sum_{i=1}^{n}Y_iX_i}{\displaystyle\sum_{i=1}^{n}X_i^2},\) and therefore, the line is \(\frac{\displaystyle\sum_{i=1}^{n}Y_iX_i}{\displaystyle\sum_{i=1}^{n}X_i^2}X_i\)

Likewise,

\[\color{red}{X_{1i}-\frac{\displaystyle\sum_{i=1}^n X_{1i}X_{2i}}{\displaystyle\sum_{i=1}^n X_{2i}^2}X_{2i}}\] is the residual of the regression through the origin \(x_{1i}=x_{2i}\gamma+\epsilon\), which is \(X_1\) regressed over \(X_2.\)

So rewriting the equation \[e\,=\,\text{residuals}\left(Y\,-\,\left\{\hat\beta_1=\frac{\displaystyle\sum_{i=1}^n Y_i X_i}{\displaystyle\sum_{i=1}^n X_i^2}\right\}X\right)\tag 9\] (see equation (3)).

For instance:

\[\color{red}{e_{iX_1|X_2}=X_{1i}-\frac{\displaystyle\sum_{i=1}^n X_{1i}X_{2i}}{\displaystyle\sum_{i=1}^n X_{2i}^2}\,X_{2i}}\]

So continuing where we left it in eq \(8:\)

\[\begin{align} 0&=\displaystyle\sum_{i=1}^n\left\{Y_i -\color{magenta}{\frac{\displaystyle\sum_{i=1}^nY_iX_{2i}}{\displaystyle\sum_{i=1}^nX_{2i}^2}}X_{2i}- \hat\beta_1 \left(\color{red}{X_{1i}-\frac{\displaystyle\sum_{i=1}^n X_{1i}X_{2i}}{\displaystyle\sum_{i=1}^n X_{2i}^2}X_{2i}}\right)\right\}X_{1i}\\[2ex] &=\displaystyle\sum_{i=1}^n\left\{e_{iY|X_2}\,-\,\hat\beta_1\,e_{iX_1|X_2}\right\}\,X_{1i} \small \tag {10}\\[2ex] &=\displaystyle\sum_{i=1}^n\left\{e_{iY|X_2}\frac{e_{iX_1|X_2}}{e_{iX_1|X_2}}\,-\,\hat\beta_1\,e_{iX_1|X_2}\,\frac{e_{iX_1|X_2}}{e_{iX_1|X_2}}\right\}\,X_{1i} \end{align} \]

Therefore,

\[\displaystyle\sum_{i=1}^n e_{Y|X_2}X_{1i}=\,\hat\beta_1\,\displaystyle\sum_{i=1}^n e_{X_1|X_2}\,X_{1i}\]

\[\hat\beta_1\,=\,\frac{\displaystyle\sum_{i=1}^n e_{Y|X_2}X_{1i}}{\displaystyle\sum_{i=1}^n e_{X_1|X_2}X_{1i}}\,=\,\frac{\displaystyle\sum_{i=1}^n e_{Y|X_2}e_{X_1|X_2}}{\displaystyle\sum_{i=1}^n e_{X_1|X_2}X_{1i}} \small \tag {11}\]

What happened to the numerators? It turns out they are the same (proof from equation \((10)\)).

The estimator \(\beta_1\) from the model \(y_i =\beta_1 x_{1i} + \beta_2 x_{2i} + \dots + \epsilon_i\) estimates the association between \(X_1\) and \(Y\) after “partialling out” \(X_2\). \(e_{i, Y|X_2}\) is \(Y\) after “partialling out” the variation that is explained by \(X_2\) . The expression \(e_{i, X_1|X_2}\) is \(X_1\) after “partialling out” the variation that is explained by \(X_2\).

\(\hat\beta_1\) is simply the sample analogue of \(\frac{\text{cov}(\bar X_1, \bar Y)}{\text{var}(\bar X_1)}\) or \(\text{cor}(\hat X_1, \hat Y)\frac{\text{SD}(\bar Y)}{\text{SD}(\bar X)}\), where \(\hat X_1\) and \(\hat Y\) are partialled out by \(X_1\) and \(Y\).

The \((11)\) numerator expression, i.e. \(\displaystyle\sum_{i=1}^n X_{1i} e_{i,Y|X_2}=\displaystyle\sum_{i=1}^n e_{i,X_1|X_2} \, e_{i,Y|X_2},\) can be shown to be true:

by \((9),\)

\[\begin{align} \displaystyle\sum_{i=1}^n X_{1i} e_{i,Y|X_2} &= \color{blue}{\displaystyle\sum_{i=1}^n X_{1i}\left(y_i - \frac{\displaystyle\sum_{j=1}^n X_{2j} y_j}{\displaystyle\sum_{j=1}^n X_{2j}^2} X_{2i} \right)} \end{align}\]

(replacing \(e_{i,Y|X_2}\)) by its expansion - see under equation \((8)\) - let’s call this last equation “\(X_1\) times residuals \(X_2\) through origin”).

Now,

\[\begin{align}\small \displaystyle\sum_{i=1}^n e_{i,X_1|X_2} \, e_{i,Y|X_2} &=\displaystyle\sum_{i=1}^n \left(X_{1i} - \frac{\displaystyle\sum_{j=1}^n X_{2j} X_{1j}}{\displaystyle\sum_{j=1}^n X_{2i}^2} X_{2i}\right)\,\left(y_i - \frac{\displaystyle\sum_{j=1}^n X_{2j}y_{j}}{\displaystyle\sum_{j=1}^n X_{2j}^2} X_{2i}\right)\\[2ex] &=\small\displaystyle\sum_{i=1}^n \left(X_{1i}y_i - \frac{\displaystyle\sum_{j=1}^n X_{2j} y_j}{\displaystyle\sum_{j=1}^n X_{2j}^2} X_{1i} X_{2i} - \frac{\displaystyle\sum_{j=1}^n X_{2j}X_{1j}}{\displaystyle\sum_{j=1}^n X_{2i}^2} X_{2i}y_i + \frac{\displaystyle\sum_{j=1}^n X_{2j}X_{1j} \displaystyle\sum_{j=1}^n X_{2j}y_j}{\left(\displaystyle\sum_{j=1}^n X_{2j}^2\right)^2} X_{2i}^2 \right)\\[2ex] &=\small\displaystyle\sum_{i=1}^n X_{1i}y_i - \frac{\displaystyle\sum_{j=1}^n X_{2j} y_j}{\displaystyle\sum_{j=1}^n X_{2j}^2} \displaystyle\sum_{i=1}^n X_{1i} X_{2i} - \frac{\displaystyle\sum_{j=1}^n X_{2j}X_{1j}}{\displaystyle\sum_{j=1}^n X_{2i}^2} \displaystyle\sum_{i=1}^n X_{2i}y_i + \frac{\displaystyle\sum_{j=1}^n X_{2j}X_{1j} \displaystyle\sum_{j=1}^n X_{2j}y_j}{\left(\displaystyle\sum_{j=1}^n X_{2j}^2\right)^2} \displaystyle\sum_{i=1}^nX_{2i}^2 \end{align}\]

Now, anything that is a summation indexed by either \(i\) or \(j\) are equivalent because they are summing the same \(n\) observations, with the only difference being the index itself. So it follows that the above is equal to:

\[\begin{align} &...=\small\displaystyle\sum_{i=1}^n X_{1i}y_i - \frac{\displaystyle\sum_{j=1}^n X_{2j} y_j}{\displaystyle\sum_{j=1}^n X_{2j}^2} \displaystyle\sum_{i=1}^n X_{1i} X_{2i} - \frac{\displaystyle\sum_{i=1}^n X_{2i}y_i \displaystyle\sum_{j=1}^n X_{2j}X_{1j}}{\displaystyle\sum_{j=1}^n X_{2i}^2} + \frac{\displaystyle\sum_{j=1}^n X_{2j}y_j \displaystyle\sum_{j=1}^n X_{2j}X_{1j}}{\displaystyle\sum_{j=1}^n X_{2j}^2}\\[2ex] &=\displaystyle\sum_{i=1}^n X_{1i}y_i - \frac{\displaystyle\sum_{j=1}^n X_{2j} y_j}{\displaystyle\sum_{j=1}^n X_{2j}^2} \displaystyle\sum_{i=1}^n X_{1i} X_{2i}\\[2ex] &=\color{blue}{\displaystyle\sum_{i=1}^n X_{1i}\left(y_i - \frac{\displaystyle\sum_{j=1}^n X_{2j}y_j}{\displaystyle\sum_{j=1}^n X_{2j}^2}\,X_{2i}\right)}\quad\square \end{align}\]

Indeed, this is the same blue expression above - the equation “\(X_1\) times residuals \(X_2\) through the origin”).

It can be shown that \(\displaystyle\sum_{i=1}^n\,e_{i,X_1|X_2}\,X_{1i}\) (denominator of \((11)\)) is equivalent to \(\displaystyle\sum_{i=1}^n \,e_{i,X_1|X_2}^2\).

\[\begin{align}\displaystyle\sum_{i=1}^n e_{X_1|X_2}^2 &=\displaystyle\sum_{i=1}^n e_{X_1|X_2} \left( \color{red}{X_{1i} - \frac{\displaystyle\sum_{i=1}^n X_{1i}X_{2i}}{\displaystyle\sum_{i=1}^n X_{2i}^2}X_{2i}} \right) \end{align}\]

\(\displaystyle\sum_{i=1}^n e_{X_1|X_2}X_{2i}=0\) (sum of the residuals times the regressor is zero). The normal equations of OLS: the least squares residuals must be orthogonal to all the regressors.

\[\displaystyle\sum_{i=1}^n e_{X_1|X_2}^2 = \displaystyle\sum_{i=1}^n e_{X_1|X_2}X_{1i}\quad\square\].

And plugging into the \(\hat\beta_1\) equation \((11)\):

\[\bbox[5px,border:2px solid red]{\hat\beta_1=\,\frac{\displaystyle\sum_{i=1}^n e_{Y|X_2}X_{1i}}{\displaystyle\sum_{i=1}^n e_{X_1|X_2}X_{1i}}=\frac{\displaystyle\sum_{i=1}^n e_{Y|X_2}e_{X_1|X_2}}{\displaystyle\sum_{i=1}^n e_{X_1|X_2}^2}=\frac{\displaystyle\sum_{i=1}^n e_{Y|X_2}e_{X_1|X_2}}{\displaystyle\sum_{i=1}^n e_{X_1|X_2}X_{1i}}}\small\tag {12}\]

which leads to the intended insight, or point of the entire exercise:

Regression estimate for \(\beta_1\) is the regression through the origin estimate having regressed \(X_2\) out of both the response and the predictor.

Regression estimate for \(\beta_2\) is the regression through the origin estimate having regressed \(X_1\) out of both the response and the predictor.

Multivariate regression estimates are exactly those having removed the linear relationship of the other variables from both the regressor and the response.

INTUITION:

The ingredients for the slope of the linear component to \(y\) afforded by one of the independent variables or regressors is the result of adding up and multiplying the residuals of the other variables with the regressor under consideration, as well as the residuals of the other variables with \(y.\) In essence, it’s the measure of 1. Lack of correlation between regressors \((e_{x_1|x_2})\); and 2. How far off in predicting y the other variables are \((e{Y|x_2})\).

More than two regressors:

Solving will entail p equations with p unknowns of the form of “system of equations”:

\[\displaystyle\sum_{i=1}^n(Y_i-\hat\beta_1X_{1i}-\hat\beta_2X_{2i}-\dots-\hat\beta_pX_{pi})X_k=0\]

Holding \(\hat\beta_1\) through \(\hat\beta_{p-1}\) fixed…

\[\hat\beta_p=\frac{\displaystyle\sum_{i=1}^n (Y_i-\hat\beta_1X_{1i}-\hat\beta_2X_{2i}-\dots-\hat\beta_{p-1}X_{i(p-1)})X_{ip}}{\displaystyle\sum_{i=1}^n X_{ip}^2}\]

as in equation (7).

Plugging it into the equation above:

\[\displaystyle\sum_{i=1}^n(e_{i,Y|X_p}-e_{i,X_1|X_p}\hat\beta_1-\dots-e_{i,X_{p-1}|X_p}\hat\beta_{p-1})X_k=0\] as in equation (8).

\[X_k = e_{i,X_k|X_p}+\frac{\displaystyle\sum_{i=1}^n X_{ik}X_{ip}}{\displaystyle\sum_{i=1}^n X_{ip}^2}X_p\] and \[\displaystyle\sum_{i=1}^n e_{i,X_k|X_p}X_{ip}=0\]

Therefore:

\[\displaystyle\sum_{i=1}^n(e_{i,Y|X_p}-e_{i,X_1|X_p}\hat\beta_1-\dots-e_{i,X_{p-1}|X_p}\hat\beta_{p-1})X_k=0\] is equal to:

\[\displaystyle\sum_{i=1}^n(e_{i,Y|X_p}-e_{i,X_1|X_p}\hat\beta_1-\dots-e_{i,X_{p-1}|X_p}\hat\beta_{p-1})e_{i,X_kX_p}=0\]

Different Approach with Multiple Regression Model and Intercept:

\[y_i\,=\,\beta_1X_{1i}\,+\,\beta_2X_{2i}+\dots+\,\beta_pX_{pi}\,+\,\varepsilon_i\]

\[\min(\hat\beta_o,\hat\beta_1,\hat\beta_2)\displaystyle\sum_{i=1}^n e_i^2\] where

\[e_i= y_i - \hat\beta_o+\hat\beta_1X_{1i}+\hat\beta_2X_{2i}\]

Then first order conditions are:

\(\frac{\partial}{\partial\hat\beta_o}\displaystyle\sum_{i=1}^n e_i^2=-2\displaystyle\sum_{i=1}^n(y_i-\hat\beta_o-\hat\beta_1x_{1i}-\hat\beta_2x_{2i})\)

\(\frac{\partial}{\partial\hat\beta_1}\displaystyle\sum_{i=1}^n e_i^2=-2\displaystyle\sum_{i=1}^nX_{1i}(y_i-\hat\beta_o-\hat\beta_1x_{1i}-\hat\beta_2x_{2i})\)

\(\frac{\partial}{\partial\hat\beta_2}\displaystyle\sum_{i=1}^n e_i^2=-2\displaystyle\sum_{i=1}^n X_{2i}(y_i-\hat\beta_o-\hat\beta_1x_{1i}-\hat\beta_2x_{2i})\)

Set to zero and solve:

\(-2\displaystyle\sum_{i=1}^n(y_i-\hat\beta_o-\hat\beta_1x_{1i}-\hat\beta_2x_{2i})=0\)

\(\displaystyle\sum_{i=1}^n(y_i-\hat\beta_o-\hat\beta_1x_{1i}-\hat\beta_2x_{2i})=0\)

\(\displaystyle\sum_{i=1}^n y_i-n\hat\beta_o-\hat\beta_1 \displaystyle\sum_{i=1}^n x_{1i}-\hat\beta_2 \displaystyle\sum_{i=1}^nx_{2i}=0\)

\(\displaystyle\sum_{i=1}^n y_i-\hat\beta_1 \displaystyle\sum_{i=1}^n x_{1i}-\hat\beta_2 \displaystyle\sum_{i=1}^nx_{2i}=n\hat\beta_o\)

\(\bar Y -\hat\beta_1 \bar X_1 -\hat\beta_2 \bar X_2 =\hat\beta_o\)

\(-2\displaystyle\sum_{i=1}^nX_{1i}(y_i-\hat\beta_o-\hat\beta_1x_{1i}-\hat\beta_2x_{2i})=0\)

\(\displaystyle\sum_{i=1}^nX_{1i}(y_i-\hat\beta_o-\hat\beta_1x_{1i}-\hat\beta_2x_{2i})=0\)

\(\displaystyle\sum_{i=1}^nX_{1i}(y_i-\bar Y +\hat\beta_1 \bar X_1 +\hat\beta_2 \bar X_2 -\hat\beta_1 X_{1i} - \hat\beta_2 x_{2i})=0\)

\(\displaystyle\sum_{i=1}^nX_{1i}[(y_i-\bar Y -\hat\beta_1 (x_{1i}-\bar X_1) -\hat\beta_2 (x_{2i}-\bar X_2)]=0\)

\(\displaystyle\sum_{i=1}^n X_{1i}(y_i-\bar Y) -\hat\beta_1 \displaystyle\sum_{i=1}^n x_{1i}(x_{1i}-\bar X_1) -\hat\beta_2 \displaystyle\sum_{i=1}^n x_{1i}(x_{2i}-\bar X_2)]=0\)

\(\displaystyle\sum_{i=1}^n X_{1i}(y_i-\bar Y) -\hat\beta_2 \displaystyle\sum_{i=1}^n x_{1i}(x_{2i}-\bar X_2)]=\hat\beta_1 \displaystyle\sum_{i=1}^n x_{1i}(x_{1i}-\bar X_1)\)

\(\hat\beta_1=\frac{\displaystyle\sum_{i=1}^nx_{1i}(y_i-\bar Y)}{\displaystyle\sum_{i=1}^nx_{1i}(x_{1i}-\bar X_1)}\,-\, \hat\beta_2\, \frac{\displaystyle\sum_{i=1}^n x_{1i}(x_{2i}-\bar X_2)}{\displaystyle\sum_{i=1}^n x_{1i}(x_{1i}-\bar X_1)}\)

compare to the formula with prior method through the origin (equation \((11)\)):

\(\hat\beta_1=\frac{\displaystyle\sum_{i=1}^n e_{Y|X_2}e_{X_1|X_2}}{\displaystyle\sum_{i=1}^n e_{X_1|X_2}^2}\)

\(\hat\beta_2=\frac{\displaystyle\sum_{i=1}^n (Y_i-\hat\beta_1X_{1i})X_{2i}}{\displaystyle\sum_{i=1}^n X_{2i}^2}\)

The first part of the equation above appears to be the simple ordinary least squares (OLS) estimators for \(\beta_1\). The second part is the product of the estimate \(\beta_2\) and another term. The second part of this product should also be familiar: it is the simple OLS estimator of the regression of \(X_2\) on \(X_1\); it captures the part of the variation in \(X_2\) that is associated with variations in \(X_1\). Together, the product that comprises the second term captures the association between \(X_1\) and \(Y\) that is “explained” by \(X_2\) ; it subtraction from the simple OLS estimator is the “partialling out” of the influence of \(X_2\) , hence we arrive at the familiar interpretation of 1: “the association between \(X_1\) and \(Y\) , adjusting for \(X_2\) (holding \(X_2\) fixed).”

We can solve for \(\beta_2\) as a simple exercise; it is essentially the same as the case with \(\beta_1\).

Home Page