OLS REGRESSION WITHOUT LINEAR ALGEBRA:


The ideal line for the population is:

\(\large \mu_i\,=\,\beta_o \,+\,\beta_1\,X_i\)





Preliminary definitions:

\(\large Cov (X,Y) = \frac{1}{n-1}\displaystyle \sum_{i-1}^n (X_i -\bar X) (Y_i - \bar Y) =\frac{1}{n-1}\big (\displaystyle \sum_{i-1}^n X_i Y_i - n\bar X\bar Y\big)\)

\(\large SD (X) = \frac{1}{n-1}\displaystyle \sum_{i-1}^n (X_i -\bar X) (X_i - \bar X) =\frac{1}{n-1}\big (\displaystyle \sum_{i-1}^n X_i^2 - n\bar X^2\big)\)

\(\large Cor (X,Y) = \frac{Cov(X,Y)}{SD(X) SD(Y)}\)


The estimated regression line is:

\(\large \hat \mu_i\,=\,\hat\beta_o\,+\,\hat\beta_1\,X_i\)

We want to minimize the squared distances from all the observed values \(Y_i\) in the population and \(\mu_i\), which are the fitted values that we would get if we had the “ideal” regression line calculated based on all the population - it has the symbol of mean, because it’s the mean of every bell’s curve as in the diagram above; \(\hat \mu_i\) is the fitted values for the sample based on our estimated regression line.

Just subtracting and adding \(\hat \mu_i\):

\(\large \displaystyle \sum_{i=1}^n (Y_i-\mu_i)^2 = \displaystyle \sum_{i=1}^{n}(Y_i-\hat \mu_i + \hat \mu_i - \mu_i)^2= \displaystyle \sum_{i=1}^{n}(Y_i-\hat \mu_i)^2 + 2 \sum_{i=1}^{n} (Y_i - \hat \mu_i)\,(\hat \mu_i - \mu_i) + \displaystyle \sum_{i=1}^n (\hat \mu_i - \mu_i)^2\)

Finding the minimum amounts to:

\(\Large\sum_{i=1}^{n} (Y_i - \hat \mu_i)\,(\hat \mu_i - \mu_i)\,=\,0\,\small \tag 1\)


… leaving us with:


\(\displaystyle \sum_{i=1}^{n}(Y_i-\mu_i)^2 = \displaystyle \sum_{i=1}^{n}(Y_i-\hat \mu_i)^2 + \displaystyle \sum_{i=1}^n (\hat \mu_i - \mu_i)^2\)


It follows that:


\(\displaystyle \sum_{i=1}^{n}(Y_i-\mu_i)^2\,\geq \sum_{i=1}^{n}(Y_i-\hat \mu_i)^2\)

Considering only horizontal lines, and from equation (1):

\(\large \displaystyle\sum_{i=1}^{n} (Y_i - \hat \beta_o)\,(\hat \beta_o - \beta_o)\,=\,(\hat \beta_o - \beta_o)\,\displaystyle\sum_{i=1}^{n}(Y_i - \hat \beta_o)=\,0\)


This will happen if:


\(\displaystyle\sum_{i=1}^{n}(Y_i - \hat \beta_o)=\,0\). In other words if \(n\,\bar Y\,-\,n\,\hat \beta_o\,=\,0\), or



\(\Large \bar Y = \beta_o \small \tag 2\).

If we do regression through the origin:


\(\displaystyle\sum_{i=1}^{n} (Y_i - \hat \mu_i)\,(\hat \mu_i - \mu_i)\,=\,\displaystyle\sum_{i=1}^{n} (Y_i - \hat \beta_1\, X_i)\,(\hat \beta_1 X_i - \beta_1\,X_i)\,=\,\displaystyle\sum_{i=1}^{n}(Y_i\,\hat \beta_1\,X_i\,-\,Y_i\beta_1 X_i)(-\hat \beta_1 X_i \hat \beta_1 X_i\, +\,\hat\beta_1 X_i \beta_1 X_i)\)

\(=\,(\hat \beta_1 - \beta_1)\,\displaystyle\sum_{i=1}^{n}(Y_iX_i)-(\hat \beta_1 X_i^2)\). And this will be zero if:

\(\displaystyle\sum_{i=1}^{n}(Y_iX_i)-(\hat \beta_1 X_i^2)=\displaystyle\sum_{i=1}^{n}(Y_iX_i)-\hat \beta_1\displaystyle\sum_{i=1}^{n}X_i^2=0\). Hence,

\(\Large \hat \beta_1=\frac{\displaystyle\sum_{i=1}^{n}Y_iX_i}{\displaystyle\sum_{i=1}^{n}X_i^2} \small \tag 3\).


Now doing both intercept and slope:

\(0 =\displaystyle\sum_{i=1}^{n} (Y_i - \hat \mu_i)\,(\hat \mu_i - \mu_i)\, =\displaystyle\sum_{i=1}^{n}(Y_i -\hat\beta_o - \hat \beta_1 X_i)(\hat\beta_o+\hat\beta_1X_i-\beta_o-\beta_1X_i)\)

\(= \displaystyle\sum_{i=1}^{n} (Y_i -\hat\beta_o \hat \beta_1 X_i)((\hat\beta_o-\beta_o)+\hat\beta_1X_i-\beta_1X_i)\)

\(=\large (\hat \beta_o-\beta_o)\displaystyle\sum_{i=1}^{n}(Y_i-\hat\beta_o-\hat\beta_1X_i)+(\hat\beta_1-\beta_1)\displaystyle\sum_{i=1}^{n}(Y_i-\hat\beta_o-\hat\beta_1X_i)X_i \small \tag 4\)



\(0 = \displaystyle\sum_{i=1}^{n}(Y_i-\hat\beta_o-\hat\beta_1X_i)=n\bar Y_i-n\hat\beta_o-n\hat\beta_1\bar X_i\). Hence,

\(\color{blue}{\Large \hat\beta_o\,=\,\bar Y\,-\,\hat\beta_1\,\bar X \small \tag 5}\)



And for the second part of equation (4):

\(0=\displaystyle\sum_{i=1}^{n}(Y_i-\hat\beta_o-\hat\beta_1X_i)X_i\). Substituting \(\hat\beta_o=\bar Y- \hat \beta_1 \bar X\) for \(\hat\beta_o\):



\(0=\displaystyle\sum_{i=1}^{n}(Y_i-\bar Y + \hat \beta_1 \bar X - \hat \beta_1X_i)X_i =\displaystyle\sum_{i=1}^{n}(Y_i-\bar Y)X_i - \hat\beta_1 \sum_{i=1}^{n}(X_i-\bar X)X_i\)



Since \(\displaystyle\sum_{i=1}^{n}(Y_i-\bar Y)=0\), also \(\displaystyle\bar X\sum_{i=1}^{n}(Y_i-\bar Y)=0\).



\(\large\hat \beta_1\,=\,\frac{\sum_{i=1}^{n}(Y_i-\bar Y)X_i}{\sum_{i=1}^{n}(X_i-\bar X)X_i} =\frac{\sum_{i=1}^{n}(Y_i-\bar Y)(X_i-\bar X)}{\sum_{i=1}^{n}(X_i-\bar X)(X_i-\bar X)}\)



Checking equation (1):

\(\color{red}{\large \frac{\sum_{i=1}^{n}(Y_i-\bar Y)(X_i-\bar X)}{\sum_{i=1}^{n}(X_i-\bar X)(X_i-\bar X)} =\frac{cov(X,Y)}{var(X)}=\frac{cor(X,Y)\,SD(X)SD(Y)}{SD(X)\,SD(X)}}\). Hence,



\(\color{blue}{\Large \hat\beta_1\,=\,cor(Y, X)\,\frac{SD(Y)}{SD(X)}\,=\,\frac{cov(Y,X)}{var(X)}\small \tag 6}\)

MULTIVARIABLE REGRESSION:



\(Y_i\,=\,\beta_1X_{1i}\,+\,\beta_2X_{2i}+\dots+\,\beta_pX_{pi}\,+\,\varepsilon_i\) is the ideal model for the population. One of the terms corresponds to the intercept.

For the sample:



\(\hat\mu_i=\hat\beta_1X_{1i}\,+\,\hat\beta_2X_{2i}\,+\dots\,+\,\hat\beta_pX_{pi}+\epsilon_i\)

As in the univariable case, \(\displaystyle \sum_{i=1}^n(Y_i-\hat\mu_i)(\hat\mu_i-\mu_i)=0\).



Let’s assume two regressors: \(Y_i = \beta_1X_{1i}+\beta_2X_{2i}\):



\(\displaystyle \sum_{i=1}^n(Y_i-\hat\mu_i)(\hat\mu_i-\mu_i)=\displaystyle \sum_{i=1}^n(Y_i-\hat\beta_1X_{1i}-\hat\beta_2X_{2i})\{(\hat\beta_1-\beta_1)X_{1i}+(\hat\beta_2-\beta_2)X_{2i}\}\).



We need a system of equations:



\(\displaystyle\sum_{n=1}^n(Y_i-\hat\beta_1X_{1i}-\hat\beta_2X_{2i})X_{1i}=0\) and \(\displaystyle\sum_{n=1}^n(Y_i-\hat\beta_1X_{1i}-\hat\beta_2X_{2i})X_{2i}=0\).



Solving for \(\hat\beta_2\) in the second equation:



\(\Large \hat\beta_2\,=\,\frac{\displaystyle\sum_{i=1}^n(Y_i\,-\,\hat\beta_1\,X_{1i})\,X_{2i}}{\displaystyle\sum_{i=1}^nX_{2i}^2} \small \tag 7\)



\(0=\displaystyle\sum_{i=1}^n\{Y_i - \hat\beta_1 X_{1i}-\Big(\frac{\displaystyle\sum_{i=1}^n(Y_i-\hat\beta_1X_{1i})X_{2i}}{\displaystyle\sum_{i=1}^n X_{2i}^2}\Big)X_{2i}\}X_{1i}\)

\(0=\displaystyle\sum_{i=1}^n\{Y_i - \hat\beta_1 X_{1i}-\Bigg(\frac{\displaystyle\sum_{i=1}^nY_iX_{2i}}{\displaystyle\sum_{i=1}^nX_{2i}^2}-\frac{\displaystyle\sum_{i=1}^n \hat\beta_1X_{1i}X_{2i}}{\displaystyle\sum_{i=1}^n X_{2i}^2}\Bigg)X_{2i}\}X_{1i}\)



\(0=\displaystyle\sum_{i=1}^n\{Y_i - \hat\beta_1 X_{1i}-\frac{\displaystyle\sum_{i=1}^nY_iX_{2i}}{\displaystyle\sum_{i=1}^nX_{2i}^2}X_{2i}+\frac{\displaystyle\sum_{i=1}^n \hat\beta_1X_{1i}X_{2i}}{\displaystyle\sum_{i=1}^n X_{2i}^2}X_{2i}\}X_{1i}\)



\(0=\displaystyle\sum_{i=1}^n\{Y_i -\frac{\displaystyle\sum_{i=1}^nY_iX_{2i}}{\displaystyle\sum_{i=1}^nX_{2i}^2}X_{2i}+\hat\beta_1\frac{\displaystyle\sum_{i=1}^n X_{1i}X_{2i}}{\displaystyle\sum_{i=1}^n X_{2i}^2}X_{2i}\,- \hat\beta_1 X_{1i}\}X_{1i}\)



\(0=\displaystyle\sum_{i=1}^n\{Y_i -\frac{\displaystyle\sum_{i=1}^nY_iX_{2i}}{\displaystyle\sum_{i=1}^nX_{2i}^2}X_{2i}\,- \hat\beta_1 X_{1i}+\hat\beta_1\frac{\displaystyle\sum_{i=1}^n X_{1i}X_{2i}}{\displaystyle\sum_{i=1}^n X_{2i}^2}X_{2i}\}X_{1i}\)



\(0=\displaystyle\sum_{i=1}^n\{Y_i -\Bigg(\frac{\displaystyle\sum_{i=1}^nY_iX_{2i}}{\displaystyle\sum_{i=1}^nX_{2i}^2}X_{2i}\Big)\,- \hat\beta_1 \Big(X_{1i}-\frac{\displaystyle\sum_{i=1}^n X_{1i}X_{2i}}{\displaystyle\sum_{i=1}^n X_{2i}^2}X_{2i}\Bigg)\}X_{1i} \small \tag 8\)



\(\frac{\displaystyle\sum_{i=1}^nY_iX_{2i}}{\displaystyle\sum_{i=1}^nX_{2i}^2}\) is the slope of the regression through the origin of \(y_i=x_{2i}\beta_2+\epsilon\).

\(Y_i -\frac{\displaystyle\sum_{i=1}^nY_iX_{2i}}{\displaystyle\sum_{i=1}^nX_{2i}^2}X_{2i}\) is the residual only including \(X_{2i}\) as a regressor through the origin.

\(X_{1i}-\frac{\displaystyle\sum_{i=1}^n X_{1i}X_{2i}}{\displaystyle\sum_{i=1}^n X_{2i}^2}X_{2i}\) is the residual of the regression through the origin \(x_{1i}=x_{2i}\gamma+\epsilon\).



So rewriting the equation \(e\,=\,residuals\Big(Y\,-\,\{\hat\beta_1=\frac{\displaystyle\sum_{i=1}^n Y_i X_i}{\displaystyle\sum_{i=1}^n X_i^2}\}X\Big)\) see equation (3). For instance: \(e_{iX_1|X_2}=X_{1i}-\frac{\displaystyle\sum_{i=1}^n X_{1i}X_{2i}}{\displaystyle\sum_{i=1}^n X_{2i}^2}\,X_{2i}\).



\(\large 0=\large \displaystyle\sum_{i=1}^n\{e_{iY|X_2}\,-\,\hat\beta_1\,e_{iX_1|X_2}\}\,X_{1i} \small \tag 9\)



\(=\large \displaystyle\sum_{i=1}^n\{e_{iY|X_2}\frac{e_{iX_1|X_2}}{e_{iX_1|X_2}}\,-\,\hat\beta_1\,e_{iX_1|X_2}\,\frac{e_{iX_1|X_2}}{e_{iX_1|X_2}}\}\,X_{1i}\)



\(\large \displaystyle\sum_{i=1}^n\{e_{Y|X_2}X_{1i}=\,\hat\beta_1\,\displaystyle\sum_{i=1}^n e_{X_1|X_2}\,X_{1i}\)



\(\large \hat\beta_1\,=\,\frac{\displaystyle\sum_{i=1}^n e_{Y|X_2}X_{1i}}{\displaystyle\sum_{i=1}^n e_{X_1|X_2}X_{1i}}\,=\,\frac{\displaystyle\sum_{i=1}^n e_{Y|X_2}e_{X_1|X_2}}{\displaystyle\sum_{i=1}^n e_{X_1|X_2}X_{1i}} \small \tag {10}\)



What happened to the numerators? It turns out they are the same (proof from equation (9)).

The estimator \(\beta_1\) from the model \(y_i =\beta_1 x_{1i} + \beta_2 x_{2i} + \dots + \epsilon_i\) estimates the association between \(X_1\) and \(Y\) after “partialling out” \(X_2\). \(e_{i, Y|X_2}\) is \(Y\) after “partialling out” the variation that is explained by \(X_2\) . \(e_{i, X_1|X_2}\) is \(X_1\) after “partialling out” the variation that is explained by \(X_2\).

It can be shown that \(\displaystyle\sum_{i=1}^n\,e_{i,X_1|X_2}\,X_{1i}\) is equivalent to \(\displaystyle\sum_{i=1}^n \,e_{i,X_1|X_2}^2\).

\(\hat\beta_1\) is simply the sample analogue of \(\large \frac{cov(\bar X_1, \bar Y)}{var(\bar X_1)}\) or \(\large cor(\hat X_1, \hat Y)\frac{SD(\bar Y)}{SD(\bar X)}\), where \(\hat X_1\) and \(\hat Y\) are partialled out by \(X_1\) and \(Y\).

Starting off with equation (9):

\(\large\displaystyle\sum_{i=1}^n (e_{i,Y|X_2}-\hat\beta_1\,e_{i,X_1|X_2})X_{1i} = 0\)



\(\large\displaystyle\sum_{i=1}^n X_{1i} e_{i,Y|X_2}-\hat\beta_1\,\displaystyle\sum_{i=1}^n X_{1i} e_{i,X_1|X_2} = 0\)



\(\large\displaystyle\sum_{i=1}^n X_{1i} e_{i,Y|X_2}=\hat\beta_1\,\displaystyle\sum_{i=1}^n X_{1i} e_{i,X_1|X_2}=0\)

\(\large\hat\beta_1 =\frac{\displaystyle\sum_{i=1}^n X_{1i} e_{i,Y|X_2}}{\displaystyle\sum_{i=1}^n X_{1i} e_{i,X_1|X_2}}\) or \(\large\hat\beta_1 =\frac{\displaystyle\sum_{i=1}^n e_{i,X_1|X_2} e_{i,Y|X_2}}{\displaystyle\sum_{i=1}^n X_{1i} e_{i,X_1|X_2}}\).



\(\large\displaystyle\sum_{i=1}^n X_{1i} e_{i,Y|X_2}=\displaystyle\sum_{i=1}^n X_{1i} \Bigg( y_i - \frac{\displaystyle\sum_{j=1}^n X_{2j}\,y_j}{\displaystyle\sum_{j=1}^n X_{2j}^2}\,X_{2i} \Bigg)\) (replacing \(e_{i,Y|X_2}\) by its expansion - see under equation 8 - let’s call this last equation “\(X_1\) times residuals \(X_2\) through origin”).



This can be shown to be equivalent to:



\(\large \displaystyle\sum_{i=1}^n e_{i,X_1|X_2} \, e_{i,Y|X_2}\) \(=\displaystyle\sum_{i=1}^n \Bigg(X_{1i} - \frac{\displaystyle\sum_{j=1}^n X_{2j} X_{1j}}{\displaystyle\sum_{j=1}^n X_{2i}^2} X_{2i}\Bigg)\,\Bigg(y_i - \frac{\displaystyle\sum_{j=1}^n X_{2j}y_{j}}{\displaystyle\sum_{j=1}^n X_{2j}^2} X_{2i}\Bigg)\)



\(\large\displaystyle\sum_{i=1}^n X_{1i} e_{i,Y|X_2} = \displaystyle\sum_{i=1}^n X_{1i}\Bigg(y_i - \frac{\displaystyle\sum_{j=1}^n X_{2j} y_j}{\displaystyle\sum_{j=1}^n X_{2j}^2} X_{2i} \Bigg) = \displaystyle\sum_{i=1}^n X_{1i}y_i - \frac{\displaystyle\sum_{j=1}^n X_{2j} y_j}{\displaystyle\sum_{j=1}^n X_{2j}^2}\, \displaystyle\sum_{i=1}^n X_{1i} X_{2i}\)



\(\large\displaystyle\sum_{i=1}^n e_{i,X_1|X_2}\, e_{i,Y|X_2} = \large\displaystyle\sum_{i=1}^n \Bigg(X_{1i} - \frac{\displaystyle\sum_{j=1}^n X_{2j}X_{1j}}{\displaystyle\sum_{j=1}^n X_{2i}^2} X_{2i} \Bigg)\,\Bigg(y_i - \frac{\displaystyle\sum_{j=1}^n X_{2j} y_j}{\displaystyle\sum_{j=1}^n X_{2j}^2} X_{2i} \Bigg)\)



\(\large\displaystyle\sum_{i=1}^n \Bigg[X_{1i}y_i - \frac{\displaystyle\sum_{j=1}^n X_{2j} y_j}{\displaystyle\sum_{j=1}^n X_{2j}^2} X_{1i} X_{2i} - \frac{\displaystyle\sum_{j=1}^n X_{2j}X_{1j}}{\displaystyle\sum_{j=1}^n X_{2i}^2} X_{2i}y_i + \frac{\displaystyle\sum_{j=1}^n X_{2j}X_{1j} \displaystyle\sum_{j=1}^n X_{2j}y_j}{(\displaystyle\sum_{j=1}^n X_{2j}^2)^2} X_{2i}^2 \Bigg]\)



\(\large\displaystyle\sum_{i=1}^n X_{1i}y_i - \frac{\displaystyle\sum_{j=1}^n X_{2j} y_j}{\displaystyle\sum_{j=1}^n X_{2j}^2} \large\displaystyle\sum_{i=1}^n X_{1i} X_{2i} - \frac{\displaystyle\sum_{j=1}^n X_{2j}X_{1j}}{\displaystyle\sum_{j=1}^n X_{2i}^2} \large\displaystyle\sum_{i=1}^n X_{2i}y_i + \frac{\displaystyle\sum_{j=1}^n X_{2j}X_{1j} \displaystyle\sum_{j=1}^n X_{2j}y_j}{(\displaystyle\sum_{j=1}^n X_{2j}^2)^2} \large\displaystyle\sum_{i=1}^nX_{2i}^2\)

Now, anything that is a summation indexed by either \(i\) or \(j\) are equivalent because they are summing the same \(n\) observations, with the only difference being the index itself. So it follows that the above is equal to:

\(=\large\displaystyle\sum_{i=1}^n X_{1i}y_i - \frac{\displaystyle\sum_{j=1}^n X_{2j} y_j}{\displaystyle\sum_{j=1}^n X_{2j}^2} \large\displaystyle\sum_{i=1}^n X_{1i} X_{2i} - \frac{\large\displaystyle\sum_{i=1}^n X_{2i}y_i \displaystyle\sum_{j=1}^n X_{2j}X_{1j}}{\displaystyle\sum_{j=1}^n X_{2i}^2} + \frac{\displaystyle\sum_{j=1}^n X_{2j}y_j \displaystyle\sum_{j=1}^n X_{2j}X_{1j}}{\displaystyle\sum_{j=1}^n X_{2j}^2}\)

\(=\large\displaystyle\sum_{i=1}^n X_{1i}y_i - \frac{\displaystyle\sum_{j=1}^n X_{2j} y_j}{\displaystyle\sum_{j=1}^n X_{2j}^2} \large\displaystyle\sum_{i=1}^n X_{1i} X_{2i} =\large\displaystyle\sum_{i=1}^n X_{1i}\Bigg(y_i - \frac{\large\displaystyle\sum_{j=1}^n X_{2j}y_j}{\large\displaystyle\sum_{j=1}^n X_{2j}^2}\,X_{2i}\Bigg)\) (equation “\(X_1\) times residuals \(X_2\) through the origin”).

\(\Box\)

\(=\large\displaystyle\sum_{i=1}^n e_{X_1|X_2}^2 =\large\displaystyle\sum_{i=1}^n e_{X_1|X_2} \Bigg(X_{1i} - \frac{\large\displaystyle\sum_{i=1}^n \hat\beta_1 X_{1i}X_{2i}}{\large\displaystyle\sum_{i=1}^n X_{2i}^2}X_{2i} \Bigg)\)



\(=\large\displaystyle\sum_{i=1}^n e_{X_1|X_2}^2 =\large\displaystyle\sum_{i=1}^n e_{X_1|X_2} X_{1i} - \frac{\large\displaystyle\sum_{i=1}^n \hat\beta_1 X_{1i}X_{2i}}{\large\displaystyle\sum_{i=1}^n X_{2i}^2}\large\displaystyle\sum_{i=1}^n e_{X_1|X_2}X_{2i}\)

\(\large\displaystyle\sum_{i=1}^n e_{X_1|X_2}X_{2i}=0\) (sum of the residuals times the regressor is zero). So,



\(\large\displaystyle\sum_{i=1}^n e_{X_1|X_2}^2 = \large\displaystyle\sum_{i=1}^n e_{X_1|X_2}X_{1i}\). And plugging into the \(\hat\beta_1\) equation:



\(\large\hat\beta_1=\frac{\large\displaystyle\sum_{i=1}^n e_{Y|X_2}e_{X_1|X_2}}{\large\displaystyle\sum_{i=1}^n e_{X_1|X_2}X_{1i}}\)



\(\large \hat\beta_1=\frac{\large\displaystyle\sum_{i=1}^n e_{Y|X_2}e_{X_1|X_2}}{\large\displaystyle\sum_{i=1}^n e_{X_1|X_2}^2}\small \tag {11}\)

Regression estimate for \(\beta_1\) is the regression through the origin estimate having regressed \(X_2\) out of both the response and the predictor.

Regression estimate for \(\beta_2\) is the regression through the origin estimate having regressed \(X_1\) out of both the response and the predictor.

Multivariate regression estimates are exactly those having removed the linear relationship of the other variables from both the regressor and the response.

INTUITION:

The ingredients for the slope of the linear component to y afforded by one of the independent variables or regressors is the result of adding up and multiplying the residuals of the other variables with the regressor under consideration, as well as the residuals of the other variables with y. In essence, it’s the measure of 1. Lack of correlation between regressors (\(e_{x_1|x_2}\)); and 2. How far off in predicting y the other variables are (\(e{Y|x_2}\)).

More than two regressors:



Solving will entail p equations with p unknowns of the form of “system of equations”:

\(\large\displaystyle\sum_{i=1}^n(Y_i-\hat\beta_1X_{1i}-\hat\beta_2X_{2i}-\dots-\hat\beta_pX_{pi})X_k=0\)

Holding \(\hat\beta_1\) through \(\hat\beta_{p-1}\) fixed…

\(\hat\beta_p=\frac{\large\displaystyle\sum_{i=1}^n (Y_i-\hat\beta_1X_{1i}-\hat\beta_2X_{2i}-\dots-\hat\beta_{p-1}X_{i(p-1)})X_{ip}}{\large\displaystyle\sum_{i=1}^n X_{ip}^2}\) as in equation (7).

Plugging it into the equation above:

\(\large\displaystyle\sum_{i=1}^n(e_{i,Y|X_p}-e_{i,X_1|X_p}\hat\beta_1-\dots-e_{i,X_{p-1}|X_p}\hat\beta_{p-1})X_k=0\) as in equation (8).

\(X_k = e_{i,X_k|X_p}+\frac{\displaystyle\sum_{i=1}^n X_{ik}X_{ip}}{\displaystyle\sum_{i=1}^n X_{ip}^2}X_p\) and \(\displaystyle\sum_{i=1}^n e_{i,X_k|X_p}X_{ip}=0\).



Therefore:


\(\large\displaystyle\sum_{i=1}^n(e_{i,Y|X_p}-e_{i,X_1|X_p}\hat\beta_1-\dots-e_{i,X_{p-1}|X_p}\hat\beta_{p-1})X_k=0\) is equal to:

\(\large\displaystyle\sum_{i=1}^n(e_{i,Y|X_p}-e_{i,X_1|X_p}\hat\beta_1-\dots-e_{i,X_{p-1}|X_p}\hat\beta_{p-1})e_{i,X_kX_p}=0\)



Different Approach with Multiple Regression Model and Intercept:



\(y_i\,=\,\beta_1X_{1i}\,+\,\beta_2X_{2i}+\dots+\,\beta_pX_{pi}\,+\,\varepsilon_i\)

\(min(\hat\beta_o,\hat\beta_1,\hat\beta_2)\large\displaystyle\sum_{i=1}^n e_i^2\) where \(e_i= y_i - \hat\beta_o+\hat\beta_1X_{1i}+\hat\beta_2X_{2i}\)



Then first order conditions are:



\(\frac{\partial}{\partial\hat\beta_o}\large\displaystyle\sum_{i=1}^n e_i^2=-2\large\displaystyle\sum_{i=1}^n(y_i-\hat\beta_o-\hat\beta_1x_{1i}-\hat\beta_2x_{2i})\)

\(\frac{\partial}{\partial\hat\beta_1}\large\displaystyle\sum_{i=1}^n e_i^2=-2\large\displaystyle\sum_{i=1}^nX_{1i}(y_i-\hat\beta_o-\hat\beta_1x_{1i}-\hat\beta_2x_{2i})\)

\(\frac{\partial}{\partial\hat\beta_2}\large\displaystyle\sum_{i=1}^n e_i^2=-2\large\displaystyle\sum_{i=1}^n X_{2i}(y_i-\hat\beta_o-\hat\beta_1x_{1i}-\hat\beta_2x_{2i})\)



Set to zero and solve:



\(-2\large\displaystyle\sum_{i=1}^n(y_i-\hat\beta_o-\hat\beta_1x_{1i}-\hat\beta_2x_{2i})=0\)


\(\large\displaystyle\sum_{i=1}^n(y_i-\hat\beta_o-\hat\beta_1x_{1i}-\hat\beta_2x_{2i})=0\)


\(\large\displaystyle\sum_{i=1}^n y_i-n\hat\beta_o-\hat\beta_1 \large\displaystyle\sum_{i=1}^n x_{1i}-\hat\beta_2 \large\displaystyle\sum_{i=1}^nx_{2i}=0\)


\(\large\displaystyle\sum_{i=1}^n y_i-\hat\beta_1 \large\displaystyle\sum_{i=1}^n x_{1i}-\hat\beta_2 \large\displaystyle\sum_{i=1}^nx_{2i}=n\hat\beta_o\)


\(\large\bar Y -\hat\beta_1 \bar X_1 -\hat\beta_2 \bar X_2 =\hat\beta_o\)


\(-2\large\displaystyle\sum_{i=1}^nX_{1i}(y_i-\hat\beta_o-\hat\beta_1x_{1i}-\hat\beta_2x_{2i})=0\)


\(\large\displaystyle\sum_{i=1}^nX_{1i}(y_i-\hat\beta_o-\hat\beta_1x_{1i}-\hat\beta_2x_{2i})=0\)


\(\large\displaystyle\sum_{i=1}^nX_{1i}(y_i-\bar Y +\hat\beta_1 \bar X_1 +\hat\beta_2 \bar X_2 -\hat\beta_1 X_{1i} - \hat\beta_2 x_{2i})=0\)


\(\large\displaystyle\sum_{i=1}^nX_{1i}[(y_i-\bar Y -\hat\beta_1 (x_{1i}-\bar X_1) -\hat\beta_2 (x_{2i}-\bar X_2)]=0\)


\(\large\displaystyle\sum_{i=1}^n X_{1i}(y_i-\bar Y) -\hat\beta_1 \large\displaystyle\sum_{i=1}^n x_{1i}(x_{1i}-\bar X_1) -\hat\beta_2 \large\displaystyle\sum_{i=1}^n x_{1i}(x_{2i}-\bar X_2)]=0\)


\(\large\displaystyle\sum_{i=1}^n X_{1i}(y_i-\bar Y) -\hat\beta_2 \large\displaystyle\sum_{i=1}^n x_{1i}(x_{2i}-\bar X_2)]=\hat\beta_1 \large\displaystyle\sum_{i=1}^n x_{1i}(x_{1i}-\bar X_1)\)


\(\large\hat\beta_1=\large\frac{\displaystyle\sum_{i=1}^nx_{1i}(y_i-\bar Y)}{\displaystyle\sum_{i=1}^nx_{1i}(x_{1i}-\bar X_1)}\,-\, \hat\beta_2\, \frac{\displaystyle\sum_{i=1}^n x_{1i}(x_{2i}-\bar X_2)}{\displaystyle\sum_{i=1}^n x_{1i}(x_{1i}-\bar X_1)}\) compare to the formula with prior method through the origin (equation (11)):

\(\large \hat\beta_1=\frac{\large\displaystyle\sum_{i=1}^n e_{Y|X_2}e_{X_1|X_2}}{\large\displaystyle\sum_{i=1}^n e_{X_1|X_2}^2}\)

\(\large \hat\beta_2=\frac{\large\displaystyle\sum_{i=1}^n (Y_i-\hat\beta_1X_{1i})X_{2i}}{\large\displaystyle\sum_{i=1}^n X_{2i}^2}\)



The first part of the equation above appears to be the simple ordinary least squares (OLS) estimators for \(\beta_1\). The second part is the product of the estimate \(\beta_2\) and another term. The second part of this product should also be familiar: it is the simple OLS estimator of the regression of \(X_2\) on \(X_1\); it captures the part of the variation in \(X_2\) that is associated with variations in \(X_1\). Together, the product that comprises the second term captures the association between \(X_1\) and \(Y\) that is “explained” by \(X_2\) ; it subtraction from the simple OLS estimator is the “partialling out” of the influence of \(X_2\) , hence we arrive at the familiar interpretation of 1: “the association between \(X_1\) and \(Y\) , adjusting for \(X_2\) (holding \(X_2\) fixed).”

We can solve for \(\beta_2\) as a simple exercise; it is essentially the same as the case with \(\beta_1\).


Home Page