NOTES ON STATISTICS, PROBABILITY and MATHEMATICS


Ordinary Least Squares (OLS):


[First written as a post in CrossValidated.]

I want to walk through the process of proving to oneself that the results in R with a simple example can be reached following the linear algebra of projections onto the column space, and perpendicular (dot product) errors concept, illustrated in different posts, and nicely explained by Dr. Strang in Linear Algebra and Its Applications, and readily accessible here.

In order to estimate the coefficients \(\small \beta\) in the regression,

\[\small mpg = intercept\,(cyl=4) + \beta_1\,*\,weight + D1\,* intercept\,(cyl=6) + D2\, * intercept\,(cyl=8)\,\,\,\,[*]\]

with \(\small D1\) and \(\small D2\) representing dummy variables with values [0,1], we first would need to include in the design matrix (\(\small X\)) the dummy coding for the number of cylinders, as follows:

    attach(mtcars)    
    x1 <- wt
        
        x2 <- cyl; x2[x2==4] <- 1; x2[!x2==1] <-0
        
        x3 <- cyl; x3[x3==6] <- 1; x3[!x3==1] <-0
        
        x4 <- cyl; x4[x4==8] <- 1; x4[!x4==1] <-0
        
        X <- cbind(x1, x2, x3, x4)
        colnames(X) <-c('wt','4cyl', '6cyl', '8cyl')
head(X)
        wt 4cyl 6cyl 8cyl
[1,] 2.620    0    1    0
[2,] 2.875    0    1    0
[3,] 2.320    1    0    0
[4,] 3.215    0    1    0
[5,] 3.440    0    0    1
[6,] 3.460    0    1    0

If the design matrix had to parallel strictly equation \(\small [*]\) (above), where the first intercept corresponds to cars of four cylinders, as in the lm without a `-1’, it would require a first column of just ones, but we’ll derive the same results without this intercept column.

Continuing then, to calculate the coefficients (\(\small\beta\)) we need the projection matrix - we project the vector of the independent variable values on to the column space of the vectors constituting the design matrix. The linear algebra is \(\small ProjMatrix = \small (X^{T}X)^{-1}X^{T}\), which multiplied by the vector of the independent variable: \(\large [ProjMatrix] \,[y]\, =\, [RegrCoef's]\), or \(\large (X^{T}X)^{-1}X^{T}\,y = \beta\):

    X_tr_X_inv <- solve(t(X) %*% X)    
    Proj_M <- X_tr_X_inv %*% t(X)
    Proj_M %*% mpg
##           [,1]
## wt   -3.205613
## 4cyl 33.990794
## 6cyl 29.735212
## 8cyl 27.919934

Identical to: coef(lm(mpg ~ wt + as.factor(cyl)-1)).

Finally, to calculate the predicted values, we will need the hat matrix, which is defined as, \(\small Hat Matrix = \small X(X^{T}X)^{-1}X^{T}\). This is readily calculated as:

HAT <- X %*% X_tr_X_inv %*% t(X)

And the estimated (\(\hat{y}\)) values as \(\small X(X^{T}X)^{-1}X^{T}\,y\), in this case:

y_hat <- HAT %*% mpg

which gives identical values to:

cyl <- as.factor(cyl); OLS <- lm(mpg ~ wt + cyl); predict(OLS)
##        1        2        3        4        5        6        7        8 
## 21.33650 20.51907 26.55377 19.42916 16.89262 18.64379 16.47590 23.76489 
##        9       10       11       12       13       14       15       16 
## 23.89311 18.70790 18.70790 14.87309 15.96300 15.80272 11.09046 10.53269 
##       17       18       19       20       21       22       23       24 
## 10.78593 26.93844 28.81373 28.10849 26.08896 16.63618 16.90865 15.61038 
##       25       26       27       28       29       30       31       32 
## 15.59435 27.78793 27.13078 29.14070 17.75814 20.85566 16.47590 25.07919
y_hat <- as.numeric(y_hat)
predicted <- as.numeric(predict(OLS))
all.equal(y_hat,predicted)
## [1] TRUE

Home Page

NOTE: These are tentative notes on different topics for personal use - expect mistakes and misunderstandings.