NOTES ON STATISTICS, PROBABILITY and MATHEMATICS


Dummy Coding in Regression:


Dummy coding allows introducing different levels within the variable that you want to code - for instance temperature with factors hot, moderate and cold: one single dummy-coded variable with three levels. This is very straightforward in matrix notation. Let’s say you are regressing the variable Food production on a continuous variable Ground Nitrogen and the categorical variable Temperature dummy-coded into the model matrix. The equation of the regression model is going to be:

\(\Tiny \begin {bmatrix} \hat F_1\\ \hat F_2\\ \hat F_3\\\vdots\\ \\\vdots\\ \\\vdots\\ \\\vdots\\ \\\vdots\\ \hat F_{59} \end {bmatrix} = \Tiny \begin {bmatrix} 1&GN_1&0&0\\ 1&GN_2 & 0 & \color{orange}{\text{MOD}}\\ 1&GN_3 & \color{red}{\text{HOT}} & 0\\ \vdots&\vdots&\vdots&\vdots\\ 1&GN_{35}& 0 & \color{orange}{\text{MOD}}\\ 1&GN_{36}& 0 & \color{orange}{\text{MOD}}\\ 1&GN_{37}& \color{red}{\text{HOT}} & 0\\ \vdots&\vdots&\vdots&\vdots\\ 1&GN_{42}& 0 &\color{orange}{\text{MOD}}\\ 1&GN_{43}&\color{red}{\text{HOT}} & 0\\ 1&GN_{44}&\color{red}{\text{HOT}} & 0\\ 1&GN_{45}& 0 &\color{orange}{\text{MOD}}\\ \vdots&\vdots&\vdots&\vdots\\ 1&GN_{59}&\color{red}{\text{HOT}} & 0\\ \end {bmatrix} \small \begin {bmatrix} \hat\beta_0\\\hat\beta_1\\\hat\beta_{2\color{red}{H}}\\\hat\beta_{2\color{orange}{M}} \end {bmatrix} \Tiny = \begin {bmatrix} 1&GN_1&0&0\\ 1&GN_2 & 0 & \color{orange}{1}\\ 1&GN_3 & \color{red}{1} & 0\\ \vdots&\vdots&\vdots&\vdots\\ 1&GN_{35}& 0 & \color{orange}{1}\\ 1&GN_{36}& 0 & \color{orange}{1}\\ 1&GN_{37}& \color{red}{1} & 0\\ \vdots&\vdots&\vdots&\vdots\\ 1&GN_{42}& 0 &\color{orange}{1}\\ 1&GN_{43}&\color{red}{1} & 0\\ 1&GN_{44}&\color{red}{1} & 0\\ 1&GN_{45}& 0 &\color{orange}{1}\\ \vdots&\vdots&\vdots&\vdots\\ 1&GN_{59}&\color{red}{1} & 0\\ \end {bmatrix} \small \begin {bmatrix} \hat\beta_0\\\hat\beta_1\\\hat\beta_{2\color{red}{H}}\\\hat\beta_{2\color{orange}{M}} \end {bmatrix}\)

The model matrix has the dummy-coded values arranged in such a way that the coefficient for the factor COLD will be completely “absorbed” into the intercept, while the coefficients for the other two factors will be calculated as \(\hat\beta_0+\hat\beta_{2\color{red}{H}}\) for the factor HOT, and \(\hat\beta_0+\hat\beta_{2\color{orange}{M}}\) for MODERATE.

If you are using R, the intercept for COLD will be simply the default intercept. To get the other intercepts you will have to do add to the intercept the coefficients for HOT and MODERATE respectively. If your model is called fit the coefficients (intercepts) for COLD will be fit$coef[1]; for HOT, fit$coef[1] + fit$coef[2]; and for MODERATE, fit$coef[1] + fit$coef[3].


Home Page

NOTE: These are tentative notes on different topics for personal use - expect mistakes and misunderstandings.