Dummy coding allows introducing different
levels within the variable that you want to
code - for instance temperature with factors hot,
moderate and cold: one single dummy-coded variable
with three levels. This is very straightforward in matrix notation.
Let’s say you are regressing the variable Food production
on a continuous variable Ground Nitrogen
and the
categorical variable Temperature
dummy-coded into the
model matrix. The equation of the regression model is
going to be:
\(\Tiny \begin {bmatrix} \hat F_1\\ \hat F_2\\ \hat F_3\\\vdots\\ \\\vdots\\ \\\vdots\\ \\\vdots\\ \\\vdots\\ \hat F_{59} \end {bmatrix} = \Tiny \begin {bmatrix} 1&GN_1&0&0\\ 1&GN_2 & 0 & \color{orange}{\text{MOD}}\\ 1&GN_3 & \color{red}{\text{HOT}} & 0\\ \vdots&\vdots&\vdots&\vdots\\ 1&GN_{35}& 0 & \color{orange}{\text{MOD}}\\ 1&GN_{36}& 0 & \color{orange}{\text{MOD}}\\ 1&GN_{37}& \color{red}{\text{HOT}} & 0\\ \vdots&\vdots&\vdots&\vdots\\ 1&GN_{42}& 0 &\color{orange}{\text{MOD}}\\ 1&GN_{43}&\color{red}{\text{HOT}} & 0\\ 1&GN_{44}&\color{red}{\text{HOT}} & 0\\ 1&GN_{45}& 0 &\color{orange}{\text{MOD}}\\ \vdots&\vdots&\vdots&\vdots\\ 1&GN_{59}&\color{red}{\text{HOT}} & 0\\ \end {bmatrix} \small \begin {bmatrix} \hat\beta_0\\\hat\beta_1\\\hat\beta_{2\color{red}{H}}\\\hat\beta_{2\color{orange}{M}} \end {bmatrix} \Tiny = \begin {bmatrix} 1&GN_1&0&0\\ 1&GN_2 & 0 & \color{orange}{1}\\ 1&GN_3 & \color{red}{1} & 0\\ \vdots&\vdots&\vdots&\vdots\\ 1&GN_{35}& 0 & \color{orange}{1}\\ 1&GN_{36}& 0 & \color{orange}{1}\\ 1&GN_{37}& \color{red}{1} & 0\\ \vdots&\vdots&\vdots&\vdots\\ 1&GN_{42}& 0 &\color{orange}{1}\\ 1&GN_{43}&\color{red}{1} & 0\\ 1&GN_{44}&\color{red}{1} & 0\\ 1&GN_{45}& 0 &\color{orange}{1}\\ \vdots&\vdots&\vdots&\vdots\\ 1&GN_{59}&\color{red}{1} & 0\\ \end {bmatrix} \small \begin {bmatrix} \hat\beta_0\\\hat\beta_1\\\hat\beta_{2\color{red}{H}}\\\hat\beta_{2\color{orange}{M}} \end {bmatrix}\)
The model matrix has the dummy-coded values arranged in such a way
that the coefficient for the factor COLD
will be completely
“absorbed” into the intercept, while the coefficients for the other two
factors will be calculated as \(\hat\beta_0+\hat\beta_{2\color{red}{H}}\)
for the factor HOT
, and \(\hat\beta_0+\hat\beta_{2\color{orange}{M}}\)
for MODERATE
.
If you are using R, the intercept for COLD
will be
simply the default intercept. To get the other intercepts you will have
to do add to the intercept the coefficients for HOT
and
MODERATE
respectively. If your model is called
fit
the coefficients (intercepts) for COLD
will be fit$coef[1]
; for HOT
,
fit$coef[1] + fit$coef[2]
; and for MODERATE
,
fit$coef[1] + fit$coef[3]
.
NOTE: These are tentative notes on different topics for personal use - expect mistakes and misunderstandings.