This entry appeared initially as a CV post here.

If the interaction happens between a continuous and a discrete variable it is (if I’m not mistaken) relatively straightforward. The mathematical expression is:


So if we take my favorite dataset mtcars{datasets} in R, and we carry out the following regression:

(fit <- lm(mpg ~ wt * am, mtcars))

lm(formula = mpg ~ wt * am, data = mtcars)

(Intercept)           wt           am        wt:am  
     31.416       -3.786       14.878       -5.298  

am, which dummy-codes for the type of transmission in the car am Transmission (0 = automatic, 1 = manual) will give us an intercept of 31.416 for manual (0), and 31.416 + 14.878 = 46.294 for automatic (1). The slope for weight is -3.786. And for the interaction, when am is 1 (automatic), the regression expression will have the added term, \(-5.298*1*\text {weight}\), which will add to \(-3.786*\text {weight}\), resulting in a slope of \(-9.084*\text {weight}\). So we are changing the slope with the interaction.

But when it is two continuous variables that are interacting, are we really creating an infinite number of slopes? For example, take the explanatory variables wt (weight) and hp (horsepower) and the regressand mpg (miles per gallon):

(fit <- lm(mpg ~ wt * hp, mtcars))

lm(formula = mpg ~ wt * hp, data = mtcars)

(Intercept)           wt           hp        wt:hp  
   49.80842     -8.21662     -0.12010      0.02785

How do we read the output?


We can “prove” how these coefficients “work” by simply taking the first values of mpg, wt and hp, which happen to be for the glamorous Mazda RX4:

credit here

These are:

          mpg cyl disp  hp drat   wt  qsec vs am gear carb
Mazda RX4  21   6  160 110  3.9 2.62 16.46  0  1    4    4

And simply run predict(fit)[1] Mazda RX4, which returns a \(\hat y\) value of \(23.09547\). No matter what, I have to rearrange the coefficient to get this number - all possible permutations if necessary! No just kidding. Here it is:

coef(fit)[1] + (coef(fit)[2] * mtcars$wt[1]) + (coef(fit)[3] * mtcars$hp[1]) + (coef(fit)[4] * mtcars$wt[1] * mtcars$hp[1]) \(= 23.09547\).

The math expression is:

\(\small \hat Y=\hat ß_0 (=1^{st}\,\text{coef})\,+\,\hatß_1 (=2^{nd}\,\text{coef})\,*wt \,+\, \hatß_2 (=3^{rd}\,\text{coef})\,*hp \,+\, [\hatß_3(=4^{th}\,\text{coef})\, *wt\,*\,hp]\)

So, as pointed out in the answers, there is only one intercept (the first coefficient), but there are two “private” slopes: one for each explanatory variable, plus one “shared” slope. This shared slope allows obtaining uncountably infinite slopes if we “zip” through \(\mathbb{R}\) for all the theoretically possible realizations of one of the variables, and at any point we combine (\(+\)) the “shared” coefficient times the remaining random variable (e.g. for hp = 100, it would be 0.02785 * 100 * wt) with its *“private" slope (-8.21662 * wt). I wonder if I can call it a convolution*…

We can also see that this is the right interpretation running:

y <- coef(fit)[1] + (coef(fit)[2] * mtcars$wt[1]) + (coef(fit)[3] * mtcars$hp[1]) + (coef(fit)[4] * mtcars$wt[1] * mtcars$hp[1])
identical(as.numeric(predict(fit)[1]), as.numeric(y)) TRUE

Having rediscovered the wheel we see that the “shared” coefficient is positive (0.02785), leaving one loose end, now, which is the explanation as to why the weight of the vehicle as a predictor for “gas-guzzliness” is buffered for higher horse-powered cars… We can see this effect (thank you @Glen_b for the tip) with the \(3\,D\) plot of the predicted values in this regression model, which conforms to the following parabolic hyperboloid:

enter image description here

Home Page