Initially posted in CV.

Simpson’s effect is key to understanding potential pitfalls in medical research, particularly in the area of meta-analysis, where multiple studies with dissimilar methodologies are brought together to provide a more definite answer to a problem. The cases reported tend to deal with count tables and ratios that change because of a “lurking” variable or confounder that is ignored (usually different group sizes). However, it is interesting to look at the continuous case to understand it conceptually.

To look into the paradox different studies composing a meta-analysis are regarded as random effects in a mixed model.

In Simpson’s paradox there is a reversal of the sign of the correlation between two variables, or equivalently, a flip in the sign of regression coefficients (slopes) due to an unaccounted for confounding variable.

In the following illustration, we are looking at the relationship between a fictitious biochemical marker (“Marker X”) in blood, and a second level in a blood test (“Marker Y”). We are pooling together the data from four different studies. Initially we look at a “successful” meta-analysis to eventually contrast it with a scenario where Simpson’s paradox rears its head.

**Scenario 1:**

As the data is aggregated we end up with a situation best modeled as a mixed-model with random effects explaining the variability between different studies (unit effects) of the form \(\boldsymbol{y} = X \boldsymbol{\beta} + Z \boldsymbol{u} + \boldsymbol{\epsilon}\) with \(\boldsymbol u\) corresponding to the intecepts of the fitted lines through the \(y\sim x\) data cloud for each individual study. This model accounts for the possible presence of a substantial amount of dispersion in the data. Here’s what it would look like (code):

with the data from each study color coded on the plot to the left, and aggregated on the right. The situation is not ideal, because there is quite a bit of spread in the values from different datasets. The study would probably be much more likely to be published and quoted with something like this:

So perfect an overlap between studies that an ordinary least square (OLS) fit would probably be better than a mixed model with random effects.

However, in either instance, the presence of over-dispersion doesn’t raise the possibility of a confounder variable.

On the other hand…

**Scenario 2:**

… we can encounter an often plotted distribution in the data cloud such that fitting a linear regression with mixed-effects ends up with negative (or positive) slopes for each one of the individual studies, only to reverse sign when the data is aggregated:

This is the Simpson’s effect.

If we look into the results in the data behind these plots, the `cor(y,x)`

for each individual subgroup ranged between `-0.238`

and `-0.302`

; yet, when the data was aggregated, the combined correlation was `0.473`

. Naturally the sign of the regression slopes was also opposite. Of note, the mixed-effect regression with random intersects was a better model than the OLS on the aggregate data.

Already on the plot, one can have an intuition that the data is not just disperse, but it is stretched along the \(x\) axis, precisely by an unknown “lurking” variable, which may not be immediately apparent. This can be objectivized by looking at the correlation between the \(x\) variable and the \(y\,\,intercept\) for each subgroup, and contrasting it with scenario 1. Graphically,

in the case with Simpson’s effect (scenario 2), the higher the \(x\) values, the higher the \(y\,\,\text{intercepts}\) (left plot), as opposed to the lack of correlation in scenario 1 (right plot). The correlation in the first case between \(x\) and the \(y\,\,\text{intercepts}\) was `0.7876`

with a statistically significant (`p -> 0`

) slope when doing a regression. In contradistinction, the correlation in scenario 1 between \(x\) and the \(y\,\,\text{intercepts}\) was `0`

.

A follow-up question is what type of clinical scenario could possibly follow this pattern (scenario 2)?