Wilcoxon Rank Sum

Wilcoxon Rank Sum or Mann-Whitney U Test:

From this question in CV about the difference with the rank sum test:

You should use the rank-sum test when the data are not paired. You’ll find many definitions of pairing, but at heart the criterion is something that makes pairs of values at least somewhat positively dependent, while unpaired values are not dependent. Often the dependence-pairing occurs because they’re observations on the same unit (repeated measures), but it doesn’t have to be on the same unit, just in some way tending to be associated (while measuring the same kind of thing), to be considered as ‘paired’.

The procedure is outlined in Wikipedia.

Example 1:

a = c(110, 115, 128, 142, 123, 129, 130, 128 ,134, 133, 128, 147, 137, 112, 138, 128, 132, 139, 133, 135, 133, 125, 134, 139, 138, 142, 152, 140, 144, 147 ,153 ,141)
b = c(122, 118, 120 ,131 ,124 ,118 ,120 ,140 ,124, 120, 134, 127, 127 ,134, 133, 137 ,137 ,135 ,129 ,138 ,143, 128 ,121 ,129, 133, 138, 142, 131, 135, 132, 146, 135)

In statistics, the Mann–Whitney U test (also called the Mann–Whitney–Wilcoxon (MWW), Wilcoxon rank-sum test, or Wilcoxon–Mann–Whitney test) is a nonparametric test of the null hypothesis that it is equally likely that a randomly selected value from one sample will be less than or greater than a randomly selected value from a second sample.

Under the null hypothesis \(\text{H}_0\), the probability of an observation from the population \(X\) exceeding an observation from the second population \(Y\) equals the probability of an observation from \(Y\) exceeding an observation from \(X\).

wilcox.test(a,b)

## Warning in wilcox.test.default(a, b): cannot compute exact p-value with ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 632.5, p-value = 0.1067
## alternative hypothesis: true location shift is not equal to 0

\(632.5\) can be reproduced manually as

n1 = length(a)
n2 = length(b)
d = data.frame(a,b)
d = stack(d)
pooled = d[order(d$values),]
pooled$ranks = rank(pooled$values)
(R1 = sum(pooled[pooled$ind == "a", "ranks"]))

## [1] 1160.5

(R2 = sum(pooled[pooled$ind == "b", "ranks"]))

## [1] 919.5

(U1 = R1 - n1 * (n1 + 1) / 2)

## [1] 632.5

(U2 = R2 - n2 * (n2 + 1) / 2)

## [1] 391.5

Example 2:

From the mtcars R dataset:

a = mtcars[mtcars$am == 0, "mpg"]
b = mtcars[mtcars$am == 1, "mpg"]

n1 = length(a)
n2 = length(b)

lab.a = rep("a", n1)
lab.b = rep("b", n2)

values = c(a,b)
labs = c(lab.a, lab.b)
d = data.frame(labs, values)

pooled = d[order(d$values),]
pooled$ranks = rank(pooled$values)

(R1 = sum(pooled[pooled$labs == "a", "ranks"]))

## [1] 232

(R2 = sum(pooled[pooled$labs == "b", "ranks"]))

## [1] 296

(U1 = R1 - n1 * (n1 + 1) / 2)

## [1] 42

(U2 = R2 - n2 * (n2 + 1) / 2)

## [1] 205

wilcox.test(a, b)

## Warning in wilcox.test.default(a, b): cannot compute exact p-value with ties

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  a and b
## W = 42, p-value = 0.001871
## alternative hypothesis: true location shift is not equal to 0

Example 3:

A topical rash treatment was applied to a portion of a rash on \(n\) patients. A quantitative measure of redness was calculated for the treated and untreated regions of the rash. A sign of + was given when the treated area was less red (more improved) than the untreated area and a - sign when it was not. It is desired to know whether the treatment improves the rash.

Surprisingly R seems to pick the maximum value of the adjusted rank sums for both groups, whereas the minimum value is typically chosen. This is likely anecdotal since the sum of all unadjusted ranks \(R1+R2=\frac{N(N+1)}{2}\) with \(N\) being the total number of observations.

How many possible values can the p value for the sign test take?

Answer: There are \(n+1\) possible p values. (CORRECT)

If there was one person, the test statistic could be 0 or 1 positive signs, thus the p value has two possible values. For two people it could be 0, 1 and 2 for three possible values and so on.

For what size of \(n\) for a \(5\%\) one sided sign test is there a non empty rejection region?

Answers:

You need 5 patients and all have to be + (CORRECT)
You need 20 patients, and all +
You need 5 patients, most +
You need 20 patients, most +

The most extreme case is all of the patients have a +. In this case, the p value is:

\(1/2^n\) because \(2^n\) is the “doubling function” that gives us all the possible “subsets” of positive results in \(n\): The first patient can be \(+\) or \(-\) and in each case, the second patient can in turn be \(+\) or \(-\) and so on. On the other hand, there is only one way of having all patients with a \(+\) result.

Trying with the lowest number of patients:

1/(2^(1:5))

## [1] 0.50000 0.25000 0.12500 0.06250 0.03125

We see that we can reject with \(5\%\) risk alpha with five positive cases.

Consider a \(5\%\) one sided sign test for \(5\) subjects. What would be the power of the test if the probability that the treatment works is actually \(90\%\) instead of the \(50\%\) assumed under the null?

By the previous question, we will only reject for 5 subjects when all of them are +. The probability of such an occurrence under the alternative (where \(p=.9\) rather than \(.5\)) is:

0.9^5

Example 4:

The Biostatistics and Epidemilogy departments are running a 10K road race. There are three pairs of runners. In each case, a runner from Biostat was matched to a runner from Epi of the same age, gender and degree of running experience. The difference in each pairs times was taken and a signed rank test performed.

What is the smallest value that the two sided exact signed rank p value could take?

There are \(2^3\) possible collections of winners, labelled \(1\) if biostat wins and \(-1\) if epi wins:

(winners <- t(expand.grid(c(1,-1), c(1,-1), c(1,-1))))

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## Var1    1   -1    1   -1    1   -1    1   -1
## Var2    1    1   -1   -1    1    1   -1   -1
## Var3    1    1    1    1   -1   -1   -1   -1

So the signs are giving us 8 possible outcomes.

Each possible outcome will be ranked with the possible ranks being the permutations of, in this case, \(3\): \(P(3,3)=3!\cdot 2!\cdot 1!=6\).

t <- c(1,2,3)
library(Deducer)
(abs_rank <- t(perm(t)))

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    2    2    1    3    3
## [2,]    2    1    3    3    1    2
## [3,]    3    3    1    2    2    1

So there are 48 outcomes:

(outcomes <- lapply(as.data.frame(winners),`*`,abs_rank))

## $V1
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    2    2    1    3    3
## [2,]    2    1    3    3    1    2
## [3,]    3    3    1    2    2    1
## 
## $V2
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]   -1   -2   -2   -1   -3   -3
## [2,]    2    1    3    3    1    2
## [3,]    3    3    1    2    2    1
## 
## $V3
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    2    2    1    3    3
## [2,]   -2   -1   -3   -3   -1   -2
## [3,]    3    3    1    2    2    1
## 
## $V4
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]   -1   -2   -2   -1   -3   -3
## [2,]   -2   -1   -3   -3   -1   -2
## [3,]    3    3    1    2    2    1
## 
## $V5
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    2    2    1    3    3
## [2,]    2    1    3    3    1    2
## [3,]   -3   -3   -1   -2   -2   -1
## 
## $V6
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]   -1   -2   -2   -1   -3   -3
## [2,]    2    1    3    3    1    2
## [3,]   -3   -3   -1   -2   -2   -1
## 
## $V7
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    2    2    1    3    3
## [2,]   -2   -1   -3   -3   -1   -2
## [3,]   -3   -3   -1   -2   -2   -1
## 
## $V8
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]   -1   -2   -2   -1   -3   -3
## [2,]   -2   -1   -3   -3   -1   -2
## [3,]   -3   -3   -1   -2   -2   -1

Summing the columns:

(W <- lapply(as.data.frame(winners), function(x) colSums(x * abs_rank)))

## $V1
## [1] 6 6 6 6 6 6
## 
## $V2
## [1] 4 2 2 4 0 0
## 
## $V3
## [1] 2 4 0 0 4 2
## 
## $V4
## [1]  0  0 -4 -2 -2 -4
## 
## $V5
## [1] 0 0 4 2 2 4
## 
## $V6
## [1] -2 -4  0  0 -4 -2
## 
## $V7
## [1] -4 -2 -2 -4  0  0
## 
## $V8
## [1] -6 -6 -6 -6 -6 -6

# Tabulating the results:

table(stack(W)$values)

## 
## -6 -4 -2  0  2  4  6 
##  6  6  6 12  6  6  6

(p_value <- (sum(stack(W)$values==6) + sum(stack(W)$values==-6)) /length(stack(W)$values))

## [1] 0.25

Permutations of {1, 2, 3} = 6 and for each one, \(2^3\) ways of allocating the +’s and -’s with only two of them have all + or all - signs. So it’s \(12/48 = 1/4\).

As a Monte Carlo calculation:

v <- c(1,2,3) 
nsim <- 1e5 

W <- 0 
for (i in 1:nsim){ 
rank <- sample(v) 
sign <- sample(c(1, -1), 3, replace = T) 
W[i] <- sum(rank * sign) 
} 
(p_value <- (sum(W==6) + sum(W==-6)) / nsim)

## [1] 0.24884

Or simply: There are \(2^3\) cases, of which 2 are the most extreme, hence the minimum p value is \(1/4\). For \(n\) pairs the minimum p value is \((1/2)^{(n-1)}\). The signed rank test effectively just counts all the ways to allocate \(\pm 1\) to the observations. The extreme cases are “all -1” and “all +1”. since those will lead to the largest and smallest values of the statistic.

Home Page

NOTE: These are tentative notes on different topics for personal use - expect mistakes and misunderstandings.

Wilcoxon Rank Sum

NOTES ON STATISTICS, PROBABILITY and MATHEMATICS