From this question in CV about the difference with the rank sum test:
You should use the signed rank test when the data are paired. You’ll find many definitions of pairing, but at heart the criterion is something that makes pairs of values at least somewhat positively dependent, while unpaired values are not dependent. Often the dependence-pairing occurs because they’re observations on the same unit (repeated measures), but it doesn’t have to be on the same unit, just in some way tending to be associated (while measuring the same kind of thing), to be considered as ‘paired’.
Here is the procedure in Wikipedia
From this question in CV:
I want to test whether the difference between the pairs of students's scores follows a symmetric distribution around zero.
Students Before After difference
1 ** ** 10
2 ** ** -9
3 ** ** 8
4 ** ** 5
5 ** ** -6
6 ** ** 0
7 ** ** 2
Since the data is not normally distributed, I assume i can use wilcoxon signed-rank test here, but in this case we only know the difference of the pairs, so if perform wilcoxon signed-rank test in r here, normally the code should be:
wilcox.test(before, after, paired = T, exact = T)
but the problem is that, I don’t have the data of ‘before’ and ‘after’, instead I only know the difference between them, so it seems wilcoxon signed-rank test can not be applied here. Is there other alternative method to this case?
Answer:
I have been playing with this question, and here is what I got so far…
wilcox.test()
:I tried following on the side what R was doing in the back. Here is some code:
d = data.frame(difference = c(10,-9,8,5,-6,0,2)) # Differences in OP
d$sgn = sign(d$difference) # The sign function of these diff's
d$abs = abs(d$difference) # The absolute value of these diff's
d$abs = replace(d$abs, d$abs==0, NA) # Getting the zeros out of the way
n = length(d$rank) - length(d$rank[is.na(d$rank)]) # Rows without zero differences (6)
d$rank = rank(d$abs, na='keep') # Ranking the abs values
d$multi = d$sgn * d$rank # Multiplying sign times rank.
yielding the following data.frame:
difference sgn abs rank multi
1 10 1 10 6 6
2 -9 -1 9 5 -5
3 8 1 8 4 4
4 5 1 5 2 2
5 -6 -1 6 3 -3
6 0 0 NA NA NA
7 2 1 2 1 1
Now we can get the test statistic (V
) by summing all the
positive values in multi
:
sum(d[d$multi > 0, ]$multi, na.rm=T)
, and the output is
\(13\).
Now let’s pretend its 1985, and we go to the back of the book to the
tables to retrieve the \(p\)-value… OK,
let’s do it with R:
psignrank(q = 13, n = n, lower.tail = T)
. The result: \(0.71875\).
Is this concordant with the results of the wilcox.test
function… No… and yes. Initially, we may be disappointed:
wilcox.test(difference, correct = F, alternative = 'less', exact = T)
Wilcoxon signed rank test
data: difference
V = 13, p-value = 0.6999
alternative hypothesis: true location is less than 0
Warning message:
In wilcox.test.default(difference, correct = F, alternative = "less", :
cannot compute exact p-value with zeroes
but it’s not like we are not warned… Now, let’s just get rid of the annoying row with the tie, and recalculate:
wilcox.test(difference[-6], correct = F, alternative = 'less', exact = T)
Wilcoxon signed rank test
data: difference[-6]
V = 13, p-value = 0.7187
alternative hypothesis: true location is less than 0
Bingo! Here is the coveted \(0.7187\), again.
set.seed(11) # Today's date in the US - no cherry-picking!
r = 1:6 # The possible ranks of our non-zero differences
nsim = 1e5
V = 0
for (i in 1:nsim){
rank = sample(r) # Sampling the ranks...
sign = sample(c(1, -1), 6, replace = T) #... and the signs for each rank.
V[i] = sum(rank[sign > 0]) # Doing the sum to get the V.
}
(p_value <- sum(V <= 13) / nsim) # Fraction with a sum equal or lower than 13.
[1] 0.72204
Really cool! \(0.72204\)… so close…
From this question:
Question:
I am trying to duplicate a Wilcoxon signed-rank test example in this Wikipedia post using R.
The data is as follows:
after = c(125, 115, 130, 140, 140, 115, 140, 125, 140, 135)
before = c(110, 122, 125, 120, 140, 124, 123, 137, 135, 145)
sgn = sign(after-before)
abs = abs(after - before)
d = data.frame(after,before,sgn,abs)
d$rank = rank(replace(abs,abs==0,NA), na='keep')
d$multi = d$sgn * d$rank
(W=abs(sum(d$multi, na.rm = T)))
W = 9
However, the test statistic value that R produces (undoubtedly due to some mistake that I am making in setting up the function or interpreting the output) is:
wilcox.test(d$before,d$after, paired = T, alternative = "two.sided", correct=F)
Wilcoxon signed rank test
data: d$before and d$after
V = 18, p-value = 0.5936
alternative hypothesis: true location shift is not equal to 0
This (V=18
) is different from the 9
value
in the Wikipedia post.
What am I missing?
Answer:
R reports the V-statistic, which is the sum of the positive ranks. The Wikipedia example computes it slightly differently, as the sum of all ranks, regardless of sign. In other words, both versions are correct (and equivalent). This CrossValidated post might be helpful.
For contrast, here is the Wilcoxon rank sum test:
From this question on CV:
Comparing two vectors of data without assuming a normal distribution:
x <- scan(text="8 8 4 1 2 2 0 2 5 2 3 3 3 1 5 4 4 1 4 2")
y <- scan(text="4 0 4 1 1 2 0 1 2 2 1 2 1 0 0 0 1 1 1 1")
can be done with a permutation test, which will give us the one-sided \(p\)-value:
set.seed(0)
xy <- c(x, y) # Combo vec. with x and y combined. 40 entries.
sample <- replicate(1e5, sum(sample(xy, length(x)))) # 1e5 samples of length(x) w/o replac. from combo xy, returning the sum of their elements.
mean_x_samples <- sample / length(x) # 1e5 means from length(x) samples from xy
diff <- sum(xy) - sample # Difference between combo vec and samples
mean_y_samples <- diff / length(y) # 1e5 means from length(y) samples from xy
delta_samples <- mean_x_samples - mean_y_samples
stat <- mean(x) - mean(y)
(p_value = mean(delta_samples > stat))
## [1] 0.00013
In R we can get a two-tailed \(p\) value with the Wilcoxon rank-sum test:
wilcox.test(x, y, paired=F)
##
## Wilcoxon rank sum test with continuity correction
##
## data: x and y
## W = 321, p-value = 0.0008505
## alternative hypothesis: true location shift is not equal to 0
# If both x and y are given and paired is FALSE, a Wilcoxon rank sum test
# (equivalent to the Mann-Whitney test: see the Note) is carried out.
NOTE: These are tentative notes on different topics for personal use - expect mistakes and misunderstandings.