NOTES ON STATISTICS, PROBABILITY and MATHEMATICS


ROC Curves and Area Under the Curve (AUC):


There are reference answers on-line here here, here and here.

DEFINITIONS:

\(\text{Sensitivity or TP Rate} = \frac{TP}{TP\, +\, FN}\)


\(\text{Specificity} = \frac{TN}{TN\, +\, FP}\)


\(\text{FP Rate or Fall-out} = \frac{FP}{TN\, +\, FP} = 1 - \text{Specificity}\)


\(\text{FN Rate or Miss rate} = \frac{FN}{TP\, +\, FN}\)


\(\text{PPV or Precision} = \frac{TP}{TP\, +\, FP}\)


\(\text{NPV or Negative predictive value} = \frac{TN}{FN\, +\, TN}\)


\(\text{FOR or False omission rate} = \frac{FN}{FN\, +\, TN}\)


\(\text{FDR or False discovery rate} = \frac{FP}{TP\, +\, FP}\)


In addition we can get the following rates using the entire table:

\(\text{Prevalence} = \frac{TP+FN}{TP\, +\, FP +\, TN +\, FN}\)


\(\text{Accuracy or frac correct classif.} = \frac{TP+TN}{TP\, +\, FP +\, TN +\, FN}\)


\(\text{frac incorrect classif.} = \frac{FP+FN}{TP\, +\, FP +\, TN +\, FN}\)


Utilizing the PSA antigen data from here, where fpsa stands for “free PSA” and tpsa stands for “total PSA”. There are repeated measures in time t at different age-s. The outcome is d and is coded as 0 or 1:

The following is the dataset from an age-matched case-control study: each diagnosed case was assigned a control matched to case by date of birth. There are 70 non diagnosed and 71 diagnosed:

data = read.csv(text = gsheet2text('https://docs.google.com/spreadsheets/d/1ba1IYjX79EEmgBmsEzZgNs6Joe4v-SECWZUhbfhewhc/edit?usp=sharing', format ='csv'))
## No encoding supplied: defaulting to UTF-8.
data = sqldf("select id, d, min(t), fpsa, tpsa, age from 'data'\n group by id")
data$age_group <- cut(data$age, breaks = c(45,55,65,75), labels = c("(45,55]","(55,65]","(65,75]"))
colnames(data)[2] = "outcome"
head(data)
##   id outcome min(t)  fpsa  tpsa    age age_group
## 1  1       1 -4.482 3.525 14.82 67.581   (65,75]
## 2  2       1 -4.498 1.104  5.54 70.166   (65,75]
## 3  3       0 -3.381 0.226  0.94 55.028   (55,65]
## 4  4       1 -4.619 0.583  9.30 57.424   (55,65]
## 5  5       0 -4.005 0.377  1.71 62.732   (55,65]
## 6  6       1 -0.337 0.150  0.58 58.360   (55,65]

Generating ROC curves with pROC:

# With total PSA:
par(pty="s")
r = roc(data$outcome, data$tpsa)
plot.roc(r, print.auc=TRUE, auc.polygon=TRUE, auc.polygon.col=rgb(.3,0,.8,0.2), print.thres=TRUE)
plot.roc(smooth(r), add=TRUE, col=2)

Notice that the thresholds used to construct the ROC plot, together with the calculated sensitivities and specificities are stored in roc:

rock = pROC::roc(data$outcome, data$tpsa)
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
str(rock)
## List of 15
##  $ percent           : logi FALSE
##  $ sensitivities     : num [1:120] 1 1 1 0.986 0.986 ...
##  $ specificities     : num [1:120] 0 0.0286 0.0429 0.0429 0.0571 ...
##  $ thresholds        : num [1:120] -Inf 0.3 0.385 0.445 0.455 ...
##  $ direction         : chr "<"
##  $ cases             : num [1:71] 14.82 5.54 9.3 0.58 13.61 ...
##  $ controls          : num [1:70] 0.94 1.71 0.96 0.27 1.04 1.93 1.5 0.92 4.47 0.45 ...
##  $ fun.sesp          :function (thresholds, controls, cases, direction)  
##  $ auc               : 'auc' num 0.849
##   ..- attr(*, "partial.auc")= logi FALSE
##   ..- attr(*, "percent")= logi FALSE
##   ..- attr(*, "roc")=List of 15
##   .. ..$ percent           : logi FALSE
##   .. ..$ sensitivities     : num [1:120] 1 1 1 0.986 0.986 ...
##   .. ..$ specificities     : num [1:120] 0 0.0286 0.0429 0.0429 0.0571 ...
##   .. ..$ thresholds        : num [1:120] -Inf 0.3 0.385 0.445 0.455 ...
##   .. ..$ direction         : chr "<"
##   .. ..$ cases             : num [1:71] 14.82 5.54 9.3 0.58 13.61 ...
##   .. ..$ controls          : num [1:70] 0.94 1.71 0.96 0.27 1.04 1.93 1.5 0.92 4.47 0.45 ...
##   .. ..$ fun.sesp          :function (thresholds, controls, cases, direction)  
##   .. ..$ auc               : 'auc' num 0.849
##   .. .. ..- attr(*, "partial.auc")= logi FALSE
##   .. .. ..- attr(*, "percent")= logi FALSE
##   .. .. ..- attr(*, "roc")=List of 8
##   .. .. .. ..$ percent      : logi FALSE
##   .. .. .. ..$ sensitivities: num [1:120] 1 1 1 0.986 0.986 ...
##   .. .. .. ..$ specificities: num [1:120] 0 0.0286 0.0429 0.0429 0.0571 ...
##   .. .. .. ..$ thresholds   : num [1:120] -Inf 0.3 0.385 0.445 0.455 ...
##   .. .. .. ..$ direction    : chr "<"
##   .. .. .. ..$ cases        : num [1:71] 14.82 5.54 9.3 0.58 13.61 ...
##   .. .. .. ..$ controls     : num [1:70] 0.94 1.71 0.96 0.27 1.04 1.93 1.5 0.92 4.47 0.45 ...
##   .. .. .. ..$ fun.sesp     :function (thresholds, controls, cases, direction)  
##   .. .. .. ..- attr(*, "class")= chr "roc"
##   .. ..$ call              : language roc.default(response = data$outcome, predictor = data$tpsa)
##   .. ..$ original.predictor: num [1:141] 14.82 5.54 0.94 9.3 1.71 ...
##   .. ..$ original.response : int [1:141] 1 1 0 1 0 1 0 1 0 1 ...
##   .. ..$ predictor         : num [1:141] 14.82 5.54 0.94 9.3 1.71 ...
##   .. ..$ response          : int [1:141] 1 1 0 1 0 1 0 1 0 1 ...
##   .. ..$ levels            : chr [1:2] "0" "1"
##   .. ..- attr(*, "class")= chr "roc"
##  $ call              : language roc.default(response = data$outcome, predictor = data$tpsa)
##  $ original.predictor: num [1:141] 14.82 5.54 0.94 9.3 1.71 ...
##  $ original.response : int [1:141] 1 1 0 1 0 1 0 1 0 1 ...
##  $ predictor         : num [1:141] 14.82 5.54 0.94 9.3 1.71 ...
##  $ response          : int [1:141] 1 1 0 1 0 1 0 1 0 1 ...
##  $ levels            : chr [1:2] "0" "1"
##  - attr(*, "class")= chr "roc"
# Let's take a peek under the hood:

roc_scaffolding = data.frame(rock$thresholds, rock$sensitivities, rock$specificities)
head(roc_scaffolding)
##   rock.thresholds rock.sensitivities rock.specificities
## 1            -Inf          1.0000000         0.00000000
## 2           0.300          1.0000000         0.02857143
## 3           0.385          1.0000000         0.04285714
## 4           0.445          0.9859155         0.04285714
## 5           0.455          0.9859155         0.05714286
## 6           0.480          0.9859155         0.07142857

In addition to the PSA we can stratify by patient’s age:

# With age groups:

plot.roc(x = data$outcome[data$age_gr == "(45,55]"], 
      predictor = data$tpsa[data$age_gr == "(45,55]"], 
      print.auc = TRUE, col = "green", print.auc.col = "green", 
      print.auc.y = 0.97, print.auc.x = 0.5)

plot.roc(x = data$outcome[data$age_gr == "(55,65]"], 
      predictor = data$tpsa[data$age_gr == "(55,65]"], 
      print.auc = TRUE, col = "blue", print.auc.col = "blue", add = TRUE, 
      print.auc.y = 0.82, print.auc.x = 0.7)

plot.roc(x = data$outcome[data$age_gr == "(65,75]"], 
      predictor = data$tpsa[data$age_gr == "(65,75]"], 
      print.auc = TRUE, col = "red", add = TRUE, print.auc.col = "red", 
      print.auc.y = 0.7, print.auc.x = 0.8)

And with ROCR:

# With ROCR:
pred <- prediction(data$tpsa, data$outcome)
perf <- performance(pred, "tpr", "fpr")
# performance metrics TPR: True Positive Ratio FPR: False Positive Ratio
plot(perf, col = "green")
abline(0, 1, col = "grey")
auc <- performance(pred, "auc")
legend("bottomright", paste(round(as.numeric(auc@y.values), digits = 2)), col = c("green"), pch = c(3))


Alternatively, the Epi package can be used. In this case we are dealing with predicting the clinical outcome in patients with subarachnoid hemorrage.

data(aSAH)
head(aSAH)
##    gos6 outcome gender age wfns s100b  ndka
## 29    5    Good Female  42    1  0.13  3.01
## 30    5    Good Female  37    1  0.14  8.54
## 31    5    Good Female  42    1  0.10  8.09
## 32    5    Good Female  27    1  0.04 10.42
## 33    1    Poor Female  42    3  0.13 17.40
## 34    1    Poor   Male  48    2  0.10 12.75
rock = ROC(form = outcome ~ s100b, data=aSAH, plot = "ROC", MX = T)

Notice the values in the \(18\)-th row of the ROC$res printout - they are the values printed on plot above as lr.eta values:

rock$res[18,]
##                       sens      spec       pvp  pvn   lr.eta
## 0.30426295405785 0.6341463 0.8055556 0.2054795 0.35 0.304263

The maximized sum threshold or MST or the probability of the outcome is the lr.eta - in other words, the sigmoid function:


\[\large \mathbb{E}[Y_i|X_i]=\frac{1}{1 + e^{-(\beta_0+\beta_1\times\text{s100b})}}\] The lr.eta corresponds to the probability at the input corresponding to the Youden’s index.


rc <- ROC(form = outcome ~ s100b, data=aSAH, plot="sp") # Saving the ROC for later
ROC(form = outcome ~ s100b, data=aSAH, plot="sp" ) # Getting the ROC plot
# The following lines just literally color the black lines just printed:
lines(rc$res$lr, rc$res$spec, lwd = 2, type="s", col="red") #specificity
lines(rc$res$lr, rc$res$sens, lwd = 2, type="s", col="green") # sensitivity
lines(rc$res$lr, rc$res$pvp,  lwd = 2, type="s", col="orange") # pvp
lines(rc$res$lr, rc$res$pvn,  lwd = 2, type="s", col="magenta") # pvn

The maximal combination of sensitivity and specificity is given by:

## optimal combination of sensitivity and specificity:

which.max(rowSums(rc$res[, c("sens", "spec")])) # Again the same value from row 18!
## 0.30426295405785 
##               18

This is not the way a cut-off value is selected to report the result of a test as positive in clinical practice. Utilities (loss/cost) are needed to solve for the optimum cutoff on predicted risk.” See comments here.

Given that the parameters of the logistic regression are…

rc$lr
## 
## Call:  glm(formula = form, family = binomial, data = data)
## 
## Coefficients:
## (Intercept)        s100b  
##      -1.759        4.904  
## 
## Degrees of Freedom: 112 Total (i.e. Null);  111 Residual
## Null Deviance:       148 
## Residual Deviance: 123.6     AIC: 127.6

this “optimal sensitivity/specificity” value in row 18 corresponds to:

\[0.30426295405785 =\frac{1}{1+e^{-(-1.759+4.904\times \text{s100b})}}\]

and

\[\text{s100b}= 0.1900327\]

A plausible value of s100b:

summary(aSAH$s100b) # Remember that not all values are used as cutoff points.
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.030   0.090   0.140   0.247   0.330   2.070

that will optimize sensitivity and specificity.

If we wanted to get the cutoff points, we’d have to use sort(unique(aSAH)). Let’s see the value in position \(18\)

(points = c("Inf",sort(unique(aSAH$s100b))))[18]
## [1] "0.19"



DETERMINING A THRESHOLD VALUE:

From here:

For the two-state setting, we used a dataset from Kapaki, Paraskevas, Zalonis, and Zournas(2003), which contains measurements of tau protein levels in the cerebrospinal fluid of \(49\) control subjects and \(49\) patients with Alzheimer’s disease (AD). The authors reported that the cut-off of 317 led to an optimal combination of sensitivity \((0.88)\) and specificity \((0.96),\) producing \(R = 1.\) We calculated an alternative threshold for this dataset using the functions in ThresholdROC. This approach accounts for specific characteristics of the problem by choosinga reasonable combination of costs based on clinical criteria. We set the costs corresponding to correct classification at zero (that is, \(\text{CTP = CTN = 0}\)) given that there are no consequences when correct decisions are made. The costs corresponding to false classifications were set at \(1\) (i.e., \(\text{CFP = CFN = 1}\)), placing the same weight on false positives and false negatives. The value for AD prevalence was set at \(0.2\) based on the literature (Tsolaki, Fountoulakis, Pavlopoulos, Chatzi, and Kazis 1999; Ferri et al. 2005). To determine if the measurements from both the control and diseased groups could be assumed to follow a normal distribution in deciding the method to use for threshold estimation, we applied Shapiro-Wilk’s test to the measurements, obtaining \(p < 0.01\) in both cases. Since the data failed the normality tests, the empirical method was used. Thus, using thres2(), the threshold that minimizes the cost function for these cost and prevalence values was estimated to be \(384.45,\) corresponding to a sensitivity of \(0.76\) and a specificity of \(1.\) Using a confidence level of $95% $ and applying the bootstrap methodology with \(1000\) resamples, the confidence intervals of the threshold were \((304.40,464.51)\) for the bootstrap method based on the normal distribution and \((295.37,444.03)\) for the percentile technique.

require(ThresholdROC)
## Loading required package: ThresholdROC
k1 = AD[AD$Status==0,'Tau']
k2 = AD[AD$Status==1,'Tau']
rho = 0.2   # Disease prevalence
FPC = 1     # False negative cost
FNC = 1     # False positive cost

costs = matrix(c(0,0,FPC,FNC),2,2,byrow=T)
thresh <- thres2(k1, k2, rho, R=NULL, costs, "empirical", ci.method='boot', extra.info=T)

par(mfrow=c(1,3))
plot(thresh)
plotCostROC(thresh)

If we can’t tolerate any false positives:

par(mfrow=c(1,3))
FPC = 1000
FNC = 1     # If the cost of missing any cases is huge:
costs = matrix(c(0,0,FPC,FNC),2,2,byrow=T)
thresh <- thres2(k1, k2, rho, R=NULL, costs, "empirical", ci.method='boot', extra.info=T)
plot(thresh)
plotCostROC(thresh)

If the cost of missing any cases is huge:

par(mfrow=c(1,3))
FPC = 1  
FNC = 1000
costs = matrix(c(0,0,FPC,FNC),2,2,byrow=T)
thresh <- thres2(k1, k2, rho, R=NULL, costs, "empirical", ci.method='boot', extra.info=T)
plot(thresh)
plotCostROC(thresh)


MANUAL REPRODUCTION:


data("aSAH")
(data <- aSAH[1:5,]) # We take only the first x rows - for instance 5
##    gos6 outcome gender age wfns s100b  ndka
## 29    5    Good Female  42    1  0.13  3.01
## 30    5    Good Female  37    1  0.14  8.54
## 31    5    Good Female  42    1  0.10  8.09
## 32    5    Good Female  27    1  0.04 10.42
## 33    1    Poor Female  42    3  0.13 17.40
contrasts(data$outcome)
##      Poor
## Good    0
## Poor    1
fit <- glm(outcome ~ s100b, data, family = "binomial") # This is the logistic regression that we will use as a predictor.

ODDS RATIOS:

\[\color{blue}{\text{odds(Y=1)}} = \frac{\Pr\,(Y=1)}{1\,-\,\Pr\,(Y=1)} = e^{\,\beta_0 + \beta_1 x_1 + \cdots + \beta_p x_p} = e^{\,\beta_o}\,e^{\,\beta_1x_1}\,e^{\,\beta_2x_2}\,\cdots\,e^{\,\beta_px_p}\]

Therefore a unit increase in \(x_1\) increases the odds \(\large e^{\,\beta_1}\) times. The factors \(\large e^{\,\beta_i}\) are the ODDS RATIO’s. On the other hand, \(\beta_i\) (the coefficients) are the LOG ODDS-RATIO.

exp(coef(fit))
##  (Intercept)        s100b 
## 1.571381e-03 1.690125e+18
exp(confint(fit))
##                    2.5 %        97.5 %
## (Intercept) 1.499772e-22  1.435188e+01
## s100b       1.165035e-16 1.174063e+160

PROBABILITIES:

PROBABILITIES given again by the sigmoid function:

\[\color{green}{\text{p(Y = 1)}} = \frac{\color{blue}{\text{odds(Y=1)}}}{1\,+\,\color{blue}{\text{odds(Y=1)}}}=\frac{e^{\,\beta_0 + \beta_1 x_1 + \cdots + \beta_p x_p}}{1 \,+\,e^{\,\beta_0 + \beta_1 x_1 + \cdots + \beta_p x_p}}=\frac{1}{1 \,+\,e^{-(\,\beta_0 + \beta_1 x_1 + \cdots + \beta_p x_p)}}\]

In a glm() (logistic regression) model, predict() simply gives you the LINEAR EQUATION (\(\beta_0 + \beta_1 x_1 + \cdots + \beta_p x_p\)).

probabilities = exp(predict(fit)) / (  1 + exp(predict(fit))  ) 

These are the predicted probabilities (not the odds or the results of the logistic regression) for each value of s100b in the dataset.

Let’s get the thresholds from the pROC package:

(threshold = roc(data$outcome, data$s100b)$thresholds) 
## [1]  -Inf 0.070 0.115 0.135   Inf

Here we have them… Compare them to s100b… Parallel, but not quite:

sort(data$s100b)
## [1] 0.04 0.10 0.13 0.13 0.14

Let’s add the predicted probabilities:

data$probabilities <- probabilities

# and order the dataset just to take a peek at what it looks like...

(data[with(data, order(s100b)), ])
##    gos6 outcome gender age wfns s100b  ndka probabilities
## 32    5    Good Female  27    1  0.04 10.42   0.008351337
## 31    5    Good Female  42    1  0.10  8.09   0.094604465
## 29    5    Good Female  42    1  0.13  3.01   0.269034388
## 33    1    Poor Female  42    3  0.13 17.40   0.269034388
## 30    5    Good Female  37    1  0.14  8.54   0.358975423
# Now let's what happens with different cutoff points...

cut <- rep(0, length(threshold)) # Establishing cutoff points empty vector
for (i in 1:length(threshold)){
cut[i] <- exp(predict(fit,data.frame(s100b=threshold[i]))) / (1 + exp(predict(fit,data.frame(s100b=threshold[i]))))
}
cut
## [1] 0.00000000 0.02880979 0.16395407 0.31223942        NaN

The cuts are the different probabilities of having a good outcome based on the s100b theresholds. We have to decide what probability is “high enough” to classify all values with a higher probability as “POOR” outcome. Sometimes we’ll be right; others wrong - it is a “supervised” learning process.

These are going to be the cutoff points:

cut[is.na(cut)] <- 1.01 * max(data$probabilities) # These is to get rid of NA's.
cut
## [1] 0.00000000 0.02880979 0.16395407 0.31223942 0.36256518
  • For cut[1] = 0, all outcomes predicted as “Poor” (1): \(\text{sens} = \frac{1\text{TP}}{1\text{ "Poor"}}=1/1\), \(\text{spec} = \frac{0\text{TN}}{4\text{"Good"}}=0/4\)

  • For cut[2] = 0.02880979, there will \(4\) predicted as “Poor” (1): \(\text{sens} = \frac{1\text{TP}}{1\text{"Poor"}}=1/1\), \(\text{spec} = \frac{1\text{TN}}{4\text{"Good"}}=1/4\)

  • For cut[3] = 0.1639541, there will \(3\) predicted as “Poor” (1): \(\text{sens} = \frac{1\text{TP}}{1\text{"Poor"}}=1/1\), \(\text{spec} = \frac{2\text{TN}}{4\text{"Good"}}=2/4\)

  • For cut[4] = 0.3122394, there will \(1\) predicted as “Poor” (1): \(\text{sens} = \frac{0\text{TP}}{1\text{"Poor"}}=0/1\), \(\text{spec} = \frac{3\text{TN}}{4\text{"Good"}}=3/4\)

  • For cut[5] = 0.36256518, there will \(0\) predicted as “Poor” (1): \(\text{sens} = \frac{0\text{TP}}{1\text{"Poor"}}=0\), \(\text{spec} = \frac{4\text{TN}}{4\text{"Good"}}=4/4\)

Coordinates then are:

\((1,0)\), \((1, 1/4)\), \((1, 1/2)\), \((0,3/4)\), \((0,1)\)


In this particular example, then, the third cutoff or threshold point \(\color{red}{0.115}\), associated with a probability of \(0.16395\) would result in the highest combination of sensitivity and specificity.



If the probabilities of “success”” (outcome 1 or, most commonly, “disease”) predicted by the logistic regression order the values perfectly along the actual presence of “success”, we could end up with a system like:



In contradistinction, in a really bad test the predicted probabilities would bear no relation to the actual presence of the outcome, and the ROC curve would form a perfect diagonal:




In general, though, the shape of an ROC curve will tend to be something like:




# Let's automate:

data$outcome <- as.numeric(data$outcome) - 1
data[with(data, order(probabilities)), ]
##    gos6 outcome gender age wfns s100b  ndka probabilities
## 32    5       0 Female  27    1  0.04 10.42   0.008351337
## 31    5       0 Female  42    1  0.10  8.09   0.094604465
## 29    5       0 Female  42    1  0.13  3.01   0.269034388
## 33    1       1 Female  42    3  0.13 17.40   0.269034388
## 30    5       0 Female  37    1  0.14  8.54   0.358975423
positive <- 0
negative <- 0
tru_pos <- 0
tru_neg <- 0
for (i in 1:length(cut)){
sum(data$outcome)
positive[i] <- nrow(data[data$probabilities > cut[i], ])
negative[i] <- nrow(data) - positive[i]

# tru_pos will be the sum of the actual "1" or "successes" on the rows with values in the predictor function higher than the cutoff point:

tru_pos[i]  <- sum(data[data$probabilities > cut[i], ]$outcome)

# tru_neg will be the sum of the actual "1" or "successes" on the rows with values in the predictor function lower than the cutoff point (these are FN) subtracted from all the negatives:

tru_neg[i]  <- negative[i] - sum(data[data$probabilities < cut[i], ]$outcome)
}

sens   <- tru_pos / sum(data$outcome)
specif <- tru_neg / (length(data$outcome) - sum(data$outcome))
(rock = cbind(cut, positive, negative,tru_pos,tru_neg, sens, specif))
##             cut positive negative tru_pos tru_neg sens specif
## [1,] 0.00000000        5        0       1       0    1   0.00
## [2,] 0.02880979        4        1       1       1    1   0.25
## [3,] 0.16395407        3        2       1       2    1   0.50
## [4,] 0.31223942        1        4       0       3    0   0.75
## [5,] 0.36256518        0        5       0       4    0   1.00
par(mfrow=c(1,2))

# What we want:

plot.roc(data$outcome, data$s100b, 
    auc.polygon = TRUE, 
    auc.polygon.col=rgb(.35,0.31,0.61, alpha = 0.4), 
    auc.polygon.border=rgb(.35,0.31,0.61, 0.4))


# what we got will be superimposed as a red line on top of the ROC generated by R:

plot.roc(data$outcome, data$s100b,lwd=15,col="gray75")
rock <- rbind(rep(0,7), rock)
lines(rock[,6] ~ rock[,7],xlim=rev(range(rock[,7])), lwd = 2,col ="red") 
polygon(rock[,6] ~ rock[,7], xlim=rev(range(rock[,7])), col=rgb(.9,0.1,0.1, alpha = 0.4))
segments(1,0,0,1)
segments(1,0,0,1)

Now for the entire dataset:

We can calculate the AUC manually:

AUC = function(sens,specif){
    partitions <- 1 / length(threshold[2:(length(threshold) - 1)])
    rect <- partitions * sens[2:((length(sens) - 1) )]
    triang <- partitions * sens[2:((length(sens) - 1) )] / 2
    sum(rect) + sum(triang)
}
AUC(sens,specif)
## [1] 0.7458935
roc(data$outcome, data$s100b, print.auc = T)
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## 
## Call:
## roc.default(response = data$outcome, predictor = data$s100b,     print.auc = T)
## 
## Data: data$s100b in 72 controls (data$outcome 0) < 41 cases (data$outcome 1).
## Area under the curve: 0.7314

Home Page

NOTE: These are tentative notes on different topics for personal use - expect mistakes and misunderstandings.