Example

Out of 350 patients that adhered to a new drug treatment, 273 recovered.The response proportion was 273/350 or 0.78 (78%). We can use the rbinom function to simulate this binomial process. Let’s simulate this 10 times (i.e., 10 replicates).

replicates <- 10
num_of_events <- 273
num_at_risk <- 350
risk <- num_of_events/num_at_risk
sim_events <- rbinom(n = replicates, size = num_at_risk, prob = risk)
sim_events
##  [1] 288 271 266 272 262 262 262 277 268 270

Exercise 5.1 (Bootstrap a confidence interval). In the same study above, 350 patients did not take the new drug treatment, and 289 recovered. Calculate the bootstrap confidence interval for this proportion (289/350).

replicates <- 5000
num_of_events <- 289
num_at_risk <- 350
risk <- num_of_events/num_at_risk
sim_events <- rbinom(n = replicates, size = num_at_risk, prob = risk)
conf.level <- 0.95
alpha <- 1 - conf.level
## 0.95 CI for binomial counts
quantile(sim_events, c(alpha/2, 1-alpha/2))
##  2.5% 97.5% 
##   275   303
## 0.95 CI for binomial proportions
quantile(sim_events, c(alpha/2, 1-alpha/2))/num_at_risk
##      2.5%     97.5% 
## 0.7857143 0.8657143

Exercise 5.2 (Function for bootstrapping a confidence interval). Adapt your code from the previous problem to write and test a function for calculating the bootstrap confidence interval for a binomial proportion.

boots <- function(x1, x0, conf.level = 0.95, replicates = 5000){
#### x1 = c(number cases among exposed, number of exposed)
#### x0 = c(number cases among nonexposed, number of nonexposed)
n1 <- x1[2] 
n0 <- x0[2]
p1 <- x1/n1 # risk among exposed
p0 <- x0/n0 # risk among unexposed
risk <- num_of_events/num_at_risk
sim_events <- rbinom(n = replicates, size = num_at_risk, prob = risk)
alpha <- 1 - conf.level
## 0.95 CI for binomial counts
quantile(sim_events, c(alpha/2, 1-alpha/2))
ci<-quantile(sim_events, c(alpha/2, 1-alpha/2))/num_at_risk
list(
conf.int = unname(ci),
conf.level = conf.level,
replicates = replicates)
}
## test
x1<-289
x0<-350
boots(x1, x0, conf.level = 0.95, replicates = 5000)
## $conf.int
## [1] 0.7857143 0.8657143
## 
## $conf.level
## [1] 0.95
## 
## $replicates
## [1] 5000

Exercise 5.3 (Calculating risks and risk ratio). Consider an observational study where 700 patients were given access to a new drug for an ailment. A total of 350 patients chose to take the drug and 350 patients did not. Table 5.4 displays the results of this study.

Using the results from Table 5.4 create and test a function that:

Calculates the risks 𝑅1 and 𝑅0

Calculates the risk ratio (𝑅𝑅)

Calculates 95% confidence intervals

Uses the fisher.test function (Fisher’s Exact Test) to calculate the twosided 𝑝-value of the null hypothesis of no association. (Hint: see epitable function in Exercise 5.5)

Collects and returns all the results as a ‘list’.

Tab5.4 <- matrix(c(273,289,77, 61), 2, 2)
rownames(Tab5.4) <- c('Yes', 'No')
colnames(Tab5.4) <- c('Yes', 'No')
dimnames(Tab5.4)<- list( Treatment= c('Yes','No'), Recovery= c('Yes','No') )
epitable <- function(a, b, c, d, digits = 3, ...){
N1=a+c
N0=b+d
R0 <- b/N0
R1 <- a/N1
RR <- R1/R0
OR <- (R1/(1-R1)) / (R0/(1-R0))
ft <- fisher.test(Tab5.4,simulate.p.value=TRUE,B=2000,...)
pv <- ft$p.value
SE.RR<-sqrt((1/a)-(1/N1)+(1/b)-(1/N0))
conf.int<-c(RR-1.96*SE.RR, RR+1.96*SE.RR)
results <- signif(c(R0, R1, RR, OR, pv, conf.int), digits = digits)
names(results) <- c('R_0', 'R_1', 'RR', 'OR', 'p.value', '95 lower ci', '95 upper ci')
list(data = Tab5.4, results = results)
}
epitable(273,289, 77, 61)
## $data
##          Recovery
## Treatment Yes No
##       Yes 273 77
##       No  289 61
## 
## $results
##         R_0         R_1          RR          OR     p.value 95 lower ci 
##       0.826       0.780       0.945       0.748       0.154       0.871 
## 95 upper ci 
##       1.020
fisher.test(Tab5.4)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  Tab5.4
## p-value = 0.154
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.504886 1.106634
## sample estimates:
## odds ratio 
##  0.7486641

Exercise 5.4 (Evaluating drug treatment). Consider an observational study where 700 patients were given access to a new drug for an ailment. A total of 350 patients chose to take the drug and 350 patients did not. Table 5.4 displays the results of this study.Using the read.csv function import the data frame for Table 5.4. Use the code below.

Analyze the drugrx.df data frame using the xtab function, your function from the previous problem, and any R functions you wish. Is the drug treatment effective? Is there a difference in drug treatment effectiveness comparing men and women?

drugrx.df<- read.csv("/Users/dr.auwal/Desktop/Personals/UC Berkeley/Classes/PB HLTH 251D Applied Epidemiology Using R/Practice/data/drugrx-pearl2.csv")
xTab5.4<- xtabs(~ Drug + Recovered, data = drugrx.df)
fisher.test(xTab5.4)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  xTab5.4
## p-value = 0.154
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.504886 1.106634
## sample estimates:
## odds ratio 
##  0.7486641

Statistically, it seems the drug is not effective. Since the drug is not effective, its effectiveness comparing both men and women will not be appreciated

Exercise 5.5 (Oswego outbreak investigation). Study this clean version of the Oswego data set and import it as a data frame: ~/data/oswego/oswego2.txt. Remember this data set is available from https://github.com/taragonmd/data. Locate the data set and select the “Raw” tab. Copy the URL and use with the read.csv function. Import this clean version.

oswego2 <- read.csv("https://raw.githubusercontent.com/taragonmd/data/master/oswego/oswego2.txt", strip.white = TRUE)
food.item <- c("Baked.ham", "Spinach", "Mashed.potato", "Cabbage.salad",
"Jello", "Rolls", "Brown.bread", "Milk", "Coffee", "Water", "Cakes",
"Vanilla.ice.cream", "Chocolate.ice.cream", "Fruit.salad")
names(oswego2)[9:22] <- food.item
str(oswego2)
## 'data.frame':    75 obs. of  22 variables:
##  $ ID                 : int  2 3 4 6 7 8 9 10 14 16 ...
##  $ Age                : int  52 65 59 63 70 40 15 33 10 32 ...
##  $ Sex                : Factor w/ 2 levels "F","M": 1 2 1 1 2 1 1 1 2 1 ...
##  $ MealDate           : Factor w/ 1 level "04/18/1940": 1 1 1 1 1 1 1 1 1 1 ...
##  $ MealTime           : Factor w/ 6 levels "11:00:00","18:30:00",..: 5 2 2 4 4 4 6 3 4 NA ...
##  $ Ill                : Factor w/ 2 levels "N","Y": 2 2 2 2 2 2 2 2 2 2 ...
##  $ OnsetDate          : Factor w/ 2 levels "4/18/1940","4/19/1940": 2 2 2 1 1 2 2 1 2 2 ...
##  $ OnsetTime          : Factor w/ 17 levels "00:00:00","00:30:00",..: 2 2 2 15 15 4 3 16 4 7 ...
##  $ Baked.ham          : Factor w/ 2 levels "N","Y": 2 2 2 2 2 1 1 2 1 2 ...
##  $ Spinach            : Factor w/ 2 levels "N","Y": 2 2 2 2 2 1 1 2 1 2 ...
##  $ Mashed.potato      : Factor w/ 2 levels "N","Y": 2 2 1 1 2 1 1 2 1 1 ...
##  $ Cabbage.salad      : Factor w/ 2 levels "N","Y": 1 2 1 2 1 1 1 1 1 1 ...
##  $ Jello              : Factor w/ 2 levels "N","Y": 1 1 1 2 2 1 1 1 1 1 ...
##  $ Rolls              : Factor w/ 2 levels "N","Y": 2 1 1 1 2 1 1 2 1 2 ...
##  $ Brown.bread        : Factor w/ 2 levels "N","Y": 1 1 1 1 2 1 1 2 1 1 ...
##  $ Milk               : Factor w/ 2 levels "N","Y": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Coffee             : Factor w/ 2 levels "N","Y": 2 2 2 1 2 1 1 1 1 2 ...
##  $ Water              : Factor w/ 2 levels "N","Y": 1 1 1 2 2 1 1 2 1 1 ...
##  $ Cakes              : Factor w/ 2 levels "N","Y": 1 1 2 1 1 1 2 1 1 2 ...
##  $ Vanilla.ice.cream  : Factor w/ 2 levels "N","Y": 2 2 2 2 2 2 1 2 2 2 ...
##  $ Chocolate.ice.cream: Factor w/ 2 levels "N","Y": 1 2 2 1 1 2 2 2 2 2 ...
##  $ Fruit.salad        : Factor w/ 2 levels "N","Y": 1 1 1 1 1 1 1 1 1 1 ...

Study the following epitable function, especially the use of the … argument to past options to the fisher.test function.

epitable <- function(exposure, outcome, digits = 3, ...){
# 2x2 table from 'xtabs' function
# row names are exposure status
# col names are illness outcome
# digits = significant digits (default = 3)
# ... = options to fisher.test function
x <- xtabs(~ exposure + outcome, na.action=na.pass)
rowtot <- apply(x, 1, sum)
rowdist <- sweep(x, 1, rowtot, '/')
risk0 <- rowdist['N','Y']
risk1 <- rowdist['Y','Y']
rr <- risk1/risk0
or <- (risk1/(1-risk1)) / (risk0/(1-risk0))
ft <- fisher.test(x, ...)
pv <- ft$p.value
results <- signif(c(risk0, risk1, rr, or, pv), digits = digits)
names(results) <- c('R_0', 'R_1', 'RR', 'OR', 'p.value')
list(data = x, results = results)
}

Use the epitable function to test each food item to see if it is associated with developing illness. Construct a table with the following column headings

epitable(oswego2$Baked.ham, oswego2$Ill)
## $data
##         outcome
## exposure  N  Y
##        N 12 17
##        Y 17 29
## 
## $results
##     R_0     R_1      RR      OR p.value 
##   0.586   0.630   1.080   1.200   0.809
epitable(oswego2$Spinach, oswego2$Ill)
## $data
##         outcome
## exposure  N  Y
##        N 12 20
##        Y 17 26
## 
## $results
##     R_0     R_1      RR      OR p.value 
##   0.625   0.605   0.967   0.918   1.000
epitable(oswego2$Mashed.potato, oswego2$Ill)
## $data
##         outcome
## exposure  N  Y
##        N 14 23
##        Y 14 23
## 
## $results
##     R_0     R_1      RR      OR p.value 
##   0.622   0.622   1.000   1.000   1.000
epitable(oswego2$Cabbage.salad, oswego2$Ill)
## $data
##         outcome
## exposure  N  Y
##        N 19 28
##        Y 10 18
## 
## $results
##     R_0     R_1      RR      OR p.value 
##   0.596   0.643   1.080   1.220   0.808
epitable(oswego2$Jello, oswego2$Ill)
## $data
##         outcome
## exposure  N  Y
##        N 22 30
##        Y  7 16
## 
## $results
##     R_0     R_1      RR      OR p.value 
##   0.577   0.696   1.210   1.680   0.442
epitable(oswego2$Rolls, oswego2$Ill)
## $data
##         outcome
## exposure  N  Y
##        N 13 25
##        Y 16 21
## 
## $results
##     R_0     R_1      RR      OR p.value 
##   0.658   0.568   0.863   0.682   0.482
epitable(oswego2$Brown.bread, oswego2$Ill)
## $data
##         outcome
## exposure  N  Y
##        N 20 28
##        Y  9 18
## 
## $results
##     R_0     R_1      RR      OR p.value 
##   0.583   0.667   1.140   1.430   0.622
epitable(oswego2$Milk, oswego2$Ill)
## $data
##         outcome
## exposure  N  Y
##        N 27 44
##        Y  2  2
## 
## $results
##     R_0     R_1      RR      OR p.value 
##   0.620   0.500   0.807   0.614   0.638
epitable(oswego2$Coffee, oswego2$Ill)
## $data
##         outcome
## exposure  N  Y
##        N 17 27
##        Y 12 19
## 
## $results
##     R_0     R_1      RR      OR p.value 
##   0.614   0.613   0.999   0.997   1.000
epitable(oswego2$Water, oswego2$Ill)
## $data
##         outcome
## exposure  N  Y
##        N 18 33
##        Y 11 13
## 
## $results
##     R_0     R_1      RR      OR p.value 
##   0.647   0.542   0.837   0.645   0.450
epitable(oswego2$Cakes, oswego2$Ill)
## $data
##         outcome
## exposure  N  Y
##        N 16 19
##        Y 13 27
## 
## $results
##     R_0     R_1      RR      OR p.value 
##   0.543   0.675   1.240   1.750   0.342
epitable(oswego2$Vanilla.ice.cream, oswego2$Ill)
## $data
##         outcome
## exposure  N  Y
##        N 18  3
##        Y 11 43
## 
## $results
##      R_0      R_1       RR       OR  p.value 
## 1.43e-01 7.96e-01 5.57e+00 2.35e+01 2.60e-07
epitable(oswego2$Chocolate.ice.cream, oswego2$Ill)
## $data
##         outcome
## exposure  N  Y
##        N  7 20
##        Y 22 25
## 
## $results
##     R_0     R_1      RR      OR p.value 
##  0.7410  0.5320  0.7180  0.3980  0.0891
epitable(oswego2$Fruit.salad, oswego2$Ill)
## $data
##         outcome
## exposure  N  Y
##        N 27 42
##        Y  2  4
## 
## $results
##     R_0     R_1      RR      OR p.value 
##   0.609   0.667   1.100   1.290   1.000

The Table

Table <- function(x) {
epitable(x, oswego2$Ill)$results
}
t(sapply(oswego2[, food.item], Table))
##                       R_0   R_1    RR     OR  p.value
## Baked.ham           0.586 0.630 1.080  1.200 8.09e-01
## Spinach             0.625 0.605 0.967  0.918 1.00e+00
## Mashed.potato       0.622 0.622 1.000  1.000 1.00e+00
## Cabbage.salad       0.596 0.643 1.080  1.220 8.08e-01
## Jello               0.577 0.696 1.210  1.680 4.42e-01
## Rolls               0.658 0.568 0.863  0.682 4.82e-01
## Brown.bread         0.583 0.667 1.140  1.430 6.22e-01
## Milk                0.620 0.500 0.807  0.614 6.38e-01
## Coffee              0.614 0.613 0.999  0.997 1.00e+00
## Water               0.647 0.542 0.837  0.645 4.50e-01
## Cakes               0.543 0.675 1.240  1.750 3.42e-01
## Vanilla.ice.cream   0.143 0.796 5.570 23.500 2.60e-07
## Chocolate.ice.cream 0.741 0.532 0.718  0.398 8.91e-02
## Fruit.salad         0.609 0.667 1.100  1.290 1.00e+00

From the list of results, only Vanilla ice cream’s p-value is significant. Therefore Vanilla ice cream may be associated with illness

Exercise 5.6 (Generate outbreak investigation results table). Write a for loop to automate the creation of the outbreak table from previous exercise. Hint provided:

outbreak <- matrix(NA, length(food.item), 5)
rownames(outbreak) <- food.item
colnames(outbreak) <- c('R_0','R_1','RR','OR','P value')
for(i in 1:length(food.item)) {
outbreak[i,] <- epitable(oswego2[,food.item[i]], oswego2$Ill)$results
}
outbreak
##                       R_0   R_1    RR     OR  P value
## Baked.ham           0.586 0.630 1.080  1.200 8.09e-01
## Spinach             0.625 0.605 0.967  0.918 1.00e+00
## Mashed.potato       0.622 0.622 1.000  1.000 1.00e+00
## Cabbage.salad       0.596 0.643 1.080  1.220 8.08e-01
## Jello               0.577 0.696 1.210  1.680 4.42e-01
## Rolls               0.658 0.568 0.863  0.682 4.82e-01
## Brown.bread         0.583 0.667 1.140  1.430 6.22e-01
## Milk                0.620 0.500 0.807  0.614 6.38e-01
## Coffee              0.614 0.613 0.999  0.997 1.00e+00
## Water               0.647 0.542 0.837  0.645 4.50e-01
## Cakes               0.543 0.675 1.240  1.750 3.42e-01
## Vanilla.ice.cream   0.143 0.796 5.570 23.500 2.60e-07
## Chocolate.ice.cream 0.741 0.532 0.718  0.398 8.91e-02
## Fruit.salad         0.609 0.667 1.100  1.290 1.00e+00

Exercise 5.7 (Comparing lapply and sapply). Study the following use of lapply and sapply. Explain how both functions work, and what are their differences.

myepitab <- function(x) {
epitable(x, oswego2$Ill)$results
}
lapply(oswego2[, food.item], myepitab)
## $Baked.ham
##     R_0     R_1      RR      OR p.value 
##   0.586   0.630   1.080   1.200   0.809 
## 
## $Spinach
##     R_0     R_1      RR      OR p.value 
##   0.625   0.605   0.967   0.918   1.000 
## 
## $Mashed.potato
##     R_0     R_1      RR      OR p.value 
##   0.622   0.622   1.000   1.000   1.000 
## 
## $Cabbage.salad
##     R_0     R_1      RR      OR p.value 
##   0.596   0.643   1.080   1.220   0.808 
## 
## $Jello
##     R_0     R_1      RR      OR p.value 
##   0.577   0.696   1.210   1.680   0.442 
## 
## $Rolls
##     R_0     R_1      RR      OR p.value 
##   0.658   0.568   0.863   0.682   0.482 
## 
## $Brown.bread
##     R_0     R_1      RR      OR p.value 
##   0.583   0.667   1.140   1.430   0.622 
## 
## $Milk
##     R_0     R_1      RR      OR p.value 
##   0.620   0.500   0.807   0.614   0.638 
## 
## $Coffee
##     R_0     R_1      RR      OR p.value 
##   0.614   0.613   0.999   0.997   1.000 
## 
## $Water
##     R_0     R_1      RR      OR p.value 
##   0.647   0.542   0.837   0.645   0.450 
## 
## $Cakes
##     R_0     R_1      RR      OR p.value 
##   0.543   0.675   1.240   1.750   0.342 
## 
## $Vanilla.ice.cream
##      R_0      R_1       RR       OR  p.value 
## 1.43e-01 7.96e-01 5.57e+00 2.35e+01 2.60e-07 
## 
## $Chocolate.ice.cream
##     R_0     R_1      RR      OR p.value 
##  0.7410  0.5320  0.7180  0.3980  0.0891 
## 
## $Fruit.salad
##     R_0     R_1      RR      OR p.value 
##   0.609   0.667   1.100   1.290   1.000
t(sapply(oswego2[, food.item], myepitab)) # transpose output
##                       R_0   R_1    RR     OR  p.value
## Baked.ham           0.586 0.630 1.080  1.200 8.09e-01
## Spinach             0.625 0.605 0.967  0.918 1.00e+00
## Mashed.potato       0.622 0.622 1.000  1.000 1.00e+00
## Cabbage.salad       0.596 0.643 1.080  1.220 8.08e-01
## Jello               0.577 0.696 1.210  1.680 4.42e-01
## Rolls               0.658 0.568 0.863  0.682 4.82e-01
## Brown.bread         0.583 0.667 1.140  1.430 6.22e-01
## Milk                0.620 0.500 0.807  0.614 6.38e-01
## Coffee              0.614 0.613 0.999  0.997 1.00e+00
## Water               0.647 0.542 0.837  0.645 4.50e-01
## Cakes               0.543 0.675 1.240  1.750 3.42e-01
## Vanilla.ice.cream   0.143 0.796 5.570 23.500 2.60e-07
## Chocolate.ice.cream 0.741 0.532 0.718  0.398 8.91e-02
## Fruit.salad         0.609 0.667 1.100  1.290 1.00e+00

Both lapply and sapply are used to apply function to each component (bin) of a list. They are identical except that sapply “simplifies” the final result and lapply returns list

Exercise 5.8 (Lexical scoping). Study the following R code and explain the final answer for g(2):

a <- 1
b <- 2
f <- function(x) {
a * x + b
}
g <- function(x) {
a <- 2
b <- 1
f(x)
}
g(2)
## [1] 4

The values executed by g(x) is predefined by the function of ‘f’ and R executes the function defined before ‘f’. Although the functions are defined before ‘g’, R doesn’t consider the position relative to ‘g’. Since a <- 1, b <- 2 was defined before ‘f’, hence R works with this code. Therefore a * x + b= 1*2+2=4.

Exercise 5.9 (More lexical scoping). Study the following R code and explain the final answer for g(2):

g <- function(x) {
a <- 2
b <- 1
f <- function(x) {
a * x + b
}
f(x)
}
g(2)
## [1] 5