Assignment 5

#Exercise 5.1 (Bootstrap a confidence interval)

In the same study above, 350 patients did not take the new drug treatment, and 289 recovered. Calculate the bootstrap confidence interval for this proportion (289/350).

replicates<-5000
num_at_risk<-350
num_of_events<-289
risk <-num_of_events/num_at_risk
sim_events <- rbinom(n = replicates, size = num_at_risk, prob = risk)
conf.level <- 0.95
alpha <- 1 - conf.level
## 0.95 CI for binomial counts
quantile(sim_events, c(alpha/2, 1-alpha/2))

##  2.5% 97.5% 
##   275   303

#Exercise 5.2 (Function for bootstrapping a confidence interval)

Adapt your code from the previous problem to write and test a function for calculating the bootstrap confidence interval for a binomial proportion.

For example, if your previous code was

boots<-function(r,e,t,c){
  sim<-rbinom(n=r, size =t, p=e/t)
  a<-1-c
  quantile(sim, c(a/2, 1-a/2))
}

boots(500, 289, 350, .95)

##    2.5%   97.5% 
## 274.475 303.000

#Exercise 5.3

Using the results from Table 5.4 create and test a function that: a. Calculates the risks R0 and R1 b. Calculates the risk ratio c. Calculates 95% confidence intervals d. Uses the fisher.test function (Fisher’s Exact Test) to calculate the two-sided p-value of the null hypothesis of no association. (Hint: see epitable function in Exercise 5.5) Collects and returns all the results as a ‘list’

epistats<-function(a,b,c,d){
  r0<-a/a+b
  r1<-c/c+d
  rr<-r0/r1
  matr<-cbind(c(a,b),c(c,d))
  ci<-quantile(rr, c(.05/2, .95/2))
  fisher<-fisher.test(x=matr, alternative="two.sided", conf.level = 0.95)
  list(risks = c(r0 = r0, r1 = r1),
       risk.ratio = rr,
       conf.level = ci,
       fisher=fisher)
}

epistats(273, 289, 77, 61)

## $risks
##  r0  r1 
## 290  62 
## 
## $risk.ratio
## [1] 4.677419
## 
## $conf.level
##     2.5%    47.5% 
## 4.677419 4.677419 
## 
## $fisher
## 
##  Fisher's Exact Test for Count Data
## 
## data:  matr
## p-value = 0.154
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.504886 1.106634
## sample estimates:
## odds ratio 
##  0.7486641

#Exercise 5.4 (Evaluating drug treatment) Consider an observational study where 700 patients were given access to a new drug for an ailment. A total of 350 patients chose to take the drug and 350 patients did not. Table 5.4 displays the results of this study. Using the read.csv function import the data frame for Table 5.4. Use the code below.

Analyze the drugrx.df data frame using the xtab function, your function from the previous problem, and any R functions you wish. Is the drug treatment effective? Is there a difference in drug treatment effectiveness comparing men and women?

drugrx.df <- read.csv("https://raw.githubusercontent.com/taragonmd/data/master/exported/jp-drugrx-p2.txt")
str(drugrx.df)

## 'data.frame':    700 obs. of  4 variables:
##  $ X        : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Recovered: Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Drug     : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Gender   : Factor w/ 2 levels "Men","Women": 1 1 1 1 1 1 1 1 1 1 ...

tab.test <- xtabs(~ Recovered + Drug + Gender, data = drugrx.df)
tab.test

## , , Gender = Men
## 
##          Drug
## Recovered  No Yes
##       No   36   6
##       Yes 234  81
## 
## , , Gender = Women
## 
##          Drug
## Recovered  No Yes
##       No   25  71
##       Yes  55 192

male<-epistats(tab.test[4], tab.test[3], tab.test[2], tab.test[1]); male

## $risks
## r0 r1 
##  7 37 
## 
## $risk.ratio
## [1] 0.1891892
## 
## $conf.level
##      2.5%     47.5% 
## 0.1891892 0.1891892 
## 
## $fisher
## 
##  Fisher's Exact Test for Count Data
## 
## data:  matr
## p-value = 0.1266
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.8238548 6.2439802
## sample estimates:
## odds ratio 
##   2.073293

fem<-epistats(tab.test[8], tab.test[7], tab.test[6], tab.test[5]); fem

## $risks
## r0 r1 
## 72 26 
## 
## $risk.ratio
## [1] 2.769231
## 
## $conf.level
##     2.5%    47.5% 
## 2.769231 2.769231 
## 
## $fisher
## 
##  Fisher's Exact Test for Count Data
## 
## data:  matr
## p-value = 0.4786
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.6795617 2.1820561
## sample estimates:
## odds ratio 
##   1.228405

Since for both men and women the CI include zero, we cannot reject the null hypothesis that the treatment is effective.

#Exercise 5.5 (Oswego outbreak investigation) Study this clean version of the Oswego data set and import it as a data frame: ~/data/oswego/oswego2.txt. Remember this data set is available from https://github.com/taragonmd/data. Locate the data set and select the “Raw” tab. Copy the URL and use with the read.csv function. Import this clean version.

oswego2 <- read.csv("https://raw.githubusercontent.com/taragonmd/data/master/oswego/oswego2.txt", strip.white = TRUE)
food.item <- c("Baked.ham", "Spinach", "Mashed.potato", "Cabbage.salad", 
    "Jello", "Rolls", "Brown.bread", "Milk", "Coffee", "Water", "Cakes", 
    "Vanilla.ice.cream", "Chocolate.ice.cream", "Fruit.salad")
names(oswego2)[9:22] <- food.item
str(oswego2)

## 'data.frame':    75 obs. of  22 variables:
##  $ ID                 : int  2 3 4 6 7 8 9 10 14 16 ...
##  $ Age                : int  52 65 59 63 70 40 15 33 10 32 ...
##  $ Sex                : Factor w/ 2 levels "F","M": 1 2 1 1 2 1 1 1 2 1 ...
##  $ MealDate           : Factor w/ 1 level "04/18/1940": 1 1 1 1 1 1 1 1 1 1 ...
##  $ MealTime           : Factor w/ 6 levels "11:00:00","18:30:00",..: 5 2 2 4 4 4 6 3 4 NA ...
##  $ Ill                : Factor w/ 2 levels "N","Y": 2 2 2 2 2 2 2 2 2 2 ...
##  $ OnsetDate          : Factor w/ 2 levels "4/18/1940","4/19/1940": 2 2 2 1 1 2 2 1 2 2 ...
##  $ OnsetTime          : Factor w/ 17 levels "00:00:00","00:30:00",..: 2 2 2 15 15 4 3 16 4 7 ...
##  $ Baked.ham          : Factor w/ 2 levels "N","Y": 2 2 2 2 2 1 1 2 1 2 ...
##  $ Spinach            : Factor w/ 2 levels "N","Y": 2 2 2 2 2 1 1 2 1 2 ...
##  $ Mashed.potato      : Factor w/ 2 levels "N","Y": 2 2 1 1 2 1 1 2 1 1 ...
##  $ Cabbage.salad      : Factor w/ 2 levels "N","Y": 1 2 1 2 1 1 1 1 1 1 ...
##  $ Jello              : Factor w/ 2 levels "N","Y": 1 1 1 2 2 1 1 1 1 1 ...
##  $ Rolls              : Factor w/ 2 levels "N","Y": 2 1 1 1 2 1 1 2 1 2 ...
##  $ Brown.bread        : Factor w/ 2 levels "N","Y": 1 1 1 1 2 1 1 2 1 1 ...
##  $ Milk               : Factor w/ 2 levels "N","Y": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Coffee             : Factor w/ 2 levels "N","Y": 2 2 2 1 2 1 1 1 1 2 ...
##  $ Water              : Factor w/ 2 levels "N","Y": 1 1 1 2 2 1 1 2 1 1 ...
##  $ Cakes              : Factor w/ 2 levels "N","Y": 1 1 2 1 1 1 2 1 1 2 ...
##  $ Vanilla.ice.cream  : Factor w/ 2 levels "N","Y": 2 2 2 2 2 2 1 2 2 2 ...
##  $ Chocolate.ice.cream: Factor w/ 2 levels "N","Y": 1 2 2 1 1 2 2 2 2 2 ...
##  $ Fruit.salad        : Factor w/ 2 levels "N","Y": 1 1 1 1 1 1 1 1 1 1 ...

Study the following epitable function, especially the use of the … argument to past options to the fisher.test function. Use the epitable function to test each food item to see if it is associated with developing illness. Construct a table with the following column headings:

Food item R0 R1 Risk ratio Odds ratio Fisher’s p-value

Discuss your findings.

epitable <- function(exposure, outcome, digits = 3, ...){
    # Updated 2016-10-16
    # 2x2 table from 'xtabs' function
    # row names are exposure status
    # col names are illness outcome
    # digits = significant digits (default = 3) 
    # ... = options to fisher.test function
    x <- xtabs(~ exposure + outcome, na.action=na.pass)
    rowtot <- apply(x, 1, sum)
    rowdist <- sweep(x, 1, rowtot, '/')
    risk0 <- rowdist['N','Y']
    risk1 <- rowdist['Y','Y']
    rr <- risk1/risk0
    or <- (risk1/(1-risk1)) / (risk0/(1-risk0))
    ft <- fisher.test(x, ...)
    pv <- ft$p.value
    results <- signif(c(risk0, risk1, rr, or, pv), digits = digits)
    names(results) <- c('R_0', 'R_1', 'RR', 'OR', 'p.value')
    list(data = x, results = results)
}

#Here is an example of testing one food item:

epitable(oswego2$Baked.ham, oswego2$Ill)

## $data
##         outcome
## exposure  N  Y
##        N 12 17
##        Y 17 29
## 
## $results
##     R_0     R_1      RR      OR p.value 
##   0.586   0.630   1.080   1.200   0.809

epitable(oswego2$Spinach, oswego2$Ill)

## $data
##         outcome
## exposure  N  Y
##        N 12 20
##        Y 17 26
## 
## $results
##     R_0     R_1      RR      OR p.value 
##   0.625   0.605   0.967   0.918   1.000

epitable(oswego2$Mashed.potato, oswego2$Ill)

## $data
##         outcome
## exposure  N  Y
##        N 14 23
##        Y 14 23
## 
## $results
##     R_0     R_1      RR      OR p.value 
##   0.622   0.622   1.000   1.000   1.000

epitable(oswego2$Cabbage.salad, oswego2$Ill)

## $data
##         outcome
## exposure  N  Y
##        N 19 28
##        Y 10 18
## 
## $results
##     R_0     R_1      RR      OR p.value 
##   0.596   0.643   1.080   1.220   0.808

epitable(oswego2$Jello, oswego2$Ill)

## $data
##         outcome
## exposure  N  Y
##        N 22 30
##        Y  7 16
## 
## $results
##     R_0     R_1      RR      OR p.value 
##   0.577   0.696   1.210   1.680   0.442

epitable(oswego2$Rolls, oswego2$Ill)

## $data
##         outcome
## exposure  N  Y
##        N 13 25
##        Y 16 21
## 
## $results
##     R_0     R_1      RR      OR p.value 
##   0.658   0.568   0.863   0.682   0.482

epitable(oswego2$Brown.bread, oswego2$Ill)

## $data
##         outcome
## exposure  N  Y
##        N 20 28
##        Y  9 18
## 
## $results
##     R_0     R_1      RR      OR p.value 
##   0.583   0.667   1.140   1.430   0.622

epitable(oswego2$Milk, oswego2$Ill)

## $data
##         outcome
## exposure  N  Y
##        N 27 44
##        Y  2  2
## 
## $results
##     R_0     R_1      RR      OR p.value 
##   0.620   0.500   0.807   0.614   0.638

epitable(oswego2$Coffee, oswego2$Ill)

## $data
##         outcome
## exposure  N  Y
##        N 17 27
##        Y 12 19
## 
## $results
##     R_0     R_1      RR      OR p.value 
##   0.614   0.613   0.999   0.997   1.000

epitable(oswego2$Water, oswego2$Ill)

## $data
##         outcome
## exposure  N  Y
##        N 18 33
##        Y 11 13
## 
## $results
##     R_0     R_1      RR      OR p.value 
##   0.647   0.542   0.837   0.645   0.450

epitable(oswego2$Cakes, oswego2$Ill)

## $data
##         outcome
## exposure  N  Y
##        N 16 19
##        Y 13 27
## 
## $results
##     R_0     R_1      RR      OR p.value 
##   0.543   0.675   1.240   1.750   0.342

epitable(oswego2$Vanilla.ice.cream, oswego2$Ill)

## $data
##         outcome
## exposure  N  Y
##        N 18  3
##        Y 11 43
## 
## $results
##      R_0      R_1       RR       OR  p.value 
## 1.43e-01 7.96e-01 5.57e+00 2.35e+01 2.60e-07

epitable(oswego2$Fruit.salad, oswego2$Ill)

## $data
##         outcome
## exposure  N  Y
##        N 27 42
##        Y  2  4
## 
## $results
##     R_0     R_1      RR      OR p.value 
##   0.609   0.667   1.100   1.290   1.000

myepitab <- function(x) {
    epitable(x, oswego2$Ill)$results
}

t(sapply(oswego2[, food.item], myepitab))

##                       R_0   R_1    RR     OR  p.value
## Baked.ham           0.586 0.630 1.080  1.200 8.09e-01
## Spinach             0.625 0.605 0.967  0.918 1.00e+00
## Mashed.potato       0.622 0.622 1.000  1.000 1.00e+00
## Cabbage.salad       0.596 0.643 1.080  1.220 8.08e-01
## Jello               0.577 0.696 1.210  1.680 4.42e-01
## Rolls               0.658 0.568 0.863  0.682 4.82e-01
## Brown.bread         0.583 0.667 1.140  1.430 6.22e-01
## Milk                0.620 0.500 0.807  0.614 6.38e-01
## Coffee              0.614 0.613 0.999  0.997 1.00e+00
## Water               0.647 0.542 0.837  0.645 4.50e-01
## Cakes               0.543 0.675 1.240  1.750 3.42e-01
## Vanilla.ice.cream   0.143 0.796 5.570 23.500 2.60e-07
## Chocolate.ice.cream 0.741 0.532 0.718  0.398 8.91e-02
## Fruit.salad         0.609 0.667 1.100  1.290 1.00e+00

When we explore each item, we observe that the items associated with higher risks include (according to their RR).

#Exercise 5.6 (Generate outbreak investigation results table)

Write a for loop to automate the creation of the outbreak table from previous exercise. Hint provided:

outbreak <- matrix(NA, length(food.item), 5)
rownames(outbreak) <- food.item
colnames(outbreak) <- c('R_0','R_1','RR','OR','P value')
for(i in 1:length(food.item)) {
  outbreak[i,] <- epitable(oswego2[,food.item[i]], oswego2$Ill)$results
}
outbreak

##                       R_0   R_1    RR     OR  P value
## Baked.ham           0.586 0.630 1.080  1.200 8.09e-01
## Spinach             0.625 0.605 0.967  0.918 1.00e+00
## Mashed.potato       0.622 0.622 1.000  1.000 1.00e+00
## Cabbage.salad       0.596 0.643 1.080  1.220 8.08e-01
## Jello               0.577 0.696 1.210  1.680 4.42e-01
## Rolls               0.658 0.568 0.863  0.682 4.82e-01
## Brown.bread         0.583 0.667 1.140  1.430 6.22e-01
## Milk                0.620 0.500 0.807  0.614 6.38e-01
## Coffee              0.614 0.613 0.999  0.997 1.00e+00
## Water               0.647 0.542 0.837  0.645 4.50e-01
## Cakes               0.543 0.675 1.240  1.750 3.42e-01
## Vanilla.ice.cream   0.143 0.796 5.570 23.500 2.60e-07
## Chocolate.ice.cream 0.741 0.532 0.718  0.398 8.91e-02
## Fruit.salad         0.609 0.667 1.100  1.290 1.00e+00

#Exercise 5.7 (Comparing lapply and sapply)

Study the following use of lapply and sapply. Explain how both functions work, and what are their differences.

myepitab <- function(x) {
    epitable(x, oswego2$Ill)$results
}

lapply(oswego2[, food.item], myepitab)

## $Baked.ham
##     R_0     R_1      RR      OR p.value 
##   0.586   0.630   1.080   1.200   0.809 
## 
## $Spinach
##     R_0     R_1      RR      OR p.value 
##   0.625   0.605   0.967   0.918   1.000 
## 
## $Mashed.potato
##     R_0     R_1      RR      OR p.value 
##   0.622   0.622   1.000   1.000   1.000 
## 
## $Cabbage.salad
##     R_0     R_1      RR      OR p.value 
##   0.596   0.643   1.080   1.220   0.808 
## 
## $Jello
##     R_0     R_1      RR      OR p.value 
##   0.577   0.696   1.210   1.680   0.442 
## 
## $Rolls
##     R_0     R_1      RR      OR p.value 
##   0.658   0.568   0.863   0.682   0.482 
## 
## $Brown.bread
##     R_0     R_1      RR      OR p.value 
##   0.583   0.667   1.140   1.430   0.622 
## 
## $Milk
##     R_0     R_1      RR      OR p.value 
##   0.620   0.500   0.807   0.614   0.638 
## 
## $Coffee
##     R_0     R_1      RR      OR p.value 
##   0.614   0.613   0.999   0.997   1.000 
## 
## $Water
##     R_0     R_1      RR      OR p.value 
##   0.647   0.542   0.837   0.645   0.450 
## 
## $Cakes
##     R_0     R_1      RR      OR p.value 
##   0.543   0.675   1.240   1.750   0.342 
## 
## $Vanilla.ice.cream
##      R_0      R_1       RR       OR  p.value 
## 1.43e-01 7.96e-01 5.57e+00 2.35e+01 2.60e-07 
## 
## $Chocolate.ice.cream
##     R_0     R_1      RR      OR p.value 
##  0.7410  0.5320  0.7180  0.3980  0.0891 
## 
## $Fruit.salad
##     R_0     R_1      RR      OR p.value 
##   0.609   0.667   1.100   1.290   1.000

t(sapply(oswego2[, food.item], myepitab))  # transpose output

##                       R_0   R_1    RR     OR  p.value
## Baked.ham           0.586 0.630 1.080  1.200 8.09e-01
## Spinach             0.625 0.605 0.967  0.918 1.00e+00
## Mashed.potato       0.622 0.622 1.000  1.000 1.00e+00
## Cabbage.salad       0.596 0.643 1.080  1.220 8.08e-01
## Jello               0.577 0.696 1.210  1.680 4.42e-01
## Rolls               0.658 0.568 0.863  0.682 4.82e-01
## Brown.bread         0.583 0.667 1.140  1.430 6.22e-01
## Milk                0.620 0.500 0.807  0.614 6.38e-01
## Coffee              0.614 0.613 0.999  0.997 1.00e+00
## Water               0.647 0.542 0.837  0.645 4.50e-01
## Cakes               0.543 0.675 1.240  1.750 3.42e-01
## Vanilla.ice.cream   0.143 0.796 5.570 23.500 2.60e-07
## Chocolate.ice.cream 0.741 0.532 0.718  0.398 8.91e-02
## Fruit.salad         0.609 0.667 1.100  1.290 1.00e+00

The difference is that lapply does not give only one table, but several small ones and sapply gives one table with altogether all the results.

#Exercise 5.8 (Lexical scoping) Study the following R code and explain the final answer for g(2):

a<-1
b<-2
f<-function(x) {
    a * x + b
}

g <-function(x) {
  a<-2
  b<-1
  f(x)
}

g(2)

## [1] 4

I think it ignores the second values and just executes the first function.

#Exercise 5.9 (More lexical scoping) Study the following R code and explain the final answer for g(2):

g <- function(x) {
    a <- 2
    b <- 1
    f <- function(x) {
        a * x + b
    }
    f(x)
}
g(2)

## [1] 5

Here I think it is using the new values because it is separating the functions and just uses the first arguments given.

Assignment 5

Alma Juarez

11/12/2019