Homework 5

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

Table 5.3 displays the results of a study where 700 patients were given access to a new drug treatment.

Using the Results from Table 5.3 create and test a function that:

Calculates the risks R1 and R0. Calculates the Risk Ratio (RR) Calculates 95 percent confidence intervals Uses the fisher.test function to calculate two sided p values and null hypothesis of no association Collects and returns all the results as a list.

#Creation of Table 5.3
nd_table <- matrix(c(273, 289, 562, 77, 61, 138, 350, 350, 700), byrow = T, nrow = 3)
dimnames(nd_table) <- list(Recovered = c('Yes', 'No', 'Total'), Treatment = c('Yes', 'No', 'Total'))

#Proof that Table has been created satisfactorily
nd_table

##          Treatment
## Recovered Yes  No Total
##     Yes   273 289   562
##     No     77  61   138
##     Total 350 350   700

#Creation of Function that allows Dimnames to Function
EpiToolBox <- function(x) {
  #Calculating the risk of R1 and R0
  R1 = x[1 , 1] / x[3, 1]
  R0 = x[2, 1] / x[2, 3]
  
  #Calculating the Risk Ratio
  RR = R1 / R0
  
  #Calulating 95 Percent Confidence Intervals
  se_list <- c(1/x[2,1], 1/x[3,1], 1/x[2,1], 1/x[2,3])
  SE = sqrt(se_list[1] - se_list[2] + se_list[3] - se_list[4])
  CI = c(RR - (1.96*SE), RR + (1.96 *SE))
  
  #Run Fisher's Exact Test to calculate 2 sided p-value
  x_matrix <- matrix(c(x[1,1], x[2,1], x[1,2], x[2,2]), nrow = 2, byrow = T)
  FT <- fisher.test(x_matrix, conf.int = T)
  
  #Compiling Data as a list, and then seting it as output of the function.
  output <- list(R0, R1, RR, CI, FT)
  return(output)
}

EpiToolBox(nd_table)

## [[1]]
## [1] 0.557971
## 
## [[2]]
## [1] 0.78
## 
## [[3]]
## [1] 1.397922
## 
## [[4]]
## [1] 1.151005 1.644839
## 
## [[5]]
## 
##  Fisher's Exact Test for Count Data
## 
## data:  x_matrix
## p-value = 0.154
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.504886 1.106634
## sample estimates:
## odds ratio 
##  0.7486641

5.8.2 Using the read.csv function import the data frame for Table 5.3

Analyze the jp-drugrx-p2.df data using the xtab function, your previous function from 5.8.1 and any R functions that you wish. Is the drug treatment effective? Is there a difference in drug treatment effectiveness comparing men and women?

#Import File and check integrity of import
drugrx.df <- read.csv('jpdrugrxp2.txt', header = T)
head(drugrx.df)

##   X Recovered Drug Gender
## 1 1       Yes  Yes    Men
## 2 2       Yes  Yes    Men
## 3 3       Yes  Yes    Men
## 4 4       Yes  Yes    Men
## 5 5       Yes  Yes    Men
## 6 6       Yes  Yes    Men

#Create Contingency Table Based on Dataset
dr <- xtabs(~Drug + Recovered,  data = drugrx.df)
dr

##      Recovered
## Drug   No Yes
##   No   61 289
##   Yes  77 273

dimnames(dr)

## $Drug
## [1] "No"  "Yes"
## 
## $Recovered
## [1] "No"  "Yes"

#Function Constructing a standard epi contingency table
Con_Table <- function(y) {
  c_table <- matrix(c(y[2,2], y[2,1], y[2,2]+y[2,1], y[1,2],   y[1,1], y[1,2]+y[1,1]), nrow = 3)
  c_table <- cbind(c_table, rowSums(c_table))
  dimnames(c_table) <- list(Recovered = c('Yes', 'No', 'Total'), Treatment = c('Yes', 'No', 'Total'))
  
  return(c_table)
}
dr <- Con_Table(dr)
EpiToolBox(dr)

## [[1]]
## [1] 0.557971
## 
## [[2]]
## [1] 0.78
## 
## [[3]]
## [1] 1.397922
## 
## [[4]]
## [1] 1.151005 1.644839
## 
## [[5]]
## 
##  Fisher's Exact Test for Count Data
## 
## data:  x_matrix
## p-value = 0.154
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.504886 1.106634
## sample estimates:
## odds ratio 
##  0.7486641

#Take Data from Above and Split into M/F categories
dr2 <- xtabs(~Drug + Recovered + Gender, data = drugrx.df)

drmen <-Con_Table(dr2[,,1])
drmen

##          Treatment
## Recovered Yes  No Total
##     Yes    81 234   315
##     No      6  36    42
##     Total  87 270   357

EpiToolBox(drmen)

## [[1]]
## [1] 0.1428571
## 
## [[2]]
## [1] 0.9310345
## 
## [[3]]
## [1] 6.517241
## 
## [[4]]
## [1] 5.447237 7.587246
## 
## [[5]]
## 
##  Fisher's Exact Test for Count Data
## 
## data:  x_matrix
## p-value = 0.1266
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.8238548 6.2439802
## sample estimates:
## odds ratio 
##   2.073293

drwomen <-Con_Table(dr2[,,2])
drwomen

##          Treatment
## Recovered Yes No Total
##     Yes   192 55   247
##     No     71 25    96
##     Total 263 80   343

EpiToolBox(drwomen)

## [[1]]
## [1] 0.7395833
## 
## [[2]]
## [1] 0.730038
## 
## [[3]]
## [1] 0.9870937
## 
## [[4]]
## [1] 0.7555973 1.2185900
## 
## [[5]]
## 
##  Fisher's Exact Test for Count Data
## 
## data:  x_matrix
## p-value = 0.4786
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.6795617 2.1820561
## sample estimates:
## odds ratio 
##   1.228405

Baseed on the data extracted from categorizing treatment receivers by gender it appears that the treatment was much more effective on men than on women. Men who received the treatment were between 5.45 to 7.59 more times likely to recover (95 % CI) versus women hwo received the treatment, where the RR was approximately 1 (.98), showing there was statistically no difference between receiving the treatment or not and recovering.

Fisher exact tests also showed a lower p value for the men versus the women, indicating that there is a greater chance that the odds ratio is not likely to be one in the men’s case vs the women’s case, although the p value is still rather high (.126)

#Intentionally left blank to preserve formatting

Study this clean version of the Oswego data set and import it as a data frame. Follow the commands below.

oswego2 <- read.csv("oswego2.txt", strip.white = T, header = T)

food.item <- c("Baked.Ham", "Spinach", "Mashed.Potato", "Cabbage.Salad", "Jello", "Rolls", "Brown.Bread", "Milk", "Coffee", "Water", "Cakes", "Vanilla.Ice.Cream", "Chocolate.Ice.Cream", "Fruit.Salad")

names(oswego2)[9:22] <- food.item
str(oswego2)

## 'data.frame':    75 obs. of  22 variables:
##  $ ID                 : int  2 3 4 6 7 8 9 10 14 16 ...
##  $ Age                : int  52 65 59 63 70 40 15 33 10 32 ...
##  $ Sex                : Factor w/ 2 levels "F","M": 1 2 1 1 2 1 1 1 2 1 ...
##  $ MealDate           : Factor w/ 1 level "04/18/1940": 1 1 1 1 1 1 1 1 1 1 ...
##  $ MealTime           : Factor w/ 6 levels "11:00:00","18:30:00",..: 5 2 2 4 4 4 6 3 4 NA ...
##  $ Ill                : Factor w/ 2 levels "N","Y": 2 2 2 2 2 2 2 2 2 2 ...
##  $ OnsetDate          : Factor w/ 2 levels "4/18/1940","4/19/1940": 2 2 2 1 1 2 2 1 2 2 ...
##  $ OnsetTime          : Factor w/ 17 levels "00:00:00","00:30:00",..: 2 2 2 15 15 4 3 16 4 7 ...
##  $ Baked.Ham          : Factor w/ 2 levels "N","Y": 2 2 2 2 2 1 1 2 1 2 ...
##  $ Spinach            : Factor w/ 2 levels "N","Y": 2 2 2 2 2 1 1 2 1 2 ...
##  $ Mashed.Potato      : Factor w/ 2 levels "N","Y": 2 2 1 1 2 1 1 2 1 1 ...
##  $ Cabbage.Salad      : Factor w/ 2 levels "N","Y": 1 2 1 2 1 1 1 1 1 1 ...
##  $ Jello              : Factor w/ 2 levels "N","Y": 1 1 1 2 2 1 1 1 1 1 ...
##  $ Rolls              : Factor w/ 2 levels "N","Y": 2 1 1 1 2 1 1 2 1 2 ...
##  $ Brown.Bread        : Factor w/ 2 levels "N","Y": 1 1 1 1 2 1 1 2 1 1 ...
##  $ Milk               : Factor w/ 2 levels "N","Y": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Coffee             : Factor w/ 2 levels "N","Y": 2 2 2 1 2 1 1 1 1 2 ...
##  $ Water              : Factor w/ 2 levels "N","Y": 1 1 1 2 2 1 1 2 1 1 ...
##  $ Cakes              : Factor w/ 2 levels "N","Y": 1 1 2 1 1 1 2 1 1 2 ...
##  $ Vanilla.Ice.Cream  : Factor w/ 2 levels "N","Y": 2 2 2 2 2 2 1 2 2 2 ...
##  $ Chocolate.Ice.Cream: Factor w/ 2 levels "N","Y": 1 2 2 1 1 2 2 2 2 2 ...
##  $ Fruit.Salad        : Factor w/ 2 levels "N","Y": 1 1 1 1 1 1 1 1 1 1 ...

Study the following epitable function

epitable <- function(exposure, outcome, ...){
  
  x <- xtabs(~ exposure + outcome, na.action = na.pass)
  rowtot <- apply(x, 1, sum)
  rowdist <- sweep(x, 1, rowtot, '/')
  risk0 <- rowdist['N', 'Y']
  risk1 <- rowdist['Y', 'Y']
  rr <- risk1/risk0
  or <- (risk1 / (1 - risk1)) /(risk0 / (1-risk0))
  ft <- fisher.test(x, ...)
  pv <- ft$p.value
  results <- unname(round(c(risk0, risk1, rr, or, pv), 2))
  names(results) <- c('R0', 'R1', 'RR', 'OR', 'p.value')
  list(data = x, results = results)
}

epitable(oswego2$`Baked.Ham`, oswego2$Ill)

## $data
##         outcome
## exposure  N  Y
##        N 12 17
##        Y 17 29
## 
## $results
##      R0      R1      RR      OR p.value 
##    0.59    0.63    1.08    1.20    0.81

Use the Epitable Function to best each food item to see if it associated with developing illness. Construct a table with the following column headings:

Food Item, Ro, R1, RR, OR, Fisher p value and then discuss your findings.

construct.table <- function(z, u) {
  table.row <- epitable(z, u)
  
  return(table.row$results)

}

oswego2.table <- construct.table(oswego2$Baked.Ham, oswego2$Ill)
oswego2.table <- rbind(oswego2.table, construct.table(oswego2$Spinach, oswego2$Ill))
oswego2.table <- rbind(oswego2.table, construct.table(oswego2$Mashed.Potato, oswego2$Ill))
oswego2.table <- rbind(oswego2.table, construct.table(oswego2$Cabbage.Salad, oswego2$Ill))
oswego2.table <- rbind(oswego2.table, construct.table(oswego2$Jello, oswego2$Ill))
oswego2.table <- rbind(oswego2.table, construct.table(oswego2$Rolls, oswego2$Ill))
oswego2.table <- rbind(oswego2.table, construct.table(oswego2$Brown.Bread, oswego2$Ill))
oswego2.table <- rbind(oswego2.table, construct.table(oswego2$Milk, oswego2$Ill))
oswego2.table <- rbind(oswego2.table, construct.table(oswego2$Coffee, oswego2$Ill))
oswego2.table <- rbind(oswego2.table, construct.table(oswego2$Water, oswego2$Ill))
oswego2.table <- rbind(oswego2.table, construct.table(oswego2$Cakes, oswego2$Ill))
oswego2.table <- rbind(oswego2.table, construct.table(oswego2$Vanilla.Ice.Cream, oswego2$Ill))
oswego2.table <- rbind(oswego2.table, construct.table(oswego2$Chocolate.Ice.Cream, oswego2$Ill))
oswego2.table <- rbind(oswego2.table, construct.table(oswego2$Fruit.Salad, oswego2$Ill))

oswego2.table.df <- data.frame(oswego2.table)
oswego2.table.df <- cbind(oswego2.table.df, food.item) 
oswego2.table.df

##                 R0   R1   RR    OR p.value           food.item
## oswego2.table 0.59 0.63 1.08  1.20    0.81           Baked.Ham
## X             0.62 0.60 0.97  0.92    1.00             Spinach
## X.1           0.62 0.62 1.00  1.00    1.00       Mashed.Potato
## X.2           0.60 0.64 1.08  1.22    0.81       Cabbage.Salad
## X.3           0.58 0.70 1.21  1.68    0.44               Jello
## X.4           0.66 0.57 0.86  0.68    0.48               Rolls
## X.5           0.58 0.67 1.14  1.43    0.62         Brown.Bread
## X.6           0.62 0.50 0.81  0.61    0.64                Milk
## X.7           0.61 0.61 1.00  1.00    1.00              Coffee
## X.8           0.65 0.54 0.84  0.64    0.45               Water
## X.9           0.54 0.68 1.24  1.75    0.34               Cakes
## X.10          0.14 0.80 5.57 23.45    0.00   Vanilla.Ice.Cream
## X.11          0.74 0.53 0.72  0.40    0.09 Chocolate.Ice.Cream
## X.12          0.61 0.67 1.10  1.29    1.00         Fruit.Salad

Based on the results of this table, Vanilla Ice Cream seems to be the culprit of the food bourne illness. The RR and OR are much higher for those that consumed Vanilla Ice ream than for any other category.

5.8.4

Write a for loop to automate the creation of the outbreak table from the previous exercise.

oswego.table.automated <- matrix(ncol = 5)
length(food.item)

## [1] 14

k = 0

for (j in 1:length(food.item)) {
  
  oswego.table.automated <- rbind(oswego.table.automated, construct.table(oswego2[,food.item[1 + k]], oswego2$Ill))
  k <- k + 1
}
oswego.table.automated <- oswego.table.automated[-1,]
rownames(oswego.table.automated) <- food.item   

oswego.table.automated

##                       R0   R1   RR    OR p.value
## Baked.Ham           0.59 0.63 1.08  1.20    0.81
## Spinach             0.62 0.60 0.97  0.92    1.00
## Mashed.Potato       0.62 0.62 1.00  1.00    1.00
## Cabbage.Salad       0.60 0.64 1.08  1.22    0.81
## Jello               0.58 0.70 1.21  1.68    0.44
## Rolls               0.66 0.57 0.86  0.68    0.48
## Brown.Bread         0.58 0.67 1.14  1.43    0.62
## Milk                0.62 0.50 0.81  0.61    0.64
## Coffee              0.61 0.61 1.00  1.00    1.00
## Water               0.65 0.54 0.84  0.64    0.45
## Cakes               0.54 0.68 1.24  1.75    0.34
## Vanilla.Ice.Cream   0.14 0.80 5.57 23.45    0.00
## Chocolate.Ice.Cream 0.74 0.53 0.72  0.40    0.09
## Fruit.Salad         0.61 0.67 1.10  1.29    1.00

5.8.5

Study the following use of lapply and sapply. Explain how both functions work and what are their differences.

lapply(oswego2[, food.item], function(x) epitable(x, oswego2$Ill)$results)

## $Baked.Ham
##      R0      R1      RR      OR p.value 
##    0.59    0.63    1.08    1.20    0.81 
## 
## $Spinach
##      R0      R1      RR      OR p.value 
##    0.62    0.60    0.97    0.92    1.00 
## 
## $Mashed.Potato
##      R0      R1      RR      OR p.value 
##    0.62    0.62    1.00    1.00    1.00 
## 
## $Cabbage.Salad
##      R0      R1      RR      OR p.value 
##    0.60    0.64    1.08    1.22    0.81 
## 
## $Jello
##      R0      R1      RR      OR p.value 
##    0.58    0.70    1.21    1.68    0.44 
## 
## $Rolls
##      R0      R1      RR      OR p.value 
##    0.66    0.57    0.86    0.68    0.48 
## 
## $Brown.Bread
##      R0      R1      RR      OR p.value 
##    0.58    0.67    1.14    1.43    0.62 
## 
## $Milk
##      R0      R1      RR      OR p.value 
##    0.62    0.50    0.81    0.61    0.64 
## 
## $Coffee
##      R0      R1      RR      OR p.value 
##    0.61    0.61    1.00    1.00    1.00 
## 
## $Water
##      R0      R1      RR      OR p.value 
##    0.65    0.54    0.84    0.64    0.45 
## 
## $Cakes
##      R0      R1      RR      OR p.value 
##    0.54    0.68    1.24    1.75    0.34 
## 
## $Vanilla.Ice.Cream
##      R0      R1      RR      OR p.value 
##    0.14    0.80    5.57   23.45    0.00 
## 
## $Chocolate.Ice.Cream
##      R0      R1      RR      OR p.value 
##    0.74    0.53    0.72    0.40    0.09 
## 
## $Fruit.Salad
##      R0      R1      RR      OR p.value 
##    0.61    0.67    1.10    1.29    1.00

t(sapply(oswego2[,food.item], function(x) epitable(x, oswego2$Ill)$results))

##                       R0   R1   RR    OR p.value
## Baked.Ham           0.59 0.63 1.08  1.20    0.81
## Spinach             0.62 0.60 0.97  0.92    1.00
## Mashed.Potato       0.62 0.62 1.00  1.00    1.00
## Cabbage.Salad       0.60 0.64 1.08  1.22    0.81
## Jello               0.58 0.70 1.21  1.68    0.44
## Rolls               0.66 0.57 0.86  0.68    0.48
## Brown.Bread         0.58 0.67 1.14  1.43    0.62
## Milk                0.62 0.50 0.81  0.61    0.64
## Coffee              0.61 0.61 1.00  1.00    1.00
## Water               0.65 0.54 0.84  0.64    0.45
## Cakes               0.54 0.68 1.24  1.75    0.34
## Vanilla.Ice.Cream   0.14 0.80 5.57 23.45    0.00
## Chocolate.Ice.Cream 0.74 0.53 0.72  0.40    0.09
## Fruit.Salad         0.61 0.67 1.10  1.29    1.00

Both of the apply funtions takes each variable in the oswego2$food.item and applies it to function epitable and produces the contents of the $results back.

The L apply function creates a list while the sapply function creates a table. It depends on your goal which one makes more sense to use but in this case it really does seem like the compact format of the sapply makes it the superior choice.

5.8.6

a = 1
b = 2
f <- function(x) {
  a * x + b
}

g <- function(x) {
  a = 2
  b = 1
  f(x)
}
g(2)

## [1] 4

This is a concept in programming known as scope. The global variables, defined at the top of the page a = 1, b = 2 exist as dominant to a and b within the function g. By running g you change a and b to 2 and 1, respectively, but only within the context of function g, not outside of it. So, when g calls f a comes a and b revert to their hierarchercal values of 1 and 2, netting you 4 as your final answer.

5.8.7

Study the following R coe and explain the final answer for g(2)

g <- function(x) {
  a = 2
  b = 1
  f <- function(x) {
    a * x + b
  }
  f(x)
}
g(2)

## [1] 5

This is a matter of scope. Since f exists within function g then the values for a and b changed by g into a = 2, b = 1 are inherited by function f in this case, unlike the previous case. With a = 2 and b = 1, then 2 *2 + 1 = 5, following correct order of operations.

Homework 5

Brian Aldana

October 26, 2018

R Markdown