Use the results from Table 5.3 create and test a function.
Firstly, I remake table 5.3.
tab <- matrix(c(273, 77, 350, 289, 61, 350, 562, 138, 700), nrow = 3, ncol = 3)
dimnames(tab) <- list("Recovered" = c("Yes", "No", "Total"),
"Treatment" = c("Yes", "No", "Total"))
tab
## Treatment
## Recovered Yes No Total
## Yes 273 289 562
## No 77 61 138
## Total 350 350 700
Then I created functions and applied them to table 5.3 to calculate the risks R1 and R0, and the risk ratio R1/R0.
Firstly, I prepared the inputs. Then, I did the calculations. Finally, I collected the data.
#Making the function
myOR <- function(x){
#Prepare input
# x = 3x3 table
a = x[1,1]
N1 = x[3,1]
b = x[1,2]
N0 = x[3,2]
#doing the calculations
R1 = (a/N1)
R0 = (b/N0)
RR = R1/R0
#collecting results
list(data = x, risk1 = R1, risk0 = R0, odds.ratio = RR)
}
#Applying/testing the function
myOR(tab)
## $data
## Treatment
## Recovered Yes No Total
## Yes 273 289 562
## No 77 61 138
## Total 350 350 700
##
## $risk1
## [1] 0.78
##
## $risk0
## [1] 0.8257143
##
## $odds.ratio
## [1] 0.9446367
Now I make a function to calculate the 95% confidence interval.
myOR2 <- function(x, conf.level){
#### Prepare input
#### x = 3x3 table amenable to cross-product
if(missing(conf.level)) stop("Must specify confidence level")
a = x[1,1]
N1 = x[3,1]
b = x[1,2]
N0 = x[3,2]
Z <- qnorm((1 + conf.level)/2)
#### Do calculations
logOR <- log((a*N0)/(N1*b))
SE.logOR <- sqrt(1/a + 1/N1 + 1/b + 1/N0)
OR <- exp(logOR)
CI <- exp(logOR + c(-1, 1)*Z*SE.logOR)
#### Collect results
list(data = x, odds.ratio = OR, conf.int = CI)
}
#Applying/testing the function to get 95% confidence interval
myOR2(tab, 0.95)
## $data
## Treatment
## Recovered Yes No Total
## Yes 273 289 562
## No 77 61 138
## Total 350 350 700
##
## $odds.ratio
## [1] 0.9446367
##
## $conf.int
## [1] 0.7565217 1.1795279
Using the fisher.test function (Fisher’s Exact Test), I calculate the two-sided p-value of the null hypothesis of no association.
fisher.test(tab, y = NULL, workspace = 200000000, hybrid = FALSE, control = list(), or = 1, alternative = "two.sided", conf.int = TRUE, conf.level = 0.95,
simulate.p.value = FALSE, B = 2000)
##
## Fisher's Exact Test for Count Data
##
## data: tab
## p-value = 0.6799
## alternative hypothesis: two.sided
Using the read.csv function import the data frame for Table 5.3. Use the code below.
drugrx.df <- read.csv('~/Desktop/FALL 2017/CLASSES/PH 251D/Course Data/exported/jp-drugrx-p2.txt')
Analyze the jp-drugrx-p2.df data frame using the xtab function, your function from the previous problem, and any R functions you wish. Is the drug treatment effective? Is there a difference in drug treatment effectiveness comparing men and women?
xtabs(drugrx.df)
## , , Gender = Men
##
## Drug
## Recovered No Yes
## No 23670 3393
## Yes 91377 3321
##
## , , Gender = Women
##
## Drug
## Recovered No Yes
## No 17200 42884
## Yes 29425 34080
You don’t know if the drug treatment is effective from using these analyses. This is because we don’t have enough information on how the study was conducted. We need to take into account causality and common sense.
Study this clean version of the Oswego data set and import it as a data frame: ~/git/phds/data/oswego/oswego2.txt
data.frame('~/Desktop/FALL 2017/CLASSES/PH 251D/Course Data/exported/oswego/oswego2.txt')
## X...Desktop.FALL.2017.CLASSES.PH.251D.Course.Data.exported.oswego.oswego2.txt.
## 1 ~/Desktop/FALL 2017/CLASSES/PH 251D/Course Data/exported/oswego/oswego2.txt
Import this clean version.
oswego2 <- read.csv('~/Desktop/FALL 2017/CLASSES/PH 251D/Course Data/oswego/oswego2.txt', strip.white = TRUE)
food.item <- c("Baked.ham", "Spinach", "Mashed.potato", "Cabbage.salad", "Jello",
"Rolls", "Brown.bread", "Milk", "Coffee", "Water", "Cakes", "Vanilla.ice.cream",
"Chocolate.ice.cream", "Fruit.salad")
names(oswego2)[9:22] <- food.item
str(oswego2)
## 'data.frame': 75 obs. of 22 variables:
## $ ID : int 2 3 4 6 7 8 9 10 14 16 ...
## $ Age : int 52 65 59 63 70 40 15 33 10 32 ...
## $ Sex : Factor w/ 2 levels "F","M": 1 2 1 1 2 1 1 1 2 1 ...
## $ MealDate : Factor w/ 1 level "04/18/1940": 1 1 1 1 1 1 1 1 1 1 ...
## $ MealTime : Factor w/ 6 levels "11:00:00","18:30:00",..: 5 2 2 4 4 4 6 3 4 NA ...
## $ Ill : Factor w/ 2 levels "N","Y": 2 2 2 2 2 2 2 2 2 2 ...
## $ OnsetDate : Factor w/ 2 levels "4/18/1940","4/19/1940": 2 2 2 1 1 2 2 1 2 2 ...
## $ OnsetTime : Factor w/ 17 levels "00:00:00","00:30:00",..: 2 2 2 15 15 4 3 16 4 7 ...
## $ Baked.ham : Factor w/ 2 levels "N","Y": 2 2 2 2 2 1 1 2 1 2 ...
## $ Spinach : Factor w/ 2 levels "N","Y": 2 2 2 2 2 1 1 2 1 2 ...
## $ Mashed.potato : Factor w/ 2 levels "N","Y": 2 2 1 1 2 1 1 2 1 1 ...
## $ Cabbage.salad : Factor w/ 2 levels "N","Y": 1 2 1 2 1 1 1 1 1 1 ...
## $ Jello : Factor w/ 2 levels "N","Y": 1 1 1 2 2 1 1 1 1 1 ...
## $ Rolls : Factor w/ 2 levels "N","Y": 2 1 1 1 2 1 1 2 1 2 ...
## $ Brown.bread : Factor w/ 2 levels "N","Y": 1 1 1 1 2 1 1 2 1 1 ...
## $ Milk : Factor w/ 2 levels "N","Y": 1 1 1 1 1 1 1 1 1 1 ...
## $ Coffee : Factor w/ 2 levels "N","Y": 2 2 2 1 2 1 1 1 1 2 ...
## $ Water : Factor w/ 2 levels "N","Y": 1 1 1 2 2 1 1 2 1 1 ...
## $ Cakes : Factor w/ 2 levels "N","Y": 1 1 2 1 1 1 2 1 1 2 ...
## $ Vanilla.ice.cream : Factor w/ 2 levels "N","Y": 2 2 2 2 2 2 1 2 2 2 ...
## $ Chocolate.ice.cream: Factor w/ 2 levels "N","Y": 1 2 2 1 1 2 2 2 2 2 ...
## $ Fruit.salad : Factor w/ 2 levels "N","Y": 1 1 1 1 1 1 1 1 1 1 ...
Study the following epitable function.
epitable <- function(exposure, outcome, ...){
#### Updated 2016-10-16
#### 2x2 table from 'xtabs' function
#### row names are exposure status
#### col names are illness outcome
#### ... = options to fisher.test functions
x <- xtabs(~ exposure + outcome, na.action=na.pass)
rowtot <- apply(x, 1, sum)
rowdist <- sweep(x, 1, rowtot, '/')
risk0 <- rowdist['N','Y']
risk1 <- rowdist['Y','Y']
rr <- risk1/risk0
or <- (risk1/(1-risk1)) / (risk0/(1-risk0))
ft <- fisher.test(x, ...)
pv <- ft$p.value
results <- unname(round(c(risk0, risk1, rr, or, pv), 2))
names(results) <- c('R0', 'R1', 'RR', 'OR', 'p.value')
list(data = x, results = results)
}
Here is an example of testing one food item.
epitable(oswego2$Baked.ham, oswego2$Ill)
## $data
## outcome
## exposure N Y
## N 12 17
## Y 17 29
##
## $results
## R0 R1 RR OR p.value
## 0.59 0.63 1.08 1.20 0.81
Use the epitable function to test each food item to see if it is associated with developing illness. Construct a table with the following column headings: Food item, R0, R1, RR, OR, Fisher p value.
epitable(oswego2$Spinach, oswego2$Ill)
## $data
## outcome
## exposure N Y
## N 12 20
## Y 17 26
##
## $results
## R0 R1 RR OR p.value
## 0.62 0.60 0.97 0.92 1.00
epitable(oswego2$Mashed.potato, oswego2$Ill)
## $data
## outcome
## exposure N Y
## N 14 23
## Y 14 23
##
## $results
## R0 R1 RR OR p.value
## 0.62 0.62 1.00 1.00 1.00
epitable(oswego2$Cabbage.salad, oswego2$Ill)
## $data
## outcome
## exposure N Y
## N 19 28
## Y 10 18
##
## $results
## R0 R1 RR OR p.value
## 0.60 0.64 1.08 1.22 0.81
epitable(oswego2$Jello, oswego2$Ill)
## $data
## outcome
## exposure N Y
## N 22 30
## Y 7 16
##
## $results
## R0 R1 RR OR p.value
## 0.58 0.70 1.21 1.68 0.44
epitable(oswego2$Rolls, oswego2$Ill)
## $data
## outcome
## exposure N Y
## N 13 25
## Y 16 21
##
## $results
## R0 R1 RR OR p.value
## 0.66 0.57 0.86 0.68 0.48
epitable(oswego2$Brown.bread, oswego2$Ill)
## $data
## outcome
## exposure N Y
## N 20 28
## Y 9 18
##
## $results
## R0 R1 RR OR p.value
## 0.58 0.67 1.14 1.43 0.62
epitable(oswego2$Milk, oswego2$Ill)
## $data
## outcome
## exposure N Y
## N 27 44
## Y 2 2
##
## $results
## R0 R1 RR OR p.value
## 0.62 0.50 0.81 0.61 0.64
epitable(oswego2$Coffee, oswego2$Ill)
## $data
## outcome
## exposure N Y
## N 17 27
## Y 12 19
##
## $results
## R0 R1 RR OR p.value
## 0.61 0.61 1.00 1.00 1.00
epitable(oswego2$Water, oswego2$Ill)
## $data
## outcome
## exposure N Y
## N 18 33
## Y 11 13
##
## $results
## R0 R1 RR OR p.value
## 0.65 0.54 0.84 0.64 0.45
epitable(oswego2$Cakes, oswego2$Ill)
## $data
## outcome
## exposure N Y
## N 16 19
## Y 13 27
##
## $results
## R0 R1 RR OR p.value
## 0.54 0.68 1.24 1.75 0.34
epitable(oswego2$Vanilla.ice.cream, oswego2$Ill)
## $data
## outcome
## exposure N Y
## N 18 3
## Y 11 43
##
## $results
## R0 R1 RR OR p.value
## 0.14 0.80 5.57 23.45 0.00
epitable(oswego2$Chocolate.ice.cream, oswego2$Ill)
## $data
## outcome
## exposure N Y
## N 7 20
## Y 22 25
##
## $results
## R0 R1 RR OR p.value
## 0.74 0.53 0.72 0.40 0.09
epitable(oswego2$Fruit.salad, oswego2$Ill)
## $data
## outcome
## exposure N Y
## N 27 42
## Y 2 4
##
## $results
## R0 R1 RR OR p.value
## 0.61 0.67 1.10 1.29 1.00