ODDS Ratio: This is the perhaps the most commonly used measure of association. We also use Odds ratio for many log-linar and logistic models.
Relative Risk: Relative risk or the risk ratio is the ratio of the probability of an outcome in an exposed group to the probability of an outcome in an unexposed group. We mostly use for comparinf the mall probabilities.
Data:
Data 1 Tea data set: Tea and its constituents have shown anticarcinogenic activities in in vitro and animal studies. Epidemiologic studies, however, have been inconsistent. We prospectively evaluated the association between green tea consumption and colorectal cancer (CRC) risk among 6971 Chinese women aged 40 to 70 years. Information on tea consumption was assessed through in-person interviews at baseline. During 6 years of follow-up, 26 incident cases of CRC were identifed, including 5 cases of CRC among 2063 regular green tea drinkers and 21 cases among 4908 non-regular green tea drinkers.
Data 2 Drug data set: The use of the asthma drug Fenoterol, which is self-administered via an inhaler, has been linked to asthma deaths. The following is a slightly-edited abstract of a paper originally published in The Lancet (1989), describing a study in New Zealand. A study was conducted to examine the hypothesis that fenoterol by inhaler increases the risk of death in patients with asthma. Records were obtained for a group comprised of 117 patients aged 5{45 who died of asthma between August, 1981, and July, 1983. For each asthma death, four additional records, matched for age and ethnic group, were selected from asthma admissions to hospitals to which the deceased asthmatics would have been admitted, had they survived. Among the 117 asthma deaths, 60 had been prescribed fenoterol, while among the 468 other asthma admissions, 189 had been prescribed fenoterol.
Data 3 In Germany, children are subjected to an obligatory health examination before school entry. These examination data were used in a 1997 study of 9205 children aged 5 and 6 who had German nationality. At the examination, the parents of the children were asked to complete a questionnaire about risk factors for diseases. Included on the questionnaire were questions parental education and on breast feeding. Data collected by this questionnaire were then linked with the data from the school health examination. The children’s height and weight were measured as part of the routine examination, and body mass index (BMI) was calculated as weight in kg divided by height in \(m^2\).
Children were then classiffed into one of three categories: not obese (BMI <= 90th percentile); sub-obese (90th percentile < BMI <= 97th percentile); or obese (BMI > 97th percentile).
Objective:
The goal of this study is to find the conditional probabilities and to use and interpret ODDS Ratio and relative Risk.
Load Libraries
library(gmodels)
library(RColorBrewer)
#PREPARING WORK SPAcE
# Clear the workspace:
rm(list = ls())
Creating a Contingency Table for Data 1
# Using matrix function to create 2x2 contingency table
df1<-matrix(c(5,21,2058,4887),2,2)
dimnames(df1)= list(Regular_Tea_Drinker=c('Yes','No'), Colorectal_Cancer=c('Yes', 'No'))
df1
## Colorectal_Cancer
## Regular_Tea_Drinker Yes No
## Yes 5 2058
## No 21 4887
#Converting into a table
df1 <- as.table(df1)
str(df1)
## 'table' num [1:2, 1:2] 5 21 2058 4887
## - attr(*, "dimnames")=List of 2
## ..$ Regular_Tea_Drinker: chr [1:2] "Yes" "No"
## ..$ Colorectal_Cancer : chr [1:2] "Yes" "No"
1a. What kind of study is this? Explain briefly. This is a cohort study, with women self-selecting as regular green tea drinkers or not. A weakness of the study is self-selection, since there may be many other factors other than tea drinking that might explain any differences between the groups.
1b. Estimate an appropriate parameter and interpret your point estimate, noting any appropriate cautions due to the study design. (You do not need to estimate a standard error for purposes of this problem.)
p_hat3 <- (df1)[1] / rowSums(df1)[1]
p_hat3
## Yes
## 0.002423655
p_hat4 <- (df1)[2] / rowSums(df1)[2]
p_hat4
## No
## 0.004278729
RR <- p_hat3/ p_hat4
RR
## Yes
## 0.5664428
Most appropriate is relative risk: the risk is quite low, so difference of proportions is not as informative, and the cohort study design allows estimation of relative risk, which is of most interest.
The estimated relative risk is 0.56 meaning regular tea drinkers have about half the risk of non-regular tea drinkers of developing Colorectal cancer. The caution is that that due to self-selection, there may be many other factors other than tea drinking that could explain the difference in relative risks. If this result was in fact significant, it would indicate that green tea drinking might be protective.
Creating a Contingency Table for Data 2
# Using matrix function to create 2x2 contingency table
df2 <- matrix(c(60,57,189,279),2,2)
dimnames(df2) <- list(fenoterol_prescription =c('Yes', 'No'),
Asthma_Death=c('Yes', 'No') )
df2
## Asthma_Death
## fenoterol_prescription Yes No
## Yes 60 189
## No 57 279
#Converting into a table
df2 <- as.table(df2)
str(df2)
## 'table' num [1:2, 1:2] 60 57 189 279
## - attr(*, "dimnames")=List of 2
## ..$ fenoterol_prescription: chr [1:2] "Yes" "No"
## ..$ Asthma_Death : chr [1:2] "Yes" "No"
2.a What kind of study is this? Explain briefy.
This is a case-control study with 117 asthma deaths as cases and four matched controls per case.
2.b Estimate and interpret the most relevant parameter of this study.
OR <- (60*279)/(57*189)
OR
## [1] 1.553885
Since this is a case-control study, we need to estimate the odds ratio= 1:553885; which is considerably larger than one. Assuming this difference is statistically significant, we would reject the hypothesis of independence and conclude that fenoterol does increase the risk of asthma death.
Creating a Contingency Table for Data 3
# Using matrix function to create 2x2 contingency table
df2 <- array(c(1889,1380,197,210, 60,75,2146, 1665,2673,1954,279,297, 85,106, 3037,2357),
dim=c(2,4,2),
list(Brest_Feeding=c("Ever", "Never"),
Categories=c('Not Obese', 'Sub-Obese', 'Obese',"Totals"),
Parental_Education= c("High", "Low")
))
df2
## , , Parental_Education = High
##
## Categories
## Brest_Feeding Not Obese Sub-Obese Obese Totals
## Ever 1889 197 60 2146
## Never 1380 210 75 1665
##
## , , Parental_Education = Low
##
## Categories
## Brest_Feeding Not Obese Sub-Obese Obese Totals
## Ever 2673 279 85 3037
## Never 1954 297 106 2357
#Converting into a table
df2 <- as.table(df2)
str(df2)
## 'table' num [1:2, 1:4, 1:2] 1889 1380 197 210 60 ...
## - attr(*, "dimnames")=List of 3
## ..$ Brest_Feeding : chr [1:2] "Ever" "Never"
## ..$ Categories : chr [1:4] "Not Obese" "Sub-Obese" "Obese" "Totals"
## ..$ Parental_Education: chr [1:2] "High" "Low"
3.aWhat kind of study is this? Explain briefy. This is a cross-sectional study, with all children selected simultaneously and then cross-classiffed on weight, breastfeeding, and parental education.
3.b The data are repeated here for convenience. Use the table to estimate the following quantities:
prob1 <- (60+75+ 85+106)/ (2146+1665+3037+2357)
prob1
## [1] 0.03541554
prob2 <- (60+75)/ (2146+1665+3037+2357)
prob2
## [1] 0.01466594
prob3 <- (60+75)/ (2146+1665)
prob3
## [1] 0.03542377
prob4 <- (60+85)/ (2146+3037)
prob4
## [1] 0.02797608
3.c Consider collapsing Sub-Obese and Obese into a single category, Overweight. Given that a child has parents with low education, estimate the conditional odds ratio for Overweight between children who were ever breastfed versus those who were never breastfed. Interpret your answer.
Creating a New Contingency Table for Data 3
# Using matrix function to create 2x2 contingency table
df2 <- array(c(257, 285,1889,1380,2146, 1665,364,403,2673,1954, 3037,2357),
dim=c(2,3,2),
list(Brest_Feeding=c("Ever", "Never"),
Categories=c('Owerweight','Not Obese',"Totals"),
Parental_Education= c("High", "Low")
))
df2
## , , Parental_Education = High
##
## Categories
## Brest_Feeding Owerweight Not Obese Totals
## Ever 257 1889 2146
## Never 285 1380 1665
##
## , , Parental_Education = Low
##
## Categories
## Brest_Feeding Owerweight Not Obese Totals
## Ever 364 2673 3037
## Never 403 1954 2357
OR<- 364*1954 /(403*2673)
OR
## [1] 0.6602706
Looks like breastfeeding reduces the odds of overweight.
3.d The relative risk of Overweight (Sub-Obese or Obese) between children who were ever breastfed versus those who were never breastfed, conditional on having parents with high education. Interpret your answer.
RR <- (257/2146)/(285/1665)
RR
## [1] 0.699637
Looks like breastfeeding reduces the risk of overweight among children of highly-educated parents.
References: 1. Colorado state Lesson Notes.(Generalized Liner models) 2. https://online.stat.psu.edu/stat504/lesson/3/3.1/3.1.1