Introduction

ODDS Ratio: This is the perhaps the most commonly used measure of association. We also use Odds ratio for many log-linar and logistic models.

Relative Risk: Relative risk or the risk ratio is the ratio of the probability of an outcome in an exposed group to the probability of an outcome in an unexposed group. We mostly use for comparinf the mall probabilities.

Data:

Children were then classiffed into one of three categories: not obese (BMI <= 90th percentile); sub-obese (90th percentile < BMI <= 97th percentile); or obese (BMI > 97th percentile).

Objective:

The goal of this study is to find the conditional probabilities and to use and interpret ODDS Ratio and relative Risk.



Load Libraries

library(gmodels)
library(RColorBrewer)
#PREPARING WORK SPAcE
# Clear the workspace: 
rm(list = ls())

Creating a Contingency Table for Data 1

# Using matrix function to create 2x2 contingency table
df1<-matrix(c(5,21,2058,4887),2,2)
dimnames(df1)= list(Regular_Tea_Drinker=c('Yes','No'), Colorectal_Cancer=c('Yes', 'No'))

df1
##                    Colorectal_Cancer
## Regular_Tea_Drinker Yes   No
##                 Yes   5 2058
##                 No   21 4887
#Converting into a table
df1 <- as.table(df1)
str(df1)
##  'table' num [1:2, 1:2] 5 21 2058 4887
##  - attr(*, "dimnames")=List of 2
##   ..$ Regular_Tea_Drinker: chr [1:2] "Yes" "No"
##   ..$ Colorectal_Cancer  : chr [1:2] "Yes" "No"

1a. What kind of study is this? Explain briefly. This is a cohort study, with women self-selecting as regular green tea drinkers or not. A weakness of the study is self-selection, since there may be many other factors other than tea drinking that might explain any differences between the groups.

1b. Estimate an appropriate parameter and interpret your point estimate, noting any appropriate cautions due to the study design. (You do not need to estimate a standard error for purposes of this problem.)

p_hat3 <- (df1)[1] / rowSums(df1)[1]  
p_hat3
##         Yes 
## 0.002423655
p_hat4 <- (df1)[2] / rowSums(df1)[2]  
p_hat4
##          No 
## 0.004278729
RR <- p_hat3/ p_hat4
RR  
##       Yes 
## 0.5664428

Most appropriate is relative risk: the risk is quite low, so difference of proportions is not as informative, and the cohort study design allows estimation of relative risk, which is of most interest.

The estimated relative risk is 0.56 meaning regular tea drinkers have about half the risk of non-regular tea drinkers of developing Colorectal cancer. The caution is that that due to self-selection, there may be many other factors other than tea drinking that could explain the difference in relative risks. If this result was in fact significant, it would indicate that green tea drinking might be protective.



Creating a Contingency Table for Data 2

# Using matrix function to create 2x2 contingency table
df2 <- matrix(c(60,57,189,279),2,2)
dimnames(df2) <- list(fenoterol_prescription =c('Yes', 'No'), 
                    Asthma_Death=c('Yes', 'No')          )

df2
##                       Asthma_Death
## fenoterol_prescription Yes  No
##                    Yes  60 189
##                    No   57 279
#Converting into a table
df2 <- as.table(df2)
str(df2)
##  'table' num [1:2, 1:2] 60 57 189 279
##  - attr(*, "dimnames")=List of 2
##   ..$ fenoterol_prescription: chr [1:2] "Yes" "No"
##   ..$ Asthma_Death          : chr [1:2] "Yes" "No"

2.a What kind of study is this? Explain briefy.

This is a case-control study with 117 asthma deaths as cases and four matched controls per case.

2.b Estimate and interpret the most relevant parameter of this study.

OR <- (60*279)/(57*189)
OR
## [1] 1.553885

Since this is a case-control study, we need to estimate the odds ratio= 1:553885; which is considerably larger than one. Assuming this difference is statistically significant, we would reject the hypothesis of independence and conclude that fenoterol does increase the risk of asthma death.

Creating a Contingency Table for Data 3

# Using matrix function to create 2x2 contingency table
df2 <- array(c(1889,1380,197,210, 60,75,2146, 1665,2673,1954,279,297, 85,106, 3037,2357),
             dim=c(2,4,2),
             list(Brest_Feeding=c("Ever", "Never"),
                      Categories=c('Not Obese', 'Sub-Obese', 'Obese',"Totals"),
                      Parental_Education= c("High", "Low")
                      ))

df2
## , , Parental_Education = High
## 
##              Categories
## Brest_Feeding Not Obese Sub-Obese Obese Totals
##         Ever       1889       197    60   2146
##         Never      1380       210    75   1665
## 
## , , Parental_Education = Low
## 
##              Categories
## Brest_Feeding Not Obese Sub-Obese Obese Totals
##         Ever       2673       279    85   3037
##         Never      1954       297   106   2357
#Converting into a table
df2 <- as.table(df2)
str(df2)
##  'table' num [1:2, 1:4, 1:2] 1889 1380 197 210 60 ...
##  - attr(*, "dimnames")=List of 3
##   ..$ Brest_Feeding     : chr [1:2] "Ever" "Never"
##   ..$ Categories        : chr [1:4] "Not Obese" "Sub-Obese" "Obese" "Totals"
##   ..$ Parental_Education: chr [1:2] "High" "Low"

3.aWhat kind of study is this? Explain briefy. This is a cross-sectional study, with all children selected simultaneously and then cross-classiffed on weight, breastfeeding, and parental education.

3.b The data are repeated here for convenience. Use the table to estimate the following quantities:

prob1 <- (60+75+ 85+106)/ (2146+1665+3037+2357)

prob1
## [1] 0.03541554
prob2 <- (60+75)/ (2146+1665+3037+2357)

prob2
## [1] 0.01466594
prob3 <- (60+75)/ (2146+1665)

prob3
## [1] 0.03542377
prob4 <- (60+85)/ (2146+3037)

prob4
## [1] 0.02797608

3.c Consider collapsing Sub-Obese and Obese into a single category, Overweight. Given that a child has parents with low education, estimate the conditional odds ratio for Overweight between children who were ever breastfed versus those who were never breastfed. Interpret your answer.

Creating a New Contingency Table for Data 3

# Using matrix function to create 2x2 contingency table
df2 <- array(c(257, 285,1889,1380,2146, 1665,364,403,2673,1954, 3037,2357),
             dim=c(2,3,2),
             list(Brest_Feeding=c("Ever", "Never"),
                      Categories=c('Owerweight','Not Obese',"Totals"),
                      Parental_Education= c("High", "Low")
                      ))

df2
## , , Parental_Education = High
## 
##              Categories
## Brest_Feeding Owerweight Not Obese Totals
##         Ever         257      1889   2146
##         Never        285      1380   1665
## 
## , , Parental_Education = Low
## 
##              Categories
## Brest_Feeding Owerweight Not Obese Totals
##         Ever         364      2673   3037
##         Never        403      1954   2357
OR<- 364*1954 /(403*2673)
  
OR
## [1] 0.6602706

Looks like breastfeeding reduces the odds of overweight.

3.d The relative risk of Overweight (Sub-Obese or Obese) between children who were ever breastfed versus those who were never breastfed, conditional on having parents with high education. Interpret your answer.

RR <- (257/2146)/(285/1665)
  
RR
## [1] 0.699637

Looks like breastfeeding reduces the risk of overweight among children of highly-educated parents.



References:
1. Colorado state Lesson Notes.(Generalized Liner models)
2. https://online.stat.psu.edu/stat504/lesson/3/3.1/3.1.1




***********************