Introduction

ODDS Ratio: This is the perhaps the most commonly used measure of association. We also use Odds ratio for many log-linar and logistic models.

Relative Risk: Relative risk or the risk ratio is the ratio of the probability of an outcome in an exposed group to the probability of an outcome in an unexposed group. We mostly use for comparinf the mall probabilities.

Data:

Objective:

The goal of this study is to find the conditional probabilities and to figure out when to use ODDS Ratio and relative Risk.



Load Libraries

library(gmodels)
library(RColorBrewer)
#PREPARING WORK SPAcE
# Clear the workspace: 
rm(list = ls())

Creating a Contingency Table for Data 1

# Using matrix function to create 2x2 contingency table
df1<-matrix(c(67,12,2,9057,1198,30),3,2)
dimnames(df1)= list(Supplementation=c('Regular','Irregular', 'None'), Diabetes=c('Type_I', 'No'))

df1
##                Diabetes
## Supplementation Type_I   No
##       Regular       67 9057
##       Irregular     12 1198
##       None           2   30
#Converting into a table
df1 <- as.table(df1)
str(df1)
##  'table' num [1:3, 1:2] 67 12 2 9057 1198 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ Supplementation: chr [1:3] "Regular" "Irregular" "None"
##   ..$ Diabetes       : chr [1:2] "Type_I" "No"

1a. What kind of study is this?

This is a Cohort study. Cohort studies are prospective studies and check for cause and effect. The best interest for this test is Relative Risk, since risk is very small amount. Difference of proportion is not appropriate. This test is based on observational and it has high risk for confounding.

1b. What is the probability of Type I diabetes?

p_hat1 <- colSums(df1)[1] / sum(df1)
p_hat1
##      Type_I 
## 0.007814007

1c. The probability of Type I diabetes given that the infant received regular Vitamin D supplementation.

p_hat2 <- (df1)[1] / (rowSums(df1)[1])  
p_hat2
##    Regular 
## 0.00734327

1d. The probability of Type I diabetes given that the infant received any Vitamin D supplemetation, (regular or irregular)

p_hat3 <- (colSums(df1)[1]-(df1)[3]) / (sum(df1)-(rowSums(df1)[3]))  
p_hat3
##      Type_I 
## 0.007644668

1e. The Relative Risk of Type I diabetes between infants who received any Vitamin D supplementation (regular or irregualar) versus those who received no vitamin D supplementation.

p_hat4 <- (df1)[3] / (rowSums(df1)[3])  
p_hat4
##   None 
## 0.0625
RR <- p_hat3/ p_hat4
RR
##    Type_I 
## 0.1223147

Relative Risk is very low, it is less likely that Vitamin D supplementation has prevention effect on Type I diabetes.

1f. Find Chi Squared test statistics

chisq.test(df1, correct = FALSE)
## 
##  Pearson's Chi-squared test
## 
## data:  df1
## X-squared = 13.295, df = 2, p-value = 0.001297

1g. The researcher was concerned the the small cell counts in some cells of the table, and so used a simulation based approach to compute a p-value for their test statistics. Use the chi -squared result to conduct the relevant hypothesis test. State your conclusion in the context of the problem, noting any appropriate cautions due to the study design.

H0: X (Vitamin D supplementation) and Y (Type I Diabetes) are independent.

HA: X (Vitamin D supplementation) and Y (Type I Diabetes) are NOT independent.

p=0.01142< 0.05. We REJECT H0 because, p value is less than 0.05. So, we conclude that vitamin D supplementation affects Type I diabetes. However, it is an observational study, therefore, there might be a confounder.



Creating a Contingency Table for Data 2

# Using matrix function to create 2x2 contingency table
df2 <- matrix(c(6,22,60,38),2,2)
dimnames(df2) <- list(Drug_Therapy_wsurgery =c('Yes', 'No'), 
                    Hospitalized=c('Yes', 'No')          )

df2
##                      Hospitalized
## Drug_Therapy_wsurgery Yes No
##                   Yes   6 60
##                   No   22 38
#Converting into a table
df2 <- as.table(df2)
str(df2)
##  'table' num [1:2, 1:2] 6 22 60 38
##  - attr(*, "dimnames")=List of 2
##   ..$ Drug_Therapy_wsurgery: chr [1:2] "Yes" "No"
##   ..$ Hospitalized         : chr [1:2] "Yes" "No"

2.a: Estimate and interpret the most relevant parameter of this study.

OR <- (22*60)/(6+60)
OR
## [1] 20

2.b: The standard error of the estimated parameter on the logarithmic scale is 0.42.

Use this information to compute an approximate 95% confidence interval for the parameter. Use the results to conduct the relevant hypothesis test. State your conclusion in the context of the problem, noting any appropriate cautions due to the study.

SE <- 0.42

Confint95 <- 1.96

CI_Lower <- log(OR) - Confint95 * SE

CI_Upper <- log(OR) + Confint95 * SE

#Confidence Interval Lower Level
CI_Lower
## [1] 2.172532
#Confidence Interval Upper Level
CI_Upper
## [1] 3.818932

H0: X(drug+surgery) and Y(hospitalized) are independent.

HA: X(drug+surgery) and Y(hospitalized) are NOT independent.

Since Confidence Interval does not cover zero, we REJECT the Null hypothesis. It means there is dependency that thos patients who are assigned drug and surgery less likely hospitalized because drug and surgery has positive impact on patients.

This is a clinical study it is because randomly assigned those therapist to patients. There is no caution.

References:
1. Colorado state Lesson Notes.(Generalized Liner models)
2. https://online.stat.psu.edu/stat504/lesson/3/3.1/3.1.1




***********************