Categorical Data Analysis 2

Introduction

ODDS Ratio: This is the perhaps the most commonly used measure of association. We also use Odds ratio for many log-linar and logistic models.

If odds equal to 1, “success” and “failure” are equally likely.
If odds > 1, then “success” is more likely than “failure”.
If odds < 1, then “success” is less likely than “failure”.

Relative Risk: Relative risk or the risk ratio is the ratio of the probability of an outcome in an exposed group to the probability of an outcome in an unexposed group. We mostly use for comparinf the mall probabilities.

If RR = 1, exposure does not effect on outcome.
If RR > 1, the risk of outcome is decreased by the exposure. -If RR < 1, the risk of outcome is increased by the exposure.

Data:

Diabetes data set: A study was conducted in norther Finland to asses the relationship between Type I diabetes and dietary vitamin D supplementation in humans. All pregnant women living in northern Finland whose due date fell in 1966 were enrolled. Their infants were followed up at one year of age, by interviewing the mothers to see if their infants had received any dietary Vitamin D supplementation. Levels of vitamin D supplementation were Regular, Irregular and None. The children were then tracked over time. Ultimately, of the 10366 children were tracked through 1997 and 81 had been diagnosed with Type I diabetes.
Drug data set: Mynasthia gravis a chronic autoimmune disease that causes weakness in skeletal muscles. In a study described in the New England Journal of Medicine(2016), 126 patients were randonly assigned to receive either drug therapy only or drug therapy plus surgery. One variable of interest was hospitalization for worsening of the disease symptoms. Of the 60 patients randomly assigned to drug therapy, 22 were hospitalized for worsened symptoms. Of the 66 patients randomly assigned to drug therapy plus surgery, 6 were hospitalized for worsened disease symptoms.

Objective:

The goal of this study is to find the conditional probabilities and to figure out when to use ODDS Ratio and relative Risk.

Load Libraries

library(gmodels)
library(RColorBrewer)

#PREPARING WORK SPAcE
# Clear the workspace: 
rm(list = ls())

Creating a Contingency Table for Data 1

# Using matrix function to create 2x2 contingency table
df1<-matrix(c(67,12,2,9057,1198,30),3,2)
dimnames(df1)= list(Supplementation=c('Regular','Irregular', 'None'), Diabetes=c('Type_I', 'No'))

df1

##                Diabetes
## Supplementation Type_I   No
##       Regular       67 9057
##       Irregular     12 1198
##       None           2   30

#Converting into a table
df1 <- as.table(df1)
str(df1)

##  'table' num [1:3, 1:2] 67 12 2 9057 1198 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ Supplementation: chr [1:3] "Regular" "Irregular" "None"
##   ..$ Diabetes       : chr [1:2] "Type_I" "No"

1a. What kind of study is this?

This is a Cohort study. Cohort studies are prospective studies and check for cause and effect. The best interest for this test is Relative Risk, since risk is very small amount. Difference of proportion is not appropriate. This test is based on observational and it has high risk for confounding.

1b. What is the probability of Type I diabetes?

p_hat1 <- colSums(df1)[1] / sum(df1)
p_hat1

##      Type_I 
## 0.007814007

1c. The probability of Type I diabetes given that the infant received regular Vitamin D supplementation.

p_hat2 <- (df1)[1] / (rowSums(df1)[1])  
p_hat2

##    Regular 
## 0.00734327

1d. The probability of Type I diabetes given that the infant received any Vitamin D supplemetation, (regular or irregular)

p_hat3 <- (colSums(df1)[1]-(df1)[3]) / (sum(df1)-(rowSums(df1)[3]))  
p_hat3

##      Type_I 
## 0.007644668

1e. The Relative Risk of Type I diabetes between infants who received any Vitamin D supplementation (regular or irregualar) versus those who received no vitamin D supplementation.

p_hat4 <- (df1)[3] / (rowSums(df1)[3])  
p_hat4

##   None 
## 0.0625

RR <- p_hat3/ p_hat4
RR

##    Type_I 
## 0.1223147

Relative Risk is very low, it is less likely that Vitamin D supplementation has prevention effect on Type I diabetes.

1f. Find Chi Squared test statistics

chisq.test(df1, correct = FALSE)

## 
##  Pearson's Chi-squared test
## 
## data:  df1
## X-squared = 13.295, df = 2, p-value = 0.001297

1g. The researcher was concerned the the small cell counts in some cells of the table, and so used a simulation based approach to compute a p-value for their test statistics. Use the chi -squared result to conduct the relevant hypothesis test. State your conclusion in the context of the problem, noting any appropriate cautions due to the study design.

H0: X (Vitamin D supplementation) and Y (Type I Diabetes) are independent.

HA: X (Vitamin D supplementation) and Y (Type I Diabetes) are NOT independent.

p=0.01142< 0.05. We REJECT H0 because, p value is less than 0.05. So, we conclude that vitamin D supplementation affects Type I diabetes. However, it is an observational study, therefore, there might be a confounder.

Creating a Contingency Table for Data 2

# Using matrix function to create 2x2 contingency table
df2 <- matrix(c(6,22,60,38),2,2)
dimnames(df2) <- list(Drug_Therapy_wsurgery =c('Yes', 'No'), 
                    Hospitalized=c('Yes', 'No')          )

df2

##                      Hospitalized
## Drug_Therapy_wsurgery Yes No
##                   Yes   6 60
##                   No   22 38

#Converting into a table
df2 <- as.table(df2)
str(df2)

##  'table' num [1:2, 1:2] 6 22 60 38
##  - attr(*, "dimnames")=List of 2
##   ..$ Drug_Therapy_wsurgery: chr [1:2] "Yes" "No"
##   ..$ Hospitalized         : chr [1:2] "Yes" "No"

2.a: Estimate and interpret the most relevant parameter of this study.

OR <- (22*60)/(6+60)
OR

## [1] 20

2.b: The standard error of the estimated parameter on the logarithmic scale is 0.42.

Use this information to compute an approximate 95% confidence interval for the parameter. Use the results to conduct the relevant hypothesis test. State your conclusion in the context of the problem, noting any appropriate cautions due to the study.

SE <- 0.42

Confint95 <- 1.96

CI_Lower <- log(OR) - Confint95 * SE

CI_Upper <- log(OR) + Confint95 * SE

#Confidence Interval Lower Level
CI_Lower

## [1] 2.172532

#Confidence Interval Upper Level
CI_Upper

## [1] 3.818932

H0: X(drug+surgery) and Y(hospitalized) are independent.

HA: X(drug+surgery) and Y(hospitalized) are NOT independent.

Since Confidence Interval does not cover zero, we REJECT the Null hypothesis. It means there is dependency that thos patients who are assigned drug and surgery less likely hospitalized because drug and surgery has positive impact on patients.

This is a clinical study it is because randomly assigned those therapist to patients. There is no caution.

References:
1. Colorado state Lesson Notes.(Generalized Liner models)
2. https://online.stat.psu.edu/stat504/lesson/3/3.1/3.1.1

***********************

Categorical Data Analysis 2

Mustafa Arslan

10/22/2021