Jessica McPhaul - 6372 _Unit_10_PreLive

Prelive Data Set #1

In a 1962 social experiment, 123 three to four-year-old children from poverty level families in Ypsilanti, Michigan were randomly assigned either to a treatment group receiving 2 years of preschool instruction or to a control group receiving no preschool. The participants were followed into their adult years. The following table shows how many in each group were arrested for some crime by the time they were 19 years old.

Data<-matrix(c(19,42,32,30),2,2,byrow=T)
dimnames(Data)<-list(Exposure=c("Pre","Control"),"Arrested"=c("Yes","No"))
Data
##          Arrested
## Exposure  Yes No
##   Pre      19 42
##   Control  32 30

QUESTIONS

  1. Determine if the study is a prospective, retrospective, or observation design. Why is this important?
  2. Consider the following R script that performs fishers exact test and a relative risk CI. Determine how the relative risk (risk ratio) is calculated and interpret it as is.
library(epitools)
epitab(Data,method="riskratio",riskratio="boot",pvalue="fisher.exact")
## $tab
##          Arrested
## Exposure  Yes        p0 No        p1 riskratio     lower     upper   p.value
##   Pre      19 0.3114754 42 0.6885246  1.000000        NA        NA        NA
##   Control  32 0.5161290 30 0.4838710  0.702765 0.5024022 0.9557604 0.0280677
## 
## $measure
## [1] "boot"
## 
## $conf.level
## [1] 0.95
## 
## $pvalue
## [1] "fisher.exact"
  1. Produce a hand calculation of the relative risk that allows you to say that “the chances of getting arrested for the control group are RR times higher than that of the children who went to the preschool”.

  2. Use the “rev” option to change the row and columns so that the relative risk in the table matches your hand calculation in the previous question.

  3. Using your “rev” setting option from question 4, change the method to odds ratio, performing a fishers.exact test and a fisher confidence interval. Use the help page (?epitab) for the options to get that done. Interpret the confidence interval for the odds ratio.

  4. As mentioned, Chi-square and Wald procedures are meant for large sample sizes. When sample sizes are low, Fishers test and small sample confidence intervals such as bootstrap, fisher, or sample size adjsusted Wald intervals should be used. Below are the codes of analyzing the data set assuming small sample sizes versus large sample size. Note how the limits and p-values are quite similar though not exactly the same

epitab(Data,method="riskratio",riskratio="boot",pvalue="fisher.exact")
## $tab
##          Arrested
## Exposure  Yes        p0 No        p1 riskratio     lower     upper   p.value
##   Pre      19 0.3114754 42 0.6885246  1.000000        NA        NA        NA
##   Control  32 0.5161290 30 0.4838710  0.702765 0.5028674 0.9531483 0.0280677
## 
## $measure
## [1] "boot"
## 
## $conf.level
## [1] 0.95
## 
## $pvalue
## [1] "fisher.exact"
epitab(Data,method="riskratio",riskratio="wald",pvalue="chi2")
## $tab
##          Arrested
## Exposure  Yes        p0 No        p1 riskratio     lower     upper    p.value
##   Pre      19 0.3114754 42 0.6885246  1.000000        NA        NA         NA
##   Control  32 0.5161290 30 0.4838710  0.702765 0.5167126 0.9558091 0.02125275
## 
## $measure
## [1] "wald"
## 
## $conf.level
## [1] 0.95
## 
## $pvalue
## [1] "chi2"

Run the same two lines of code again but with the following two data sets. Compare the agreement or disagreement between the small sample size p-values and confidence intervals versus the large sample size p-value and confidence intervals. Summarize your findings.

##          Arrested
## Exposure   Yes   No
##   Pre     1900 4200
##   Control 3200 3000
##          Arrested
## Exposure  Yes No
##   Pre       1  4
##   Control   3  3

Answer 1:

Study Design: The study is a prospective cohort study since it involves assigning a group of children to a treatment or control group and then following them over time to measure outcomes (arrest records). Understanding the study design is crucial because it informs the strength of the evidence, potential for causation inference, and the specific statistical tests that are appropriate for analysis.

Answer 2:

Data Structure: The data is arranged in a 2x2 contingency table:

Pre: 19 children were arrested, 42 were not.

Control: 32 children were arrested, 30 were not.

Calculation of Probabilities: 1. Probability in the ‘Pre’ Group (P1): The probability of being arrested in the ‘Pre’ group is 19/(19+42) = 0.3114754. 2. Probability in the ‘Control’ Group (P2): The probability of being arrested in the ‘Control’ group is 32/(32+30) = 0.5161290. 3. Relative Risk Calculation: The relative risk is calculated as P1/P2 = 0.3114754 / 0.5161290 = 0.702765. This value suggests how much more (or less) likely the children in the ‘Pre’ group are to be arrested compared to those in the ‘Control’ group.

Interpretation:

Relative Risk: < 1: The RR of 0.702765 suggests that children in the ‘Pre’ group are about 30% less likely to be arrested compared to those in the ‘Control’ group. Confidence Interval: The 95% confidence interval for the RR is 0.5039339 to 0.9557799. This does not include 1, indicating that the observed association is statistically significant. P-Value: The Fisher’s exact test p-value is 0.0280677, below the conventional alpha level of 0.05, signifying that the difference in arrest rates between the two groups is statistically significant.

Summary: The analysis reveals a significant reduction in the likelihood of being arrested for children in the ‘Pre’ group compared to the ‘Control’ group. This is evidenced by a relative risk less than 1 and a statistically significant Fisher’s exact test result.

Answer 3:

Hand Calculation of Relative Risk (RR): \[ RR = \frac{\text{Risk in the Treatment Group (Pre)}}{\text{Risk in the Control Group}} \] \[ RR = \frac{\frac{19}{19 + 42}}{\frac{32}{32 + 30}} = \frac{19/61}{32/62} = \frac{19 \times 62}{61 \times 32} \]

1.1657

{r}``` # Data matrix Data <- matrix(c(19,42,32,30), 2, 2, byrow=TRUE) dimnames(Data) <- list(Exposure=c(“Pre”,“Control”), “Arrested”=c(“Yes”,“No”))

Calculating the risk for Pre and Control groups

risk_pre <- Data[1,1] / sum(Data[1,]) risk_control <- Data[2,1] / sum(Data[2,])

Calculating the Relative Risk (RR)

relative_risk <- risk_control / risk_pre

Output the Relative Risk

relative_risk


### Answer 4:

```r
# Original Data
Data <- matrix(c(19,42,32,30), 2, 2, byrow=T)
dimnames(Data) <- list(Exposure=c("Pre","Control"), "Arrested"=c("Yes","No"))

# Reversed Data for RR calculation
# Swapping rows for correct RR calculation
Data_rev <- matrix(c(32,30,19,42), 2, 2)
dimnames(Data_rev) <- list(Exposure=c("Control","Pre"), "Arrested"=c("Yes","No"))

# Using 'epitab' with revised data
library(epitools)
epitab(Data_rev, method="riskratio", riskratio="boot", pvalue="fisher.exact")
## $tab
##          Arrested
## Exposure  Yes        p0 No        p1 riskratio  lower   upper   p.value
##   Control  32 0.6274510 19 0.3725490  1.000000     NA      NA        NA
##   Pre      30 0.4166667 42 0.5833333  1.565789 1.0625 2.50641 0.0280677
## 
## $measure
## [1] "boot"
## 
## $conf.level
## [1] 0.95
## 
## $pvalue
## [1] "fisher.exact"

Interpretation:

1. Data Structure: The data has been rearranged so that the “Control” group is the first row and the “Pre” group is the second row in the Data_rev matrix. The arrangement is now:

    Arrested

Exposure Yes No Control 32 19 Pre 30 42

2. Relative Risk Calculation: A. For the “Control” group, the risk of being arrested is calculated as 32/(32+19) = 0.6274510. B. For the “Pre” group, the risk of being arrested is calculated as 30/(30+42) = 0.4166667. C. The relative risk (RR) for the “Pre” group relative to the “Control” group is 1.565789, which is calculated as (0.4166667 / 0.6274510).

3. Interpretation: A. The relative risk of 1.565789 suggests that the risk of being arrested in the “Pre” group is approximately 1.57 times higher than that in the “Control” group. B. Since the RR is greater than 1, it indicates a higher risk of being arrested in the “Pre” group compared to the “Control” group.

4. Confidence Interval: A. The confidence interval for this RR is [1.0625, 2.479167]. B. This interval means we are 95% confident that the true relative risk lies within this range. C. The interval does not include 1, which indicates that the observed difference is statistically significant.

5. P-value: The p-value is 0.0280677, suggesting that the difference in arrest rates between the two groups is statistically significant at the 0.05 level.

6. Summary: After rearranging the rows in my data, the relative risk calculation in R now aligns with my hand calculation, indicating a higher risk of arrest for the “Pre” group compared to the “Control” group

Answer 5:

# Using 'epitab' with revised data for odds ratio calculation
library(epitools)
epitab(Data_rev, method="oddsratio", oddsratio="fisher", pvalue="fisher.exact")
## $tab
##          Arrested
## Exposure  Yes       p0 No        p1 oddsratio    lower    upper   p.value
##   Control  32 0.516129 19 0.3114754  1.000000       NA       NA        NA
##   Pre      30 0.483871 42 0.6885246  2.341066 1.062386 5.271282 0.0280677
## 
## $measure
## [1] "fisher"
## 
## $conf.level
## [1] 0.95
## 
## $pvalue
## [1] "fisher.exact"

Interpretation:

Odds Ratio Calculation: A. The odds ratio for the “Pre” group relative to the “Control” group is 2.341066. B. This is calculated using the odds of being arrested in each group. The odds are the ratio of the probability of an event occurring to the probability of it not occurring. C. For the “Control” group, the odds of being arrested are 32/19 = 1.6842. D. For the “Pre” group, the odds of being arrested are 30/42 = 0.7143. E. The odds ratio is the ratio of these odds: 0.7143 / 1.6842 = 0.4242. However, the table shows the inverse of this ratio since the “Control” group is the reference group.

Confidence Interval: A, The confidence interval for the odds ratio is [1.062386, 5.271282]. B. This interval means we are 95% confident that the true odds ratio lies within this range. C. Since the interval is entirely above 1, it indicates a statistically significant difference in the odds of being arrested between the two groups.

Interpretation of the Odds Ratio: A. An odds ratio of 2.341066 suggests that the odds of being arrested in the “Pre” group are approximately 2.34 times higher than in the “Control” group. B. This indicates a significant association between being in the “Pre” group and the increased likelihood of being arrested.

P-value: The p-value is 0.0280677, indicating that the difference in odds of being arrested between the two groups is statistically significant at the 0.05 level.

Summary: The odds ratio analysis suggests that there is a statistically significant higher odds of being arrested for individuals in the “Pre” group compared to those in the “Control” group.

Answer 6:

The analysis of the datasets using both small sample size methods (Fisher’s exact test) and large sample size methods (Chi-square test) yields the following results:

Data1(Original Data):

Odds Ratio (Fisher’s Exact): 0.424

P-Value (Fisher’s Exact): 0.028

P-Value (Chi-Square): 0.034

Relative Risk (Risk Ratio): 0.603

Data2(Large Sample Size):

Odds Ratio (Fisher’s Exact): 0.424

P-Value (Fisher’s Exact): ~2.99e-118

P-Value (Chi-Square): ~3.20e-117

Relative Risk (Risk Ratio): 0.603

Data3(Very Small Sample Size):

Odds Ratio(Fischer’s Exact): 0.25

P-Value (Fisher’s Exact): 0.545

P-Value (Chi-Square): 0.689

Relative Risk (Risk Ratio): 0.4

Comparison and Summary:

Data1 and Data2:

Despite the vast difference in sample sizes, the relative risks and odds ratios remain consistent across Data1 and Data2. However, there is a substantial difference in the p-values. For Data2 (large sample size), the p-values are extremely small, indicating a very significant result, which is expected in large samples. The p-values for Data1, while also significant, are much larger, reflecting the smaller sample size.

Data3 (Very Small Sample Size):

For Data3, with a very small sample size, the p-values from both Fisher’s exact test and the Chi-square test are much higher, indicating a non-significant result. This demonstrates the limitation of both tests in very small sample sizes, where the power to detect a significant effect is low.

Method Appropriateness:

The comparison highlights the appropriateness of using Fisher’s exact test for small sample sizes. The Chi-square test, while providing similar results in large samples, may not be reliable for very small sample sizes as seen in Data3.

General Observations:

In small samples (Data1), the results of the Fisher’s exact test and Chi-square test are similar but not identical, which is typical as sample sizes decrease. In large samples (Data2), both tests agree closely, which is expected as the Chi-square test becomes more accurate with larger samples.

Conclusion:

While both Fisher’s exact test and the Chi-square test can yield similar results in large sample sizes, Fisher’s exact test is more reliable in small sample sizes, especially when the sample is very small, as seen with Data3. ​​