Jessica McPhaul - 6372 _Unit_10_PreLive
In a 1962 social experiment, 123 three to four-year-old children from poverty level families in Ypsilanti, Michigan were randomly assigned either to a treatment group receiving 2 years of preschool instruction or to a control group receiving no preschool. The participants were followed into their adult years. The following table shows how many in each group were arrested for some crime by the time they were 19 years old.
Data<-matrix(c(19,42,32,30),2,2,byrow=T)
dimnames(Data)<-list(Exposure=c("Pre","Control"),"Arrested"=c("Yes","No"))
Data
## Arrested
## Exposure Yes No
## Pre 19 42
## Control 32 30
QUESTIONS
library(epitools)
epitab(Data,method="riskratio",riskratio="boot",pvalue="fisher.exact")
## $tab
## Arrested
## Exposure Yes p0 No p1 riskratio lower upper p.value
## Pre 19 0.3114754 42 0.6885246 1.000000 NA NA NA
## Control 32 0.5161290 30 0.4838710 0.702765 0.5024022 0.9557604 0.0280677
##
## $measure
## [1] "boot"
##
## $conf.level
## [1] 0.95
##
## $pvalue
## [1] "fisher.exact"
Produce a hand calculation of the relative risk that allows you to say that “the chances of getting arrested for the control group are RR times higher than that of the children who went to the preschool”.
Use the “rev” option to change the row and columns so that the relative risk in the table matches your hand calculation in the previous question.
Using your “rev” setting option from question 4, change the method to odds ratio, performing a fishers.exact test and a fisher confidence interval. Use the help page (?epitab) for the options to get that done. Interpret the confidence interval for the odds ratio.
As mentioned, Chi-square and Wald procedures are meant for large sample sizes. When sample sizes are low, Fishers test and small sample confidence intervals such as bootstrap, fisher, or sample size adjsusted Wald intervals should be used. Below are the codes of analyzing the data set assuming small sample sizes versus large sample size. Note how the limits and p-values are quite similar though not exactly the same
epitab(Data,method="riskratio",riskratio="boot",pvalue="fisher.exact")
## $tab
## Arrested
## Exposure Yes p0 No p1 riskratio lower upper p.value
## Pre 19 0.3114754 42 0.6885246 1.000000 NA NA NA
## Control 32 0.5161290 30 0.4838710 0.702765 0.5028674 0.9531483 0.0280677
##
## $measure
## [1] "boot"
##
## $conf.level
## [1] 0.95
##
## $pvalue
## [1] "fisher.exact"
epitab(Data,method="riskratio",riskratio="wald",pvalue="chi2")
## $tab
## Arrested
## Exposure Yes p0 No p1 riskratio lower upper p.value
## Pre 19 0.3114754 42 0.6885246 1.000000 NA NA NA
## Control 32 0.5161290 30 0.4838710 0.702765 0.5167126 0.9558091 0.02125275
##
## $measure
## [1] "wald"
##
## $conf.level
## [1] 0.95
##
## $pvalue
## [1] "chi2"
Run the same two lines of code again but with the following two data sets. Compare the agreement or disagreement between the small sample size p-values and confidence intervals versus the large sample size p-value and confidence intervals. Summarize your findings.
## Arrested
## Exposure Yes No
## Pre 1900 4200
## Control 3200 3000
## Arrested
## Exposure Yes No
## Pre 1 4
## Control 3 3
Study Design: The study is a prospective cohort study since it involves assigning a group of children to a treatment or control group and then following them over time to measure outcomes (arrest records). Understanding the study design is crucial because it informs the strength of the evidence, potential for causation inference, and the specific statistical tests that are appropriate for analysis.
Data Structure: The data is arranged in a 2x2 contingency table:
Pre: 19 children were arrested, 42 were not.
Control: 32 children were arrested, 30 were not.
Calculation of Probabilities: 1. Probability in the ‘Pre’ Group (P1): The probability of being arrested in the ‘Pre’ group is 19/(19+42) = 0.3114754. 2. Probability in the ‘Control’ Group (P2): The probability of being arrested in the ‘Control’ group is 32/(32+30) = 0.5161290. 3. Relative Risk Calculation: The relative risk is calculated as P1/P2 = 0.3114754 / 0.5161290 = 0.702765. This value suggests how much more (or less) likely the children in the ‘Pre’ group are to be arrested compared to those in the ‘Control’ group.
Relative Risk: < 1: The RR of 0.702765 suggests that children in the ‘Pre’ group are about 30% less likely to be arrested compared to those in the ‘Control’ group. Confidence Interval: The 95% confidence interval for the RR is 0.5039339 to 0.9557799. This does not include 1, indicating that the observed association is statistically significant. P-Value: The Fisher’s exact test p-value is 0.0280677, below the conventional alpha level of 0.05, signifying that the difference in arrest rates between the two groups is statistically significant.
Summary: The analysis reveals a significant reduction in the likelihood of being arrested for children in the ‘Pre’ group compared to the ‘Control’ group. This is evidenced by a relative risk less than 1 and a statistically significant Fisher’s exact test result.
Hand Calculation of Relative Risk (RR): \[ RR = \frac{\text{Risk in the Treatment Group (Pre)}}{\text{Risk in the Control Group}} \] \[ RR = \frac{\frac{19}{19 + 42}}{\frac{32}{32 + 30}} = \frac{19/61}{32/62} = \frac{19 \times 62}{61 \times 32} \]
1.1657
{r}``` # Data matrix Data <- matrix(c(19,42,32,30), 2, 2, byrow=TRUE) dimnames(Data) <- list(Exposure=c(“Pre”,“Control”), “Arrested”=c(“Yes”,“No”))
risk_pre <- Data[1,1] / sum(Data[1,]) risk_control <- Data[2,1] / sum(Data[2,])
relative_risk <- risk_control / risk_pre
relative_risk
### Answer 4:
```r
# Original Data
Data <- matrix(c(19,42,32,30), 2, 2, byrow=T)
dimnames(Data) <- list(Exposure=c("Pre","Control"), "Arrested"=c("Yes","No"))
# Reversed Data for RR calculation
# Swapping rows for correct RR calculation
Data_rev <- matrix(c(32,30,19,42), 2, 2)
dimnames(Data_rev) <- list(Exposure=c("Control","Pre"), "Arrested"=c("Yes","No"))
# Using 'epitab' with revised data
library(epitools)
epitab(Data_rev, method="riskratio", riskratio="boot", pvalue="fisher.exact")
## $tab
## Arrested
## Exposure Yes p0 No p1 riskratio lower upper p.value
## Control 32 0.6274510 19 0.3725490 1.000000 NA NA NA
## Pre 30 0.4166667 42 0.5833333 1.565789 1.0625 2.50641 0.0280677
##
## $measure
## [1] "boot"
##
## $conf.level
## [1] 0.95
##
## $pvalue
## [1] "fisher.exact"
1. Data Structure: The data has been rearranged so that the “Control” group is the first row and the “Pre” group is the second row in the Data_rev matrix. The arrangement is now:
Arrested
Exposure Yes No Control 32 19 Pre 30 42
2. Relative Risk Calculation: A. For the “Control” group, the risk of being arrested is calculated as 32/(32+19) = 0.6274510. B. For the “Pre” group, the risk of being arrested is calculated as 30/(30+42) = 0.4166667. C. The relative risk (RR) for the “Pre” group relative to the “Control” group is 1.565789, which is calculated as (0.4166667 / 0.6274510).
3. Interpretation: A. The relative risk of 1.565789 suggests that the risk of being arrested in the “Pre” group is approximately 1.57 times higher than that in the “Control” group. B. Since the RR is greater than 1, it indicates a higher risk of being arrested in the “Pre” group compared to the “Control” group.
4. Confidence Interval: A. The confidence interval for this RR is [1.0625, 2.479167]. B. This interval means we are 95% confident that the true relative risk lies within this range. C. The interval does not include 1, which indicates that the observed difference is statistically significant.
5. P-value: The p-value is 0.0280677, suggesting that the difference in arrest rates between the two groups is statistically significant at the 0.05 level.
6. Summary: After rearranging the rows in my data, the relative risk calculation in R now aligns with my hand calculation, indicating a higher risk of arrest for the “Pre” group compared to the “Control” group
# Using 'epitab' with revised data for odds ratio calculation
library(epitools)
epitab(Data_rev, method="oddsratio", oddsratio="fisher", pvalue="fisher.exact")
## $tab
## Arrested
## Exposure Yes p0 No p1 oddsratio lower upper p.value
## Control 32 0.516129 19 0.3114754 1.000000 NA NA NA
## Pre 30 0.483871 42 0.6885246 2.341066 1.062386 5.271282 0.0280677
##
## $measure
## [1] "fisher"
##
## $conf.level
## [1] 0.95
##
## $pvalue
## [1] "fisher.exact"
Odds Ratio Calculation: A. The odds ratio for the “Pre” group relative to the “Control” group is 2.341066. B. This is calculated using the odds of being arrested in each group. The odds are the ratio of the probability of an event occurring to the probability of it not occurring. C. For the “Control” group, the odds of being arrested are 32/19 = 1.6842. D. For the “Pre” group, the odds of being arrested are 30/42 = 0.7143. E. The odds ratio is the ratio of these odds: 0.7143 / 1.6842 = 0.4242. However, the table shows the inverse of this ratio since the “Control” group is the reference group.
Confidence Interval: A, The confidence interval for the odds ratio is [1.062386, 5.271282]. B. This interval means we are 95% confident that the true odds ratio lies within this range. C. Since the interval is entirely above 1, it indicates a statistically significant difference in the odds of being arrested between the two groups.
Interpretation of the Odds Ratio: A. An odds ratio of 2.341066 suggests that the odds of being arrested in the “Pre” group are approximately 2.34 times higher than in the “Control” group. B. This indicates a significant association between being in the “Pre” group and the increased likelihood of being arrested.
P-value: The p-value is 0.0280677, indicating that the difference in odds of being arrested between the two groups is statistically significant at the 0.05 level.
Summary: The odds ratio analysis suggests that there is a statistically significant higher odds of being arrested for individuals in the “Pre” group compared to those in the “Control” group.
The analysis of the datasets using both small sample size methods (Fisher’s exact test) and large sample size methods (Chi-square test) yields the following results:
Odds Ratio (Fisher’s Exact): 0.424
P-Value (Fisher’s Exact): 0.028
P-Value (Chi-Square): 0.034
Relative Risk (Risk Ratio): 0.603
Odds Ratio (Fisher’s Exact): 0.424
P-Value (Fisher’s Exact): ~2.99e-118
P-Value (Chi-Square): ~3.20e-117
Relative Risk (Risk Ratio): 0.603
Odds Ratio(Fischer’s Exact): 0.25
P-Value (Fisher’s Exact): 0.545
P-Value (Chi-Square): 0.689
Relative Risk (Risk Ratio): 0.4
Despite the vast difference in sample sizes, the relative risks and odds ratios remain consistent across Data1 and Data2. However, there is a substantial difference in the p-values. For Data2 (large sample size), the p-values are extremely small, indicating a very significant result, which is expected in large samples. The p-values for Data1, while also significant, are much larger, reflecting the smaller sample size.
For Data3, with a very small sample size, the p-values from both Fisher’s exact test and the Chi-square test are much higher, indicating a non-significant result. This demonstrates the limitation of both tests in very small sample sizes, where the power to detect a significant effect is low.
Method Appropriateness:
The comparison highlights the appropriateness of using Fisher’s exact test for small sample sizes. The Chi-square test, while providing similar results in large samples, may not be reliable for very small sample sizes as seen in Data3.
General Observations:
In small samples (Data1), the results of the Fisher’s exact test and Chi-square test are similar but not identical, which is typical as sample sizes decrease. In large samples (Data2), both tests agree closely, which is expected as the Chi-square test becomes more accurate with larger samples.
Conclusion:
While both Fisher’s exact test and the Chi-square test can yield similar results in large sample sizes, Fisher’s exact test is more reliable in small sample sizes, especially when the sample is very small, as seen with Data3.