Unit 10 HW

Jessica McPhaul - 6372 _Unit_10_Homework

Exercise 1: Conceptual Questions

State under what circumstance a difference in proportion confidence interval should not be used in favor of an odds ratio metric.
Under what sampling schemes can a hypothesis test be generally worded as a “test for association”, rather than a test for difference in proportion, odds, or relative risk?
Give an example of study that could not be conducted in a prospective manner due to either logistical or ethical considerations.
What metric allows for the following interpretation: “The chances of a person getting the flu who has not taken the flu vaccine is 30 percent (1.3 times) higher than someone who has taken the flu vaccine”?

ANSWER Exercise 1:

Difference in Proportion Confidence Interval vs. Odds Ratio Metric:
In my understanding, a difference in proportion confidence interval may not be appropriate when the outcome of interest is rare or when the sample sizes in the groups being compared are very unbalanced. In such situations, the odds ratio metric is preferred because it remains stable and interpretable even w rare outcomes and unequal sample sizes.. It provides a measure OF THE STRENGTH OF THE ASSOCIAION betweem exposure & outcome.
Sampling Schemes for “Test for Association”: A hypothesis test can be generally worded as a “test for association” under sampling schemes that involve collecting data in a way that allows for examining the relationship between variables without necessarily comparing proportions, odds, or relative risks directly. This includes cross-sectional studies, cohort studies, and case-control studies. In these designs, the primary interest often lies in understanding whether there is a statistical relationship between two or more variables (e.g., exposure and outcome) across different groups within the population.
Example of a Study Not Conducted Prospectively: In a study investigating the effects of smoking during pregnancy on the development of asthma in children, I would be unable to ethically design a prospective study for consderation as it would be unethical to assign pregnant women to smoke during their pregnancy to observe the effects on their unborn children. Furthermore, logistically, a prospective study design requiring long-term follow-up from pre-pregnancy through birth could be challenging due to participant retention and the need to control for numerous confounding variables over time.
Metric for Interpretation: The metric suitable for the interpretation provided is the relative risk (RR), also known as the risk ratio. This statement is about comparing the likelihood of getting the flu between two groups: those who have not received the flu vaccine and those who have. A relative risk value of 1.3 means that the group not vaccinated against the flu has a 30% higher chance of catching the flu compared to the vaccinated group. Essentially, RR quantifies how much more likely an event (like getting the flu) is to happen in the exposed group (unvaccinated) relative to the unexposed group (vaccinated), with a 1.3 RR indicating a significantly increased risk for the unvaccinated group.

Exercise 2: Vitamin C Study

Consider the vitamin C study previously described. Use the workflow discussed in class to write a final report that includes what the study design is, the report and conclusion of a test, and report and conclusion of a confidence interval.

VitC<-matrix(c(335,76,302,105),2,2,byrow=T)
dimnames(VitC)<-list(Exposure=c("Placebo","Vit C"),"Arrested"=c("Cold","No Cold"))
VitC

##          Arrested
## Exposure  Cold No Cold
##   Placebo  335      76
##   Vit C    302     105

ANSWER EXERCISE 2:

STUDY DESIGN: This was an observational study where participants were either given a placebo or Vitamin C, and researchers noted whether these participants developed a cold or not. The results were compiled into the following matrix, w/ the goal to assess whether taking Vitamin C reduced the incidence of the common cold compared to a placeb: Arrested Exposure Cold No Cold Placebo 335 76 Vit C 302 105

Analysis and Results: Test for Association (Chi-Square Test):

# Create the matrix
VitC <- matrix(c(335, 76, 302, 105), 2, 2, byrow = TRUE)
dimnames(VitC) <- list(Exposure = c("Placebo", "Vit C"), "Arrested" = c("Cold", "No Cold"))

# Perform Fisher's Exact Test
fisher_result <- fisher.test(VitC)

# Print the results
fisher_result

## 
##  Fisher's Exact Test for Count Data
## 
## data:  VitC
## p-value = 0.01444
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  1.083492 2.172490
## sample estimates:
## odds ratio 
##   1.531722

Reflecting on the results from Fisher’s Exact Test on the Vitamin C study data I analyzed:

P-value: The test yielded a p-value of 0.01444. This value is below the commonly used significance threshold of 0.05, indicating that the differences observed in cold incidence between the Vitamin C and placebo groups are statistically significant. In simpler terms, there’s a low probability that the observed association happened by chance under the null hypothesis of no association.

Confidence Interval for the Odds Ratio: The 95% confidence interval for the odds ratio ranges from 1.083492 to 2.172490. This interval does not include 1, suggesting a significant association between Vitamin C intake and cold incidence. More specifically, the interval suggests that those taking Vitamin C are between approximately 1.08 to 2.17 times more likely to not catch a cold compared to those taking a placebo.

Interpretation Interpretation: Based on the p-value and the confidence interval, my analysis suggests that Vitamin C intake is associated with a lower incidence of colds. This implies that participants who took Vitamin C were significantly less likely to catch a cold compared to those who took a placebo.

It’s important for me to note in my report that, although the results suggest a beneficial effect of Vitamin C on preventing colds, the study design—whether it was randomized or not—plays a crucial role in interpreting these findings. If it wasn’t a randomized controlled trial, other unmeasured factors might influence the incidence of colds, and the observed association might not purely be due to the effect of Vitamin C.

Therefore, while the statistical analysis indicates a significant association, I should conclude cautiously, highlighting the need for further research, preferably through randomized controlled trials, to confirm these findings and establish a causal relationship between Vitamin C intake and cold prevention.

Exercise 3: Breast Cancer and Alcohol

Researchers took random samples of 534 women who had breast cancer and 1044 women who did not have cancer. They reached out individually and asked the question, “Do you have fewer than four drinks per week?” of which they could answer yes or no. The table of the results follow.

The goal of the study was to assess the risk of drinking on breast cancer. Use the workflow discussed in class to write a final report that includes what the study design is, the test being used, the conclusion of the test, and interpretation of a confidence interval.

Drinks<-matrix(c(330,658,204,386),2,2,byrow=T)
dimnames(Drinks)<-list(Drinks=c("Fewer than 4","4 or more"),"Status"=c("Cancer","Control"))
Drinks

##               Status
## Drinks         Cancer Control
##   Fewer than 4    330     658
##   4 or more       204     386

ANSWER EXERCISE 3:

Study Design: This is an observational study where participants were divided into 2 groups based on their health status (cancer or control) and asked about their drinking habits. It was designed to explore the relationship between alcohol consumption (namely imbibing less than 4 drinks per week) and the instance of breast cancer.

Data Description The subjects comprised two cohorts: 534 women with a breast cancer diagnosis and 1044 women without cancer, serving as the control group. Each participant was questioned individually regarding their alcohol consumption, specifically whether they had fewer than four drinks per week, allowing for binary responses of ‘yes’ or ‘no.’

Data Collection The responses were organized into a 2x2 contingency table based on the alcohol consumption frequency and cancer status, as follows:

       | Cancer | Control |

———–|——–|———| < 4 drinks | 330 | 658 | ≥ 4 drinks | 204 | 386 |

Statistrical Method A Chi-square test for independence was utilized to analyze the association btwn reported: alcohol consumption and breast cancer incidents. This statistical test is appropriate for determining if there’s a significant association between two categorical variables.

library(MASS)
# Create the matrix of observed frequencies
Drinks <- matrix(c(330, 204, 658, 386), nrow = 2, byrow = TRUE)
dimnames(Drinks) <- list(Drinks = c("Fewer than 4", "4 or more"),
                         Status = c("Cancer", "Control"))

# View the table
Drinks

##               Status
## Drinks         Cancer Control
##   Fewer than 4    330     204
##   4 or more       658     386

# Perform the Chi-square test of independence
chi_test_result <- chisq.test(Drinks)

# Print the test result
chi_test_result

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  Drinks
## X-squared = 0.1785, df = 1, p-value = 0.6727

# Display the p-value and the expected frequencies 
chi_test_result$p.value

## [1] 0.6726679

chi_test_result$expected

##               Status
## Drinks           Cancer  Control
##   Fewer than 4 334.3422 199.6578
##   4 or more    653.6578 390.3422

Analysis and Results: Test for Association (Chi-Square Test): To analyze the data, I used the Chi-square test to see if there was a significant difference in drinking habits between the two groups. Pvalues My Chi-square test showed a statistic of about 0.178 and a p-value of around 0.673. These numbers suggest there’s no strong evidence of a link between drinking frequency and breast cancer risk in my study sample. The expected numbers—around 334 women with cancer drinking fewer than four drinks a week, and about 200 women with cancer drinking four or more—were pretty close to what I actually observed. This outcome indicates that there is not a statistically significant relationship between the frequency of alcohol consumption (specifically, having fewer than four drinks per week versus four or more) and the incidence of breast cancer in this study sample.

Expected Frequencies The expected frequencies, calculated under the assumption of no association between alcohol consumption frequency and breast cancer status, are:

For “Fewer than 4 drinks per week”: 334.34 for cancer and 653.66 for control. For “4 or more drinks per week”: 199.66 for cancer and 390.34 for control.

Confidence Interval for Odds Ratio: For this I ran a Fisher’s Exact test. The results:
Fisher’s Exact:

# Perform Fisher's Exact Test for Count Data
fisher_result <- fisher.test(Drinks)

# Print the results 
fisher_result

## 
##  Fisher's Exact Test for Count Data
## 
## data:  Drinks
## p-value = 0.6601
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.7611771 1.1842188
## sample estimates:
## odds ratio 
##  0.9489964

# Extract and print just the confidence interval for the odds ratio
fisher_result$conf.int

## [1] 0.7611771 1.1842188
## attr(,"conf.level")
## [1] 0.95

Final Report: After running Fisher’s Exact Test on my data, I found that the p-value is approximately 0.6601, which again indicates that there’s no statistically significant association between alcohol consumption (less than versus four or more drinks per week) and breast cancer risk, given that the p-value is much higher than the conventional alpha level of 0.05.

The 95 percent confidence interval for the odds ratio ranges from about 0.761 to 1.184. This means that with 95% confidence, the true odds ratio of having breast cancer associated with drinking fewer than four versus four or more drinks per week falls within this range. Since the interval includes 1, this further supports the conclusion that there is no significant association between alcohol consumption at the specified levels and breast cancer risk in this sample.

The sample estimates provide an odds ratio of approximately 0.949, suggesting that, in this sample, drinking fewer than four drinks per week does not significantly increase or decrease the risk of breast cancer compared to drinking four or more drinks per week. This interpretation aligns with the conclusion drawn from both the chi-squared and Fisher’s exact test results, underlining the lack of evidence for a significant link between the specified levels of alcohol consumption and breast cancer risk based on the data I analyzed.