Problem 1: Chi-Square Goodness-of-Fit Test: ACTN3 Alleles

Research Question: Does the sample provide evidence that the R and X alleles of the ACTN3 gene are not equally likely in the population?

Hypotheses:

# Observed counts
observed <- c(R = 244, X = 192)
n <- sum(observed)

# Expected counts under equal probability
expected <- c(R = n * 0.5, X = n * 0.5)

# Chi-square goodness-of-fit test
result1 <- chisq.test(observed, p = c(0.5, 0.5))
result1
## 
##  Chi-squared test for given probabilities
## 
## data:  observed
## X-squared = 6.2018, df = 1, p-value = 0.01276
allele_df <- data.frame(
  Allele = c("R", "X"),
  Observed = c(244, 192),
  Expected = c(218, 218)
)

allele_long <- reshape(allele_df, varying = c("Observed", "Expected"),
                       v.names = "Count", timevar = "Type",
                       times = c("Observed", "Expected"), direction = "long")

ggplot(allele_long, aes(x = Allele, y = Count, fill = Type)) +
  geom_bar(stat = "identity", position = "dodge", width = 0.6) +
  scale_fill_manual(values = c("Observed" = "#4E79A7", "Expected" = "#F28E2B")) +
  labs(title = "ACTN3 Allele Counts: Observed vs. Expected",
       x = "Allele", y = "Count", fill = "") +
  theme_minimal(base_size = 13) +
  geom_hline(yintercept = 218, linetype = "dashed", color = "gray50")

P-value: 0.0128

Conclusion: The p-value of 0.0128 is less than the standard significance level of \(\alpha = 0.05\). Therefore, we reject the null hypothesis. The sample provides statistically significant evidence that the R and X alleles of the ACTN3 gene are not equally likely in the population. The R allele appears more prevalent than expected under equal probability.


Problem 2: Chi-Square Test of Association: Vitamin Use and Sex

Research Question: Is there a significant association between vitamin use and sex in this sample?

Hypotheses:

# Load the dataset
nutrition <- read.csv("C:/Users/noahb/Downloads/NutritionStudy.csv")

# Contingency table using VitaminUse and Sex (Gender variable in this dataset)
vitamin_table <- table(nutrition$VitaminUse, nutrition$Sex)
vitamin_table
##             
##              Female Male
##   No             87   24
##   Occasional     77    5
##   Regular       109   13
# Chi-square test of association
result2 <- chisq.test(vitamin_table)
result2
## 
##  Pearson's Chi-squared test
## 
## data:  vitamin_table
## X-squared = 11.071, df = 2, p-value = 0.003944
nutrition$VitaminUse <- factor(nutrition$VitaminUse, levels = c("No", "Occasional", "Regular"))

ggplot(nutrition, aes(x = Sex, fill = VitaminUse)) +
  geom_bar(position = "fill", width = 0.6) +
  scale_fill_manual(values = c("No" = "#E15759", "Occasional" = "#F28E2B", "Regular" = "#4E79A7")) +
  scale_y_continuous(labels = scales::percent) +
  labs(title = "Vitamin Use by Sex",
       x = "Sex", y = "Proportion", fill = "Vitamin Use") +
  theme_minimal(base_size = 13)

P-value: 0.0039

Conclusion: The p-value of 0.0039 is less than \(\alpha = 0.05\). Therefore, we reject the null hypothesis. The data provide statistically significant evidence of an association between vitamin use and sex. Females in this sample appear more likely to use vitamins than males.


Problem 3: One-Way ANOVA: Fish Gill Rate by Calcium Level

Research Question: Does the mean gill beat rate of fish differ significantly across tanks with low, medium, and high calcium levels?

Hypotheses:

# Load the dataset
gills <- read.csv("C:/Users/noahb/Downloads/FishGills3.csv")

# Ensure Calcium is treated as a factor
gills$Calcium <- factor(gills$Calcium, levels = c("Low", "Medium", "High"))

# Summary statistics by group
tapply(gills$GillRate, gills$Calcium, summary)
## $Low
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   44.00   55.50   65.00   68.50   84.75   98.00 
## 
## $Medium
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   33.00   46.00   59.50   58.67   68.75   83.00 
## 
## $High
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   37.00   45.75   58.50   58.17   68.00   85.00
# One-way ANOVA
result3 <- aov(GillRate ~ Calcium, data = gills)
summary(result3)
##             Df Sum Sq Mean Sq F value Pr(>F)  
## Calcium      2   2037  1018.6   4.648 0.0121 *
## Residuals   87  19064   219.1                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Extract p-value
anova_summary <- summary(result3)
p_value3 <- anova_summary[[1]]$`Pr(>F)`[1]
p_value3
## [1] 0.01207706
ggplot(gills, aes(x = Calcium, y = GillRate, fill = Calcium)) +
  geom_boxplot(width = 0.5, outlier.shape = 19, outlier.color = "gray40") +
  geom_jitter(width = 0.15, alpha = 0.4, size = 1.5) +
  scale_fill_manual(values = c("Low" = "#76B7B2", "Medium" = "#4E79A7", "High" = "#F28E2B")) +
  labs(title = "Fish Gill Beat Rate by Water Calcium Level",
       x = "Calcium Level", y = "Gill Rate (beats per minute)") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none")

P-value: 0.0121

Conclusion: The p-value of 0.0121 is less than \(\alpha = 0.05\). Therefore, we reject the null hypothesis. The data provide statistically significant evidence that mean gill beat rate differs across at least one pair of calcium level groups. Water calcium concentration appears to have a significant effect on fish ventilation rate.