nutrition_study <- read.csv("NutritionStudy.csv")
Fish_Gills <- read.csv("FishGills3.csv")

Problem 1: ACTN3 Alleles (Chi-Square Goodness-of-Fit)

Of the 436 people in the sample, 244 have allele R and 192 have allele X. Test whether the two alleles are equally likely in the population at a = 0.05.

Problem 1 Hypotheses

\(H_0\):\(p_r\) = \(p_x\) = 0.5 \(H_a\): \(p_r\) \(\neq\) 0.5

# Observed counts and expected values

observed <- c(244, 192)
theoritical_prop <- c(0.5, 0.5)

expected_values <- theoritical_prop * sum(observed)
expected_values
## [1] 218 218
# Chi-square test and p-value

chisq_actn3 <- chisq.test(observed)
chisq_actn3
## 
##  Chi-squared test for given probabilities
## 
## data:  observed
## X-squared = 6.2018, df = 1, p-value = 0.01276
# Results

cat("Chi-square Test for ACTN3 Alleles\n")
## Chi-square Test for ACTN3 Alleles
cat("Test statistic:", round(chisq_actn3$statistic, 3), "\n")
## Test statistic: 6.202
cat("Degrees of freedom:", chisq_actn3$parameter, "\n")
## Degrees of freedom: 1
cat("P-value:", signif(chisq_actn3$p.value, 4), "\n")
## P-value: 0.01276

Problem 1 Conclusion

The chi-square test yielded a p-value of 0.0128, which is below a = 0.05. Therefore, we reject the null hypothesis and conclude that the R and X alleles are not equally likely in the population.

Problem 2: Vitamin Use and Gender

We want to determine whether vitamin use is associated with gender (Sex) in the NutritionStudy dataset.

Problem 2 Hypotheses and Contingency Table

\(H_0\): Vitamin use and gender are independent (no association).

\(H_A\): Vitamin use and gender are associated (there is a relationship).

# Contingency table of VitaminUse by Sex

tab_vit_sex <- table(nutrition_study$VitaminUse, nutrition_study$Sex)
tab_vit_sex
##             
##              Female Male
##   No             87   24
##   Occasional     77    5
##   Regular       109   13
# Chi-square test and p-value

chisq_vit_sex <- chisq.test(tab_vit_sex)
chisq_vit_sex
## 
##  Pearson's Chi-squared test
## 
## data:  tab_vit_sex
## X-squared = 11.071, df = 2, p-value = 0.003944
# Results

cat("Chi-square Test for Association: Vitamin Use vs Sex\n")
## Chi-square Test for Association: Vitamin Use vs Sex
cat("--------------------------------------------------\n")
## --------------------------------------------------
cat("Test statistic:", round(chisq_vit_sex$statistic, 3), "\n")
## Test statistic: 11.071
cat("Degrees of freedom:", chisq_vit_sex$parameter, "\n")
## Degrees of freedom: 2
cat("P-value:", signif(chisq_vit_sex$p.value, 6), "\n")
## P-value: 0.00394428

Problem 2 Conclusion

The chi-square test gives a test statistic of approximately 11.07 with 2 degrees of freedom and a p-value of about 0.00394. Because this p-value is less than α = 0.05, we reject the null hypothesis and conclude that there is statistically significant evidence of an association between vitamin use and gender in this sample.

Problem 3: Calcium Level and Fish Gill Rate

We want to determine whether mean gill rate differs among three calcium levels (Low, Medium, High) in the FishGills3 dataset.

Problem 3 Hypotheses and Anova test

\(H_0\): \(\mu_{Low} = \mu_{Medium} = \mu_{High}\)
(the mean gill rate is the same for all calcium levels)

\(H_A\): At least one mean gill rate differs from the others.

# Make sure Calcium is treated as a factor and perform check
Fish_Gills$Calcium <- as.factor(Fish_Gills$Calcium)

str(Fish_Gills)
## 'data.frame':    90 obs. of  2 variables:
##  $ Calcium : Factor w/ 3 levels "High","Low","Medium": 2 2 2 2 2 2 2 2 2 2 ...
##  $ GillRate: int  55 63 78 85 65 98 68 84 44 87 ...
table(Fish_Gills$Calcium)
## 
##   High    Low Medium 
##     30     30     30
# Anova test

anova_gill <- aov(GillRate ~ Calcium, data = Fish_Gills)
summary(anova_gill)
##             Df Sum Sq Mean Sq F value Pr(>F)  
## Calcium      2   2037  1018.6   4.648 0.0121 *
## Residuals   87  19064   219.1                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Results

anova_summary <- summary(anova_gill)

F_val       <- anova_summary[[1]]$`F value`[1]
p_val       <- anova_summary[[1]]$`Pr(>F)`[1]
df_between  <- anova_summary[[1]]$Df[1]
df_within   <- anova_summary[[1]]$Df[2]

cat("One-way ANOVA: GillRate ~ Calcium\n")
## One-way ANOVA: GillRate ~ Calcium
cat("---------------------------------\n")
## ---------------------------------
cat("F-statistic:", round(F_val, 3), "\n")
## F-statistic: 4.648
cat("Degrees of freedom (between, within):", df_between, ",", df_within, "\n")
## Degrees of freedom (between, within): 2 , 87
cat("P-value:", signif(p_val, 6), "\n")
## P-value: 0.0120771

Problem 3 Conclusion

The one-way ANOVA gives an F-statistic of approximately 4.65 with degrees of freedom (2, 87) and a p-value of about 0.0121. Because this p-value is less than α = 0.05, we reject the null hypothesis and conclude that there is statistically significant evidence that mean gill rate differs among at least some of the calcium levels.