Problem 1: Chi-Square Goodness-of-Fit Test

ACTN3 has two alleles: R and X. In a sample of 436 people,

244 were classified as R

192 were classified as X

We want to test whether the two alleles are equally likely.

Hypotheses

\(H_0\): \(p_R = p_X = 0.5\)

\(H_a\): At least one proportion differs from 0.5

# Observed counts

observed_actn3 <- c(R = 244, X = 192)

# Expected proportions under H0

p_null_actn3 <- c(0.5, 0.5)

# Total sample size

n_total <- sum(observed_actn3)

# Expected counts

expected_actn3 <- p_null_actn3 * n_total
observed_actn3
##   R   X 
## 244 192
expected_actn3
## [1] 218 218

Check that all expected counts are greater than 5, so the chi-square goodness-of-fit test is appropriate.

chisq_actn3 <- chisq.test(observed_actn3, p = p_null_actn3)
chisq_actn3
## 
##  Chi-squared test for given probabilities
## 
## data:  observed_actn3
## X-squared = 6.2018, df = 1, p-value = 0.01276
chisq_actn3$p.value
## [1] 0.01276179

Interpretation

Test statistic: \(\chi^2 \approx 6.20\)

P-value: approximately 0.0128

Since the p-value is less than 0.05, we reject the null hypothesis.

Conclusion: There is evidence that the two alleles R and X are not equally likely in this population. The observed distribution of ACTN3 alleles differs significantly from a 50/50 split.

Problem 2: Chi-Square Test of Association

Who Is More Likely to Take Vitamins: Males or Females?

We will use the dataset NutritionStudy and the variables:

VitaminUse (Num, Occasional, Regular)

Sex (Male, Female)

Note: The problem text said “Gender,” but in this dataset the column is named Sex.

Hypotheses

\(H_0\): Vitamin use is not associated with sex.

\(H_a\): Vitamin use is associated with sex.

# Load the NutritionStudy dataset (make sure NutritionStudy.csv is in your working directory)
#setwd("~/Data 101/HW8")
NutritionStudy <- read_csv("NutritionStudy.csv")
## Rows: 315 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (3): Smoke, Sex, VitaminUse
## dbl (14): ID, Age, Quetelet, Vitamin, Calories, Fat, Fiber, Alcohol, Cholest...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(NutritionStudy)
## Rows: 315
## Columns: 17
## $ ID            <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…
## $ Age           <dbl> 64, 76, 38, 40, 72, 40, 65, 58, 35, 55, 66, 40, 57, 66, …
## $ Smoke         <chr> "No", "No", "No", "No", "No", "No", "No", "No", "No", "N…
## $ Quetelet      <dbl> 21.4838, 23.8763, 20.0108, 25.1406, 20.9850, 27.5214, 22…
## $ Vitamin       <dbl> 1, 1, 2, 3, 1, 3, 2, 1, 3, 3, 1, 2, 3, 1, 3, 3, 1, 1, 3,…
## $ Calories      <dbl> 1298.8, 1032.5, 2372.3, 2449.5, 1952.1, 1366.9, 2213.9, …
## $ Fat           <dbl> 57.0, 50.1, 83.6, 97.5, 82.6, 56.0, 52.0, 63.4, 57.8, 39…
## $ Fiber         <dbl> 6.3, 15.8, 19.1, 26.5, 16.2, 9.6, 28.7, 10.9, 20.3, 15.5…
## $ Alcohol       <dbl> 0.0, 0.0, 14.1, 0.5, 0.0, 1.3, 0.0, 0.0, 0.6, 0.0, 1.0, …
## $ Cholesterol   <dbl> 170.3, 75.8, 257.9, 332.6, 170.8, 154.6, 255.1, 214.1, 2…
## $ BetaDiet      <dbl> 1945, 2653, 6321, 1061, 2863, 1729, 5371, 823, 2895, 330…
## $ RetinolDiet   <dbl> 890, 451, 660, 864, 1209, 1439, 802, 2571, 944, 493, 535…
## $ BetaPlasma    <dbl> 200, 124, 328, 153, 92, 148, 258, 64, 218, 81, 184, 91, …
## $ RetinolPlasma <dbl> 915, 727, 721, 615, 799, 654, 834, 825, 517, 562, 935, 7…
## $ Sex           <chr> "Female", "Female", "Female", "Female", "Female", "Femal…
## $ VitaminUse    <chr> "Regular", "Regular", "Occasional", "No", "Regular", "No…
## $ PriorSmoke    <dbl> 2, 1, 2, 2, 1, 2, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2, 2, 2, 1,…

Create the contingency table:

vit_sex_table <- table(NutritionStudy$VitaminUse, NutritionStudy$Sex)
vit_sex_table
##             
##              Female Male
##   No             87   24
##   Occasional     77    5
##   Regular       109   13

Perform the chi-square test of association:

chi_vit_sex <- chisq.test(vit_sex_table)
chi_vit_sex
## 
##  Pearson's Chi-squared test
## 
## data:  vit_sex_table
## X-squared = 11.071, df = 2, p-value = 0.003944
chi_vit_sex$p.value
## [1] 0.003944277

Check expected counts to make sure assumptions are met:

chi_vit_sex$expected
##             
##                 Female     Male
##   No          96.20000 14.80000
##   Occasional  71.06667 10.93333
##   Regular    105.73333 16.26667

Interpretation

From the real data:

Contingency table (VitaminUse by Sex):

VitaminUse Female Male No 87 24 Occasional 71 11 Regular 106 16

Chi-square statistic: \(\chi^2 \approx 11.07\)

P-value: approximately 0.00394

Since p ≈ 0.0039 < 0.05, we reject the null hypothesis.

Conclusion: There is a significant association between vitamin use and sex. In this sample, vitamin use patterns differ between males and females

Problem 3: ANOVA — Fish Gill Beat Rates and Calcium Level

Professor Baldwin studied how water calcium level might affect fish gill beat rate (beats per minute).

Factor: Calcium level in water

Low (0.71 mg/L)

Medium (5.24 mg/L)

High (18.24 mg/L)

Response: GillRate (beats per minute)

Each group: about 30 fish per calcium level

Dataset: FishGills3

Hypotheses

\(H_0\): \(\mu_{Low} = \mu_{Medium} = \mu_{High}\)

\(H_a\): Not all mean gill rates are equal

# Load the FishGills3 dataset

FishGills3 <- read_csv("FishGills3.csv")
## Rows: 90 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Calcium
## dbl (1): GillRate
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
FishGills3$Calcium <- as.factor(FishGills3$Calcium)

head(FishGills3)
## # A tibble: 6 × 2
##   Calcium GillRate
##   <fct>      <dbl>
## 1 Low           55
## 2 Low           63
## 3 Low           78
## 4 Low           85
## 5 Low           65
## 6 Low           98
summary(FishGills3)
##    Calcium      GillRate    
##  High  :30   Min.   :33.00  
##  Low   :30   1st Qu.:48.00  
##  Medium:30   Median :62.50  
##              Mean   :61.78  
##              3rd Qu.:72.00  
##              Max.   :98.00

Compute group means:

FishGills3 %>%
group_by(Calcium) %>%
summarise(
mean_gill = mean(GillRate),
sd_gill   = sd(GillRate),
n         = n()
)
## # A tibble: 3 × 4
##   Calcium mean_gill sd_gill     n
##   <fct>       <dbl>   <dbl> <int>
## 1 High         58.2    13.8    30
## 2 Low          68.5    16.2    30
## 3 Medium       58.7    14.3    30

Run the ANOVA:

anova_model <- aov(GillRate ~ Calcium, data = FishGills3)
summary(anova_model)
##             Df Sum Sq Mean Sq F value Pr(>F)  
## Calcium      2   2037  1018.6   4.648 0.0121 *
## Residuals   87  19064   219.1                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

F-statistic: F ≈ 4.65

P-value: approximately 0.0121

Since p ≈ 0.012 < 0.05, we reject the null hypothesis.

Conclusion: There is evidence that mean gill beat rate differs among at least some of the calcium levels.

Post-hoc Comparison: Tukey’s HSD

Since ANOVA is significant, we use Tukey’s HSD to see which calcium levels differ.

TukeyHSD(anova_model)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = GillRate ~ Calcium, data = FishGills3)
## 
## $Calcium
##                  diff        lwr        upr     p adj
## Low-High    10.333333   1.219540 19.4471264 0.0222533
## Medium-High  0.500000  -8.613793  9.6137931 0.9906108
## Medium-Low  -9.833333 -18.947126 -0.7195402 0.0313247

Interpretation of Tukey’s HSD (from your real data):

The comparison between Low and High calcium levels is significant.

Other pairwise comparisons (e.g., Low vs Medium, Medium vs High) are not statistically significant at the 0.05 level.

Overall Conclusion Calcium level in the water has a significant effect on mean gill beat rate. In particular, fish in high-calcium water have significantly different gill rates compared with fish in low-calcium water, based on the Tukey post-hoc results.