CHI SQUARED AND ANOVA TEST ASSIGNMENT

In your markdown answer the following problems. Include the following:

Problem 1: ACTN3 is a gene that encodes alpha-actinin-3, a protein in fast-twitch muscle fibers, important for activities like sprinting and weightlifting. The gene has two main alleles: R (functional) and X (non-functional). The R allele is linked to better performance in strength, speed, and power sports, while the X allele is associated with endurance due to a greater reliance on slow-twitch fibers. However, athletic performance is influenced by various factors, including training, environment, and other genes, making the ACTN3 genotype just one contributing factor.

A study examines the ACTN3 genetic alleles R and X, also associated with fast-twitch muscles. Of the 436 people in this sample, 244 were classified as R, and 192 were classified as X. Does the sample provide evidence that the two options are not equally likely? Conduct the test using a chi-square goodness-of-fit test.

# Observed data
observed <- c(244, 192)

# Null values
theoritical_prop <- rep(1/2, 2)

Hypothesis:

\(H_0\): \(p_1\) = \(p_2\) = \(1/2\)

\(H_a\): at least on \(pi\) \(\neq\) \(1/2\)

#Check the expected values to make sure that you can preform the Chi-square test

#Expected values
expected_values <- theoritical_prop*sum(observed) 
expected_values
## [1] 218 218

P-value:

chisq.test(observed)
## 
##  Chi-squared test for given probabilities
## 
## data:  observed
## X-squared = 6.2018, df = 1, p-value = 0.01276

p-value = 0.01276

Conclusion:

With a p-value of 0.01276, we can reject the null, indicating that the observed R and X are Not equally likely, further explaining that the R and X alleles could have different probabilities of occurring to different athletes.

Problem 2: Who Is More Likely to Take Vitamins: Males or Females?* The dataset NutritionStudy contains, among other things, information about vitamin use and the gender of the participants. Is there a significant association between these two variables? Use the variables VitaminUse and Gender to conduct a chi-square analysis and give the results. (Test for Association)

library(tidyverse)

setwd("C:/Users/Joanne G/OneDrive/Data101(Fall 2025)/Datasets")

nutrition_df <- read.csv("NutritionStudy.csv")
summary(nutrition_df)
##        ID             Age           Smoke              Quetelet    
##  Min.   :  1.0   Min.   :19.00   Length:315         Min.   :16.33  
##  1st Qu.: 79.5   1st Qu.:39.00   Class :character   1st Qu.:21.80  
##  Median :158.0   Median :48.00   Mode  :character   Median :24.74  
##  Mean   :158.0   Mean   :50.15                      Mean   :26.16  
##  3rd Qu.:236.5   3rd Qu.:62.50                      3rd Qu.:28.85  
##  Max.   :315.0   Max.   :83.00                      Max.   :50.40  
##     Vitamin         Calories           Fat             Fiber      
##  Min.   :1.000   Min.   : 445.2   Min.   : 14.40   Min.   : 3.10  
##  1st Qu.:1.000   1st Qu.:1338.0   1st Qu.: 53.95   1st Qu.: 9.15  
##  Median :2.000   Median :1666.8   Median : 72.90   Median :12.10  
##  Mean   :1.965   Mean   :1796.7   Mean   : 77.03   Mean   :12.79  
##  3rd Qu.:3.000   3rd Qu.:2100.4   3rd Qu.: 95.25   3rd Qu.:15.60  
##  Max.   :3.000   Max.   :6662.2   Max.   :235.90   Max.   :36.80  
##     Alcohol         Cholesterol       BetaDiet     RetinolDiet    
##  Min.   :  0.000   Min.   : 37.7   Min.   : 214   Min.   :  30.0  
##  1st Qu.:  0.000   1st Qu.:155.0   1st Qu.:1116   1st Qu.: 480.0  
##  Median :  0.300   Median :206.3   Median :1802   Median : 707.0  
##  Mean   :  3.279   Mean   :242.5   Mean   :2186   Mean   : 832.7  
##  3rd Qu.:  3.200   3rd Qu.:308.9   3rd Qu.:2836   3rd Qu.:1037.0  
##  Max.   :203.000   Max.   :900.7   Max.   :9642   Max.   :6901.0  
##    BetaPlasma     RetinolPlasma        Sex             VitaminUse       
##  Min.   :   0.0   Min.   : 179.0   Length:315         Length:315        
##  1st Qu.:  90.0   1st Qu.: 466.0   Class :character   Class :character  
##  Median : 140.0   Median : 566.0   Mode  :character   Mode  :character  
##  Mean   : 189.9   Mean   : 602.8                                        
##  3rd Qu.: 230.0   3rd Qu.: 716.0                                        
##  Max.   :1415.0   Max.   :1727.0                                        
##    PriorSmoke   
##  Min.   :1.000  
##  1st Qu.:1.000  
##  Median :2.000  
##  Mean   :1.638  
##  3rd Qu.:2.000  
##  Max.   :3.000
# data-set used to make a df table using variables "VitaminUse" and "Gender" 
chi_sqr_nutrition_study <- table(nutrition_df$VitaminUse, nutrition_df$Sex)
chi_sqr_nutrition_study
##             
##              Female Male
##   No             87   24
##   Occasional     77    5
##   Regular       109   13

Hypothesis:

\(H_0\): There is NO significant association between VitaminUse and Gender

\(H_a\):There is significant association between VitaminUse and Gender

P-value:

chisq.test(chi_sqr_nutrition_study)
## 
##  Pearson's Chi-squared test
## 
## data:  chi_sqr_nutrition_study
## X-squared = 11.071, df = 2, p-value = 0.003944

P-value = 0.003944

Conclusion:

Rounding to 0.0039 since the p-value is lower than the significance level(assuming it is 0.05), there is significant data to prove that there is association between VitaminUse and Gender… We can also see based on the table more females than males use vitamins more overall.

Problem 3: Most fish use gills for respiration in water, and researchers can observe how fast a fish’s gill cover beats to study ventilation, much like we might observe a person’s breathing rate. Professor Brad Baldwin is interested in how water chemistry might affect gill beat rates. In one experiment, he randomly assigned fish to tanks with different calcium levels. One tank was low in calcium (0.71 mg/L), the second tank had a medium amount (5.24 mg/L), and the third tank had water with a high calcium level (18.24 mg/L). His research team counted gill rates (beats per minute) for samples of 30 fish in each tank. The results are stored in FishGills3. Perform ANOVA test to see if the mean gill rate differs depending on the calcium level of the water.

library(tidyverse)

setwd("C:/Users/Joanne G/OneDrive/Data101(Fall 2025)/Datasets")

fish_gill_df <- read.csv("FishGills3.csv")
summary(fish_gill_df)
##    Calcium             GillRate    
##  Length:90          Min.   :33.00  
##  Class :character   1st Qu.:48.00  
##  Mode  :character   Median :62.50  
##                     Mean   :61.78  
##                     3rd Qu.:72.00  
##                     Max.   :98.00

Hypothesis:

\(H_0\):\(\mu_L\) = \(\mu_C\) = \(\mu_M\)

\(H_a\):not all \(\mu_i\) are equal

fish_gill_anova <- aov(GillRate ~ Calcium, data = fish_gill_df)

fish_gill_anova
## Call:
##    aov(formula = GillRate ~ Calcium, data = fish_gill_df)
## 
## Terms:
##                   Calcium Residuals
## Sum of Squares   2037.222 19064.333
## Deg. of Freedom         2        87
## 
## Residual standard error: 14.80305
## Estimated effects may be unbalanced

P-value:

summary(fish_gill_anova)
##             Df Sum Sq Mean Sq F value Pr(>F)  
## Calcium      2   2037  1018.6   4.648 0.0121 *
## Residuals   87  19064   219.1                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

P-value = 0.0121

TukeyHSD(fish_gill_anova)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = GillRate ~ Calcium, data = fish_gill_df)
## 
## $Calcium
##                  diff        lwr        upr     p adj
## Low-High    10.333333   1.219540 19.4471264 0.0222533
## Medium-High  0.500000  -8.613793  9.6137931 0.9906108
## Medium-Low  -9.833333 -18.947126 -0.7195402 0.0313247

Conclusion:

Since we found that the p-value = 0.0121, we can conclude that there is enough evidence to reject the null hypothesis. This means that the mean of the gill rate does depend on the calcium levels of the water.