#Problem 1 ACTN3 is a gene that encodes alpha-actinin-3, a protein in fast-twitch muscle fibers, important for activities like sprinting and weightlifting. The gene has two main alleles: R (functional) and X (non-functional). The R allele is linked to better performance in strength, speed, and power sports, while the X allele is associated with endurance due to a greater reliance on slow-twitch fibers. However, athletic performance is influenced by various factors, including training, environment, and other genes, making the ACTN3 genotype just one contributing factor. A study examines the ACTN3 genetic alleles R and X, also associated with fast-twitch muscles. Of the 436 people in this sample, 244 were classified as R, and 192 were classified as X. Does the sample provide evidence that the two options are not equally likely? Conduct the test using a chi-square goodness-of-fit test.

#Hypotheses H₀: R and X alleles are equally likely (expected 50%–50%).

H₁: R and X alleles are not equally likely.

#Calculations

# Observed values
O <- c(244, 192)

# Expected values (equal likelihood)
E <- c(218, 218)

# Chi-square test statistic
chi_square <- sum((O - E)^2 / E)
chi_square
## [1] 6.201835
# Degrees of freedom
df <- length(O) - 1

# Calculate p-value manually
p_value <- pchisq(chi_square, df = df, lower.tail = FALSE)
p_value
## [1] 0.01276179

#Conclusion Based on the chi-square goodness-of-fit test, the p-value is approximately 0.013, which is less than the significance level of 0.05. Therefore, we reject the null hypothesis. This means there is significant evidence that the R and X alleles are not equally likely in the population.

#Problem 2

Who Is More Likely to Take Vitamins: Males or Females? The dataset NutritionStudy contains, among other things, information about vitamin use and the gender of the participants. Is there a significant association between these two variables? Use the variables VitaminUse and Gender to conduct a chi-square analysis and give the results. (Test for Association)

#Hypotheses Null hypothesis (H₀): There is no association between gender and vitamin use (they are independent).

Alternative hypothesis (H₁): There is an association between gender and vitamin use (they are not independent).

#Import dataset

NutritionStudy <- read.csv("NutritionStudy.csv", stringsAsFactors = TRUE)

# Check the first few rows of the dataset
head(NutritionStudy)
##   ID Age Smoke Quetelet Vitamin Calories  Fat Fiber Alcohol Cholesterol
## 1  1  64    No  21.4838       1   1298.8 57.0   6.3     0.0       170.3
## 2  2  76    No  23.8763       1   1032.5 50.1  15.8     0.0        75.8
## 3  3  38    No  20.0108       2   2372.3 83.6  19.1    14.1       257.9
## 4  4  40    No  25.1406       3   2449.5 97.5  26.5     0.5       332.6
## 5  5  72    No  20.9850       1   1952.1 82.6  16.2     0.0       170.8
## 6  6  40    No  27.5214       3   1366.9 56.0   9.6     1.3       154.6
##   BetaDiet RetinolDiet BetaPlasma RetinolPlasma    Sex VitaminUse PriorSmoke
## 1     1945         890        200           915 Female    Regular          2
## 2     2653         451        124           727 Female    Regular          1
## 3     6321         660        328           721 Female Occasional          2
## 4     1061         864        153           615 Female         No          2
## 5     2863        1209         92           799 Female    Regular          1
## 6     1729        1439        148           654 Female         No          2
# Check the structure of the dataset to confirm variables
str(NutritionStudy)
## 'data.frame':    315 obs. of  17 variables:
##  $ ID           : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Age          : int  64 76 38 40 72 40 65 58 35 55 ...
##  $ Smoke        : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Quetelet     : num  21.5 23.9 20 25.1 21 ...
##  $ Vitamin      : int  1 1 2 3 1 3 2 1 3 3 ...
##  $ Calories     : num  1299 1032 2372 2450 1952 ...
##  $ Fat          : num  57 50.1 83.6 97.5 82.6 56 52 63.4 57.8 39.6 ...
##  $ Fiber        : num  6.3 15.8 19.1 26.5 16.2 9.6 28.7 10.9 20.3 15.5 ...
##  $ Alcohol      : num  0 0 14.1 0.5 0 1.3 0 0 0.6 0 ...
##  $ Cholesterol  : num  170.3 75.8 257.9 332.6 170.8 ...
##  $ BetaDiet     : int  1945 2653 6321 1061 2863 1729 5371 823 2895 3307 ...
##  $ RetinolDiet  : int  890 451 660 864 1209 1439 802 2571 944 493 ...
##  $ BetaPlasma   : int  200 124 328 153 92 148 258 64 218 81 ...
##  $ RetinolPlasma: int  915 727 721 615 799 654 834 825 517 562 ...
##  $ Sex          : Factor w/ 2 levels "Female","Male": 1 1 1 1 1 1 1 1 1 1 ...
##  $ VitaminUse   : Factor w/ 3 levels "No","Occasional",..: 3 3 2 1 3 1 2 3 1 1 ...
##  $ PriorSmoke   : int  2 1 2 2 1 2 1 1 1 2 ...

#Calculations

# 2. Create contingency table

vitamin_gender_table <- table(NutritionStudy$VitaminUse, NutritionStudy$Sex)
# Display it
vitamin_gender_table
##             
##              Female Male
##   No             87   24
##   Occasional     77    5
##   Regular       109   13
chisq_test <- chisq.test(vitamin_gender_table)

# Show expected frequencies
chisq_test$expected
##             
##                 Female     Male
##   No          96.20000 14.80000
##   Occasional  71.06667 10.93333
##   Regular    105.73333 16.26667
#Chi-Square Components: (O − E)² / E
(O_minus_E_sq_over_E <- (vitamin_gender_table - chisq_test$expected)^2 / chisq_test$expected)
##             
##                 Female      Male
##   No         0.8798337 5.7189189
##   Occasional 0.4953721 3.2199187
##   Regular    0.1009248 0.6560109
#Chi-Square Statistic, df, and p-value
chisq_test$statistic   # Chi-square value
## X-squared 
##  11.07098
chisq_test$parameter   # Degrees of freedom
## df 
##  2
chisq_test$p.value     # p-value
## [1] 0.003944277
#Full Chi-Square Test Output
chisq_test
## 
##  Pearson's Chi-squared test
## 
## data:  vitamin_gender_table
## X-squared = 11.071, df = 2, p-value = 0.003944

#Interpretation There is a statistically significant association between gender and vitamin use, χ²(2) = 11.07, p = 0.0039.

Females are more likely than males to report taking vitamins.

#Problem 3

Most fish use gills for respiration in water, and researchers can observe how fast a fish’s gill cover beats to study ventilation, much like we might observe a person’s breathing rate. Professor Brad Baldwin is interested in how water chemistry might affect gill beat rates. In one experiment, he randomly assigned fish to tanks with different calcium levels. One tank was low in calcium (0.71 mg/L), the second tank had a medium amount (5.24 mg/L), and the third tank had water with a high calcium level (18.24 mg/L). His research team counted gill rates (beats per minute) for samples of 30 fish in each tank. The results are stored in FishGills3. Perform ANOVA test to see if the mean gill rate differs depending on the calcium level of the water.

#Hypotheses

Null Hypothesis (H0):

The mean gill rate is the same for all calcium levels.

μ_Low = μ_Medium = μ_High

Alternative Hypothesis (H1):

At least one mean gill rate is different.

#Import Dataset

# Load the dataset
FishGills3 <- read.csv("FishGills3.csv", stringsAsFactors = TRUE)
# Look at the first few rows
head(FishGills3)
##   Calcium GillRate
## 1     Low       55
## 2     Low       63
## 3     Low       78
## 4     Low       85
## 5     Low       65
## 6     Low       98
# Run a simple one-way ANOVA
anova_result <- aov(GillRate ~ Calcium, data = FishGills3)

# Show the ANOVA table
summary(anova_result)
##             Df Sum Sq Mean Sq F value Pr(>F)  
## Calcium      2   2037  1018.6   4.648 0.0121 *
## Residuals   87  19064   219.1                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#Interpretation Because the p-value = 0.0121 is less than 0.05, we reject the null hypothesis. There is significant evidence that mean gill rate differs among the three calcium levels. At least one calcium group (Low, Medium, High) has a different average gill beat rate.