setwd("~/Downloads/Data 101 Course materials/Data Sets")
df3<-read.csv("FishGills3.csv")

df2<- read.csv("NutritionStudy.csv")

Question 1: A study examines the ACTN3 genetic alleles R and X, also associated with fast-twitch muscles. Of the 436 people in this sample, 244 were classified as R, and 192 were classified as X. Does the sample provide evidence that the two options are not equally likely? Conduct the test using a chi-square goodness-of-fit test.

expected <- 436*218
observed <- c((244/436), (192/436))
theoritical_prop <- (218/436)

Hypothesis

\(H_0\): \(p_A\) = \(p_B\)

\(H_a\): at least one \(p_i\) \(\neq\) 0

expected_values <- theoritical_prop*sum(observed) 
expected_values
## [1] 0.5
chisq.test(observed)
## Warning in chisq.test(observed): Chi-squared approximation may be incorrect
## 
##  Chi-squared test for given probabilities
## 
## data:  observed
## X-squared = 0.014224, df = 1, p-value = 0.9051

Based on the p-value, we cannot reject the idea that the outcomes are equally likely.

Problem 2: Who Is More Likely to Take Vitamins: Males or Females? The dataset NutritionStudy contains, among other things, information about vitamin use and the gender of the participants. Is there a significant association between these two variables? Use the variables VitaminUse and Gender to conduct a chi-square analysis and give the results. (Test for Association)

observed_dataset<- table(df2$VitaminUse, df2$Sex)
observed_dataset
##             
##              Female Male
##   No             87   24
##   Occasional     77    5
##   Regular       109   13

\(H_0\) : VitaminUse is not associated with Sex \(H_a\) : VitaminUse is associated with Sex

chisq.test(observed_dataset)
## 
##  Pearson's Chi-squared test
## 
## data:  observed_dataset
## X-squared = 11.071, df = 2, p-value = 0.003944

Based on the p value we reject the null hypothesis and we have evidence that vitamin use is associated with sex.

Problem 3: Most fish use gills for respiration in water, and researchers can observe how fast a fish’s gill cover beats to study ventilation, much like we might observe a person’s breathing rate. Professor Brad Baldwin is interested in how water chemistry might affect gill beat rates. In one experiment, he randomly assigned fish to tanks with different calcium levels. One tank was low in calcium (0.71 mg/L), the second tank had a medium amount (5.24 mg/L), and the third tank had water with a high calcium level (18.24 mg/L). His research team counted gill rates (beats per minute) for samples of 30 fish in each tank. The results are stored in FishGills3. Perform ANOVA test to see if the mean gill rate differs depending on the calcium level of the water.

Hypothesis

\(H_0\): \(\mu_1\) = \(\mu_2\)

\(H_a\): \(\mu_i\) are not equal

data_calcium <- data.frame(
  Calcium = c(rep("Low", 30), rep("Medium", 30), rep("High", 30)),
  GillRate = c(55, 63, 78, 85, 65, 98, 68, 84, 44, 87, 48, 86, 93, 64, 83, 79,85,65,88,47,68,86, 57, 53, 58, 47, 62, 64, 50, 45,  38, 42, 63, 46, 55, 63, 36, 58, 73, 69, 55, 68, 63, 73, 45, 79, 41, 83, 60, 48, 59, 33, 67, 43, 57, 72, 46, 74, 68, 83,  59, 45, 63, 52, 59, 78, 72, 53, 69, 68, 57, 63, 68, 83, 38, 85, 68, 63, 58, 48, 42, 42, 80, 42, 52, 37, 57, 62, 40, 42)

   
  
)

head(data_calcium)
##   Calcium GillRate
## 1     Low       55
## 2     Low       63
## 3     Low       78
## 4     Low       85
## 5     Low       65
## 6     Low       98
anova_result <- aov(GillRate ~ Calcium, data = data_calcium)

anova_result
## Call:
##    aov(formula = GillRate ~ Calcium, data = data_calcium)
## 
## Terms:
##                   Calcium Residuals
## Sum of Squares   2037.222 19064.333
## Deg. of Freedom         2        87
## 
## Residual standard error: 14.80305
## Estimated effects may be unbalanced
summary(anova_result)
##             Df Sum Sq Mean Sq F value Pr(>F)  
## Calcium      2   2037  1018.6   4.648 0.0121 *
## Residuals   87  19064   219.1                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Based on the p value we reject the null hypothesis and we have evidence that the the mean gill rate differs depending on the calcium level of the water.