setwd("~/Documents/Data 101")
nutrition_study <- read.csv("NutritionStudy.csv")
fish_gills_3 <- read.csv("FishGills3.csv")

In your markdown answer the following problems. Include the following:  Your hypotheses.  P-value  Conclusion

Problem 1:

ACTN3 is a gene that encodes alpha-actinin-3, a protein in fast-twitch muscle fibers, important for activities like sprinting and weightlifting. The gene has two main alleles: R (functional) and X (non-functional). The R allele is linked to better performance in strength, speed, and power sports, while the X allele is associated with endurance due to a greater reliance on slow-twitch fibers. However, athletic performance is influenced by various factors, including training, environment, and other genes, making the ACTN3 genotype just one contributing factor. A study examines the ACTN3 genetic alleles R and X, also associated with fast-twitch muscles. Of the 436 people in this sample, 244 were classified as R, and 192 were classified as X. Does the sample provide evidence that the two options are not equally likely? Conduct the test using a chi-square goodness-of-fit test.

\(H_0\): \(P_1\) = \(P_2\) = 1/2

\(H_a\): \(P_i\) ≠ 1/2

\(P_1\): The proportion of the people that were classified as R allele.

\(P_2\): The proportion of the poeple that were classified as the X allele.

##Creates a vector for our chi-squared test.

observed <- c(244, 192)

theoritical_prop <- rep(1/2, 2)

##This checks the expected counts checking if we are able to do our chi-squared test. (basic assumtions.)

expected_values <- theoritical_prop*sum(observed) 
expected_values
## [1] 218 218

Both of our expected counts are above 5, which means we are able to do our chi-squared test.

chisq.test(observed)
## 
##  Chi-squared test for given probabilities
## 
## data:  observed
## X-squared = 6.2018, df = 1, p-value = 0.01276

Our p-value is 0.012, which is less than our alpha of 0.05, which means that our results are statiscially significant. Our p-value is < 0.05, so in the context of our problem we reject the null hypothesis. Which means that the sample does provide evidence that the two options are not equally likely.

Problem 2:

Who Is More Likely to Take Vitamins: Males or Females? The dataset NutritionStudy contains, among other things, information about vitamin use and the gender of the participants. Is there a significant association between these two variables? Use the variables VitaminUse and Gender to conduct a chi-square analysis and give the results. (Test for Association)

\(H_0\): One’s gender is not associated with their vitamin use.

\(H_a\): One’s gender is associatied with their vitamin use.

Creating a table to check counts for our chi-squared test.

observed_nutrition_study <- table(nutrition_study$Sex, nutrition_study$VitaminUse)
observed_nutrition_study
##         
##           No Occasional Regular
##   Female  87         77     109
##   Male    24          5      13
chisq.test(observed_nutrition_study)
## 
##  Pearson's Chi-squared test
## 
## data:  observed_nutrition_study
## X-squared = 11.071, df = 2, p-value = 0.003944

Our p-value is 0.003944, which is less than our alpha of 0.05. This means that our results are statistically significant and we reject the null hypothesis. In the context our problem, this means that there is strong association between one’s gender and their vitamin use.

Problem 3:

Most fish use gills for respiration in water, and researchers can observe how fast a fish’s gill cover beats to study ventilation, much like we might observe a person’s breathing rate. Professor Brad Baldwin is interested in how water chemistry might affect gill beat rates. In one experiment, he randomly assigned fish to tanks with different calcium levels. One tank was low in calcium (0.71 mg/L), the second tank had a medium amount (5.24 mg/L), and the third tank had water with a high calcium level (18.24 mg/L). His research team counted gill rates (beats per minute) for samples of 30 fish in each tank. The results are stored in FishGills3. Perform ANOVA test to see if the mean gill rate differs depending on the calcium level of the water.

\(H_0\): \(μ_1\) = \(μ_2\) = \(μ_3\)

\(H_a\): \(μ_i\) is different

\(μ_1\): The mean gill rate (beats per minute) for fish in the low calcium tank.

\(μ_2\): The mean gill rate (beats per minute) for fish in the medium calcium tank.

\(μ_3\): The mean gill rate (beats per minute) for fish in the high calcium tank.

anova_fish_gills_3 <- aov(GillRate ~ Calcium, data = fish_gills_3)

summary(anova_fish_gills_3)
##             Df Sum Sq Mean Sq F value Pr(>F)  
## Calcium      2   2037  1018.6   4.648 0.0121 *
## Residuals   87  19064   219.1                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Our p-value is 0.0121, which is less than our alpha of 0.05. This means that our results are statistaclly significant and we reject the null hypothesis. In the context of our problem, it means that one of our means, either the low, medium, or high calcium means are different.