Loading libraries

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.1     ✔ stringr   1.5.1
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

#$ Setting working directory and loading files

setwd("~/Desktop/Data 101")
fish<- read_csv("fish.csv")
nutrition<-read_csv("nutrition_study.csv")

Problem 1:

ACTN3 is a gene that encodes alpha-actinin-3, a protein in fast-twitch muscle fibers, important for activities like sprinting and weightlifting. The gene has two main alleles: R (functional) and X (non-functional). The R allele is linked to better performance in strength, speed, and power sports, while the X allele is associated with endurance due to a greater reliance on slow-twitch fibers. However, athletic performance is influenced by various factors, including training, environment, and other genes, making the ACTN3 genotype just one contributing factor. A study examines the ACTN3 genetic alleles R and X, also associated with fast-twitch muscles. Of the 436 people in this sample, 244 were classified as R, and 192 were classified as X. Does the sample provide evidence that the two options are not equally likely? Conduct the test using a chi-square goodness-of-fit test

Hypotheses \(H_0\):\(p_1\) = \(p_2\) =1/2(Alleles are equally likely) \(H_0\):\(p_1\) != \(p_2\) (Alleles are not equally likely)

note: I could not find the proper not equal sign

# Observed counts
observed <- c(244,192)

# Null values
theoritical_prop <- rep(1/2, 2) 

Confirming values are more than 5

# Expected values
expected_values <- theoritical_prop*sum(observed) 
expected_values
## [1] 218 218

All values are more than 5 ## Test

chisq.test(observed)
## 
##  Chi-squared test for given probabilities
## 
## data:  observed
## X-squared = 6.2018, df = 1, p-value = 0.01276

P-value and conclusion The p valus is 0.01276 which is less than alpha=0.05 therefore we reject the null hypothesis. There is significant evidence that the R and X alleles are not equally likely in this sample.

Problem 2:

ACTN3 is a gene that encodes alpha-actinin-3, a protein in fast-twitch muscle fibers, important for activities like sprinting and weightlifting. The gene has two main alleles: R (functional) and X (non-functional). The R allele is linked to better performance in strength, speed, and power sports, while the X allele is associated with endurance due to a greater reliance on slow-twitch fibers. However, athletic performance is influenced by various factors, including training, environment, and other genes, making the ACTN3 genotype just one contributing factor. A study examines the ACTN3 genetic alleles R and X, also associated with fast-twitch muscles. Of the 436 people in this sample, 244 were classified as R, and 192 were classified as X. Does the sample provide evidence that the two options are not equally likely? Conduct the test using a chi-square goodness-of-fit test

Hypotheses

\(H_0\) : Vitamin use is not associated with Gender \(H_a\) : Vitamin use is associated with Gender

Creating a table

table<- table(nutrition$VitaminUse, nutrition$Sex)
table
##             
##              Female Male
##   No             87   24
##   Occasional     77    5
##   Regular       109   13

Test

chisq.test(table)
## 
##  Pearson's Chi-squared test
## 
## data:  table
## X-squared = 11.071, df = 2, p-value = 0.003944

P-value and conclusion With a p-value of 0.003944 which is less than the typical significance level of 0.05, there is sufficient evidence to reject the null hypothesis.

Therefore, we conclude that there is a significant association between vitamin use and gender.

Problem 3:

Most fish use gills for respiration in water, and researchers can observe how fast a fish’s gill cover beats to study ventilation, much like we might observe a person’s breathing rate. Professor Brad Baldwin is interested in how water chemistry might affect gill beat rates. In one experiment, he randomly assigned fish to tanks with different calcium levels. One tank was low in calcium (0.71 mg/L), the second tank had a medium amount (5.24 mg/L), and the third tank had water with a high calcium level (18.24 mg/L). His research team counted gill rates (beats per minute) for samples of 30 fish in each tank. The results are stored in FishGills3. Perform ANOVA test to see if the mean gill rate differs depending on the calcium level of the water.

\(H_0\): \(\mu_A\) = \(\mu_B\) = \(\mu_C\)

\(H_a\): not all \(\mu_i\) are equal

Anova test

 anova_result <- aov(GillRate ~ Calcium, data = fish) #Quantative first and fertilizer category secomd 

anova_result
## Call:
##    aov(formula = GillRate ~ Calcium, data = fish)
## 
## Terms:
##                   Calcium Residuals
## Sum of Squares   2037.222 19064.333
## Deg. of Freedom         2        87
## 
## Residual standard error: 14.80305
## Estimated effects may be unbalanced

Summary

summary(anova_result) #to get the p-value
##             Df Sum Sq Mean Sq F value Pr(>F)  
## Calcium      2   2037  1018.6   4.648 0.0121 *
## Residuals   87  19064   219.1                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

P-value and conclusion

The p-value is 0.0121 is less than 0.05 : indicating strong evidence against the null hypothesis. Overall, this test suggests that there are significant differences in mean Gill rates among the different calcium levels.