library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.2.0 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.2 ✔ tibble 3.3.1
## ✔ lubridate 1.9.5 ✔ tidyr 1.3.2
## ✔ purrr 1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Problem 1: ACTN3 is a gene that encodes alpha-actinin-3, a protein in fast-twitch muscle fibers, important for activities like sprinting and weightlifting. The gene has two main alleles: R (functional) and X (non-functional). The R allele is linked to better performance in strength, speed, and power sports, while the X allele is associated with endurance due to a greater reliance on slow-twitch fibers. However, athletic performance is influenced by various factors, including training, environment, and other genes, making the ACTN3 genotype just one contributing factor.
A study examines the ACTN3 genetic alleles R and X, also associated with fast-twitch muscles.Of the 436 people in this sample, 244 were classified as R, and 192 were classified as X. Does the sample provide evidence that the two options are not equally likely? Conduct the test using a chi-square goodness-of-fit test.
\(p_1\) = Proportion classified as R
\(p_2\) = Proportion classified as Z
\(H_0\):\(p_1\) = \(p_2\) = 1/2
\(H_a\): at least one \(p_i\) \(\neq\) 1/2
observed <- c(244,192)
theoretical_prop <- c(0.5,0.5)
expected_values <- sum(observed) * theoretical_prop
expected_values
## [1] 218 218
chisq.test(observed, p = theoretical_prop)
##
## Chi-squared test for given probabilities
##
## data: observed
## X-squared = 6.2018, df = 1, p-value = 0.01276
The p-value is 0.01276 which < α = 0.05. Therefore, the results are significant and we can reject the null. Overall, we conclude that there is evidence that being classified as R or X is not equally likely.
Problem 2: Who Is More Likely to Take Vitamins: Males or Females? The dataset NutritionStudy contains, among other things, information about vitamin use and the gender of the participants. Is there a significant association between these two variables? Use the variables VitaminUse and Gender to conduct a chi-square analysis and give the results. (Test for Association)
#reading the dataset
nutrition <- read_csv("C:/DATA101/NutritionStudy.csv")
## Rows: 315 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Smoke, Sex, VitaminUse
## dbl (14): ID, Age, Quetelet, Vitamin, Calories, Fat, Fiber, Alcohol, Cholest...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
\(H_0\) : Gender is not associated with vitamin use
\(H_a\) : Gender is associated with vitamin use
observed <- table(nutrition$Sex, nutrition$VitaminUse)
observed
##
## No Occasional Regular
## Female 87 77 109
## Male 24 5 13
chisq.test(observed)
##
## Pearson's Chi-squared test
##
## data: observed
## X-squared = 11.071, df = 2, p-value = 0.003944
The p-value is 0.003944 which is < α = 0.05. Therefore there is significance and we can reject the null. Overall, we conclude there is an association between gender and vitamin use.
Problem 3: Most fish use gills for respiration in water, and researchers can observe how fast a fish’s gill cover beats to study ventilation, much like we might observe a person’s breathing rate. Professor Brad Baldwin is interested in how water chemistry might affect gill beat rates. In one experiment, he randomly assigned fish to tanks with different calcium levels. One tank was low in calcium (0.71 mg/L), the second tank had a medium amount (5.24 mg/L), and the third tank had water with a high calcium level (18.24 mg/L). His research team counted gill rates (beats per minute) for samples of 30 fish in each tank. The results are stored in FishGills3. Perform ANOVA test to see if the mean gill rate differs depending on the calcium level of the water.
#reading the dataset
fish_gill <- read_csv("C:/DATA101/FishGills3.csv")
## Rows: 90 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Calcium
## dbl (1): GillRate
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
\(H_0\): \(\mu_L\) = \(\mu_M\) = \(\mu_H\)
\(H_a\): not all \(\mu_i\) are equal
anova_result <- aov(GillRate ~ Calcium, data = fish_gill)
anova_result
## Call:
## aov(formula = GillRate ~ Calcium, data = fish_gill)
##
## Terms:
## Calcium Residuals
## Sum of Squares 2037.222 19064.333
## Deg. of Freedom 2 87
##
## Residual standard error: 14.80305
## Estimated effects may be unbalanced
summary(anova_result)
## Df Sum Sq Mean Sq F value Pr(>F)
## Calcium 2 2037 1018.6 4.648 0.0121 *
## Residuals 87 19064 219.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-value is 0.0121 which is less than α = 0.05 which means there is evidence against the null hypothesis, therefore, we reject the null. Overall, this test suggests that there are differences between Gill Rates among the different calcium levels of the water.