In your markdown answer the following problems. Include the following:
Your hypotheses
p-value
conclusion
Problem 1
ACTN3 is a gene that encodes alpha-actinin-3, a protein in fast-twitch muscle fibers, important for activities like sprinting and weightlifting. The gene has two main alleles: R (functional) and X (non-functional). The R allele is linked to better performance in strength, speed, and power sports, while the X allele is associated with endurance due to a greater reliance on slow-twitch fibers. However, athletic performance is influenced by various factors, including training, environment, and other genes, making the ACTN3 genotype just one contributing factor. A study examines the ACTN3 genetic alleles R and X, also associated with fast-twitch muscles. Of the 436 people in this sample, 244 were classified as R, and 192 were classified as X. Does the sample provide evidence that the two options are not equally likely? Conduct the test using a chi-square goodness-of-fit test.
Answer
This is a goodness-of-fit test, with one variable (presence of a specific allele of ACTN3 related to athletic performance) and two categories (presence of R allele and presence of X allele). Of 436 cases, 244 possess R and 192 possess X alleles. We want to test whether the observed difference is statistically significant.
Hypotheses
The null hypothesis is that both proportions are the same, while the alternative hypothesis is that the proportions are statistically different.
\(H_0\): \(p_1\) = \(p_2\) = 1/2
\(H_a\): \(p_1\) \(\neq\) \(p_2\) \(\neq\) 1/2
# Observed counts
observed <- c(244, 192)
# Null values
null_proportions <- c(1/2, 1/2)
Check the expected values to make sure that we can preform the Chi-square test
# Expected values
expected_values <- null_proportions*sum(observed)
expected_values
## [1] 218 218
All are greater than 5 and we can perform the chi-square test.
# Perform chi-squared goodness-of-fit test
chisq.test(observed)
##
## Chi-squared test for given probabilities
##
## data: observed
## X-squared = 6.2018, df = 1, p-value = 0.01276
Based on the p-value of 0.01276 (less than .05), we reject the hypothesis that the outcomes are equally likely and conclude that the probabilities of getting R and X allenes are statistically different with 95% confidence.
Note the null was not included in the chisq.test() because the probabilities are assumed to be the same.
Problem 1 Conclusion
p-value = 0.01276. It is close to zero thus there is significant evidence that the probabilities of obtaining R or X allenes are different.
Problem 2
Who Is More Likely to Take Vitamins: Males or Females? The dataset NutritionStudy contains, among other things, information about vitamin use and the gender of the participants. Is there a significant association between these two variables? Use the variables VitaminUse and Gender to conduct a chi-square analysis and give the results. (Test for Association)
Answer
This is a test of association between gender and likelihood of taking vitamins, using data from the NutritionStudy dataset.
Loading the dataset, assumed to be located in the same directory as the rmd code. Read the dataset and summary its variables.
setwd(getwd()) # set the wd to the location of the rmd code
df<- read_csv("NutritionStudy.csv")
summary(df)
## ID Age Smoke Quetelet
## Min. : 1.0 Min. :19.00 Length:315 Min. :16.33
## 1st Qu.: 79.5 1st Qu.:39.00 Class :character 1st Qu.:21.80
## Median :158.0 Median :48.00 Mode :character Median :24.74
## Mean :158.0 Mean :50.15 Mean :26.16
## 3rd Qu.:236.5 3rd Qu.:62.50 3rd Qu.:28.85
## Max. :315.0 Max. :83.00 Max. :50.40
## Vitamin Calories Fat Fiber
## Min. :1.000 Min. : 445.2 Min. : 14.40 Min. : 3.10
## 1st Qu.:1.000 1st Qu.:1338.0 1st Qu.: 53.95 1st Qu.: 9.15
## Median :2.000 Median :1666.8 Median : 72.90 Median :12.10
## Mean :1.965 Mean :1796.7 Mean : 77.03 Mean :12.79
## 3rd Qu.:3.000 3rd Qu.:2100.4 3rd Qu.: 95.25 3rd Qu.:15.60
## Max. :3.000 Max. :6662.2 Max. :235.90 Max. :36.80
## Alcohol Cholesterol BetaDiet RetinolDiet
## Min. : 0.000 Min. : 37.7 Min. : 214 Min. : 30.0
## 1st Qu.: 0.000 1st Qu.:155.0 1st Qu.:1116 1st Qu.: 480.0
## Median : 0.300 Median :206.3 Median :1802 Median : 707.0
## Mean : 3.279 Mean :242.5 Mean :2186 Mean : 832.7
## 3rd Qu.: 3.200 3rd Qu.:308.9 3rd Qu.:2836 3rd Qu.:1037.0
## Max. :203.000 Max. :900.7 Max. :9642 Max. :6901.0
## BetaPlasma RetinolPlasma Sex VitaminUse
## Min. : 0.0 Min. : 179.0 Length:315 Length:315
## 1st Qu.: 90.0 1st Qu.: 466.0 Class :character Class :character
## Median : 140.0 Median : 566.0 Mode :character Mode :character
## Mean : 189.9 Mean : 602.8
## 3rd Qu.: 230.0 3rd Qu.: 716.0
## Max. :1415.0 Max. :1727.0
## PriorSmoke
## Min. :1.000
## 1st Qu.:1.000
## Median :2.000
## Mean :1.638
## 3rd Qu.:2.000
## Max. :3.000
# create table with the specific variables Sex and VitaminUse and show it
observed_dataset<- table(df$Sex, df$VitaminUse)
observed_dataset
##
## No Occasional Regular
## Female 87 77 109
## Male 24 5 13
Hypotheses
The null hypothesis is that VitaminUse is not associated with Sex, while the alternative hypothesis is that both variables are associated.
\(H_0\): VitaminUse is not associated with Sex.
\(H_a\): VitaminUse is associated with Sex.
# Perform the Chi square test:
chisq.test(observed_dataset)
##
## Pearson's Chi-squared test
##
## data: observed_dataset
## X-squared = 11.071, df = 2, p-value = 0.003944
Based on the p-value smaller than 0.05, therefore, we reject the hypothesis that VitaminUse is not associated with Sex and conclude that the association of VitaminUse with Sex is statistically significant with 95% confidence.
Problem 2 Conclusion
p-value = 0.003944. It is close to zero thus there is statistical significance of the association between VitaminUse and Sex.
Problem 3
Most fish use gills for respiration in water, and researchers can observe how fast a fish’s gill cover beats to study ventilation, much like we might observe a person’s breathing rate. Professor Brad Baldwin is interested in how water chemistry might affect gill beat rates. In one experiment, he randomly assigned fish to tanks with different calcium levels. One tank was low in calcium (0.71 mg/L), the second tank had a medium amount (5.24 mg/L), and the third tank had water with a high calcium level (18.24 mg/L). His research team counted gill rates (beats per minute) for samples of 30 fish in each tank. The results are stored in FishGills3. Perform ANOVA test to see if the mean gill rate differs depending on the calcium level of the water.
Answer
We will do an Anova test to examine whether fish breathing rate (mean gill rates) depend on water calcium level using 3 populations of fish in 3 water tanks containing different levels of calcium.
Loading the FishGills3 dataset, assumed to be located in the same directory as the rmd code. Read the dataset and summary its variables.
setwd(getwd()) # set the wd to the location of the rmd code
df<- read_csv("FishGills3.csv")
## Rows: 90 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Calcium
## dbl (1): GillRate
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Summary the basic stats of df, show the top 6 rows, and calculate the mean gill rates grouped by calcium level
summary(df)
## Calcium GillRate
## Length:90 Min. :33.00
## Class :character 1st Qu.:48.00
## Mode :character Median :62.50
## Mean :61.78
## 3rd Qu.:72.00
## Max. :98.00
df |> group_by (Calcium) |> summarize (mean_gillrate = mean(GillRate))
## # A tibble: 3 × 2
## Calcium mean_gillrate
## <chr> <dbl>
## 1 High 58.2
## 2 Low 68.5
## 3 Medium 58.7
One can observe that in general the breathing rate is inversely proportional to calcium level, with the largest difference appearing between low to medium calcium level.
Hypotheses
The null hypothesis is that the mean gill rates for the 3 calcium levels is the same. The alternative hypothesis that at least one mean gill rate is different from the others.
\(H_0\): \(\mu_L\) = \(\mu_M\) = \(\mu_H\)
\(H_a\): not all \(\mu_i\) are equal
# Perform ANOVA
anova_result <- aov(GillRate ~ Calcium, data = df)
anova_result
## Call:
## aov(formula = GillRate ~ Calcium, data = df)
##
## Terms:
## Calcium Residuals
## Sum of Squares 2037.222 19064.333
## Deg. of Freedom 2 87
##
## Residual standard error: 14.80305
## Estimated effects may be unbalanced
#Do the summary() to get degrees of freedom, F value, and p-value
summary(anova_result)
## Df Sum Sq Mean Sq F value Pr(>F)
## Calcium 2 2037 1018.6 4.648 0.0121 *
## Residuals 87 19064 219.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-value is very small (0.0121): indicating strong evidence against the null hypothesis. The Anova tests suggests that there are significant differences in gill rates depending on the calcium level.
Following the class activity, do the Tukey’s Honestly Significant Difference (HSD) test on the ANOVA model.
# Tukey's Honestly Significant Difference (HSD) test on the ANOVA model
library(tidyverse) #ensure tidyverse is loaded
TukeyHSD(anova_result)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = GillRate ~ Calcium, data = df)
##
## $Calcium
## diff lwr upr p adj
## Low-High 10.333333 1.219540 19.4471264 0.0222533
## Medium-High 0.500000 -8.613793 9.6137931 0.9906108
## Medium-Low -9.833333 -18.947126 -0.7195402 0.0313247
The HSF test shows that the largest difference between GillRate means is produced by the change of calcium level from Low to High, yielding the highest statistical significance (p= 0.022). The difference between GillRate means produced by the change of calcium level from Low to Medium is almost as significant (p= 0.031). The difference between GillRate means produced by the change of calcium level from Medium to High is not statistically significant (p= 0.991).
Problem 3 Conclusion
p-value = 0.0121. The fish breathing rate (mean gill rate) significantly differs depending on the calcium level in water. That dependence is inversely proportional. I checked that out and it makes sense. High calcium levels make gill membranes “harder”, impermeable to vital ions (such as Na+ and K+). Low calcium levels make membranes permeable and induce faster loss of vital ions. To restore ionic balance, fish need more energy, hence pump more water (breathe harder) to get more oxygen. The difference in mean_gillrate shows that such effect is most pronounced and statistically significant when calcium level increases sevenfold (from 0.71 to 5 mg/L); and is statistically insignificant when increasing further from 5 to 18 mg/L, probably because membranes are already saturated with calcium by the time water reaches 3 mg/L.
Published
Published at https://rpubs.com/rmiranda/1361375