Homework 8 - Data 110

Author

Kalina Peterson

Loading the Data

library(tidyverse)
Warning: package 'tidyverse' was built under R version 4.5.2
Warning: package 'ggplot2' was built under R version 4.5.2
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   4.0.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
setwd("C:/Users/kpeter81/OneDrive - montgomerycollege.edu/Datasets")
fish_gills <- read_csv("FishGills3.csv")
Rows: 90 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): Calcium
dbl (1): GillRate

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
nutrition_study <- read_csv("NutritionStudy.csv")
Rows: 315 Columns: 17
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (3): Smoke, Sex, VitaminUse
dbl (14): ID, Age, Quetelet, Vitamin, Calories, Fat, Fiber, Alcohol, Cholest...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

In your markdown answer the following problems. Include the following:

  • Your hypotheses.

  • P-value

  • Conclusion

Problem 1:

ACTN3 is a gene that encodes alpha-actinin-3, a protein in fast-twitch muscle fibers, important
for activities like sprinting and weightlifting. The gene has two main alleles: R (functional) and
X (non-functional). The R allele is linked to better performance in strength, speed, and power
sports, while the X allele is associated with endurance due to a greater reliance on slow-twitch
fibers. However, athletic performance is influenced by various factors, including training,
environment, and other genes, making the ACTN3 genotype just one contributing factor.
A study examines the ACTN3 genetic alleles R and X, also associated with fast-twitch muscles.
Of the 436 people in this sample, 244 were classified as R, and 192 were classified as X. Does
the sample provide evidence that the two options are not equally likely? Conduct the test using a
chi-square goodness-of-fit test.

\(H_0\): \(p_1\) = \(p_2\) = 1/2

\(H_a\): at least on \(p_i\) \(\neq\) 1/2

where \(p_1\) is the proportion of R alleles and \(p_2\) is the proportion of X alleles.

observed <- c(244, 192)

# Null values
theoretical_proportions <- c(1/2, 1/2)
expected <- theoretical_proportions * sum(observed)
expected
[1] 218 218

All expected values are greater than 5

chisq.test(observed, p = theoretical_proportions)

    Chi-squared test for given probabilities

data:  observed
X-squared = 6.2018, df = 1, p-value = 0.01276

P-value: 0.0127

Reject the null. There is significant evidence to suggest that the two genes are not equally likely.

Problem 2:


Who Is More Likely to Take Vitamins: Males or Females? The dataset NutritionStudy contains,
among other things, information about vitamin use and the gender of the participants. Is there a
significant association between these two variables? Use the variables VitaminUse and Gender to
conduct a chi-square analysis and give the results. (Test for Association)

\(H_0\) : Vitamin use is not associated with gender.

\(H_a\) : Vitamin use is associated with gender.

observed_dataset<- table(nutrition_study$Sex, nutrition_study$VitaminUse)
observed_dataset
        
          No Occasional Regular
  Female  87         77     109
  Male    24          5      13
chisq.test(observed_dataset)

    Pearson's Chi-squared test

data:  observed_dataset
X-squared = 11.071, df = 2, p-value = 0.003944

p-value = 0.003944

Reject the null, there is sufficient evidence to reject the null hypothesis. Therefore, I can conclude that there is a significant association between gender and vitamin use.

Problem 3:

Most fish use gills for respiration in water, and researchers can observe how fast a fish’s gill
cover beats to study ventilation, much like we might observe a person’s breathing rate. Professor
Brad Baldwin is interested in how water chemistry might affect gill beat rates. In one
experiment, he randomly assigned fish to tanks with different calcium levels. One tank was low
in calcium (0.71 mg/L), the second tank had a medium amount (5.24 mg/L), and the third tank
had water with a high calcium level (18.24 mg/L). His research team counted gill rates (beats per
minute) for samples of 30 fish in each tank. The results are stored in FishGills3. Perform
ANOVA test to see if the mean gill rate differs depending on the calcium level of the water.

\(H_0\): \(\mu_1\) = \(\mu_2\) = \(\mu_3\)

\(H_a\): not all \(\mu_i\) are equal

anova_result <- aov(GillRate ~ Calcium, data = fish_gills)

anova_result
Call:
   aov(formula = GillRate ~ Calcium, data = fish_gills)

Terms:
                  Calcium Residuals
Sum of Squares   2037.222 19064.333
Deg. of Freedom         2        87

Residual standard error: 14.80305
Estimated effects may be unbalanced
summary(anova_result)
            Df Sum Sq Mean Sq F value Pr(>F)  
Calcium      2   2037  1018.6   4.648 0.0121 *
Residuals   87  19064   219.1                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

P-value: 0.0121

Reject the null. There is evidence to suggest that calcium level is associated with gill beat rates.