HW 8 DATA 101

In your markdown answer the following problems. Include the following:

Your hypotheses.
P-value
Conclusion

Problem 1:

ACTN3 is a gene that encodes alpha-actinin-3, a protein in fast-twitch muscle fibers, important for activities like sprinting and weightlifting. The gene has two main alleles: R (functional) and X (non-functional). The R allele is linked to better performance in strength, speed, and power sports, while the X allele is associated with endurance due to a greater reliance on slow-twitch fibers. However, athletic performance is influenced by various factors, including training, environment, and other genes, making the ACTN3 genotype just one contributing factor. A study examines the ACTN3 genetic alleles R and X, also associated with fast-twitch muscles. Of the 436 people in this sample, 244 were classified as R, and 192 were classified as X. Does the sample provide evidence that the two options are not equally likely? Conduct the test using a chi-square goodness-of-fit test.

# I am loading the observed counts from Mendel’s pea experiment.
# These represent how many peas fell into each phenotype category.
observed <- c(315, 101, 108, 32)

# I am setting the theoretical probabilities based on Mendel’s genetic ratio.
# These are the expected proportions if inheritance follows the 9:3:3:1 pattern.
expected <- c(9/16, 3/16, 3/16, 1/16)

# I calculate the total number of observations so R can scale expected values correctly.
n <- sum(observed)

# I run a chi-square goodness-of-fit test to compare observed vs expected values.
# This helps me see if the differences are due to random chance or a real deviation.
chisq.test(x = observed, p = expected)

## 
##  Chi-squared test for given probabilities
## 
## data:  observed
## X-squared = 0.47002, df = 3, p-value = 0.9254

Conclusion

The p-value I got is 0.9254, which is much greater than 0.05 which is why I fail to reject the null hypothesis. This suggests how observed frequencies align with Mendel’s expected genetic ratios. I also learned how minor deviations from theoretical proportions end up being expected due to random variation which does not necessarily indicate a meaningful departure from the model.

Problem 2:

Who Is More Likely to Take Vitamins: Males or Females? The dataset NutritionStudy contains, among other things, information about vitamin use and the gender of the participants. Is there a significant association between these two variables? Use the variables VitaminUse and Gender to conduct a chi-square analysis and give the results. (Test for Association)

library(readr)

# I loaded the dataset so I can work with the NutritionStudy data in R.
nutrition <- read_csv("NutritionStudy.csv")

## Rows: 315 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (3): Smoke, Sex, VitaminUse
## dbl (14): ID, Age, Quetelet, Vitamin, Calories, Fat, Fiber, Alcohol, Cholest...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# I check the column names first because I noticed earlier that "Gender" did not exist.
# This helps me avoid errors and makes sure I use the correct variable names.
colnames(nutrition)

##  [1] "ID"            "Age"           "Smoke"         "Quetelet"     
##  [5] "Vitamin"       "Calories"      "Fat"           "Fiber"        
##  [9] "Alcohol"       "Cholesterol"   "BetaDiet"      "RetinolDiet"  
## [13] "BetaPlasma"    "RetinolPlasma" "Sex"           "VitaminUse"   
## [17] "PriorSmoke"

# I create a contingency table to compare VitaminUse and Sex.
# This organizes the data into counts so I can run a chi-square test.
table_data <- table(nutrition$VitaminUse, nutrition$Sex)

# I ran a chi-square test for independence.
# This essentially conveys to me whether VitaminUse and Sex are related or completely independent.
chisq.test(table_data)

## 
##  Pearson's Chi-squared test
## 
## data:  table_data
## X-squared = 11.071, df = 2, p-value = 0.003944

Conclusion

The p-value I got is 0.003944, which is less than 0.05 which is why I reject the null hypothesis. This shows statistical significance association that occurs between sex and vitamin use in the dataset. I learned through this that chi-square tests are effective in order to identify the relationships between categorical variables which are not likely to be explained by just chance within itself.

Problem 3:

Most fish use gills for respiration in water, and researchers can observe how fast a fish’s gill cover beats to study ventilation, much like we might observe a person’s breathing rate. Professor Brad Baldwin is interested in how water chemistry might affect gill beat rates. In one experiment, he randomly assigned fish to tanks with different calcium levels. One tank was low in calcium (0.71 mg/L), the second tank had a medium amount (5.24 mg/L), and the third tank had water with a high calcium level (18.24 mg/L). His research team counted gill rates (beats per minute) for samples of 30 fish in each tank. The results are stored in FishGills3. Perform ANOVA test to see if the mean gill rate differs depending on the calcium level of the water.

library(readr)

# I load the fish dataset so I can analyze gill rate differences.
fish <- read_csv("FishGills3.csv")

## Rows: 90 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Calcium
## dbl (1): GillRate
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# I run an ANOVA test to compare mean gill rates across calcium levels.
# I use ANOVA because I am comparing more than two groups.
anova_model <- aov(GillRate ~ Calcium, data = fish)

# I displayed the summary of the model in order for me to see the p-value and F-statistic.
# It kind of helps me make a decision on whether calcium level has an effect on gill rate.
summary(anova_model)

##             Df Sum Sq Mean Sq F value Pr(>F)  
## Calcium      2   2037  1018.6   4.648 0.0121 *
## Residuals   87  19064   219.1                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion

The p-value I got is 0.0121 which is why I reject the null hypothesis. The suggestion is about how at least one calcium group ends up differing significantly in the gill rate. I also learned how and why ANOVA is appropriate for comparing multiple group means since the calcium level appears to have a statistically significant effect on gill rate shown in this dataset.

HW 8 DATA 101

2026-04-02

In your markdown answer the following problems. Include the following:

Problem 1:

Conclusion

Problem 2:

Conclusion

Problem 3:

Conclusion