Homework 8 Data 101

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.1     ✔ stringr   1.5.1
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(corrplot)

## corrplot 0.95 loaded

# Load the dataset
fishGillDataSet <- read_csv("FishGills3.csv")

## Rows: 90 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Calcium
## dbl (1): GillRate
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

nutritionDataSet <- read_csv("NutritionStudy.csv")

## Rows: 315 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (3): Smoke, Sex, VitaminUse
## dbl (14): ID, Age, Quetelet, Vitamin, Calories, Fat, Fiber, Alcohol, Cholest...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Question 1

Hypothesis

\(H_0\): \(p_1\) = \(p_2\) = 0.5 \(H_a\): At least one of the proportions != 0.5

observedData <- c(244, 192)

chisq.test(observedData)

## 
##  Chi-squared test for given probabilities
## 
## data:  observedData
## X-squared = 6.2018, df = 1, p-value = 0.01276

Conclusion: Due to the p-value being 0.01276, which is less than 0.05, the results are significant and we reject the null hypothesis. The sample evidence shows that both alleles R and X are not equally likely.

Question 2

Hypothesis

\(H_0\): There is no association between Vitamin use and gender. \(H_a\): There is association between Vitamin use and gender.

observed_dataset <- table(nutritionDataSet$VitaminUse, nutritionDataSet$Sex)
observed_dataset

##             
##              Female Male
##   No             87   24
##   Occasional     77    5
##   Regular       109   13

chisq.test(observed_dataset)

## 
##  Pearson's Chi-squared test
## 
## data:  observed_dataset
## X-squared = 11.071, df = 2, p-value = 0.003944

Conclusion: Due to the p-value being 0.003944, which is less than 0.05, the results are significant and therefore we reject the null hypothesis. The sample evidence shows that females are more likely to use vitamins compared to males.

Question 3

Hypothesis \(H_0\): \(μ1\) = \(μ2\) = \(μ3\) \(H_a\): Some \(μ_i\) != \(μ_j\)

anova_test_result <- aov(GillRate ~ Calcium, data = fishGillDataSet)
anova_test_result

## Call:
##    aov(formula = GillRate ~ Calcium, data = fishGillDataSet)
## 
## Terms:
##                   Calcium Residuals
## Sum of Squares   2037.222 19064.333
## Deg. of Freedom         2        87
## 
## Residual standard error: 14.80305
## Estimated effects may be unbalanced

summary(anova_test_result)

##             Df Sum Sq Mean Sq F value Pr(>F)  
## Calcium      2   2037  1018.6   4.648 0.0121 *
## Residuals   87  19064   219.1                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion: Due to the p-value being 0.0121, which is less than 0.05, the results are significant and we reject the null hypothesis. The sample evidence shows that all levels of calcium (low, medium, and high) have an effect on the average gill rate.

Note: This is a replica of the data set to confirm the estimated effects from the anova test may be unbalanced due to the anova test before with the actual data set claims to be possibly unbalanced.

data_anova <- data.frame(
  Calcium = c(rep("Low", 30), rep("Medium", 30), rep("High", 30)),  # Three types of fertilizer with corrected counts
  GillRate = c(
    # Measurements for low calcium
    55, 63, 78, 85, 65, 98, 68, 84, 44, 87, 48, 86, 93, 64, 83, 79, 85, 65, 88, 47, 68, 86, 57, 53, 58, 47, 62, 64, 50, 45,
    # Measurements for medium calcium
    38, 42, 63, 46, 55, 63, 36, 58, 73, 69, 55, 68, 63, 73, 45, 79, 41, 83, 60, 48, 59, 33, 67, 43, 57, 72, 46, 74, 68, 83,
    # Measurements for high calcium
    59, 45, 63, 52, 59, 78, 72, 53, 69, 68, 57, 63, 68, 83, 38, 85, 68, 63, 58, 48, 42, 42, 80, 42, 52, 37, 57, 62, 40, 42
  )
)

anova_test_results <- aov(GillRate ~ Calcium, data = data_anova)
anova_test_results

## Call:
##    aov(formula = GillRate ~ Calcium, data = data_anova)
## 
## Terms:
##                   Calcium Residuals
## Sum of Squares   2037.222 19064.333
## Deg. of Freedom         2        87
## 
## Residual standard error: 14.80305
## Estimated effects may be unbalanced

Homework 8 Data 101

Vincent Le

2025-10-28

Question 1

Question 2

Question 3