Downloading Data

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.0     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.1
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
FishGills3 <- read_csv("FishGills3.csv")
## Rows: 90 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Calcium
## dbl (1): GillRate
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
NutritionStudy <- read_csv("NutritionStudy.csv")
## Rows: 315 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (3): Smoke, Sex, VitaminUse
## dbl (14): ID, Age, Quetelet, Vitamin, Calories, Fat, Fiber, Alcohol, Cholest...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Problem 1:

Hypothesis

\(H_0\):\(p_1\) = \(p_2\) = 1/2

\(H_a\): at least on \(p_i\) \(\neq\) 1/2

Test

# Observed counts
observed <- c(244, 192)

# Null values
theoritical_prop <- rep(1/2, 2)
expected_values <- theoritical_prop*sum(observed) 
expected_values
## [1] 218 218
chisq.test(observed)
## 
##  Chi-squared test for given probabilities
## 
## data:  observed
## X-squared = 6.2018, df = 1, p-value = 0.01276

P value is 0.01276, so we reject the null. We can conclude the 2 outcomes are not equally likely and that there are differences in the probabilities of the genetic alleles R and X.

Problem 2: Chi Square(Test for association)

Hypothesis

\(H_0\) : Vitamin use is not associated with gender

\(H_a\) : Vitamin use is associated with gender

Test

All over 5

observed_dataset<- table(NutritionStudy$VitaminUse, NutritionStudy$Sex)
observed_dataset
##             
##              Female Male
##   No             87   24
##   Occasional     77    5
##   Regular       109   13
chisq.test(observed_dataset)
## 
##  Pearson's Chi-squared test
## 
## data:  observed_dataset
## X-squared = 11.071, df = 2, p-value = 0.003944

P value is 0.003944, which is less than the typical significance level of 0.05, there is sufficient evidence to reject the null hypothesis.

Therefore, we conclude that there is a significant association between vitamin use and gender.

Problem 3: ANOVA

Hypothesis

\(H_0\): \(\mu_A\) = \(\mu_B\) = \(\mu_C\)

\(H_a\): not all \(\mu_i\) are equal

Test

anova_result <- aov(GillRate ~ Calcium, data = FishGills3)

anova_result
## Call:
##    aov(formula = GillRate ~ Calcium, data = FishGills3)
## 
## Terms:
##                   Calcium Residuals
## Sum of Squares   2037.222 19064.333
## Deg. of Freedom         2        87
## 
## Residual standard error: 14.80305
## Estimated effects may be unbalanced
summary(anova_result)
##             Df Sum Sq Mean Sq F value Pr(>F)  
## Calcium      2   2037  1018.6   4.648 0.0121 *
## Residuals   87  19064   219.1                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

P Value is 0.0121: indicating moderate evidence against the null hypothesis. Overall, this test suggests that there are significant differences in gill rate among the different calcium levels.