CHI SQUARED AND ANOVA TEST ASSIGNMENT
*H(0): The alleles R & X are equally likely (P(R) = 0.5, P(X) = 0.5)
*H(A): The alleles R & X are not equally likely (P(R) not = 0.5 or P(X) not = 0.5).
# Data
observed <- c(R = 244, X = 192)
expected_probs <- c(0.5, 0.5)
# Chi-square Goodness-of-Fit Test
p1_test <- chisq.test(observed, p = expected_probs)
p1_test
##
## Chi-squared test for given probabilities
##
## data: observed
## X-squared = 6.2018, df = 1, p-value = 0.01276
Conclusion for Problem 1
Since the p-value is less than 0.05, we reject the null hypothesis. There is sufficient evidence to suggest that the two genetic alleles are not equally likely in this sample.
##. Hypotheses
*H(0): There is no association between Vitamin Use and Gender (they are independent).
*H(A): There is a significant association between Vitamin Use and Gender.
library(readr)
read_csv("NutritionStudy.csv")
## Rows: 315 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Smoke, Sex, VitaminUse
## dbl (14): ID, Age, Quetelet, Vitamin, Calories, Fat, Fiber, Alcohol, Cholest...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 315 × 17
## ID Age Smoke Quetelet Vitamin Calories Fat Fiber Alcohol Cholesterol
## <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 64 No 21.5 1 1299. 57 6.3 0 170.
## 2 2 76 No 23.9 1 1032. 50.1 15.8 0 75.8
## 3 3 38 No 20.0 2 2372. 83.6 19.1 14.1 258.
## 4 4 40 No 25.1 3 2450. 97.5 26.5 0.5 333.
## 5 5 72 No 21.0 1 1952. 82.6 16.2 0 171.
## 6 6 40 No 27.5 3 1367. 56 9.6 1.3 155.
## 7 7 65 No 22.0 2 2214. 52 28.7 0 255.
## 8 8 58 No 28.8 1 1596. 63.4 10.9 0 214.
## 9 9 35 No 23.1 3 1800. 57.8 20.3 0.6 234.
## 10 10 55 No 35.0 3 1264. 39.6 15.5 0 172.
## # ℹ 305 more rows
## # ℹ 7 more variables: BetaDiet <dbl>, RetinolDiet <dbl>, BetaPlasma <dbl>,
## # RetinolPlasma <dbl>, Sex <chr>, VitaminUse <chr>, PriorSmoke <dbl>
# I was having issues with R reading the whole column as a trings so i had to change R using (,) as a separator to(;) then i was able to read the data correctly.
NutritionStudy <- read.csv("NutritionStudy.csv", sep = ",")
##Checking to make sure the code was showing appropriately
names(NutritionStudy)
## [1] "ID" "Age" "Smoke" "Quetelet"
## [5] "Vitamin" "Calories" "Fat" "Fiber"
## [9] "Alcohol" "Cholesterol" "BetaDiet" "RetinolDiet"
## [13] "BetaPlasma" "RetinolPlasma" "Sex" "VitaminUse"
## [17] "PriorSmoke"
table(NutritionStudy$Sex, NutritionStudy$VitaminUse)
##
## No Occasional Regular
## Female 87 77 109
## Male 24 5 13
## Creating contingency table and running Chi Squared.
contingency_table <- table(NutritionStudy$Sex, NutritionStudy$VitaminUse)
print(contingency_table)
##
## No Occasional Regular
## Female 87 77 109
## Male 24 5 13
chisq.test(NutritionStudy$Sex, NutritionStudy$VitaminUse)
##
## Pearson's Chi-squared test
##
## data: NutritionStudy$Sex and NutritionStudy$VitaminUse
## X-squared = 11.071, df = 2, p-value = 0.003944
p2_test <- chisq.test(contingency_table)
p2_test
##
## Pearson's Chi-squared test
##
## data: contingency_table
## X-squared = 11.071, df = 2, p-value = 0.003944
Since the p-value is less than 0.05, we reject the null hypothesis. The conclusion is there is that there is a significant association between gender and vitamin use.
H(0): Mu(low) = Mu(medium) = Mu(high) (Mean gill rates are the same across all calcium levels). H(A): At least one mean gill rate is different.
# Data from FishGills3
low <- c(55, 63, 78, 85, 65, 98, 68, 84, 44, 87, 48, 86, 93, 64, 83, 79, 85, 65, 88, 47, 68, 86, 57, 53, 58, 47, 62, 64, 50, 45)
medium <- c(38, 42, 63, 46, 55, 63, 36, 58, 73, 69, 55, 68, 63, 73, 45, 79, 41, 83, 60, 48, 59, 33, 67, 43, 57, 72, 46, 74, 68, 83)
high <- c(59, 45, 63, 52, 59, 78, 72, 53, 69, 68, 57, 63, 68, 83, 38, 85, 68, 63, 58, 48, 42, 42, 80, 42, 52, 37, 57, 62, 40, 42)
# Creating data frame
fish_data <- data.frame(
GillRate = c(low, medium, high),
Calcium = factor(rep(c("Low", "Medium", "High"), each = 30))
)
# Performing ANOVA
anova_results <- aov(GillRate ~ Calcium, data = fish_data)
summary(anova_results)
## Df Sum Sq Mean Sq F value Pr(>F)
## Calcium 2 2037 1018.6 4.648 0.0121 *
## Residuals 87 19064 219.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-value from the ANOVA table is “0.0121”. Since this is less than 0.05, we reject the null hypothesis. There is significant evidence that the mean gill rate differs depending on the calcium level of the water.