CHI SQUARED AND ANOVA TEST ASSIGNMENT

Problem 1 ACTN3 Genotype Goodness of Fit

Hypotheses

*H(0): The alleles R & X are equally likely (P(R) = 0.5, P(X) = 0.5)

*H(A): The alleles R & X are not equally likely (P(R) not = 0.5 or P(X) not = 0.5).

# Data
observed <- c(R = 244, X = 192)
expected_probs <- c(0.5, 0.5)

# Chi-square Goodness-of-Fit Test
p1_test <- chisq.test(observed, p = expected_probs)
p1_test
## 
##  Chi-squared test for given probabilities
## 
## data:  observed
## X-squared = 6.2018, df = 1, p-value = 0.01276
   Conclusion for Problem 1

Since the p-value is less than 0.05, we reject the null hypothesis. There is sufficient evidence to suggest that the two genetic alleles are not equally likely in this sample.

Problem 2: Nutrition Study - Association (Vitamin Use and Gender)

##. Hypotheses

*H(0): There is no association between Vitamin Use and Gender (they are independent).

*H(A): There is a significant association between Vitamin Use and Gender.

library(readr)
read_csv("NutritionStudy.csv")
## Rows: 315 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (3): Smoke, Sex, VitaminUse
## dbl (14): ID, Age, Quetelet, Vitamin, Calories, Fat, Fiber, Alcohol, Cholest...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 315 × 17
##       ID   Age Smoke Quetelet Vitamin Calories   Fat Fiber Alcohol Cholesterol
##    <dbl> <dbl> <chr>    <dbl>   <dbl>    <dbl> <dbl> <dbl>   <dbl>       <dbl>
##  1     1    64 No        21.5       1    1299.  57     6.3     0         170. 
##  2     2    76 No        23.9       1    1032.  50.1  15.8     0          75.8
##  3     3    38 No        20.0       2    2372.  83.6  19.1    14.1       258. 
##  4     4    40 No        25.1       3    2450.  97.5  26.5     0.5       333. 
##  5     5    72 No        21.0       1    1952.  82.6  16.2     0         171. 
##  6     6    40 No        27.5       3    1367.  56     9.6     1.3       155. 
##  7     7    65 No        22.0       2    2214.  52    28.7     0         255. 
##  8     8    58 No        28.8       1    1596.  63.4  10.9     0         214. 
##  9     9    35 No        23.1       3    1800.  57.8  20.3     0.6       234. 
## 10    10    55 No        35.0       3    1264.  39.6  15.5     0         172. 
## # ℹ 305 more rows
## # ℹ 7 more variables: BetaDiet <dbl>, RetinolDiet <dbl>, BetaPlasma <dbl>,
## #   RetinolPlasma <dbl>, Sex <chr>, VitaminUse <chr>, PriorSmoke <dbl>
# I was having issues with R reading the whole column as a trings so i had to change R using (,) as a separator to(;) then i was able to read the data correctly.
NutritionStudy <- read.csv("NutritionStudy.csv", sep = ",")

##Checking to make sure the code was showing appropriately
names(NutritionStudy)
##  [1] "ID"            "Age"           "Smoke"         "Quetelet"     
##  [5] "Vitamin"       "Calories"      "Fat"           "Fiber"        
##  [9] "Alcohol"       "Cholesterol"   "BetaDiet"      "RetinolDiet"  
## [13] "BetaPlasma"    "RetinolPlasma" "Sex"           "VitaminUse"   
## [17] "PriorSmoke"
table(NutritionStudy$Sex, NutritionStudy$VitaminUse)
##         
##           No Occasional Regular
##   Female  87         77     109
##   Male    24          5      13
## Creating contingency table and running Chi Squared.
contingency_table <- table(NutritionStudy$Sex, NutritionStudy$VitaminUse)
print(contingency_table)
##         
##           No Occasional Regular
##   Female  87         77     109
##   Male    24          5      13
chisq.test(NutritionStudy$Sex, NutritionStudy$VitaminUse)
## 
##  Pearson's Chi-squared test
## 
## data:  NutritionStudy$Sex and NutritionStudy$VitaminUse
## X-squared = 11.071, df = 2, p-value = 0.003944
p2_test <- chisq.test(contingency_table)
p2_test
## 
##  Pearson's Chi-squared test
## 
## data:  contingency_table
## X-squared = 11.071, df = 2, p-value = 0.003944

Conclusion for Problem 2

Since the p-value is less than 0.05, we reject the null hypothesis. The conclusion is there is that there is a significant association between gender and vitamin use.

Problem 3: Fish Gill Rates - ANOVA

Hypotheses

H(0): Mu(low) = Mu(medium) = Mu(high) (Mean gill rates are the same across all calcium levels). H(A): At least one mean gill rate is different.

# Data from FishGills3 
low <- c(55, 63, 78, 85, 65, 98, 68, 84, 44, 87, 48, 86, 93, 64, 83, 79, 85, 65, 88, 47, 68, 86, 57, 53, 58, 47, 62, 64, 50, 45)
medium <- c(38, 42, 63, 46, 55, 63, 36, 58, 73, 69, 55, 68, 63, 73, 45, 79, 41, 83, 60, 48, 59, 33, 67, 43, 57, 72, 46, 74, 68, 83)
high <- c(59, 45, 63, 52, 59, 78, 72, 53, 69, 68, 57, 63, 68, 83, 38, 85, 68, 63, 58, 48, 42, 42, 80, 42, 52, 37, 57, 62, 40, 42)

# Creating data frame
fish_data <- data.frame(
  GillRate = c(low, medium, high),
  Calcium = factor(rep(c("Low", "Medium", "High"), each = 30))
)

# Performing ANOVA
anova_results <- aov(GillRate ~ Calcium, data = fish_data)
summary(anova_results)
##             Df Sum Sq Mean Sq F value Pr(>F)  
## Calcium      2   2037  1018.6   4.648 0.0121 *
## Residuals   87  19064   219.1                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion for Problem 3

The p-value from the ANOVA table is “0.0121”. Since this is less than 0.05, we reject the null hypothesis. There is significant evidence that the mean gill rate differs depending on the calcium level of the water.