To test whether there is a significant relationship between
Gender and Admission Status in the UC
Berkeley admissions data.
# Load CSV file
ucb_df <- read.csv("C:/Users/User/Documents/Kaggle/UCBAdmissions.csv")
# Display first few rows
head(ucb_df)
## X Admit Gender Dept Freq
## 1 1 Admitted Male A 512
## 2 2 Rejected Male A 313
## 3 3 Admitted Female A 89
## 4 4 Rejected Female A 19
## 5 5 Admitted Male B 353
## 6 6 Rejected Male B 207
# Create contingency table (Gender vs Admit)
table_ucb <- xtabs(Freq ~ Gender + Admit, data = ucb_df)
table_ucb
## Admit
## Gender Admitted Rejected
## Female 557 1278
## Male 1198 1493
# Perform Chi-Square Test for Independence
chi_test_ucb <- chisq.test(table_ucb)
# Display results
chi_test_ucb
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: table_ucb
## X-squared = 91.61, df = 1, p-value < 2.2e-16
if(chi_test_ucb$p.value < 0.05){
cat("There is a statistically significant association between Gender and Admission status.")
} else {
cat("There is no statistically significant association between Gender and Admission status.")
}
## There is a statistically significant association between Gender and Admission status.
Part 2: Chi-Square Goodness-of-Fit Test
Dataset: Hair and Eye Color (CSV file)
We will now test whether eye color distribution follows a uniform distribution using a CSV dataset.
# Load CSV file (update path as necessary)
hair_df <- read.csv("C:/Users/User/Documents/Kaggle/HairEyeColor.csv")
# Display first few rows
head(hair_df)
## X Hair Eye Sex Freq
## 1 1 Black Brown Male 32
## 2 2 Brown Brown Male 53
## 3 3 Red Brown Male 10
## 4 4 Blond Brown Male 3
## 5 5 Black Blue Male 11
## 6 6 Brown Blue Male 50
# Create frequency table for Eye Color
eye_freq <- aggregate(Freq ~ Eye, data = hair_df, sum)
eye_freq
## Eye Freq
## 1 Blue 215
## 2 Brown 220
## 3 Green 64
## 4 Hazel 93
# Perform Chi-Square Goodness-of-Fit Test
chi_test_eye <- chisq.test(eye_freq$Freq, p = rep(1/nrow(eye_freq), nrow(eye_freq)))
# Display results
chi_test_eye
##
## Chi-squared test for given probabilities
##
## data: eye_freq$Freq
## X-squared = 133.47, df = 3, p-value < 2.2e-16
if(chi_test_eye$p.value < 0.05){
cat("The distribution of eye color is significantly different from a uniform distribution.")
} else {
cat("There is no significant difference — eye colors are approximately uniformly distributed.")
}
## The distribution of eye color is significantly different from a uniform distribution.