Part 1: Chi-Square Test for Independence

Objective

To test whether there is a significant relationship between Gender and Admission Status in the UC Berkeley admissions data.

# Load CSV file 

ucb_df <- read.csv("C:/Users/User/Documents/Kaggle/UCBAdmissions.csv")

# Display first few rows

head(ucb_df)
##   X    Admit Gender Dept Freq
## 1 1 Admitted   Male    A  512
## 2 2 Rejected   Male    A  313
## 3 3 Admitted Female    A   89
## 4 4 Rejected Female    A   19
## 5 5 Admitted   Male    B  353
## 6 6 Rejected   Male    B  207
# Create contingency table (Gender vs Admit)

table_ucb <- xtabs(Freq ~ Gender + Admit, data = ucb_df)
table_ucb
##         Admit
## Gender   Admitted Rejected
##   Female      557     1278
##   Male       1198     1493
# Perform Chi-Square Test for Independence

chi_test_ucb <- chisq.test(table_ucb)

# Display results

chi_test_ucb
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  table_ucb
## X-squared = 91.61, df = 1, p-value < 2.2e-16
if(chi_test_ucb$p.value < 0.05){
cat("There is a statistically significant association between Gender and Admission status.")
} else {
cat("There is no statistically significant association between Gender and Admission status.")
}
## There is a statistically significant association between Gender and Admission status.
Part 2: Chi-Square Goodness-of-Fit Test
Dataset: Hair and Eye Color (CSV file)

We will now test whether eye color distribution follows a uniform distribution using a CSV dataset.
# Load CSV file (update path as necessary)

hair_df <- read.csv("C:/Users/User/Documents/Kaggle/HairEyeColor.csv")

# Display first few rows

head(hair_df)
##   X  Hair   Eye  Sex Freq
## 1 1 Black Brown Male   32
## 2 2 Brown Brown Male   53
## 3 3   Red Brown Male   10
## 4 4 Blond Brown Male    3
## 5 5 Black  Blue Male   11
## 6 6 Brown  Blue Male   50
# Create frequency table for Eye Color

eye_freq <- aggregate(Freq ~ Eye, data = hair_df, sum)
eye_freq
##     Eye Freq
## 1  Blue  215
## 2 Brown  220
## 3 Green   64
## 4 Hazel   93
# Perform Chi-Square Goodness-of-Fit Test

chi_test_eye <- chisq.test(eye_freq$Freq, p = rep(1/nrow(eye_freq), nrow(eye_freq)))

# Display results

chi_test_eye
## 
##  Chi-squared test for given probabilities
## 
## data:  eye_freq$Freq
## X-squared = 133.47, df = 3, p-value < 2.2e-16
if(chi_test_eye$p.value < 0.05){
cat("The distribution of eye color is significantly different from a uniform distribution.")
} else {
cat("There is no significant difference — eye colors are approximately uniformly distributed.")
}
## The distribution of eye color is significantly different from a uniform distribution.