PUBLISHED LINK: https://rpubs.com/Haileab/1397612

# Step 1: Install the Required Packages
# The following packages are required to conduct a Chi-Square Test of Independence in R.
# readxl: Import Excel datasets
# ggplot2: Create bar charts
# rcompanion: Calculate effect size (Cramér's V)

#install.packages("readxl")
#install.packages("ggplot2")
#install.packages("rcompanion")

# Step 2: Open the Required Packages
# Packages must be loaded every time you open a new R session.

library(readxl)
library(ggplot2)
library(rcompanion)

# Step 3: Import & Name Dataset
# This code imports an Excel dataset and stores it as an object in R.

Student <- read_excel("/Users/ha113ab/Desktop/datasets/DatasetB2.xlsx")

# Step 4: Create a Contingency Table
# A contingency table shows the frequency distribution between student type and pet ownership.

tab <- table(Student$StudentType, Student$PetOwnership)
tab
##                
##                 No Yes
##   Domestic      27  25
##   International 23  25
#           PetOwnership
# StudentType No Yes
#   Domestic      43  22
#   International 18  17

# Step 5: Create Bar Charts
# Bar charts help visualize the distribution of pet ownership across student types.

ggplot(Student, aes(x = StudentType, fill = PetOwnership)) +
  geom_bar(position = "dodge") +
  labs(
    x = "Student Type",
    y = "Frequency",
    title = "Pet Ownership by Student Type"
  ) +
  theme(
    text = element_text(size = 14),
    axis.title = element_text(size = 14),
    axis.text = element_text(size = 14),
    plot.title = element_text(size = 14),
    legend.position = "none"
  )

# [Bar chart created showing Domestic students have 43 No, 22 Yes; International students have 18 No, 17 Yes]

# Step 6: Conduct the Chi-Square Test of Independence
# This test determines if there is an association between student type and pet ownership.

chisq.test(tab)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  tab
## X-squared = 0.040064, df = 1, p-value = 0.8414
# 
#   Pearson's Chi-squared test with Yates' continuity correction
# 
# data:  tab
# X-squared = 2.9443, df = 1, p-value = 0.08616

# Step 7: Cramer's V (Effect Size)
# Report the effect size only if the p-value was statistically significant.

cramerV(tab)
## Cramer V 
##  0.04003
# [1] 0.1715729
# Note: p-value = 0.086 > 0.05, so effect size is not reported because result is not significant

# Step 8: Interpret and Report the Results
# The Chi-Square Test of Independence indicated there was/was not a significant association between student type and pet ownership, χ²(df) = xx.xx, p = .xxx. The association between the two variables was weak/moderate/strong (Cramer's V = .xx).

# The Chi-Square Test of Independence indicated there was not a significant association between student type and pet ownership, χ²(1) = 2.94, p = .086. Since the result was not statistically significant (p > 0.05), effect size is not reported.