Chi-Square Test of Independence
The chi-square test of independence is used to analyze the frequency table (i.e. contengency table) formed by two categorical variables. The chi-square test evaluates whether there is a significant association between the categories of the two variables.
Graphical display of contengency tables
Contingency table can be visualized using the function balloonplot() [in gplots package]. This function draws a graphical matrix where each cell contains a dot whose size reflects the relative magnitude of the corresponding component.
# Create the observed data matrix
observed <- matrix(c(30, 50, 20, 40, 20, 10, 30, 30, 10), nrow = 3, byrow = TRUE)
rownames(observed) <- c("Car", "Bicycle", "Walk")
colnames(observed) <- c("Reading", "Gaming", "Socializing")
contigency_table<- observed
contigency_table
## Reading Gaming Socializing
## Car 30 50 20
## Bicycle 40 20 10
## Walk 30 30 10
hypothesis formulation h0:hobbies and transport are independent h1:hobbies and transport are dependent
level of significance
alpha=0.05
# Calculate row totals and column totals
row_totals <- rowSums(observed)
col_totals <- colSums(observed)
total_obs <- sum(observed)
# Calculate expected frequencies
expected <- outer(row_totals, col_totals) / total_obs
expected
## Reading Gaming Socializing
## Car 41.66667 41.66667 16.66667
## Bicycle 29.16667 29.16667 11.66667
## Walk 29.16667 29.16667 11.66667
# Compute the chi-square statistic
chi2_stat <- sum((observed - expected)^2 / expected)
chi2_stat
## [1] 13.02857
# Degrees of freedom
dof <- (nrow(observed) - 1) * (ncol(observed) - 1)
dof
## [1] 4
# Print results
cat("Chi2 Statistic:", chi2_stat, "\n")
## Chi2 Statistic: 13.02857
cat("Degrees of Freedom:", dof, "\n")
## Degrees of Freedom: 4
qchisq(alpha,dof)
## [1] 0.710723
reject ho if chisq >= to the critical chi so 13.08 is greater than 0.710 so there hobbies and transport are dependent
by function
chi.sq<-chisq.test(contigency_table)
chi.sq
##
## Pearson's Chi-squared test
##
## data: contigency_table
## X-squared = 13.029, df = 4, p-value = 0.01114
if p<alpha reject ho
if(chi.sq$p.value<alpha)
{
cat("reject ho")
}else
{
cat("accept ho")
}
## reject ho