We have a chi-squared value of 5.55. Since we get a p-Value less than the significance level of 0.05, we reject the null hypothesis and conclude that the two variables are in fact dependent.
Find out if the \(cyl\) and \(carb\) variables in \(mtcars\) dataset are dependent or not. Let’s have a look the table of mtcars\(carb vs mtcars\)cyl.
##
## 4 6 8
## 1 5 2 0
## 2 6 0 4
## 3 0 0 3
## 4 0 4 6
## 6 0 1 0
## 8 0 0 1
Since there are more levels, it’s much harder to make out if they are related. Let’s use the chi-squared test instead.
## Warning in chisq.test(mtcars$carb, mtcars$cyl, correct = FALSE): Chi-squared
## approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: mtcars$carb and mtcars$cyl
## X-squared = 24.389, df = 10, p-value = 0.006632
We have a high chi-squared value and a p-value of less that 0.05 significance level. So we reject the null hypothesis and conclude that \(carb\) and \(cyl\) have a significant relationship.
256 visual artists were surveyed to find out their zodiac sign. The results were: Aries (29), Taurus (24), Gemini (22), Cancer (19), Leo (21), Virgo (18), Libra (19), Scorpio (20), Sagittarius (23), Capricorn (18), Aquarius (20), Pisces (23). Test the hypothesis that zodiac signs are evenly distributed across visual artists. ### Hypothesis \(H0\) : Births are uniformly distributed over zodiac signs. \(HA\) : Births are not uniformly distributed over zodiac signs.
observed <-c ( 29, 24, 22, 19, 21, 18, 19, 20, 23, 18, 20, 23)
n<-256
expected <- c(1/12) * n
alpha <- .05
r <- c(1 , 2 , 3, 4 , 5 , 6 , 7, 8 , 9 , 10 , 11 , 12)where \(df=12−1=11.\) In R, we can calculate \(χ2\) as the following:
## [1] 5.09375
## [1] 0.9265414
The P-value of 0.07345861 says that if the zodiac signs of executives were in fact distributed uniformly, an observed chi-square value of 5.09 or higher. This certainly isn’t unusual, so we fail to reject the null hypothesis. There is no evidence that the births of the executes are not uniformly distributed among the zodiacs. We can also visualize the result as we can see bellow:
## Warning: package 'ggplot2' was built under R version 3.6.3
## Warning: package 'dplyr' was built under R version 3.6.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Warning: package 'tidyr' was built under R version 3.6.3
## Warning: package 'ggthemes' was built under R version 3.6.3
lrr = -Inf
urr = qchisq(p = alpha, df = df, lower.tail = FALSE)
data.frame(chi2 = 0:2500 / 100) %>%
mutate(density = dchisq(x = chi2, df = df)) %>%
mutate(rr = ifelse(chi2 < lrr | chi2 > urr, density, 0)) %>%
ggplot() +
geom_line(aes(x = chi2, y = density)) +
geom_area(aes(x = chi2, y = rr), fill = "red", alpha = 0.3) +
# geom_vline(aes(xintercept = pi_0), color = "black") +
geom_vline(aes(xintercept = chisq), color = "red") +
labs(title = bquote("Chi-Squared Goodness-of-Fit Test"),
subtitle = bquote("Chisq ="~.(round(chisq,2))~", n ="~.(n)~", alpha ="~.(alpha)~", chisq_crit ="~.(round(urr,2))~", p-value ="~.(round(p_value,3))),
x = "chisq",
y = "Density") +
theme(legend.position="none")