Let’s work it out in R by doing a chi-squared test on the treatment (X) and improvement (Y) columns in treatment.csv First, read in the treatment.csv data.
##
## improved not-improved
## not-treated 26 29
## treated 35 15
Let’s do the chi-squared test using the chisq.test() function. It takes the two vectors as the input. We also set correct=FALSE to turn off Yates’ continuity correction.
##
## Pearson's Chi-squared test
##
## data: df$treatment and df$improvement
## X-squared = 5.5569, df = 1, p-value = 0.01841
We have a chi-squared value of 5.55. Since we get a p-Value less than the significance level of 0.05, we reject the null hypothesis and conclude that the two variables are in fact dependent.
Find out if the \(cyl\) and \(carb\) variables in \(mtcars\) dataset are dependent or not. Let’s have a look the table of mtcars\(carb vs mtcars\)cyl.
##
## 4 6 8
## 1 5 2 0
## 2 6 0 4
## 3 0 0 3
## 4 0 4 6
## 6 0 1 0
## 8 0 0 1
Since there are more levels, it’s much harder to make out if they are related. Let’s use the chi-squared test instead.
## Warning in chisq.test(mtcars$carb, mtcars$cyl, correct = FALSE): Chi-squared
## approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: mtcars$carb and mtcars$cyl
## X-squared = 24.389, df = 10, p-value = 0.006632
We have a high chi-squared value and a p-value of less that 0.05 significance level. So we reject the null hypothesis and conclude that \(carb\) and \(cyl\) have a significant relationship.
256 visual artists were surveyed to find out their zodiac sign. The results were: Aries (29), Taurus (24), Gemini (22), Cancer (19), Leo (21), Virgo (18), Libra (19), Scorpio (20), Sagittarius (23), Capricorn (18), Aquarius (20), Pisces (23). Test the hypothesis that zodiac signs are evenly distributed across visual artists. ### Hypothesis \(H0\) : Births are uniformly distributed over zodiac signs. \(HA\) : Births are not uniformly distributed over zodiac signs.
observed <-c ( 29, 24, 22, 19, 21, 18, 19, 20, 23, 18, 20, 23)
n<-256
expected <- c(1/12) * n
alpha <- .05
r <- c(1 , 2 , 3, 4 , 5 , 6 , 7, 8 , 9 , 10 , 11 , 12)where \(df=12−1=11.\) In R, we can calculate \(χ2\) as the following:
## [1] 5.09375
## [1] 0.9265414
The P-value of 0.07345861 says that if the zodiac signs of executives were in fact distributed uniformly, an observed chi-square value of 5.09 or higher. This certainly isn’t unusual, so we fail to reject the null hypothesis. There is no evidence that the births of the executes are not uniformly distributed among the zodiacs. We can also visualize the result as we can see bellow:
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
library(ggthemes)
lrr = -Inf
urr = qchisq(p = alpha, df = df, lower.tail = FALSE)
data.frame(chi2 = 0:2500 / 100) %>%
mutate(density = dchisq(x = chi2, df = df)) %>%
mutate(rr = ifelse(chi2 < lrr | chi2 > urr, density, 0)) %>%
ggplot() +
geom_line(aes(x = chi2, y = density)) +
geom_area(aes(x = chi2, y = rr), fill = "red", alpha = 0.3) +
# geom_vline(aes(xintercept = pi_0), color = "black") +
geom_vline(aes(xintercept = chisq), color = "red") +
labs(title = bquote("Chi-Squared Goodness-of-Fit Test"),
subtitle = bquote("Chisq ="~.(round(chisq,2))~", n ="~.(n)~", alpha ="~.(alpha)~", chisq_crit ="~.(round(urr,2))~", p-value ="~.(round(p_value,3))),
x = "chisq",
y = "Density") +
theme(legend.position="none")