Task 5 ~ Goodness of Fit

Lab6 ~ Goodness of Fit


Kontak : \(\downarrow\)
Email
Instagram https://www.instagram.com/nbrigittag/
RPubs https://rpubs.com/naftalibrigitta/
Nama Naftali Brigitta Gunawan
NIM 20214920002

Pearson’s χ2

test The Chi-square goodness-of-fit test also known as Pearson’s chi-square test. This test use to evaluates whether the frequency counts of the R levels of a categorical variable follow a hypothesized distribution.

Example 1

Prior studies have shown a four possible responses to a therapy occur with frequency π1=.50,π2=.25,π3=.10, and π4=.15 . A random sample of n=200 yields n1=120,n2=60,n3=10, and n4=10 . At an α=.05 level of significance, does the random sample confirm the expected frequencies?

library(ggplot2)
library(dplyr)
library(tidyr)
library(ggthemes)

observed <- c(120, 60, 10, 10)
n <- 200  
expected <- c(.50, .25, .10, .15) * n
alpha <- .05
r <- c(1, 2, 3, 4)
data.frame(r, observed, expected) %>%
gather(key = "response", value = "freq", c(-r)) %>%
ggplot() +
  geom_col(aes(x = r, y = freq, fill = response), position = "dodge") +
  labs(title = bquote("Frequency Counts"),
       x = "Response to therapy",
       y = "Frequency") +
  theme_tufte()

df <- 4 - 1
(chisq <- sum((observed - expected)^2 / expected))
## [1] 24.33333
(p_value <- pchisq(q = chisq, df = df, lower.tail = F))
## [1] 2.128057e-05
(chisq.test.result <- chisq.test(x = observed, p = expected / n))
## 
##  Chi-squared test for given probabilities
## 
## data:  observed
## X-squared = 24.333, df = 3, p-value = 2.128e-05

The result show us P(χ2>24.33)<.001 , so reject H0 that the distributions are identical. We can also visualize the result as we can see bellow:

lrr = -Inf
urr = qchisq(p = alpha, df = df, lower.tail = FALSE)
data.frame(chi2 = 0:2500 / 100) %>%
  mutate(density = dchisq(x = chi2, df = df)) %>%
  mutate(rr = ifelse(chi2 < lrr | chi2 > urr, density, 0)) %>%
ggplot() +
  geom_line(aes(x = chi2, y = density)) +
  geom_area(aes(x = chi2, y = rr), fill = "red", alpha = 0.3) +
#  geom_vline(aes(xintercept = pi_0), color = "black") +
  geom_vline(aes(xintercept = chisq), color = "red") +
  labs(title = bquote("Chi-Squared Goodness-of-Fit Test"),
       subtitle = bquote("Chisq ="~.(round(chisq,2))~", n ="~.(n)~", alpha ="~.(alpha)~", chisq_crit ="~.(round(urr,2))~", p-value ="~.(round(p_value,3))),
       x = "chisq",
       y = "Density") +
  theme(legend.position="none")

Example 2

A student population is hypothesized to be 60 percent female πF=0.60. A random sample of n=100 students yields 53 percent females pF=0.53. Is the sample representative of the population at an α=0.05 level of significance?

observed <- c(53, 47)
n <- 100  
expected <- c(.60, .40) * n
alpha <- .05
r <- c("female", "male")
data.frame(r, observed, expected) %>%
gather(key = "response", value = "freq", c(-r)) %>%
ggplot() +
  geom_col(aes(x = r, y = freq, fill = response), position = "dodge") +
  labs(title = bquote("Frequency Counts"), 
       x = "Gender",
       y = "Frequency") +
  theme_tufte()

Exercise

Exercice 1

Please work out in R by doing a chi-squared test on the treatment (X) and improvement (Y) columns in treatment.csv.

setwd(getwd())
treatment = read.csv("treatment.csv")

treatment
table(treatment$treatment, treatment$improvement)
##              
##               improved not-improved
##   not-treated       26           29
##   treated           35           15
chisq.test(treatment$treatment, treatment$improvement)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  treatment$treatment and treatment$improvement
## X-squared = 4.6626, df = 1, p-value = 0.03083

Penejlasan H0 ditolak, p-value >= 0.05. Data yang di atas merupakan identikal.

Exercice 2

Find out if the cyl and carb variables in mtcars dataset are dependent or not.

table(mtcars$cyl, mtcars$carb)
##    
##     1 2 3 4 6 8
##   4 5 6 0 0 0 0
##   6 2 0 0 4 1 0
##   8 0 4 3 6 0 1
df2 = (2-1)*(6-1)
alpha = 0.05
chisq.test(mtcars$cyl, mtcars$carb)
## 
##  Pearson's Chi-squared test
## 
## data:  mtcars$cyl and mtcars$carb
## X-squared = 24.389, df = 10, p-value = 0.006632

Penjelasan Asumsi H0 = variabelnya independen Dari hasil di atas, karena p-value lebih dari alpha 0.05, H0 diterima. variabelnya independen

Exercise 3

256 visual artists were surveyed to find out their zodiac sign. The results were: Aries (29), Taurus (24), Gemini (22), Cancer (19), Leo (21), Virgo (18), Libra (19), Scorpio (20), Sagittarius (23), Capricorn (18), Aquarius (20), Pisces (23). Test the hypothesis that zodiac signs are evenly distributed across visual artists. (Reference)

zodiac = c(29,24,22,19,21,18,19,20,23,18,20,23) 
pert.zodiac = zodiac/256 
expected = c(rep(256/12, 12))

zodiak = data.frame(
  "zodiak" = c('Aries', 'Taurus' , 'Gemini' , 'Cancer' , 'Leo' , 'Virgo' , 'Libra' , 'Scorpio', 'Sagittarius', 'Capricorn', 'Aquarius', 'Pisces'),
  "observed" = zodiac,
  'expected' = expected
)

# H0 = semua zodiac terdistribusi dengan baik.  
chisq = sum((zodiac-expected)^2 / expected)
pchisq(q = chisq, df = 12-1, lower.tail=F)
## [1] 0.9265414

H0 diterima, zodiaknya berdistribusi dengan adil atau evenly distributed.