Chi-square Read this: https://www.r-bloggers.com/chi-square-test-of-independence-in-r/

Should pubs be open 24/7?

This data is fictional!

Table <- matrix(c(120, 80, 90, 110), nrow=2)
row.names(Table) <- c("Male","Female")
colnames(Table) <- c("Yes","No")
Table
##        Yes  No
## Male   120  90
## Female  80 110
chisq.test(Table)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  Table
## X-squared = 8.4311, df = 1, p-value = 0.003689
firstchi <- chisq.test(Table)

The result of chisq.test() function is a list containing the following components:

str(firstchi)
## List of 9
##  $ statistic: Named num 8.43
##   ..- attr(*, "names")= chr "X-squared"
##  $ parameter: Named int 1
##   ..- attr(*, "names")= chr "df"
##  $ p.value  : num 0.00369
##  $ method   : chr "Pearson's Chi-squared test with Yates' continuity correction"
##  $ data.name: chr "Table"
##  $ observed : num [1:2, 1:2] 120 80 90 110
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:2] "Male" "Female"
##   .. ..$ : chr [1:2] "Yes" "No"
##  $ expected : num [1:2, 1:2] 105 95 105 95
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:2] "Male" "Female"
##   .. ..$ : chr [1:2] "Yes" "No"
##  $ residuals: num [1:2, 1:2] 1.46 -1.54 -1.46 1.54
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:2] "Male" "Female"
##   .. ..$ : chr [1:2] "Yes" "No"
##  $ stdres   : num [1:2, 1:2] 3 -3 -3 3
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:2] "Male" "Female"
##   .. ..$ : chr [1:2] "Yes" "No"
##  - attr(*, "class")= chr "htest"

The format of the R code to use for getting these values is as follow: printing standardized residuals

firstchi$stdres
##              Yes        No
## Male    3.003757 -3.003757
## Female -3.003757  3.003757

If the chi squared test is significant we need to take a look at residuals.

If the value of standardized residual is lower than -2 it means that the cell contains fewer observations that it was expected (the case of variables independence). If the value of standardized residual is higher than 2 it means that the cell contains more observations that it was expected.

We can plot the results using assocplot() function

assocplot(t(Table), main="Residuals and number of observations")

library(corrplot)
## corrplot 0.84 loaded
corrplot(firstchi$stdres, is.cor = FALSE)

Positive residuals are in blue. Positive values in cells specify an attraction (positive association) between the corresponding row and column variables.

Negative residuals are in red. This implies a repulsion (negative association)between the corresponding row and column variables.

library(foreign)
library(sjPlot)
alcohol <- read.spss("C:/Users/lssi5/Dropbox/Data Analysis II (2018)/Lab 3 One-way ANOVA/alcohol.sav", use.value.labels = T, to.data.frame = T)

A basic version of the plot:

library(sjPlot)
sjp.xtab(alcohol$marital, alcohol$sex)

Customized version of the plot:

sjp.xtab(alcohol$marital, alcohol$sex, margin = "row", bar.pos = "stack",
         show.summary = TRUE, coord.flip = TRUE)

chi <- chisq.test(alcohol$sex, alcohol$marital)
chi
## 
##  Pearson's Chi-squared test
## 
## data:  alcohol$sex and alcohol$marital
## X-squared = 1151.3, df = 5, p-value < 2.2e-16
Table <- table(alcohol$sex, alcohol$marital)
assocplot(t(Table))

library(corrplot)
corrplot(chi$stdres, is.cor = FALSE)

Argentina example

data <- read.csv("C:/Users/lssi5/YandexDisk/teaching/DA MA 2018/dontcryformeargentina.csv")

So, we already tried today to plot the relationship between subjective class and level of education.

Let’s try a formal test (i.e. chi-squared)

chisq.test(data$educf, data$V238)
## Warning in chisq.test(data$educf, data$V238): Chi-squared approximation may
## be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  data$educf and data$V238
## X-squared = 92.078, df = 6, p-value < 2.2e-16
argchi <- chisq.test(data$educf, data$V238)
## Warning in chisq.test(data$educf, data$V238): Chi-squared approximation may
## be incorrect
argchi$observed
##            data$V238
## data$educf  Lower class Lower middle class Upper middle class
##   primary            57                 94                  2
##   secondary          44                320                  8
##   tertiary            0                 38                  7
##            data$V238
## data$educf  Working class
##   primary              84
##   secondary           177
##   tertiary             11
counts <- table(data$educf, data$V238)
barplot(counts, 
        legend = rownames(counts), las = 2)

data1 <- data
table(data1$V238)
## 
##        Lower class Lower middle class Upper middle class 
##                105                562                 34 
##      Working class 
##                306
data1$V238[ data1$V238 == "Upper middle class"] <- NA
argchi1 <- chisq.test(data1$educf, data1$V238)

No warning! Yay! (we also can unite 2 categories to avoid warnings in chi-squared)

argchi1
## 
##  Pearson's Chi-squared test
## 
## data:  data1$educf and data1$V238
## X-squared = 59.259, df = 4, p-value = 4.151e-12
argchi1$stdres
##            data1$V238
## data1$educf Lower class Lower middle class Working class
##   primary     6.6436471         -5.3860589     1.0700456
##   secondary  -4.9700666          3.4742965    -0.2129345
##   tertiary   -2.6957869          3.3010847    -1.6152749

Let’s remake the plot

library(RColorBrewer)
counts <- table(data1$educf, data1$V238)
barplot(counts, 
        col=brewer.pal(n = 3, name = "Set2"), 
        legend = rownames(counts), las = 2)

Hmmm… We still have “Upper middle class” as an empty category. Let’s drop that level.

data1$V238 <- droplevels(data1$V238)
pal <- brewer.pal(n = 3, name = "Set2")
counts <- table(data1$educf, data1$V238)
barplot(counts, 
        col = pal, 
        legend = rownames(counts), las = 2)

Let’s make bars of the same size to show the difference in proportions

par(mar = c(10, 4, 5, 1), xpd = T)
barplot(prop.table(counts, 2)*100, col = pal, 
        las = 2)
legend(1, 130, legend = rownames(counts), 
       fill = pal, horiz = T, bty = "n")

A similar plot using sjPlot package

sjp.xtab(data1$V238, data1$educf, margin = "row", bar.pos = "stack",
         show.summary = TRUE)

A really bad chi-squared example. Don’t do that!

Let’s find how gender is associated with the level of satisfaction with life (V23, 11 categories)

barplot(table(data$gndr))

barplot(table(data$V23))

chisq.test(data$gndr, data$V23)
## Warning in chisq.test(data$gndr, data$V23): Chi-squared approximation may
## be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  data$gndr and data$V23
## X-squared = 6.203, df = 9, p-value = 0.7194
chisq.test(data$gndr, data$V23)$observed
## Warning in chisq.test(data$gndr, data$V23): Chi-squared approximation may
## be incorrect
##          data$V23
## data$gndr   2   3   4   5   6   7   8   9 Completely dissatisfied
##    Female   2   6  14  41  56 124 162  77                       3
##    Male     2   5  16  20  54 114 145  72                       4
##          data$V23
## data$gndr Completely satisfied
##    Female                   56
##    Male                     47
chisq.test(data$gndr, data$V23)$expected
## Warning in chisq.test(data$gndr, data$V23): Chi-squared approximation may
## be incorrect
##          data$V23
## data$gndr        2        3        4        5        6        7        8
##    Female 2.121569 5.834314 15.91176 32.35392 58.34314 126.2333 162.8304
##    Male   1.878431 5.165686 14.08824 28.64608 51.65686 111.7667 144.1696
##          data$V23
## data$gndr        9 Completely dissatisfied Completely satisfied
##    Female 79.02843                3.712745             54.63039
##    Male   69.97157                3.287255             48.36961
corrplot(chisq.test(data$gndr, data$V23)$stdres, is.cor = FALSE)
## Warning in chisq.test(data$gndr, data$V23): Chi-squared approximation may
## be incorrect