Back in 2019, university and high school students at HSE were casting votes for the headliner of the big graduation show. The results were presented in a neat picture like this https://www.hse.ru/news/life/264225010.html
We are going to turn these results into a table and find out which artists were closer to which group of students.
As part of the exercise, we will add the age of the lead singer as the third row. It will most probably be unrelated to the real votes, and thus should create another dimension. (Otherwise there would be no map but a line as the minimum number of dimensions = MIN(row, column) - 1.)
vd <- matrix(c(1272,894, 797, 748, 781, 731, 693, 433, 461, 392,
136, 176, 169, 163, 117, 146, 169, 151, 87, 126,
57, 37, 34, 37, 50, 46, 24, 33, 32, 24) , nrow = 3, byrow = T)
coln <- c('VM', 'LB', 'Max_K', 'Ox', 'Bi2', 'Zemf', 'Mone', 'LSP', 'Skr', 'TB')
rown <- c('university', 'high_school', 'age')
colnames(vd) <- coln
row.names(vd) <- rown
vd <- as.table(vd)
vd
## VM LB Max_K Ox Bi2 Zemf Mone LSP Skr TB
## university 1272 894 797 748 781 731 693 433 461 392
## high_school 136 176 169 163 117 146 169 151 87 126
## age 57 37 34 37 50 46 24 33 32 24
chisq.test(vd)
##
## Pearson's Chi-squared test
##
## data: vd
## X-squared = 140.39, df = 18, p-value < 2.2e-16
Now we see the cells where counts are most different from the expected frequencies.
chisq.test(vd)$res # no standardization
## VM LB Max_K Ox Bi2
## university 2.97451955 0.32707154 -0.06376266 -0.33666224 0.86253374
## high_school -6.40563493 -0.06059180 0.73461199 0.94183332 -2.79651159
## age -0.48371751 -1.31637585 -1.16165678 -0.37071858 1.70233408
## Zemf Mone LSP Skr TB
## university -0.23180965 -0.55401602 -2.69638035 -0.10709807 -1.96807958
## high_school -0.11678249 2.31104497 5.28410689 -0.58550405 4.23833950
## age 1.24638901 -2.10359772 1.46384820 1.61885423 0.31990181
chisq.test(vd)$stdres # standardized for the size of cells
## VM LB Max_K Ox Bi2
## university 7.24619229 0.77853335 -0.15075885 -0.79342654 2.03277079
## high_school -7.63578811 -0.07057436 0.84991028 1.08613804 -3.22498426
## age -0.53987888 -1.43557501 -1.25836245 -0.40028396 1.83809782
## Zemf Mone LSP Skr TB
## university -0.54547142 -1.30068553 -6.22820108 -0.24683606 -4.52578258
## high_school -0.13446736 2.65495426 5.97243901 -0.66032165 4.76919886
## age 1.34371019 -2.26268548 1.54913413 1.70940997 0.33703829
library(corrplot)
corrplot(chisq.test(vd)$stdres, is.corr=FALSE)
library(FactoMineR)
my.ca <- CA(vd, graph = T)
summary(my.ca)
##
## Call:
## CA(X = vd, graph = T)
##
## The chi square of independence between the two variables is equal to 140.3939 (p-value = 5.377263e-21 ).
##
## Eigenvalues
## Dim.1 Dim.2
## Variance 0.014 0.002
## % of var. 87.249 12.751
## Cumulative % of var. 87.249 100.000
##
## Rows
## Iner*1000 Dim.1 ctr cos2 Dim.2 ctr cos2
## university | 2.366 | -0.054 16.952 0.973 | -0.009 3.168 0.027 |
## high_school | 11.298 | 0.266 83.012 0.998 | -0.011 1.016 0.002 |
## age | 1.907 | 0.011 0.036 0.003 | 0.214 95.816 0.997 |
##
## Columns
## Iner*1000 Dim.1 ctr cos2 Dim.2 ctr cos2
## VM | 5.558 | -0.185 40.808 0.997 | -0.009 0.712 0.003 |
## LB | 0.204 | -0.006 0.038 0.025 | -0.040 10.040 0.975 |
## Max_K | 0.210 | 0.021 0.370 0.240 | -0.038 8.042 0.760 |
## Ox | 0.126 | 0.032 0.800 0.861 | -0.013 0.884 0.139 |
## Bi2 | 1.271 | -0.093 6.728 0.719 | 0.058 17.994 0.281 |
## Zemf | 0.180 | 0.000 0.000 0.000 | 0.042 9.054 1.000 |
## Mone | 1.117 | 0.077 4.295 0.522 | -0.074 26.878 0.478 |
## LSP | 4.141 | 0.240 28.925 0.949 | 0.056 10.640 0.051 |
## Skr | 0.330 | -0.019 0.172 0.071 | 0.069 15.444 0.929 |
## TB | 2.433 | 0.201 17.865 0.997 | 0.010 0.311 0.003 |
library(factoextra)
fviz_screeplot(my.ca, addlabels = TRUE)
fviz_ca_biplot(my.ca, repel = TRUE)
Asymmetric plots
“If the angle between two arrows is acute, then their is a strong association between the corresponding row and column.
To interpret the distance between rows and and a column you should perpendicularly project row points on the column arrow.” http://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/113-ca-correspondence-analysis-in-r-essentials/
fviz_ca_biplot(my.ca, repel = TRUE,
map = "rowprincipal",
arrow = c(TRUE, TRUE))
Which artists contribute the most to the observed differences?
What is the meaning of the two dimensions here?
Compare the results of the chi-squared test and the correspondence map.
Bourdieu, 1984[1979] Distinction: pp.262, 266 https://monoskop.org/images/e/e0/Pierre_Bourdieu_Distinction_A_Social_Critique_of_the_Judgement_of_Taste_1984.pdf
https://radar-research.ru/tpost/r3uojhrfx1-analiz-sootvetstvii
Shafir, 2006: pp.61, 64, 69, 78, 83 Link: https://statmod.ru/wiki/_media/study:fall2015:5stat_lecture:shafir_m._-prostoii_i_mnojestvennyiiii_analiz_sootvetstviii_kak_metod_razvedochnogo_analiza_dannyiih._vyiipysknaea_kvalifikats.pdf