the summary of:

Levshina,N.(2015). 19.Exemplars, categories, prototypes: simple and multiple correpondence analysis, How to do Linguistics with R:Data exploration and statistical analysis(pp 367-385), John Benjamins Publishing Company


19.1 Register variation of Basic Colour Terms: Simple Correspondence Analysis

19.1.1 The data and hypothesis

Add-on packages

library(Rling);library(vcd);library(ca);library(rgl)

The dataset colreg

data(colreg)
colreg
##        spoken fiction academic press
## black   20335   41118    26892 73080
## blue     4693   22093     3605 21210
## brown    1185   10914     1201 11539
## gray     1168   12140     1289  6559
## green    3860   14398     4477 26837
## orange    931    3496      474  5766
## pink      962    7312      584  6356
## purple    613    3366      429  3403
## red      7230   25111     5621 34596
## white   14474   40745    26336 54883
## yellow   1349   10553     1855 10382

mosaic plot

mosaicplot(colreg, las=2, shade=TRUE, main="Register variation of BCT")

blue-shaded rectangles: overpresented in a given register pink- and red-shaded rectangles: underrepresented in a given register


19.1.2 Simple Correspondence Analysis

Corresponence Analysis(CA)

ca.bc<-ca(colreg)
summary(ca.bc,rows=FALSE,columns=FALSE)
## 
## Principal inertias (eigenvalues):
## 
##  dim    value      %   cum%   scree plot               
##  1      0.043730  77.9  77.9  *******************      
##  2      0.010787  19.2  97.1  *****                    
##  3      0.001650   2.9 100.0  *                        
##         -------- -----                                 
##  Total: 0.056167 100.0

Two-dimensional CA map

plot(ca.bc,col.lab ="black")


Interpretation of the CA map


Interpretation of symmetric(default)CA maps

It is easy to misinterpret a CA map. To be on the safe side, follow these rules:

  • Row-to-Row distances on the CA map represent the approximate \(\chi\)2-distances between the row profiles
  • Column-to-column distances on the CA map represent the approximate \(\chi\)2-distances between the column profiles
  • There is no direct interpretation of row-to-column or colunm-to-row distances
    • Interpret the dimensions first
    • and then examine how the profiles are located with regard to the dimensions of variation(Greenacre 2007:72)`

Plot all three dimensions

  • plot3d() in package rgl
  • labels=c(1,1) : both row and column profiles should be shown as text labels
plot3d(ca.bc,labels=c(1,1))

You must enable Javascript to view this page properly.


To summarize

  • Most secondary BCT cluster together in the same part of the plot where one finds fiction
    • blue and yellow are the closest to the secondary terms
      • in Berlin and Kay’s(1969) hierarchy

  • The location of green and red is relatively high on Dimension 2
    • the position of newspaper and magazine texts

19.2 Visualization of exemplars and prototypes of lexical categories
Multiple Correspondence Analysis of Stuhl and Sessel

Add-on packages

library(Rling);library(FactoMineR);library(ca);library(rms)

Prototype Theory(e.g.Rosch 1975, Rosch & Mervis 1975)


Data Structure

data(chairs)
str(chairs)
## 'data.frame':    188 obs. of  19 variables:
##  $ Shop        : Factor w/ 3 levels "ikea.de","Moebel-Profi.de",..: 2 1 1 2 1 3 1 3 1 1 ...
##  $ WordDE      : Factor w/ 44 levels "3-in-1-Sessel",..: 2 17 38 41 23 13 25 15 40 40 ...
##  $ Category    : Factor w/ 2 levels "Sessel","Stuhl": 2 2 1 2 2 2 2 1 2 2 ...
##  $ Function    : Factor w/ 5 levels "Eat","NotSpec",..: 1 1 2 1 1 5 2 4 1 1 ...
##  $ Age         : Factor w/ 2 levels "Adult","Children": 1 2 1 1 2 1 1 1 1 1 ...
##  $ Back        : Factor w/ 4 levels "Adjust","High",..: 3 4 4 2 2 2 4 2 4 4 ...
##  $ Soft        : Factor w/ 3 levels "No","Pad","Yes": 1 1 1 3 1 3 1 3 1 1 ...
##  $ Arms        : Factor w/ 2 levels "No","Yes": 1 1 2 1 2 2 1 2 1 1 ...
##  $ Upholst     : Factor w/ 2 levels "No","Yes": 1 1 1 2 1 2 1 2 1 2 ...
##  $ MaterialSeat: Factor w/ 10 levels "Fabric","Leather",..: 6 10 8 1 6 1 10 2 10 1 ...
##  $ SeatHeight  : Factor w/ 3 levels "Adjust","High",..: 3 2 3 3 2 1 3 3 3 3 ...
##  $ SeatDepth   : Factor w/ 3 levels "Adjust","Deep",..: 3 3 3 3 3 2 3 2 3 3 ...
##  $ Swivel      : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 2 1 1 1 1 ...
##  $ Roll        : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 2 1 1 1 1 ...
##  $ Rock        : Factor w/ 2 levels "No","Rock": 1 1 1 1 1 1 1 1 1 1 ...
##  $ AddFunctions: Factor w/ 3 levels "Bed","No","Table": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Recline     : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ ReclineBack : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 2 1 1 ...
##  $ SaveSpace   : Factor w/ 3 levels "collapse","No",..: 2 2 3 2 2 2 1 2 2 2 ...
swivelRoll<-xtabs(~ chairs$Swivel+chairs$Roll)
swivelRoll
##              chairs$Roll
## chairs$Swivel  No Yes
##           No  133   1
##           Yes  14  40
chisq.test(swivelRoll)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  swivelRoll
## X-squared = 117.1, df = 1, p-value < 2.2e-16

19.2.2 Multiple Correspondence Analysis

Create a map

chairs.ca<-MCA(chairs[,-c(1:3)],graph=FALSE)
plot(chairs.ca, cex=0.7, col.var="black", col.ind="grey")


The contributions of different variables : Dimension 1

dimdesc(chairs.ca,axes=1)
## $`Dim 1`
## $`Dim 1`$quali
##                      R2      p.value
## Upholst      0.72940952 1.094774e-54
## MaterialSeat 0.74518860 3.215782e-48
## Function     0.69158437 1.158923e-45
## Soft         0.66568141 9.657154e-45
## Swivel       0.40875670 5.393205e-23
## Roll         0.38348403 2.728416e-21
## SeatHeight   0.39565748 5.870717e-21
## Back         0.36654364 3.802707e-18
## Arms         0.21473392 2.133731e-11
## SeatDepth    0.20909906 3.769585e-10
## SaveSpace    0.19444992 2.058545e-09
## Age          0.06521465 4.047690e-04
## ReclineBack  0.06368029 4.764098e-04
## Recline      0.04908474 2.246446e-03
## 
## $`Dim 1`$category
##                        Estimate      p.value
## Upholst_No          0.508362663 1.094774e-54
## Soft_No             0.388494836 5.840206e-41
## Swivel_No           0.402810868 5.393205e-23
## Roll_No             0.427504866 2.728416e-21
## Wood                0.293104780 4.464862e-16
## NotSpec             0.643951013 4.899545e-13
## Eat                 0.234142965 5.684581e-13
## Back_Mid            0.358373579 1.779517e-12
## SeatHeight_Norm     0.143680221 2.417614e-12
## Arms_No             0.264219623 2.133731e-11
## SeatDepth_Norm      0.447005585 6.603795e-10
## Plastic             0.346541022 1.681653e-08
## SaveSpace_stack     0.267052677 4.199531e-07
## Children            0.236113583 4.047690e-04
## ReclineBack_No      0.163875900 4.764098e-04
## SaveSpace_collapse  0.252603082 1.181098e-03
## Rattan              0.124400133 1.696248e-03
## Recline_No          0.182956176 2.246446e-03
## SeatHeight_High     0.538230447 4.514663e-03
## Back_Low            0.737133741 6.215202e-03
## Recline_Yes        -0.182956176 2.246446e-03
## ReclineBack_Yes    -0.163875900 4.764098e-04
## Adult              -0.236113583 4.047690e-04
## SeatDepth_Adjust   -0.437445200 1.207452e-04
## Back_Adjust        -0.875345607 1.096450e-05
## Relax              -0.463574619 8.503520e-06
## SeatDepth_Deep     -0.009560385 8.078739e-06
## Fabric             -0.704604631 6.260860e-10
## SaveSpace_No       -0.519655759 2.401817e-10
## Leather            -0.835048584 8.166132e-11
## Back_High          -0.220161713 5.595166e-11
## Arms_Yes           -0.264219623 2.133731e-11
## Work               -0.714944787 5.944443e-16
## SeatHeight_Adjust  -0.681910668 7.980545e-21
## Roll_Yes           -0.427504866 2.728416e-21
## Swivel_Yes         -0.402810868 5.393205e-23
## Soft_Yes           -0.575087143 9.455970e-46
## Upholst_Yes        -0.508362663 1.094774e-54

The contributions of different variables : Dimension 2

dimdesc(chairs.ca,axes=2)
## $`Dim 2`
## $`Dim 2`$quali
##                      R2      p.value
## Function     0.78584122 4.217810e-60
## SeatDepth    0.64716353 1.414314e-42
## ReclineBack  0.57438079 2.422581e-36
## SeatHeight   0.52515707 1.204684e-30
## Roll         0.46545580 4.290655e-27
## Recline      0.27597436 9.940716e-15
## Swivel       0.26129650 6.598620e-14
## Back         0.14372484 2.682914e-06
## AddFunctions 0.11015232 2.049744e-05
## Upholst      0.05391569 1.343894e-03
## Arms         0.05093462 1.845045e-03
## Rock         0.05020680 1.993581e-03
## MaterialSeat 0.10623070 1.571677e-02
## Age          0.02829025 2.103903e-02
## Soft         0.03925557 2.461661e-02
## 
## $`Dim 2`$category
##                      Estimate      p.value
## ReclineBack_No     0.43818645 2.422581e-36
## SeatHeight_Adjust  0.42064264 1.263155e-28
## Roll_Yes           0.41932722 4.290655e-27
## Work               0.52198563 1.502005e-24
## SeatDepth_Norm     0.04767135 6.407920e-17
## Recline_No         0.38623766 9.940716e-15
## Swivel_Yes         0.28673582 6.598620e-14
## SeatDepth_Adjust   0.69693231 8.045886e-08
## Back_Adjust        0.78717617 1.046328e-07
## AddFunctions_No    0.17691154 4.792253e-04
## Upholst_No         0.12305293 1.343894e-03
## Arms_No            0.11456915 1.845045e-03
## Rock_No            0.39410033 1.993581e-03
## Wood               0.19020718 3.350552e-03
## Children           0.13845645 2.103903e-02
## Soft_No            0.01136842 2.286251e-02
## Plastic            0.20449930 3.735216e-02
## Fabric            -0.15465893 2.219771e-02
## Adult             -0.13845645 2.103903e-02
## Soft_Yes          -0.17784951 8.232290e-03
## Rock_Rock         -0.39410033 1.993581e-03
## Arms_Yes          -0.11456915 1.845045e-03
## Upholst_Yes       -0.12305293 1.343894e-03
## AddFunctions_Bed  -0.69031884 5.405031e-06
## Swivel_No         -0.28673582 6.598620e-14
## Recline_Yes       -0.38623766 9.940716e-15
## Roll_No           -0.41932722 4.290655e-27
## SeatHeight_Norm   -0.46315824 3.007332e-30
## SeatDepth_Deep    -0.74460367 2.505087e-36
## ReclineBack_Yes   -0.43818645 2.422581e-36
## Relax             -0.63329716 2.835996e-45

How are these differences related to the lexical categories under investigation, Stuhl and Sessel?

chairs.ca1<-MCA(chairs[,-c(1:2)],quali.sup=1,graph=FALSE)
plot(chairs.ca1, invis="ind", col.var="darkgrey",col.quali.sup="black")


95% confidence ellipses

plotellipses(chairs.ca1, keepvar=1, label="quali")

plotellipses(chairs.ca1, means=FALSE, keepvar=1, label="quali")


Eigenvalues

head(chairs.ca$eig,11)
##        eigenvalue percentage of variance cumulative percentage of variance
## dim 1  0.32507257              15.297533                          15.29753
## dim 2  0.25767552              12.125907                          27.42344
## dim 3  0.13519020               6.361892                          33.78533
## dim 4  0.12293223               5.785046                          39.57038
## dim 5  0.10891028               5.125190                          44.69557
## dim 6  0.09618531               4.526367                          49.22193
## dim 7  0.09019392               4.244420                          53.46635
## dim 8  0.08619851               4.056401                          57.52275
## dim 9  0.08165427               3.842554                          61.36531
## dim 10 0.07264654               3.418661                          64.78397
## dim 11 0.07066398               3.325364                          68.10933

Adjusted MCA

chairs.ca2<-mjca(chairs[,-c(1:3)])
summary(chairs.ca2,rows=FALSE,columns=FALSE)
## 
## Principal inertias (eigenvalues):
## 
##  dim    value      %   cum%   scree plot               
##  1      0.078443  47.1  47.1  **************           
##  2      0.043342  26.0  73.2  ********                 
##  3      0.006012   3.6  76.8  *                        
##  4      0.004155   2.5  79.3  *                        
##  5      0.002451   1.5  80.8                           
##  6      0.001291   0.8  81.5                           
##  7      0.000873   0.5  82.1                           
##  8      0.000639   0.4  82.4                           
##  9      0.000417   0.3  82.7                           
##  10     0.000117   0.1  82.8                           
##  11     7.6e-050   0.0  82.8                           
##  12     1e-05000   0.0  82.8                           
##         -------- -----                                 
##  Total: 0.166428

A correlation analysis of the coordinates of features of the first two dimensions

cor(chairs.ca$var$coord[,1],chairs.ca2$colcoord[,1])
## [1] 1
cor(chairs.ca$var$coord[,2],chairs.ca2$colcoord[,2])
## [1] -1

Reducing the correlated variables to a smaller set of underlying dimensions

dim1<- chairs.ca$ind$coord[,1] #coordinates of individual examplars on the horizontal axis
dim2<- chairs.ca$ind$coord[,2] #the same for the vertical axis
m<-lrm(chairs$Category ~dim1+dim2)
m
## Logistic Regression Model
##  
##  lrm(formula = chairs$Category ~ dim1 + dim2)
##  
##                        Model Likelihood     Discrimination    Rank Discrim.    
##                           Ratio Test           Indexes           Indexes       
##  Obs           188    LR chi2     118.82    R2       0.643    C       0.921    
##   Sessel        67    d.f.             2    g        2.667    Dxy     0.842    
##   Stuhl        121    Pr(> chi2) <0.0001    gr      14.394    gamma   0.844    
##  max |deriv| 2e-06                          gp       0.386    tau-a   0.388    
##                                             Brier    0.094                     
##  
##            Coef   S.E.   Wald Z Pr(>|Z|)
##  Intercept 0.9833 0.2448 4.02   <0.0001 
##  dim1      2.1780 0.5319 4.09   <0.0001 
##  dim2      3.9151 0.5377 7.28   <0.0001 
## 

How to report result of Correspondence Analysis