Load all the libraries or functions that you will use to for the rest of the assignment. It is helpful to define your libraries and functions at the top of a report, so that others can know what they need for the report to compile correctly.
library(ca)
## Warning: package 'ca' was built under R version 3.5.3
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.5.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(FactoMineR)
## Warning: package 'FactoMineR' was built under R version 3.5.3
Women and metonymy in Ancient Chinese: the data concerns metonymic patterns that were used to refer to women in texts of the Ming dynasty in China (1368 â 1644). The rows are different types of female referents, namely, imperial woman (queen or emperor’s concubine), servant girl, beautiful woman, mother or grandmother, unchaste woman (prostitute or mistress), young girl, wife (or concubine). The columns are six metonymic patterns:
Import the data and create a mosaic plot to visualize the differences in usage across women references.
# Lets load the data
chi_names <- read.csv("chinese_names.csv")
rownames(chi_names) <- chi_names[,1]
chi_names <- chi_names[,-1]
# Next lets run mosaic plot
mosaicplot(chi_names,
las = 2,
shade = T,
main = "Metonymic patterns")
Run a simple correspondence analysis on the data.
mod_scorranalysis <- ca(chi_names)
summary(mod_scorranalysis)
##
## Principal inertias (eigenvalues):
##
## dim value % cum% scree plot
## 1 0.762754 79.1 79.1 ********************
## 2 0.166546 17.3 96.4 ****
## 3 0.021321 2.2 98.6 *
## 4 0.012843 1.3 100.0
## 5 0.000249 0.0 100.0
## -------- -----
## Total: 0.963714 100.0
##
##
## Rows:
## name mass qlt inr k=1 cor ctr k=2 cor ctr
## 1 | Impr | 133 982 57 | -638 981 71 | 20 1 0 |
## 2 | Btfl | 250 999 499 | 1363 965 608 | 254 34 97 |
## 3 | Mthr | 30 80 18 | -207 75 2 | -53 5 1 |
## 4 | Unch | 64 998 194 | 824 231 57 | -1503 768 862 |
## 5 | Yong | 26 407 27 | 422 176 6 | 483 230 36 |
## 6 | Wife | 498 995 205 | -627 991 257 | 37 4 4 |
##
## Columns:
## name mass qlt inr k=1 cor ctr k=2 cor ctr
## 1 | Actn | 32 65 18 | -186 65 1 | 20 1 0 |
## 2 | Bdyp | 106 1000 282 | 1221 580 206 | -1039 420 684 |
## 3 | Lctn | 628 998 262 | -634 998 331 | -2 0 0 |
## 4 | Clth | 79 795 65 | 687 589 49 | 406 206 78 |
## 5 | Chrc | 77 999 174 | 1397 900 198 | 462 98 99 |
## 6 | Psss | 78 976 199 | 1450 856 215 | 544 120 139 |
What do the inertia values tell you about the dimensionality of the data?
Create a 2D plot of the data.
plot(mod_scorranalysis)
What can you tell about the word usage from examining this plot?
The data included is from a large project examining the definitions of words, thus, exploring their category requirements. The following columns are included:
Run a multiple correspondence analysis on the data, excluding the cue column.
dat_mcanalysis <- read.csv("mca_data.csv")
mod_mcanalysis <- MCA(dat_mcanalysis[,-1], #first column subtraction
graph = FALSE)
summary(mod_mcanalysis)
##
## Call:
## MCA(X = dat_mcanalysis[, -1], graph = FALSE)
##
##
## Eigenvalues
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6
## Variance 0.410 0.377 0.350 0.286 0.258 0.234
## % of var. 5.855 5.381 4.999 4.082 3.680 3.340
## Cumulative % of var. 5.855 11.237 16.236 20.318 23.999 27.338
## Dim.7 Dim.8 Dim.9 Dim.10 Dim.11 Dim.12
## Variance 0.225 0.219 0.213 0.208 0.206 0.203
## % of var. 3.215 3.126 3.036 2.968 2.943 2.903
## Cumulative % of var. 30.554 33.679 36.715 39.683 42.626 45.529
## Dim.13 Dim.14 Dim.15 Dim.16 Dim.17 Dim.18
## Variance 0.201 0.201 0.200 0.200 0.200 0.200
## % of var. 2.875 2.872 2.864 2.862 2.858 2.855
## Cumulative % of var. 48.404 51.276 54.140 57.002 59.861 62.716
## Dim.19 Dim.20 Dim.21 Dim.22 Dim.23 Dim.24
## Variance 0.200 0.198 0.196 0.196 0.193 0.191
## % of var. 2.850 2.832 2.806 2.799 2.764 2.725
## Cumulative % of var. 65.566 68.399 71.205 74.003 76.767 79.492
## Dim.25 Dim.26 Dim.27 Dim.28 Dim.29 Dim.30
## Variance 0.188 0.183 0.173 0.169 0.152 0.138
## % of var. 2.681 2.615 2.478 2.416 2.165 1.966
## Cumulative % of var. 82.173 84.787 87.266 89.682 91.847 93.813
## Dim.31 Dim.32 Dim.33 Dim.34 Dim.35
## Variance 0.122 0.110 0.089 0.070 0.043
## % of var. 1.738 1.575 1.269 0.994 0.611
## Cumulative % of var. 95.551 97.126 98.395 99.389 100.000
##
## Individuals (the 10 first)
## Dim.1 ctr cos2 Dim.2 ctr cos2
## 1 | 0.140 0.000 0.008 | -0.168 0.000 0.012 |
## 2 | -0.406 0.001 0.057 | 0.785 0.005 0.214 |
## 3 | -0.914 0.006 0.210 | 0.845 0.005 0.180 |
## 4 | -0.307 0.001 0.017 | 0.709 0.004 0.091 |
## 5 | 1.750 0.021 0.267 | 1.803 0.024 0.284 |
## 6 | 0.090 0.000 0.008 | -0.331 0.001 0.108 |
## 7 | 0.736 0.004 0.256 | 0.093 0.000 0.004 |
## 8 | -0.130 0.000 0.009 | -0.794 0.005 0.321 |
## 9 | 0.108 0.000 0.007 | -0.562 0.002 0.197 |
## 10 | 1.750 0.021 0.267 | 1.803 0.024 0.284 |
## Dim.3 ctr cos2
## 1 -0.339 0.001 0.049 |
## 2 0.031 0.000 0.000 |
## 3 0.124 0.000 0.004 |
## 4 0.622 0.003 0.070 |
## 5 -1.882 0.029 0.309 |
## 6 -0.383 0.001 0.144 |
## 7 0.524 0.002 0.129 |
## 8 -0.525 0.002 0.140 |
## 9 -0.111 0.000 0.008 |
## 10 -1.882 0.029 0.309 |
##
## Categories (the 10 first)
## Dim.1 ctr cos2 v.test Dim.2
## pos_cue_adjective | 0.436 1.281 0.030 32.859 | 0.104
## pos_cue_noun | -0.111 0.436 0.032 -33.510 | -0.107
## pos_cue_other | 0.677 0.458 0.010 18.432 | 0.697
## pos_cue_verb | 0.049 0.014 0.000 3.442 | 0.393
## pos_feature_adjective | 0.928 9.803 0.262 96.383 | -0.038
## pos_feature_noun | -0.097 0.208 0.008 -16.577 | -0.780
## pos_feature_other | 2.499 13.357 0.286 100.736 | 2.203
## pos_feature_verb | -1.039 14.316 0.403 -119.490 | 0.972
## pos_translated_adjective | 1.013 8.972 0.224 89.111 | -0.018
## pos_translated_noun | -0.032 0.024 0.001 -5.727 | -0.579
## ctr cos2 v.test Dim.3 ctr
## pos_cue_adjective 0.079 0.002 7.810 | 0.627 3.098
## pos_cue_noun 0.439 0.029 -32.253 | -0.108 0.480
## pos_cue_other 0.528 0.010 18.971 | -0.561 0.368
## pos_cue_verb 1.012 0.022 27.764 | 0.022 0.003
## pos_feature_adjective 0.018 0.000 -3.946 | 1.123 16.834
## pos_feature_noun 14.558 0.500 -133.062 | -0.484 6.033
## pos_feature_other 11.299 0.223 88.821 | -2.379 14.184
## pos_feature_verb 13.617 0.352 111.722 | 0.222 0.764
## pos_translated_adjective 0.003 0.000 -1.573 | 0.860 7.579
## pos_translated_noun 8.407 0.300 -103.194 | -0.215 1.246
## cos2 v.test
## pos_cue_adjective 0.063 47.210 |
## pos_cue_noun 0.030 -32.503 |
## pos_cue_other 0.007 -15.274 |
## pos_cue_verb 0.000 1.531 |
## pos_feature_adjective 0.384 116.710 |
## pos_feature_noun 0.192 -82.559 |
## pos_feature_other 0.260 -95.919 |
## pos_feature_verb 0.018 25.509 |
## pos_translated_adjective 0.162 75.680 |
## pos_translated_noun 0.041 -38.283 |
##
## Categorical variables (eta2)
## Dim.1 Dim.2 Dim.3
## pos_cue | 0.045 0.039 0.069 |
## pos_feature | 0.772 0.744 0.662 |
## pos_translated | 0.638 0.497 0.471 |
## a1 | 0.542 0.491 0.372 |
## a2 | 0.053 0.112 0.176 |
Plot the variables in a 2D graph. Use invis = "ind"
rather than col.ind = "gray"
so you can read the plot better.
plot(mod_mcanalysis, cex = 0.7,
col.var = "black", #variable name shades
invis = "ind") #indicators shade
Use the dimdesc
function to show the usefulness of the variables and to help you understand the results. Remember that the markdown preview doesn’t show you the whole output, use the console or knit to see the complete results.
dimdesc(mod_mcanalysis)
## $`Dim 1`
## $`Dim 1`$quali
## R2 p.value
## pos_cue 0.04486592 0
## pos_feature 0.77226714 0
## pos_translated 0.63752821 0
## a1 0.54220339 0
## a2 0.05251250 0
##
## $`Dim 1`$category
## Estimate p.value
## a1=a1_not 0.90844500 0.000000e+00
## a1=a1_none 0.40599205 0.000000e+00
## a1=a1_characteristic 0.44374893 0.000000e+00
## pos_translated=pos_translated_other 1.18985308 0.000000e+00
## pos_translated=pos_translated_adjective 0.22085451 0.000000e+00
## pos_feature=pos_feature_other 1.23323912 0.000000e+00
## pos_feature=pos_feature_adjective 0.22733731 0.000000e+00
## a2=a2_characteristic 0.75884682 7.624568e-275
## pos_cue=pos_cue_adjective 0.11113680 1.957209e-240
## a1=a1_magnitude 1.08403991 2.555276e-229
## pos_cue=pos_cue_other 0.26531878 3.192972e-76
## a1=a1_location 0.62809919 3.253969e-28
## a2=a2_actions_process 0.49732949 2.455727e-21
## a2=a2_past_tense 0.31004858 4.340918e-15
## a2=a2_magnitude 0.79477524 1.346548e-06
## a2=a2_not 0.30411117 2.336047e-03
## pos_cue=pos_cue_verb -0.13693653 5.767695e-04
## a1=a1_opposites_wrong -0.43767077 1.663935e-07
## pos_translated=pos_translated_noun -0.44810770 1.016708e-08
## a2=a2_third_person -0.79493852 4.336728e-21
## a1=a1_actions_process -0.07588789 3.209269e-27
## a1=a1_numbers -0.04384170 1.620015e-28
## a2=a2_none -0.01593500 8.436091e-36
## a1=a1_time -0.59287143 5.241091e-44
## a1=a1_person_object -0.15561204 2.577109e-55
## pos_feature=pos_feature_noun -0.42871100 6.045533e-62
## a2=a2_numbers -0.39197800 6.024356e-85
## pos_cue=pos_cue_noun -0.23951905 3.982554e-250
## a1=a1_third_person -0.77024497 0.000000e+00
## a1=a1_present_participle -0.63592049 0.000000e+00
## a1=a1_past_tense -0.65100137 0.000000e+00
## pos_translated=pos_translated_verb -0.96259989 0.000000e+00
## pos_feature=pos_feature_verb -1.03186543 0.000000e+00
##
##
## $`Dim 2`
## $`Dim 2`$quali
## R2 p.value
## pos_feature 0.74384040 0.00000e+00
## pos_translated 0.49728416 0.00000e+00
## a1 0.49145783 0.00000e+00
## a2 0.11215962 0.00000e+00
## pos_cue 0.03876337 1.57991e-303
##
## $`Dim 2`$category
## Estimate p.value
## a2=a2_none 0.46089226 0.000000e+00
## a1=a1_third_person 0.51146419 1.618226e-310
## a1=a1_present_participle 0.33965787 0.000000e+00
## a1=a1_past_tense 0.70251606 0.000000e+00
## a1=a1_none 0.22524679 0.000000e+00
## pos_translated=pos_translated_other 1.14565556 0.000000e+00
## pos_feature=pos_feature_verb 0.23468643 0.000000e+00
## pos_feature=pos_feature_other 0.99056909 0.000000e+00
## pos_cue=pos_cue_verb 0.07463552 1.685349e-171
## pos_cue=pos_cue_other 0.26098378 1.195717e-80
## a2=a2_characteristic 0.05931649 6.502988e-67
## a1=a1_time 0.47867566 1.532936e-27
## a2=a2_past_tense 0.82693211 1.668765e-26
## a1=a1_not 0.13369637 3.630435e-12
## a2=a2_third_person 0.76640629 1.546967e-05
## a1=a1_opposites_wrong 0.37650503 2.149196e-05
## a2=a2_actions_process 0.22706357 1.598349e-04
## a2=a2_present_participle 0.51464745 3.204520e-02
## a2=a2_location -0.46664326 4.141076e-02
## a2=a2_magnitude -0.05803628 2.635872e-03
## pos_feature=pos_feature_adjective -0.38500354 7.950096e-05
## a2=a2_time -1.06594529 1.312428e-06
## a2=a2_slang -1.28436035 5.451363e-10
## a1=a1_slang -0.75728119 2.233746e-13
## pos_cue=pos_cue_adjective -0.10306155 5.565805e-15
## a2=a2_person_object -0.44031429 1.315954e-48
## a1=a1_characteristic -0.20956663 5.073298e-164
## pos_cue=pos_cue_noun -0.23255775 1.362376e-231
## a1=a1_actions_process -0.43518917 1.178695e-297
## a2=a2_numbers -0.65840627 0.000000e+00
## a1=a1_person_object -0.78517764 0.000000e+00
## a1=a1_numbers -0.64618373 0.000000e+00
## pos_translated=pos_translated_verb -0.01655096 0.000000e+00
## pos_translated=pos_translated_noun -0.73664796 0.000000e+00
## pos_feature=pos_feature_noun -0.84025199 0.000000e+00
##
##
## $`Dim 3`
## $`Dim 3`$quali
## R2 p.value
## pos_cue 0.06910587 0
## pos_feature 0.66167384 0
## pos_translated 0.47105205 0
## a1 0.37233929 0
## a2 0.17557189 0
##
## $`Dim 3`$category
## Estimate p.value
## a2=a2_past_tense 1.37281862 0.000000e+00
## a1=a1_not 1.24512339 0.000000e+00
## a1=a1_magnitude 0.96468972 0.000000e+00
## a1=a1_characteristic 0.11159328 0.000000e+00
## pos_translated=pos_translated_noun 0.17816523 0.000000e+00
## pos_translated=pos_translated_adjective 0.81398871 0.000000e+00
## pos_feature=pos_feature_adjective 0.88895630 0.000000e+00
## pos_cue=pos_cue_adjective 0.37383401 0.000000e+00
## a2=a2_characteristic 0.29589869 8.324527e-291
## a2=a2_actions_process 0.97440704 1.682151e-183
## a1=a1_past_tense 0.13687947 7.273683e-176
## pos_feature=pos_feature_verb 0.35574385 7.674028e-145
## a2=a2_present_participle 0.65166458 7.341818e-139
## pos_translated=pos_translated_verb 0.40781191 1.994576e-107
## a2=a2_not 0.62677356 1.293838e-29
## a1=a1_opposites_wrong 0.75988965 1.924203e-28
## a2=a2_magnitude 0.90747258 2.046611e-18
## a1=a1_time 0.12088008 5.429633e-14
## a2=a2_third_person 0.04277301 1.224576e-09
## a2=a2_opposites_wrong 0.50126199 2.618666e-02
## a1=a1_location -0.33129161 7.810088e-03
## a2=a2_slang -1.58052637 1.335070e-05
## a1=a1_slang -0.82940872 2.110012e-10
## pos_cue=pos_cue_other -0.32882152 7.824926e-53
## a2=a2_numbers -0.95317928 1.870750e-180
## a1=a1_person_object -0.57595086 1.443391e-197
## pos_cue=pos_cue_noun -0.06090470 3.165924e-235
## a1=a1_none -0.36446676 7.539803e-258
## a2=a2_none -0.46049026 1.108574e-278
## a1=a1_numbers -0.61255479 0.000000e+00
## pos_translated=pos_translated_other -1.39996585 0.000000e+00
## pos_feature=pos_feature_other -1.18301264 0.000000e+00
## pos_feature=pos_feature_noun -0.06168751 0.000000e+00
What are the largest predictors (i.e., R^2 over .25) of the first dimension?
The largest predictors for R^2 values are : 1)R^2 = 0.77226714 2)a1-R^2 = 0.54220339
Looking at the category output for dimension one, what types of features does this appear to represent? (Try looking at the largest positive estimates to help distinguish what is represented by this dimension).
The dimesion reperesentation shows pos_feature = pos_feature_other, a1=a1_magnitude, a1=a1_not and a2=a2_magnitude
To view simple categories like we did in the lecture, try picking a view words out of the dataset that might be considered similar. I’ve shown how to do this below with three words, but feel free to pick your own. Change the words and the DF
to your dataframe name. We will overlay those as supplemental variables.
dat_mcanalysis<- dat_mcanalysis %>% rename("cue" = "ï..cue")
#pick any several interesting words
words = c("mom", "family", "relative")
mod2_mca = MCA(dat_mcanalysis[dat_mcanalysis$cue %in% words , ],
quali.sup = 1, #supplemental variable
graph = FALSE)
Create a 2D plot of your category analysis.
plot(mod2_mca, invis = "ind", col.var = "darkgray", col.quali.sup = "black")
Add the prototype ellipses to the plot.
plotellipses(mod2_mca, keepvar = 1, label = "quali")
Create a 95% CI type plot for the category.
plotellipses(mod2_mca, means = F, keepvar = 1, label = "quali")
What can you tell about the categories from these plots? Are they distinct or overlapping?
The confidence eclipses for mom , relative and family represnt fuzzy boundaries based on categories, so prototypes cannot be considered distinct entities.