The purpose of this script is to analyze the text of the pilot data
of the American’s Dream project.
For context, this is what we did:
Participants listed at least five guiding values for each of the
following three perspectives: US on paper (constitution); US in practice
(government); and personal ideal (if participants could design a country
from scratch).
Then, they defined each of the values.
And then, they rank-ordered them in terms of importance for each of the
perspectives.
Ok, before we start, let’s look at some demographics and political
ideology:
race | N | Perc |
---|---|---|
asian | 10 | 10 |
black | 12 | 12 |
hispanic | 7 | 7 |
multiracial | 3 | 3 |
white | 67 | 67 |
NA | 1 | 1 |
gender | N | Perc |
---|---|---|
man | 61 | 61 |
woman | 39 | 39 |
edu | N | Perc |
---|---|---|
GED | 21 | 21 |
2yearColl | 16 | 16 |
4yearColl | 41 | 41 |
MA | 19 | 19 |
PHD | 3 | 3 |
We asked for political ideology in the following way: Participants selected all that apply from a list of six ideologies or wrote in their own ideology. Those that were selected were then rated on a scale of 0 (not subscribe to it all) to 100 (subscribe to it to a great extent).
ideo | N | Score |
---|---|---|
Conservatism | 40 | 70.88 |
Liberalism | 33 | 69.48 |
Democratic Socialism | 17 | 74.00 |
Progressivism | 15 | 69.00 |
Libertarianism | 8 | 73.25 |
Right-Wing Nationalism | 2 | 92.50 |
Centrist | 1 | 100.00 |
Christ follower | 1 | 92.00 |
I’m very moderate | 1 | 100.00 |
Independent | 1 | 88.00 |
Independent. | 1 | 90.00 |
None. They all have their faults or sell an unrealistic dream of some sort. | 1 | 100.00 |
none | 1 | 3.00 |
pragmatic libertarianism | 1 | 61.00 |
party_id | N | Perc |
---|---|---|
Democrat | 41 | 41 |
Independent | 26 | 26 |
Republican | 33 | 33 |
Before we get into word embedding and clustering, let’s just take a brief look at the values people wrote in for each of the three perspectives. Does it pass the smell test?
The prompt:
First, we want you to think of the United States. Since its
independence and onwards, the formation of the US as a sovereign country
was based on a number of values, all of which were inscribed in the
constitution. This document, importantly, has evolved since its
inception.
ON PAPER, what are the values that the US stands for?
List at least FIVE values.
Top 30 words:
value | N |
---|---|
freedom | 56 |
equality | 45 |
liberty | 37 |
justice | 29 |
democracy | 22 |
independence | 16 |
diversity | 14 |
individualism | 14 |
freedom of speech | 13 |
unity | 12 |
free speech | 8 |
freedom of religion | 8 |
opportunity | 8 |
equality | 7 |
democracy | 5 |
life | 5 |
patriotism | 5 |
fairness | 4 |
hard work | 4 |
liberty | 4 |
progress | 4 |
pursuit of happiness | 4 |
diversity | 3 |
education | 3 |
equal opportunity | 3 |
integrity | 3 |
religion | 3 |
representative government | 3 |
self-government | 3 |
strength | 3 |
alright, this makes sense. All pretty consistent with what we’d imagine, I think.
Now, let’s see if people write different values for the US in
practice. The prompt:
Now, we want you to think of the values that the United States
stands for in reality. Regardless of what is written in the
constitution, the US (across party lines) stands for certain values and
does not stand for others.
IN PRACTICE, what are the values that the US stands for?
List at least FIVE values.
Top 30 words:
value | N |
---|---|
freedom | 23 |
equality | 18 |
democracy | 15 |
individualism | 15 |
diversity | 12 |
liberty | 12 |
greed | 9 |
freedom of speech | 8 |
justice | 8 |
power | 8 |
unity | 7 |
capitalism | 6 |
nationalism | 6 |
opportunity | 6 |
free speech | 5 |
independence | 5 |
money | 5 |
success | 5 |
achievement | 4 |
competition | 4 |
individuality | 4 |
patriotism | 4 |
progress | 4 |
right to bear arms | 4 |
democracy | 3 |
division | 3 |
dominance | 3 |
education | 3 |
freedom of religion | 3 |
hard work | 3 |
Ok, some similarity, but a little more variance in this one. And we get some new words up here (greed, power). It’ll be interesting to see who wrote similar words here and in the constitution question.
What did people write for values in their own ideal dream country?
Prompt:
And now, we want you to imagine your ideal state. Importantly,
imagine this ideal state as if you are randomly born into its
population. You can end up in any level of its citizenry.
So, if you could design a state completely from scratch, what would be
its guiding values?
List at least FIVE values.
Top 30 words:
value | N |
---|---|
freedom | 38 |
equality | 35 |
justice | 19 |
freedom of speech | 12 |
liberty | 12 |
unity | 12 |
democracy | 11 |
individualism | 11 |
diversity | 10 |
opportunity | 7 |
education | 6 |
equality | 6 |
compassion | 5 |
hard work | 5 |
honesty | 5 |
respect | 5 |
democracy | 4 |
empathy | 4 |
freedom of religion | 4 |
kindness | 4 |
nationalism | 4 |
patriotism | 4 |
equal | 3 |
free speech | 3 |
individualism | 3 |
integrity | 3 |
love | 3 |
morality | 3 |
peace | 3 |
privacy | 3 |
freedom still rules. americans… but hey, equality and justice are much stronger here, though. And there are some nice ones in the middle third of this list.
Let’s get a sense of the definitions people wrote for these values.
Instead of taking all the values, though, we’ll just look at the top ten
most mentioned values across perspectives.
This was their prompt:
Thank you for listing the values guiding the US on paper, the US in
practice, and your ideal state.
Now, we ask you to define these values for us.
For each value, please write 1-2 sentences about what you meant when you
listed that value. If you listed the same value in two or three
different perspectives, there is no need to define it more than once.
Simply write “See above” for the second or third time it
appears.
value | mentions | def_word | def_mentions |
---|---|---|---|
freedom | 117 | freedom | 35 |
freedom | 117 | free | 33 |
freedom | 117 | ability | 21 |
freedom | 117 | live | 17 |
freedom | 117 | act | 16 |
freedom | 117 | life | 14 |
freedom | 117 | speak | 13 |
freedom | 117 | government | 11 |
equality | 98 | equal | 59 |
equality | 98 | opportunities | 22 |
equality | 98 | people | 22 |
equality | 98 | rights | 19 |
equality | 98 | treated | 19 |
equality | 98 | equally | 11 |
liberty | 61 | freedom | 27 |
liberty | 61 | free | 13 |
justice | 56 | justice | 14 |
justice | 56 | people | 14 |
justice | 56 | law | 12 |
democracy | 48 | government | 23 |
democracy | 48 | people | 18 |
democracy | 48 | citizens | 11 |
diversity | 36 | people | 13 |
freedom of speech | 33 | speech | 12 |
freedom of speech | 33 | ability | 11 |
Cool. All of this is coming together pretty nicely. Alright, are we ready to start with the real stuff?
We’ll start by using GloVe word embedding (https://cran.r-project.org/web/packages/text2vec/vignettes/glove.html). I’ll keep the code visible for this part.
library(text2vec)
text8_file = "~/text8"
if (!file.exists(text8_file)) {
download.file("http://mattmahoney.net/dc/text8.zip", "~/text8.zip")
unzip ("~/text8.zip", files = "text8", exdir = "~/")
}
wiki = readLines(text8_file, n = 1, warn = FALSE)
# Create iterator over tokens
tokens <- space_tokenizer(wiki)
# Create vocabulary. Terms will be ngrams (1 to 4 tokens).
it = itoken(tokens, progressbar = FALSE)
vocab <- create_vocabulary(it,ngram = c(ngram_min = 1,ngram_max = 4))
vocab <- prune_vocabulary(vocab, term_count_min = 3L)
# Use our filtered vocabulary
vectorizer <- vocab_vectorizer(vocab)
# use window of 5 for context words
tcm <- create_tcm(it, vectorizer, skip_grams_window = 5L)
glove = GlobalVectors$new(rank = 100, x_max = 10)
wv_main = glove$fit_transform(tcm, n_iter = 10, convergence_tol = 0.01, n_threads = 8)
## INFO [12:53:48.368] epoch 1, loss 0.4197
## INFO [12:55:46.425] epoch 2, loss 0.2677
## INFO [12:57:19.234] epoch 3, loss 0.1594
## INFO [12:58:44.773] epoch 4, loss 0.1102
## INFO [12:59:53.151] epoch 5, loss 0.0835
## INFO [13:01:18.132] epoch 6, loss 0.0657
## INFO [13:02:20.827] epoch 7, loss 0.0536
## INFO [13:03:17.902] epoch 8, loss 0.0451
## INFO [13:04:19.144] epoch 9, loss 0.0388
## INFO [13:05:16.104] epoch 10, loss 0.0341
wv_context = glove$components
word_vectors = wv_main + t(wv_context)
Now, with our words. These are the vectors for our top ten values:
word | X1 | X2 | X3 | X4 | X5 | X6 | X7 | X8 | X9 | X10 | X11 | X12 | X13 | X14 | X15 | X16 | X17 | X18 | X19 | X20 | X21 | X22 | X23 | X24 | X25 | X26 | X27 | X28 | X29 | X30 | X31 | X32 | X33 | X34 | X35 | X36 | X37 | X38 | X39 | X40 | X41 | X42 | X43 | X44 | X45 | X46 | X47 | X48 | X49 | X50 | X51 | X52 | X53 | X54 | X55 | X56 | X57 | X58 | X59 | X60 | X61 | X62 | X63 | X64 | X65 | X66 | X67 | X68 | X69 | X70 | X71 | X72 | X73 | X74 | X75 | X76 | X77 | X78 | X79 | X80 | X81 | X82 | X83 | X84 | X85 | X86 | X87 | X88 | X89 | X90 | X91 | X92 | X93 | X94 | X95 | X96 | X97 | X98 | X99 | X100 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
individualism | -0.5706531 | -0.1126608 | 0.4573229 | -0.1748016 | -0.3033205 | 0.4709786 | -0.1742736 | -0.0765885 | 0.8068037 | 0.0654578 | 0.6177974 | 0.6801190 | -0.2108597 | -0.3455212 | -0.3215271 | -0.0975455 | -0.1981827 | -0.5241472 | 0.3229958 | -1.1306757 | 0.6358075 | -0.3059242 | 0.0492382 | 0.6905779 | 0.6204865 | -0.6705427 | -0.1807378 | 0.0214706 | -0.3845205 | 0.1555910 | 0.2793875 | -0.7739798 | 0.2864166 | -0.1966459 | -0.4110313 | 0.3090826 | 0.1772605 | 0.4547164 | 0.2122230 | 0.0212808 | 0.2628981 | -0.7249787 | 0.0818724 | 0.0433294 | 0.1339398 | -0.4809950 | -0.4479637 | -0.5902564 | 0.2526257 | 0.5358725 | -0.1896026 | -0.0222402 | -0.0566561 | 0.1649339 | -0.6993240 | -0.2175410 | 0.0429257 | -0.2348120 | -0.0806193 | -0.2840777 | -0.2957826 | 0.0590548 | 0.3887997 | -0.3307897 | 0.0764245 | 0.2954556 | -1.0379694 | -0.3518625 | 0.3666290 | 0.6449180 | 0.1350715 | 0.2532982 | -0.0734219 | -0.5956234 | -0.4529497 | 0.0242262 | 0.2632688 | 0.2654312 | 0.1684302 | -0.2778670 | -0.0713707 | -0.2567578 | -0.1512891 | 0.4958537 | 0.0183076 | 0.2986921 | 0.0035480 | 0.0096460 | -0.6274090 | 0.0661499 | 0.0582183 | 0.3157753 | -0.7879794 | -0.0059821 | -0.7711160 | 0.2735928 | -0.4636370 | -0.2566196 | 0.2017884 | 0.0350333 |
equality | 0.0017765 | -0.2757384 | 0.1380378 | 0.1115941 | -0.3846866 | -0.1867720 | -0.3296255 | 0.0164803 | -0.0105116 | 0.3699454 | 0.2803337 | -0.2414358 | -0.0757693 | 0.6476464 | -0.0250542 | -0.5222559 | -0.1953305 | -0.5506221 | 0.2688169 | -0.0147637 | 0.2946952 | -0.4280716 | 0.3224988 | 0.7129194 | 0.5991113 | -0.0151125 | -0.0589920 | 0.2227196 | -0.8394155 | 0.2518007 | -0.0072719 | 0.0724599 | -0.2895042 | -0.5537905 | -0.4293071 | -0.3396828 | 0.3617906 | 0.9901985 | -0.4436743 | 0.5156335 | 0.0420188 | -0.3018648 | 0.0897391 | -0.1571725 | 0.0423092 | -0.4390616 | 0.4547808 | -0.1080651 | -0.6224648 | -0.4486903 | 0.5340221 | -0.0290245 | 0.3772279 | 0.6542494 | -0.3789293 | 0.4393395 | -0.0694559 | -0.5179666 | -0.5633874 | 0.0518472 | -0.9817229 | 0.0698676 | 0.6482765 | 0.4054193 | 0.5840553 | 0.3225159 | 0.1916285 | -0.0710628 | -0.3673116 | 0.1853093 | 0.1857984 | 0.2164594 | -0.1051258 | -0.6007488 | 0.3100362 | 0.4470880 | 0.2910041 | -0.1982457 | 0.1036974 | -0.1021365 | -0.0555308 | -0.2373071 | -0.3498558 | -0.1867557 | -0.2871992 | -0.0571508 | 0.3586378 | -0.1056368 | 0.1503214 | -0.2101359 | 0.2535754 | 0.8551450 | 0.0139293 | -0.1587807 | -0.2438933 | 0.1077239 | -0.3602018 | -0.6220697 | 0.2590905 | -0.1899552 |
diversity | -0.0543874 | 0.0907506 | -0.2158128 | -0.0894759 | 0.3463087 | -0.2947234 | -0.4554147 | -0.3980008 | -0.1632898 | 0.1637886 | -0.8599857 | 0.0258172 | -0.4292283 | 0.6537609 | -0.0373631 | -0.8016178 | 0.2868961 | 0.0973240 | 0.2242796 | 0.3415994 | -0.2060550 | -0.1768636 | -0.6758251 | 0.1643870 | -0.1685744 | -0.0459732 | 0.4936555 | -0.1099546 | 0.5283486 | -0.2210025 | -1.2558102 | -0.0302971 | 0.7401356 | -0.6730625 | -0.1484026 | -0.2459032 | 0.0347580 | 0.6086943 | -0.2112537 | 0.2689932 | -0.1995739 | 0.5846537 | 0.1961937 | 0.0547890 | 0.0610349 | -0.2533994 | 0.2208386 | 0.0369523 | 0.0746591 | 0.5525951 | -0.9102708 | -0.0792612 | -0.6983373 | -0.4610428 | -0.0456132 | -0.2445136 | 0.2684186 | 0.2038335 | 0.0035407 | -0.1248608 | 0.0986256 | 0.3431945 | 0.4859838 | -0.4890740 | -0.0278704 | -0.6386770 | -0.0500980 | 0.0719799 | 0.4942881 | -0.4049571 | 0.2833160 | 0.7122544 | -0.2760131 | 0.3350321 | -0.8227359 | 0.0168181 | 0.4498476 | 0.2278223 | 0.4521201 | 0.3622040 | 0.6045631 | 0.5357237 | 0.3514683 | 0.0383356 | 0.1127135 | -0.2619211 | 0.3481093 | 0.6108972 | -0.3352669 | 0.4594275 | 0.0975350 | 0.2207475 | 0.3213755 | 0.3712006 | 0.4007671 | 0.7085598 | -0.4457898 | -0.2428139 | -0.1033081 | -0.5335293 |
unity | -0.0834816 | 0.4491344 | -0.8389599 | -0.8980731 | 0.2041093 | -0.3043530 | 0.0556579 | -0.5018289 | -0.6602815 | -0.3047201 | 0.4859250 | -0.4721624 | 0.0422945 | 0.4558312 | -0.0560778 | -0.2663017 | -0.6386721 | 0.3278354 | 0.4835667 | 0.5883605 | -0.4797593 | -0.0668836 | 0.5838603 | -0.0456437 | -0.4627429 | -0.9044603 | 0.6313694 | -0.9404632 | 0.6859468 | 0.0544157 | 0.2886156 | 0.4134014 | 0.2526044 | -0.0334707 | -0.6004569 | -0.3950155 | 0.2671329 | 0.5151973 | -0.4986752 | 0.7023148 | 0.1238084 | 0.5210217 | 0.3507144 | -1.1982318 | 0.5916083 | -0.0495996 | -0.1134710 | -0.0297949 | -0.1936190 | 0.1567637 | 0.0451554 | 0.1622532 | 0.4393766 | 0.3385275 | -0.2438365 | 0.2643870 | 0.2968137 | -0.4262645 | 0.2901089 | -0.0333452 | -0.2823046 | -0.2145625 | -0.1664019 | 0.1075338 | -0.1084785 | -0.0598009 | 0.2181054 | 0.2158593 | -0.5503917 | -0.0866416 | 0.2876534 | 0.1570109 | 0.1902234 | 0.4909739 | -0.1261626 | 0.8535546 | 0.3338455 | -0.1590653 | 0.6324685 | -0.1156309 | 1.1517201 | 0.2981082 | 0.7840473 | 0.9985799 | -0.1398852 | -0.1487450 | -0.2559384 | -0.3719143 | -0.2603666 | -0.2424770 | 0.4323443 | -0.3279959 | 0.6170279 | -0.4661679 | 0.7269990 | 0.1858877 | -0.2301561 | 0.7706122 | -0.0994340 | 0.3192974 |
liberty | -0.3594864 | -0.5026147 | -0.2081281 | -0.3685180 | 0.5255886 | -0.4280984 | -0.2733975 | 0.8367721 | 0.1696940 | -0.0902922 | -0.3381813 | 0.5422031 | -0.7208381 | 0.0672403 | -0.9753591 | -0.1349160 | -0.2284926 | 0.1977168 | 0.5585061 | 0.4658009 | 0.1284381 | 0.5046493 | -0.2164724 | 0.6892621 | -0.2698011 | -0.2174263 | 0.0185442 | -0.4715038 | 0.1055669 | -0.0078694 | 0.0333601 | 0.3191605 | 0.4104726 | -0.4290830 | -0.0344310 | 0.2331574 | 0.2489767 | 0.6369151 | 0.0890752 | 0.5674957 | 0.5221212 | 0.4046228 | -0.4912863 | -0.0194108 | 0.0493918 | -0.0608819 | 0.1264356 | -0.3659190 | -0.3325358 | -0.1000579 | -0.1496046 | 0.6260188 | 0.2415619 | -0.0841586 | 0.1649768 | 0.8104288 | -0.2432779 | 0.3300447 | -0.2008761 | 0.5125108 | -0.4867673 | -1.0143752 | 0.0288914 | -0.0623315 | 0.6270692 | 0.4268869 | -0.3429166 | 0.1172073 | -0.7184248 | 0.5238178 | -0.5019073 | -0.1210488 | 0.2108400 | 0.0683919 | 0.3839224 | -0.6006378 | 0.2774419 | -0.1497170 | 0.4100104 | -0.3908681 | -0.2047120 | 0.4311659 | 0.9726577 | -0.1953387 | -0.5186177 | 0.0126568 | 0.0664331 | -1.0553703 | -0.1063223 | 0.0540204 | 1.3651621 | 0.3348835 | -0.5150288 | -0.0673262 | 0.3323468 | 0.6750739 | -0.2216213 | 0.5112707 | 0.8171864 | -0.4797633 |
democracy | -0.1139310 | -0.3491421 | -0.0302168 | 0.0581126 | -0.5130326 | -0.3618891 | 0.2366838 | 0.0612760 | -0.0664646 | -0.6326874 | 0.4105462 | 0.5310218 | 0.2480130 | 0.7982512 | 0.5223246 | -0.2394029 | -0.4604727 | 0.1140813 | -0.1784561 | 0.2173720 | 0.7119586 | 0.4361755 | -0.0084432 | -1.1038426 | -0.5087239 | 0.0981672 | 0.6296581 | 0.1140319 | 0.1820886 | 0.1653642 | -0.0158409 | -0.5698087 | -0.6532357 | -0.1247795 | 0.3182479 | -0.1431068 | 0.2692345 | 0.4700592 | -0.1118507 | -0.0588129 | -0.2848716 | -0.4341126 | -0.1630735 | -0.3011917 | 0.2812844 | 0.0344683 | -0.0609784 | 0.8127626 | 0.1111385 | -0.1403709 | 0.1411283 | -0.2212206 | -0.0926750 | 0.0001322 | 1.1118620 | -0.0890517 | 0.2174105 | 0.5390585 | -0.2850934 | 0.0146192 | 0.3395877 | -0.6936893 | 0.1120022 | -0.8893632 | 0.1875660 | 0.1727779 | 0.9256687 | 0.1470129 | 0.1020919 | 0.1850827 | 0.3310835 | 0.6416769 | -0.2618445 | -0.9010756 | -0.3920506 | -0.6481385 | 0.9278553 | 0.2837365 | -0.0238192 | 0.1856862 | 0.4384185 | -0.1426451 | -0.1403193 | -0.0621569 | -0.0959625 | -0.4237133 | 0.3650434 | 0.0758955 | -0.5443707 | 0.7385178 | -0.0981297 | 0.4504162 | 0.1562996 | -0.5648318 | 0.3658088 | -0.1896669 | -0.2539289 | 0.0587047 | 0.0925498 | 0.8850234 |
justice | 1.3205603 | 0.4782971 | -0.2945293 | -0.1582906 | -0.3027340 | 0.3250320 | 0.4134305 | 0.0563676 | -0.1706448 | 0.0847796 | -0.6025830 | -0.1120429 | 0.8825506 | -0.3314126 | 0.3060865 | 0.2588186 | 0.1090525 | 0.2746653 | 0.1148561 | -0.3173074 | 0.2398353 | 0.2472418 | 0.1980879 | -0.1754475 | -0.3045942 | -0.2779845 | -0.1550678 | -0.0462797 | 0.2977303 | 0.0119498 | -0.0404475 | 0.2181525 | -0.5735036 | 0.2930813 | -0.2164784 | -0.0002422 | -0.7442827 | -1.1164809 | 0.4076308 | -0.6360653 | 0.2850729 | 1.4206514 | -0.6165990 | -0.2972571 | 1.0985000 | -0.2259105 | -0.2153638 | 0.6105820 | -0.4218787 | 0.5137426 | -0.5127449 | -0.3253621 | 0.2545246 | 0.6647953 | -0.8559284 | 0.2234756 | 0.2255488 | 0.2058525 | -0.2573327 | 0.7802072 | -0.3983821 | 0.1206354 | 0.4078982 | -0.0435164 | 1.0833050 | 0.9459490 | -0.7741399 | 0.0506848 | -0.3715400 | 1.5446490 | 0.2420095 | -0.6337935 | -0.1363183 | 0.0474556 | 0.2131962 | 0.0072489 | 0.2557982 | -0.4845407 | -1.1207439 | 0.1718878 | 0.9994394 | 0.3859515 | -0.3079891 | -0.6769921 | -0.7631164 | 0.4015137 | 0.2904308 | -0.3051606 | -0.5609063 | 0.3212380 | 0.5495811 | -0.3923484 | -0.9684743 | -0.2597839 | 0.4456711 | -0.0017392 | 0.0577903 | 0.3166950 | -0.0890987 | 0.3606351 |
freedom | -0.9758872 | -0.0483266 | 0.5723640 | -0.2906176 | 0.0309650 | 0.0210083 | -0.0531846 | -0.1829408 | 0.1280572 | -0.4474409 | -0.2491306 | 0.0101692 | 0.2632806 | -0.4737616 | 0.4049551 | 0.6044812 | 0.0447891 | -0.4044132 | 0.3174958 | 0.9714692 | 0.1844219 | 0.1304695 | 0.2516911 | -0.2192965 | -0.7567427 | -0.1621473 | 0.7687256 | 0.1118473 | 0.4346112 | 0.3893186 | -0.7614122 | -0.3057235 | -0.6198806 | 0.2974233 | -0.7395588 | -1.3266038 | 0.2295608 | 0.3590204 | -0.0293640 | 0.3850306 | 0.1476187 | -0.0248723 | -0.4552377 | -0.1549678 | 0.4255162 | -0.0361901 | -0.6686361 | 0.3994660 | 0.1989528 | -0.2814545 | -0.0974256 | -0.2685177 | -0.4628228 | -1.4965588 | 0.5921433 | 0.2735399 | 0.8204858 | -0.7702816 | 0.0700059 | 0.1117551 | 0.1903634 | 0.8904612 | -0.6708447 | -0.1990541 | 0.1025740 | -0.4836443 | 0.3604871 | 0.6905748 | -0.2610917 | 0.7692295 | 0.7514672 | -0.7845639 | -0.7135853 | 0.2549577 | 0.3096264 | 0.1981537 | -0.2940688 | -0.3303469 | 0.3190798 | 0.6323002 | 0.4649938 | -0.3265512 | 0.7423137 | 0.1736426 | 0.1679425 | 0.5389774 | 0.2993193 | 0.7665718 | -0.6256936 | 0.4890951 | -0.6993064 | 0.4530404 | -0.4507098 | 0.0981348 | 0.0674675 | -0.2983423 | 0.2501080 | -0.2745000 | 0.3139165 | -0.2723636 |
independence | 0.2256531 | 1.0150124 | -0.3070521 | -0.1539596 | 0.4921132 | 0.0099673 | 0.6999894 | 0.5137481 | 0.7720840 | -0.3073462 | -0.4588363 | -0.0492247 | -0.0881307 | 1.0326173 | 0.1713335 | -0.4216821 | 0.5931731 | -0.1176083 | 0.4697768 | 0.1935990 | 0.3649809 | -0.0529658 | 0.3100830 | 0.0429115 | -0.3644657 | 0.8660627 | -0.7539011 | 0.0777666 | -0.1445115 | 0.6596922 | -0.2167226 | -0.4489493 | 0.1793150 | 0.6353878 | 0.4680334 | -0.2557805 | 0.2829012 | -0.9582570 | 0.1420597 | -0.4017733 | 0.5045108 | 0.3136754 | -0.2834175 | -0.2978825 | 0.1924060 | -0.4647861 | -0.5146586 | -0.8607032 | -0.2949883 | -0.1089756 | -0.5974052 | 0.3850957 | -0.2536941 | 0.6269122 | 0.0388140 | -0.2391959 | 0.5349864 | 0.9856966 | 0.3284889 | -0.1479923 | 0.2271516 | -1.0836656 | -0.6377266 | -0.2889705 | 0.6887094 | -0.1041094 | 0.6100079 | -0.3813571 | -0.2832932 | -0.6331540 | 0.8097350 | 0.1776102 | 1.3003847 | -0.0204378 | 0.0611318 | -0.8648971 | -0.4942458 | -0.1591714 | -0.8316659 | -0.1391127 | -0.2808507 | 0.2986309 | -0.4739119 | 0.3802865 | -0.9212319 | 0.4912192 | -0.0578424 | 0.7464743 | -0.2355124 | 0.3479696 | 0.2826111 | 0.1943578 | -0.1462496 | 0.5105316 | -0.1203944 | -0.4463011 | -0.2325612 | 1.0852533 | 0.4308586 | -0.5890007 |
freedom of speech | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
df_valuelist <- df_amd %>%
select(value) %>%
rename(word = value) %>%
distinct() %>%
group_by(word) %>%
slice(1) %>%
ungroup() %>%
select(word) %>%
mutate(last_char = stri_sub(word,-1,-1)) %>%
mutate(word = ifelse(last_char == " ",stri_replace_last_fixed(word," ",""),word)) %>%
select(word)
Ok, we have a list of values. It’s not perfect, but with enough data, I think the noise will wash away. Now, let’s add vectors to the words on the list.
df_valuevectors <- data.frame(word_vectors) %>%
mutate(word = rownames(word_vectors),
word = str_replace_all(word,"_"," ")) %>%
group_by(word) %>%
slice(1) %>%
ungroup() %>%
select(word,everything()) %>%
right_join(df_valuelist,by = "word") %>%
mutate(word = ifelse(str_length(word) < 3,NA,word)) %>%
distinct() %>%
filter(!is.na(word))
cool. now, let’s add the vectors to our data
df_amd <- df_amd %>%
left_join(df_valuevectors %>%
rename(value = word),by = "value")
df_distances = tibble(PID = -999,dist_paper_practice = 0,dist_paper_ideal = 0,dist_prac_ideal = 0)
PIDs = unique(df_amd$PID)
for(i in PIDs){
mt_cosine <- df_amd %>%
select(PID,type,value,X1:X100) %>%
group_by(PID,type) %>%
summarise_at(vars(X1:X100),function(x){mean(x,na.rm = T)}) %>%
ungroup() %>%
filter(PID == i) %>%
select(type,X1:X100) %>%
pivot_longer(X1:X100,
names_to = "names",
values_to = "values") %>%
pivot_wider(names_from = "type",
values_from = "values") %>%
select(-names) %>%
as.matrix() %>%
cosine()
cosine_ideal_paper = mt_cosine["ideal","paper"]
cosine_ideal_prac = mt_cosine["ideal","prac"]
cosine_paper_prac = mt_cosine["paper","prac"]
current_scores = tibble(PID = i,
dist_paper_practice = cosine_paper_prac,
dist_paper_ideal = cosine_ideal_paper,
dist_prac_ideal = cosine_ideal_prac)
df_distances <- df_distances %>%
bind_rows(current_scores)
}
combine back in
df_amd_inddiff <- df_amd_inddiff %>%
left_join(df_distances %>%
filter(PID != -999),by = "PID")
Let’s take a couple of extreme PID’s and see if it passes the smell test. I’ll look at paper vs. practice: top 5 cosine similarity vs. bottom 5 cosine similarity. lets see
PID | dist_paper_practice | cosine_similarity | type | value |
---|---|---|---|---|
17 | -0.1968922 | low | paper | freedom |
17 | -0.1968922 | low | paper | independence |
17 | -0.1968922 | low | paper | democracy |
17 | -0.1968922 | low | paper | liberty |
17 | -0.1968922 | low | paper | life |
17 | -0.1968922 | low | paper | persuit of happiness |
17 | -0.1968922 | low | prac | obedience to authority |
17 | -0.1968922 | low | prac | accumulation of money |
17 | -0.1968922 | low | prac | gathering of power to oneself |
17 | -0.1968922 | low | prac | screw the other guy |
17 | -0.1968922 | low | prac | elections and votes |
PID | dist_paper_practice | cosine_similarity | type | value |
---|---|---|---|---|
17 | -0.1968922 | low | paper | freedom |
17 | -0.1968922 | low | paper | independence |
17 | -0.1968922 | low | paper | democracy |
17 | -0.1968922 | low | paper | liberty |
17 | -0.1968922 | low | paper | life |
17 | -0.1968922 | low | paper | persuit of happiness |
17 | -0.1968922 | low | prac | obedience to authority |
17 | -0.1968922 | low | prac | accumulation of money |
17 | -0.1968922 | low | prac | gathering of power to oneself |
17 | -0.1968922 | low | prac | screw the other guy |
17 | -0.1968922 | low | prac | elections and votes |
PID | dist_paper_practice | cosine_similarity | type | value |
---|---|---|---|---|
86 | -0.1885002 | low | paper | all men are equal |
86 | -0.1885002 | low | paper | the right to bear arms |
86 | -0.1885002 | low | paper | no taxaation without representation |
86 | -0.1885002 | low | paper | the right to vote |
86 | -0.1885002 | low | paper | trial by a jury of his peers |
86 | -0.1885002 | low | prac | right to bear arms |
86 | -0.1885002 | low | prac | trial by jury of one’s peers |
86 | -0.1885002 | low | prac | rule by law |
86 | -0.1885002 | low | prac | free trade |
86 | -0.1885002 | low | prac | foreign aid |
86 | -0.1885002 | low | prac | free speech |
PID | dist_paper_practice | cosine_similarity | type | value |
---|---|---|---|---|
13 | -0.1083258 | low | paper | freedom |
13 | -0.1083258 | low | paper | liberty |
13 | -0.1083258 | low | paper | democracy |
13 | -0.1083258 | low | paper | vote |
13 | -0.1083258 | low | paper | bear arms |
13 | -0.1083258 | low | paper | speech |
13 | -0.1083258 | low | prac | money |
13 | -0.1083258 | low | prac | power |
13 | -0.1083258 | low | prac | greed |
13 | -0.1083258 | low | prac | entitlement |
13 | -0.1083258 | low | prac | division |
PID | dist_paper_practice | cosine_similarity | type | value |
---|---|---|---|---|
53 | 1 | high | paper | freedom of speech |
53 | 1 | high | paper | freedom to practice religion or not |
53 | 1 | high | paper | no search and seizure without warrant |
53 | 1 | high | paper | right to bear arms in a well-regulated militia |
53 | 1 | high | paper | right to be represented in legislative body |
53 | 1 | high | prac | freedom of speech |
53 | 1 | high | prac | taxation without representation in some places |
53 | 1 | high | prac | right to be taken advantage of by corporations |
53 | 1 | high | prac | wealth is 9/10ths of the law |
53 | 1 | high | prac | right to lobby legislators with re-election funds |
Oh, looks like this is similar just because they’re all NA’s (none of them have vectors). We’re gonna have to fix that.
PID | dist_paper_practice | cosine_similarity | type | value |
---|---|---|---|---|
68 | 0.9535225 | high | paper | greed |
68 | 0.9535225 | high | paper | capitalism |
68 | 0.9535225 | high | paper | war |
68 | 0.9535225 | high | paper | drama |
68 | 0.9535225 | high | paper | entertainment |
68 | 0.9535225 | high | paper | freedom |
68 | 0.9535225 | high | prac | greed |
68 | 0.9535225 | high | prac | war |
68 | 0.9535225 | high | prac | capitalism |
68 | 0.9535225 | high | prac | drama |
68 | 0.9535225 | high | prac | freedom |
PID | dist_paper_practice | cosine_similarity | type | value |
---|---|---|---|---|
44 | 0.9266478 | high | paper | freedom |
44 | 0.9266478 | high | paper | democracy |
44 | 0.9266478 | high | paper | justice |
44 | 0.9266478 | high | paper | individualism |
44 | 0.9266478 | high | paper | equality |
44 | 0.9266478 | high | paper | self-government |
44 | 0.9266478 | high | prac | individualism |
44 | 0.9266478 | high | prac | self-government |
44 | 0.9266478 | high | prac | democracy |
44 | 0.9266478 | high | prac | freedom |
44 | 0.9266478 | high | prac | justice |
Ok, these look a little better.
How about we try to plot these distributions and slice by ideology
df_amd_inddiff %>%
select(PID,ideo,dist_paper_practice:dist_prac_ideal) %>%
pivot_longer(-c(PID,ideo),
names_to = "names",
values_to = "values") %>%
filter(!is.na(values)) %>%
ggplot(aes(x = values,fill = names)) +
geom_histogram(bins = 30) +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.ticks = element_blank(),
legend.position = "none") +
facet_wrap(~names,nrow = 3)
df_amd_inddiff %>%
select(PID,dist_paper_practice:dist_prac_ideal) %>%
left_join(df_amd_ideo %>%
filter(ideo == "Democratic Socialism" |
ideo == "Conservatism" |
ideo == "Liberalism" |
ideo == "Progressivism" |
ideo == "Libertarianism" |
ideo == "Right-Wing Nationalism"),by = "PID") %>%
select(-ideo_score) %>%
pivot_longer(-c(PID,ideo),
names_to = "names",
values_to = "distance") %>%
filter(!is.na(distance)) %>%
filter(!is.na(ideo)) %>%
ggplot(aes(x = distance,fill = names)) +
geom_histogram(bins = 30) +
theme(panel.grid.major = element_line(color = "grey66"),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.ticks = element_blank(),
axis.line = element_line(color = "grey66"),
legend.position = "none") +
facet_grid(ideo~names)
## Warning in left_join(., df_amd_ideo %>% filter(ideo == "Democratic Socialism" | : Each row in `x` is expected to match at most 1 row in `y`.
## ℹ Row 5 of `x` matches multiple rows.
## ℹ If multiple matches are expected, set `multiple = "all"` to silence this
## warning.
alright, now we’ll take an average of each dimension per participant per perspective and then get difference scores between perspectives for each participant.
df_amd_diffscores <- df_amd %>%
group_by(PID,type) %>%
summarise_at(vars(X1:X100),function(x){mean(x,na.rm = T)}) %>%
ungroup() %>%
pivot_longer(-c(PID,type),
names_to = "dim",
values_to = "values") %>%
filter(!is.na(values)) %>%
#mutate(dim_type = paste0(type,"_",names)) %>%
pivot_wider(names_from = type,
values_from = values) %>%
mutate(prac_minus_paper = prac - paper,
ideal_minus_paper = ideal - paper,
ideal_minus_prac = ideal - prac) %>%
group_by(PID) %>%
summarise(prac_minus_paper = mean(prac_minus_paper,na.rm = T),
ideal_minus_paper = mean(ideal_minus_paper,na.rm = T),
ideal_minus_prac = mean(ideal_minus_prac,na.rm = T)) %>%
ungroup()
hmm, let’s take a look
df_amd_diffscores %>%
pivot_longer(-PID,
names_to = "names",
values_to = "values") %>%
filter(!is.na(values)) %>%
ggplot(aes(x = values,fill = names)) +
geom_histogram(bins = 30) +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.ticks = element_blank(),
legend.position = "none") +
facet_wrap(~names,nrow = 3)
umm, not sure what this tells us tbh. but basically, we can take the difference scores to predict different things. Let’s see if this like ideology predict difference scores:
df_amd_diffscores %>%
left_join(df_amd_ideo %>%
filter(ideo == "Democratic Socialism" |
ideo == "Conservatism" |
ideo == "Liberalism" |
ideo == "Progressivism" |
ideo == "Libertarianism" |
ideo == "Right-Wing Nationalism"),by = "PID") %>%
select(-ideo_score) %>%
pivot_longer(-c(PID,ideo),
names_to = "names",
values_to = "values") %>%
filter(!is.na(values)) %>%
filter(!is.na(ideo)) %>%
ggplot(aes(x = values,fill = names)) +
geom_histogram(bins = 30) +
theme(panel.grid.major = element_line(color = "grey66"),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.ticks = element_blank(),
axis.line = element_line(color = "grey66"),
legend.position = "none") +
facet_grid(ideo~names)
## Warning in left_join(., df_amd_ideo %>% filter(ideo == "Democratic Socialism" | : Each row in `x` is expected to match at most 1 row in `y`.
## ℹ Row 5 of `x` matches multiple rows.
## ℹ If multiple matches are expected, set `multiple = "all"` to silence this
## warning.