Positive Samples

Throughout this project, we have been trying to capture euphemisms for socioeconomic inequality in diplomatic texts. This is a difficult issue to triangulate, as it is likely characterized by both absence and misdirection. There are fewer ways to directly talk about something than are to NOT talk about something.

The aim of this section is to use external text samples assumed to embody the euphemization of inequality to find instances of euphemization in the United Nations. If we assume that these external documents embody the concepts/quantities of interest, then analyzing the features will give an idea of what euphemization will look like in the UN Speeches. I will refer to the euphemization documents as positive samples.

There are three lines of inquiry to pursue using the positive samples. The first line involves narrowing the pool of documents to those that most likely euphemize socioeconomic inequality. To find these documents, I computed the document-level cosine similarity between each UNGA speech and a positive sample (the Reagan speech in this case). The resultant document-level similarity measures narrow the pool of documents likely to contain euphemization of socioeconomic inequality.

The second line involves content analysis on the pool of documents suspected to euphemize socioeconomic inequality. We are particularly interested in focal terms and n-grams used to perform this euphemization. Given previous results, we should devote specific attention to the WEOG and GRULAC speeches, especially with regard to their points of difference.

The third line involves using the focal terms/n-grams as the basis for analysis using concept mover distance (CMD) and sentiment analysis. Recall that CMD can be used to gauge the degree of engagement with focal concepts and sentiment analysis can be used to build a sentiment profile of specific documents. These two metrics should suffice in producing a final list of documents that structure the field of engagement with socioeconomic inequality.

1) Document-level Cosine similarity

Descriptive Plots - Cosine Similarity to Reagan Speech The following plots depict the document-level cosine similarity to the 1982 Reagan Speech in Cancun.

# Figure 1  ----
ungdc18 %>%
  ggplot(mapping = aes(x = year, y = cos_reagan)) +
  geom_point() +
  geom_smooth() +
  labs(title = "Document-level Cosine Similarity to Reagan Speech", x = "year", y = "cosine similarity", caption = "Figure 1")
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

# Interpretation: This figure demonstrates a relative peak in similarity to the Reagan speech in the mid-80s. The extreme outlier is the 1981 US speech, which shares a high degree of verbatim overlap with the template document. Including this outlier diminishes the interpretability of this figure, as its degree of similarity is much higher than the second. In any case, we see peak similarity in the mid-1980s, followed by a slight decline. 

# Figure 2 ---- 
ungdc18[ which(ungdc18$UN_REGION != "OTHER"),] %>%
  ggplot(mapping = aes(x = year, y = cos_reagan, colour = UN_REGION)) +
  geom_point() +
  geom_smooth() +
  labs(title = "Document-level Cosine Similarity to Reagan Speech",subtitle = "Subset by UN_REGION", x = "year", y = "cosine similarity", caption = "Figure 2")
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

# Interpretation: When subset by UN_REGION, we see the highest degree of engagement by WEOG and GRULAC. This is slightly evident in the shapes of the curves, but more so in the colored points that rise above the fray. Most of the highly represented points belong to WEOG and GRULAC. 

# Figure 3 ----
ungdc18 %>%
  group_by(year) %>%
  summarise(year_mean = mean(cos_reagan)) %>%
  ggplot(mapping = aes(x = year, y = year_mean)) +
  geom_smooth() +
  labs(title = "Document-level Cosine Similarity to Reagan Speech",subtitle = "Smoothed to Yearly Average", x = "year", y = "average cosine similarity", caption = "Figure 3")
## `summarise()` ungrouping output (override with `.groups` argument)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

# Interpretation: This figure depicts the yearly average cosine similarity to the Reagan speech. The yearly average alters the scale and makes it appear as if there is a rebound that starts in the mid-2000s. 

# Figure 4 ----
ungdc18[ which(ungdc18$UN_REGION != "OTHER"),] %>%
  group_by(year, UN_REGION) %>%
  summarise(year_mean = mean(cos_reagan)) %>%
  ggplot(mapping = aes(x = year, y = year_mean, colour = UN_REGION)) +
  geom_smooth() +
  labs(title = "Document-level Cosine Similarity to Reagan Speech",subtitle = "Smoothed to Yearly Average; Subset by UN_REGION", x = "year", y = "average cosine similarity", caption = "Figure 4")
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

# Interpretation: This again demonstrates the relatively higher degree of similarity among WEOG and GRULAC countries. It also appears that WEOG has a sharper return to similarity with the Reagan speech especially during the late 2000s. 

Overall Findings:

These figures demonstrate cosine similarity to the 1982 Reagan speech in Cancun. If we assume that this positive sample embodies the euphemization of socioeconomic inequality, these cosine similarity measures give us a place to look for instances of euphemization.

Instances of euphemization are most likely to occur among WEOG countries during the mid-80s, with a return from the late-2000s until the present. GRULAC countries also appear to sharing similar vocabular to the WEOG countries, though previous findings suggest that they may take a more critical tone.

Exemplary Documents:

top_reagan <- ungdc18[,c(2,4,6,15,24)] %>%
  arrange(desc(cos_reagan))

kable(head(top_reagan, n = 250))
doc_id iso year UN_REGION cos_reagan
USA_36_1981.txt USA 1981 WEOG 0.1754281
GBR_41_1986.txt GBR 1986 WEOG 0.0935013
MEX_37_1982.txt MEX 1982 GRULAC 0.0893229
ECU_38_1983.txt ECU 1983 GRULAC 0.0892983
USA_25_1970.txt USA 1970 WEOG 0.0878042
MEX_40_1985.txt MEX 1985 GRULAC 0.0792422
SUR_36_1981.txt SUR 1981 GRULAC 0.0784855
GBR_36_1981.txt GBR 1981 WEOG 0.0783948
USA_40_1985.txt USA 1985 WEOG 0.0782623
VEN_36_1981.txt VEN 1981 GRULAC 0.0782170
USA_27_1972.txt USA 1972 WEOG 0.0774915
USA_43_1988.txt USA 1988 WEOG 0.0762043
JAM_47_1992.txt JAM 1992 GRULAC 0.0755327
GUY_29_1974.txt GUY 1974 GRULAC 0.0732834
AUS_36_1981.txt AUS 1981 WEOG 0.0726000
BRA_37_1982.txt BRA 1982 GRULAC 0.0714520
USA_49_1994.txt USA 1994 WEOG 0.0713861
MUS_40_1985.txt MUS 1985 AFRICA 0.0712036
GBR_40_1985.txt GBR 1985 WEOG 0.0709652
ATG_71_2016.txt ATG 2016 GRULAC 0.0701165
FIN_41_1986.txt FIN 1986 WEOG 0.0700443
USA_29_1974.txt USA 1974 WEOG 0.0690225
PER_40_1985.txt PER 1985 GRULAC 0.0689318
COL_32_1977.txt COL 1977 GRULAC 0.0689110
USA_31_1976.txt USA 1976 WEOG 0.0683802
VEN_43_1988.txt VEN 1988 GRULAC 0.0682494
BRA_40_1985.txt BRA 1985 GRULAC 0.0679502
CIV_36_1981.txt CIV 1981 AFRICA 0.0677668
FJI_36_1981.txt FJI 1981 ASIAPAC 0.0675230
SWZ_42_1987.txt SWZ 1987 AFRICA 0.0668327
MYS_29_1974.txt MYS 1974 ASIAPAC 0.0666352
KEN_36_1981.txt KEN 1981 AFRICA 0.0660604
VEN_40_1985.txt VEN 1985 GRULAC 0.0653701
SLV_40_1985.txt SLV 1985 GRULAC 0.0651700
JAM_27_1972.txt JAM 1972 GRULAC 0.0641814
TUR_41_1986.txt TUR 1986 ASIAPAC 0.0628166
DMA_36_1981.txt DMA 1981 GRULAC 0.0625899
THA_25_1970.txt THA 1970 ASIAPAC 0.0624449
USA_33_1978.txt USA 1978 WEOG 0.0624196
USA_41_1986.txt USA 1986 WEOG 0.0623864
IND_37_1982.txt IND 1982 ASIAPAC 0.0621173
JAM_39_1984.txt JAM 1984 GRULAC 0.0617113
DOM_40_1985.txt DOM 1985 GRULAC 0.0617065
MEX_30_1975.txt MEX 1975 GRULAC 0.0612485
GRD_35_1980.txt GRD 1980 GRULAC 0.0608898
NZL_35_1980.txt NZL 1980 WEOG 0.0607765
URY_30_1975.txt URY 1975 GRULAC 0.0605374
BEL_36_1981.txt BEL 1981 WEOG 0.0602507
GUY_44_1989.txt GUY 1989 GRULAC 0.0599380
IND_42_1987.txt IND 1987 ASIAPAC 0.0598933
CAN_36_1981.txt CAN 1981 WEOG 0.0598189
BGD_46_1991.txt BGD 1991 ASIAPAC 0.0596244
PHL_26_1971.txt PHL 1971 ASIAPAC 0.0594615
ROU_41_1986.txt ROU 1986 EASTEUROPE 0.0589347
JAM_38_1983.txt JAM 1983 GRULAC 0.0586186
SOM_26_1971.txt SOM 1971 AFRICA 0.0585580
NER_36_1981.txt NER 1981 AFRICA 0.0585361
LBR_27_1972.txt LBR 1972 AFRICA 0.0584763
IRN_29_1974.txt IRN 1974 ASIAPAC 0.0583286
LAO_49_1994.txt LAO 1994 ASIAPAC 0.0580114
GTM_41_1986.txt GTM 1986 GRULAC 0.0578108
DNK_40_1985.txt DNK 1985 WEOG 0.0575797
IND_40_1985.txt IND 1985 ASIAPAC 0.0571793
BRA_38_1983.txt BRA 1983 GRULAC 0.0567973
ZAF_27_1972.txt ZAF 1972 AFRICA 0.0566994
BGD_40_1985.txt BGD 1985 ASIAPAC 0.0564679
CYP_43_1988.txt CYP 1988 ASIAPAC 0.0561653
FRA_39_1984.txt FRA 1984 WEOG 0.0560813
GBR_39_1984.txt GBR 1984 WEOG 0.0558125
HTI_41_1986.txt HTI 1986 GRULAC 0.0556034
ZAF_26_1971.txt ZAF 1971 AFRICA 0.0555637
BWA_41_1986.txt BWA 1986 AFRICA 0.0554320
BRA_36_1981.txt BRA 1981 GRULAC 0.0551938
AUS_42_1987.txt AUS 1987 WEOG 0.0551342
TGO_36_1981.txt TGO 1981 AFRICA 0.0551318
CAN_29_1974.txt CAN 1974 WEOG 0.0550837
MUS_39_1984.txt MUS 1984 AFRICA 0.0550325
NZL_71_2016.txt NZL 2016 WEOG 0.0548733
ITA_41_1986.txt ITA 1986 WEOG 0.0548704
VAT_34_1979.txt VAT 1979 OTHER 0.0548480
JPN_40_1985.txt JPN 1985 ASIAPAC 0.0546678
USA_47_1992.txt USA 1992 WEOG 0.0546613
BFA_31_1976.txt BFA 1976 AFRICA 0.0545478
ATG_38_1983.txt ATG 1983 GRULAC 0.0544008
BRB_41_1986.txt BRB 1986 GRULAC 0.0543958
NZL_38_1983.txt NZL 1983 WEOG 0.0543622
USA_44_1989.txt USA 1989 WEOG 0.0543325
NPL_37_1982.txt NPL 1982 ASIAPAC 0.0541360
IND_41_1986.txt IND 1986 ASIAPAC 0.0541313
AUS_49_1994.txt AUS 1994 WEOG 0.0540912
BEL_33_1978.txt BEL 1978 WEOG 0.0540783
TUN_36_1981.txt TUN 1981 AFRICA 0.0538812
USA_39_1984.txt USA 1984 WEOG 0.0538793
CAN_38_1983.txt CAN 1983 WEOG 0.0538763
TGO_42_1987.txt TGO 1987 AFRICA 0.0536447
KWT_31_1976.txt KWT 1976 ASIAPAC 0.0534427
CUB_40_1985.txt CUB 1985 GRULAC 0.0534330
ITA_26_1971.txt ITA 1971 WEOG 0.0533262
PHL_33_1978.txt PHL 1978 ASIAPAC 0.0529900
UGA_45_1990.txt UGA 1990 AFRICA 0.0527434
MWI_41_1986.txt MWI 1986 AFRICA 0.0525763
MEX_34_1979.txt MEX 1979 GRULAC 0.0523724
USA_71_2016.txt USA 2016 WEOG 0.0523531
CAN_40_1985.txt CAN 1985 WEOG 0.0521708
AUS_48_1993.txt AUS 1993 WEOG 0.0520478
LCA_36_1981.txt LCA 1981 GRULAC 0.0519719
JAM_58_2003.txt JAM 2003 GRULAC 0.0519116
DOM_70_2015.txt DOM 2015 GRULAC 0.0518563
CIV_26_1971.txt CIV 1971 AFRICA 0.0518230
JAM_41_1986.txt JAM 1986 GRULAC 0.0517271
VEN_39_1984.txt VEN 1984 GRULAC 0.0515391
BRB_35_1980.txt BRB 1980 GRULAC 0.0514739
COL_41_1986.txt COL 1986 GRULAC 0.0514014
GMB_33_1978.txt GMB 1978 AFRICA 0.0513847
DOM_42_1987.txt DOM 1987 GRULAC 0.0512716
GBR_29_1974.txt GBR 1974 WEOG 0.0512672
NPL_36_1981.txt NPL 1981 ASIAPAC 0.0512095
MMR_29_1974.txt MMR 1974 ASIAPAC 0.0511581
NZL_33_1978.txt NZL 1978 WEOG 0.0510867
YEM_36_1981.txt YEM 1981 ASIAPAC 0.0508758
TTO_41_1986.txt TTO 1986 GRULAC 0.0507885
COM_43_1988.txt COM 1988 AFRICA 0.0507712
COG_41_1986.txt COG 1986 AFRICA 0.0507275
VEN_38_1983.txt VEN 1983 GRULAC 0.0506410
GBR_26_1971.txt GBR 1971 WEOG 0.0506141
TGO_41_1986.txt TGO 1986 AFRICA 0.0505888
MYS_39_1984.txt MYS 1984 ASIAPAC 0.0505798
FRA_31_1976.txt FRA 1976 WEOG 0.0505193
AUS_34_1979.txt AUS 1979 WEOG 0.0504201
TGO_26_1971.txt TGO 1971 AFRICA 0.0503779
WSM_36_1981.txt WSM 1981 ASIAPAC 0.0502982
FRA_40_1985.txt FRA 1985 WEOG 0.0502096
MYS_54_1999.txt MYS 1999 ASIAPAC 0.0500938
PHL_36_1981.txt PHL 1981 ASIAPAC 0.0500763
URY_38_1983.txt URY 1983 GRULAC 0.0500693
KEN_43_1988.txt KEN 1988 AFRICA 0.0500693
USA_64_2009.txt USA 2009 WEOG 0.0500445
CAN_27_1972.txt CAN 1972 WEOG 0.0500308
MLI_43_1988.txt MLI 1988 AFRICA 0.0500247
MMR_40_1985.txt MMR 1985 ASIAPAC 0.0498939
FRA_25_1970.txt FRA 1970 WEOG 0.0498047
IRL_40_1985.txt IRL 1985 WEOG 0.0497766
GUY_26_1971.txt GUY 1971 GRULAC 0.0496861
NZL_36_1981.txt NZL 1981 WEOG 0.0496750
IND_26_1971.txt IND 1971 ASIAPAC 0.0496577
NPL_31_1976.txt NPL 1976 ASIAPAC 0.0496178
BFA_29_1974.txt BFA 1974 AFRICA 0.0495953
USA_37_1982.txt USA 1982 WEOG 0.0494451
TUR_36_1981.txt TUR 1981 ASIAPAC 0.0494094
BGD_41_1986.txt BGD 1986 ASIAPAC 0.0493791
USA_32_1977.txt USA 1977 WEOG 0.0492658
GRC_38_1983.txt GRC 1983 WEOG 0.0492443
PHL_28_1973.txt PHL 1973 ASIAPAC 0.0492383
FJI_72_2017.txt FJI 2017 ASIAPAC 0.0492153
USA_73_2018.txt USA 2018 WEOG 0.0491806
CHL_44_1989.txt CHL 1989 GRULAC 0.0490360
LUX_40_1985.txt LUX 1985 WEOG 0.0489988
ATG_73_2018.txt ATG 2018 GRULAC 0.0489639
EGY_38_1983.txt EGY 1983 AFRICA 0.0489290
SDN_29_1974.txt SDN 1974 AFRICA 0.0489238
GUY_41_1986.txt GUY 1986 GRULAC 0.0489182
ARG_36_1981.txt ARG 1981 GRULAC 0.0488429
GUY_27_1972.txt GUY 1972 GRULAC 0.0487938
SLV_26_1971.txt SLV 1971 GRULAC 0.0487072
GHA_45_1990.txt GHA 1990 AFRICA 0.0486216
ZMB_27_1972.txt ZMB 1972 OTHER 0.0486216
ATG_51_1996.txt ATG 1996 GRULAC 0.0485684
PER_27_1972.txt PER 1972 GRULAC 0.0485435
GHA_41_1986.txt GHA 1986 AFRICA 0.0484089
DZA_27_1972.txt DZA 1972 AFRICA 0.0483825
BEL_27_1972.txt BEL 1972 WEOG 0.0483414
CIV_27_1972.txt CIV 1972 AFRICA 0.0483115
CAN_32_1977.txt CAN 1977 WEOG 0.0482575
SUR_31_1976.txt SUR 1976 GRULAC 0.0482444
ROU_36_1981.txt ROU 1981 EASTEUROPE 0.0481990
IND_36_1981.txt IND 1981 ASIAPAC 0.0481894
JAM_36_1981.txt JAM 1981 GRULAC 0.0481847
BEL_34_1979.txt BEL 1979 WEOG 0.0481348
BDI_29_1974.txt BDI 1974 AFRICA 0.0481183
GHA_37_1982.txt GHA 1982 AFRICA 0.0480892
DNK_41_1986.txt DNK 1986 WEOG 0.0480721
SLE_36_1981.txt SLE 1981 AFRICA 0.0479166
UGA_30_1975.txt UGA 1975 AFRICA 0.0478836
COL_40_1985.txt COL 1985 GRULAC 0.0478730
USA_46_1991.txt USA 1991 WEOG 0.0477955
PRT_36_1981.txt PRT 1981 WEOG 0.0477393
ETH_31_1976.txt ETH 1976 AFRICA 0.0477110
JPN_42_1987.txt JPN 1987 ASIAPAC 0.0476636
ATG_56_2001.txt ATG 2001 GRULAC 0.0476409
FJI_46_1991.txt FJI 1991 ASIAPAC 0.0476331
DEU_30_1975.txt DEU 1975 WEOG 0.0475797
KEN_41_1986.txt KEN 1986 AFRICA 0.0475375
ARE_40_1985.txt ARE 1985 ASIAPAC 0.0475272
DEU_57_2002.txt DEU 2002 WEOG 0.0474297
SWE_68_2013.txt SWE 2013 WEOG 0.0473760
GUY_39_1984.txt GUY 1984 GRULAC 0.0473381
URY_32_1977.txt URY 1977 GRULAC 0.0473212
AUS_32_1977.txt AUS 1977 WEOG 0.0472359
MWI_42_1987.txt MWI 1987 AFRICA 0.0472172
IND_43_1988.txt IND 1988 ASIAPAC 0.0471982
IRN_30_1975.txt IRN 1975 ASIAPAC 0.0470696
BLZ_44_1989.txt BLZ 1989 GRULAC 0.0470104
IDN_40_1985.txt IDN 1985 ASIAPAC 0.0469197
MUS_42_1987.txt MUS 1987 AFRICA 0.0469114
LUX_41_1986.txt LUX 1986 WEOG 0.0469058
DZA_30_1975.txt DZA 1975 AFRICA 0.0468384
CAN_34_1979.txt CAN 1979 WEOG 0.0467689
DEU_28_1973.txt DEU 1973 WEOG 0.0467403
SYC_33_1978.txt SYC 1978 AFRICA 0.0467260
GHA_25_1970.txt GHA 1970 AFRICA 0.0466784
USA_26_1971.txt USA 1971 WEOG 0.0466334
SLV_27_1972.txt SLV 1972 GRULAC 0.0466285
FRA_29_1974.txt FRA 1974 WEOG 0.0465657
KOR_47_1992.txt KOR 1992 ASIAPAC 0.0464465
COL_29_1974.txt COL 1974 GRULAC 0.0463792
GBR_43_1988.txt GBR 1988 WEOG 0.0462213
BFA_30_1975.txt BFA 1975 AFRICA 0.0462197
GTM_44_1989.txt GTM 1989 GRULAC 0.0462169
THA_29_1974.txt THA 1974 ASIAPAC 0.0461960
SEN_27_1972.txt SEN 1972 AFRICA 0.0461696
NGA_41_1986.txt NGA 1986 AFRICA 0.0459523
UZB_71_2016.txt UZB 2016 ASIAPAC 0.0459247
STP_31_1976.txt STP 1976 AFRICA 0.0458827
CUB_29_1974.txt CUB 1974 GRULAC 0.0458827
ECU_53_1998.txt ECU 1998 GRULAC 0.0458349
SWZ_35_1980.txt SWZ 1980 AFRICA 0.0457184
ARE_39_1984.txt ARE 1984 ASIAPAC 0.0456075
NOR_41_1986.txt NOR 1986 WEOG 0.0455772
NER_41_1986.txt NER 1986 AFRICA 0.0455661
LCA_45_1990.txt LCA 1990 GRULAC 0.0455514
BOL_40_1985.txt BOL 1985 GRULAC 0.0454508
ECU_44_1989.txt ECU 1989 GRULAC 0.0454422
GMB_41_1986.txt GMB 1986 AFRICA 0.0454334
DEU_40_1985.txt DEU 1985 WEOG 0.0453353
HTI_40_1985.txt HTI 1985 GRULAC 0.0451375
BOL_30_1975.txt BOL 1975 GRULAC 0.0450280
BEL_39_1984.txt BEL 1984 WEOG 0.0450224
ARG_40_1985.txt ARG 1985 GRULAC 0.0450170
PHL_40_1985.txt PHL 1985 ASIAPAC 0.0450055
BWA_43_1988.txt BWA 1988 AFRICA 0.0448609
DOM_29_1974.txt DOM 1974 GRULAC 0.0447929
ARE_42_1987.txt ARE 1987 ASIAPAC 0.0447888
GBR_52_1997.txt GBR 1997 WEOG 0.0447333
DEU_36_1981.txt DEU 1981 WEOG 0.0446781
YUG_40_1985.txt YUG 1985 OTHER 0.0446157
CYP_27_1972.txt CYP 1972 ASIAPAC 0.0445553
GRC_43_1988.txt GRC 1988 WEOG 0.0445421
CAN_43_1988.txt CAN 1988 WEOG 0.0444931
GIN_49_1994.txt GIN 1994 AFRICA 0.0444709
LSO_42_1987.txt LSO 1987 AFRICA 0.0443939