unvotes
R paket sadrzi tri seta podataka u formi data frame-a (preciznije tbl_df, odn. tibble, sto obezbedjuje bolje formatiranje pri njihovom ispisivanju). Prvi set podataka, un_votes
se odnosi na istoriju glasanja svake od zemalja. Svaka vrsta sadrzi country/vote par:
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(unvotes)
## Warning: package 'unvotes' was built under R version 3.3.2
## If you use data from the unvotes package, please cite the following:
##
## Erik Voeten "Data and Analyses of Voting in the UN General Assembly" Routledge Handbook of International Organization, edited by Bob Reinalda (published May 27, 2013)
str(un_votes)
## Classes 'tbl_df', 'tbl' and 'data.frame': 711275 obs. of 3 variables:
## $ rcid : atomic 3 3 3 3 3 3 3 3 3 3 ...
## ..- attr(*, "comment")= chr "rcid"
## $ country: chr "Egypt" "Honduras" "Costa Rica" "El Salvador" ...
## $ vote : Factor w/ 3 levels "abstain","no",..: 1 3 3 3 2 3 3 3 3 3 ...
un_votes
## # A tibble: 711,275 × 3
## rcid country vote
## <dbl> <chr> <fctr>
## 1 3 Egypt abstain
## 2 3 Honduras yes
## 3 3 Costa Rica yes
## 4 3 El Salvador yes
## 5 3 France no
## 6 3 Uruguay yes
## 7 3 Chile yes
## 8 3 Ecuador yes
## 9 3 Argentina yes
## 10 3 Haiti yes
## # ... with 711,265 more rows
Paket takodje sadrzi i set podataka sa informacijama o svakom javnom glasanju, ukljucujuci datum, opis, i rezoluciju o kojoj je glasano:
un_roll_calls
## # A tibble: 5,356 × 9
## rcid session importantvote date unres amend para
## <dbl> <dbl> <dbl> <date> <chr> <dbl> <dbl>
## 1 3 1 0 1946-01-01 R/1/66 1 0
## 2 4 1 0 1946-01-02 R/1/79 0 0
## 3 5 1 0 1946-01-04 R/1/98 0 0
## 4 6 1 0 1946-01-04 R/1/107 0 0
## 5 7 1 0 1946-01-02 R/1/295 1 0
## 6 8 1 0 1946-01-05 R/1/297 1 0
## 7 9 1 0 1946-02-05 R/1/329 0 0
## 8 10 1 0 1946-02-05 R/1/361 1 1
## 9 11 1 0 1946-02-05 R/1/376 0 0
## 10 12 1 0 1946-02-06 R/1/394 1 1
## # ... with 5,346 more rows, and 2 more variables: short <chr>, descr <chr>
Konacno un_roll_call_issues
set podataka sadrzi informacije o medjusobnoj povezanosti razlicitih glasanja kao i o 6 generalnih problema o kojima je glasano u Gen. skupstini UN-a:
un_roll_call_issues
## # A tibble: 4,951 × 3
## rcid short_name issue
## <dbl> <chr> <chr>
## 1 30 me Palestinian conflict
## 2 34 me Palestinian conflict
## 3 77 me Palestinian conflict
## 4 9002 me Palestinian conflict
## 5 9003 me Palestinian conflict
## 6 9004 me Palestinian conflict
## 7 9005 me Palestinian conflict
## 8 9006 me Palestinian conflict
## 9 128 me Palestinian conflict
## 10 129 me Palestinian conflict
## # ... with 4,941 more rows
count(un_roll_call_issues, issue, sort = TRUE)
## # A tibble: 6 × 2
## issue n
## <chr> <int>
## 1 Palestinian conflict 1047
## 2 Colonialism 971
## 3 Human rights 901
## 4 Arms control and disarmament 859
## 5 Nuclear weapons and nuclear material 712
## 6 Economic development 461
Za vise informacija o svakom od pojedinacnih setova podataka koristite naredbu help()
.
Za pocetak cemo da izvrsimo “inner join” za setove un_votes
i un_roll_calls
a na osnovu zajednicke kolone rcid
. Na ovaj nacin dobijamo objedinjen set podataka sa vecinom informacija relevantnih za dalju eksplorativnu analizu.
joined <- inner_join(un_votes, un_roll_calls, by = "rcid")
joined
## # A tibble: 711,275 × 11
## rcid country vote session importantvote date unres amend
## <dbl> <chr> <fctr> <dbl> <dbl> <date> <chr> <dbl>
## 1 3 Egypt abstain 1 0 1946-01-01 R/1/66 1
## 2 3 Honduras yes 1 0 1946-01-01 R/1/66 1
## 3 3 Costa Rica yes 1 0 1946-01-01 R/1/66 1
## 4 3 El Salvador yes 1 0 1946-01-01 R/1/66 1
## 5 3 France no 1 0 1946-01-01 R/1/66 1
## 6 3 Uruguay yes 1 0 1946-01-01 R/1/66 1
## 7 3 Chile yes 1 0 1946-01-01 R/1/66 1
## 8 3 Ecuador yes 1 0 1946-01-01 R/1/66 1
## 9 3 Argentina yes 1 0 1946-01-01 R/1/66 1
## 10 3 Haiti yes 1 0 1946-01-01 R/1/66 1
## # ... with 711,265 more rows, and 3 more variables: para <dbl>,
## # short <chr>, descr <chr>
summarise(joined, total = n(), percent_yes = mean(vote == "yes"))
## # A tibble: 1 × 2
## total percent_yes
## <int> <dbl>
## 1 711275 0.7963952
by_country <- joined %>%
group_by(country) %>%
summarise(n_votes = n(),
percent_yes = mean(vote == "yes"))
# Print the by_country dataset
by_country
## # A tibble: 200 × 3
## country n_votes percent_yes
## <chr> <int> <dbl>
## 1 Afghanistan 4824 0.8381012
## 2 Albania 3363 0.7204877
## 3 Algeria 4374 0.8978052
## 4 Andorra 1410 0.6510638
## 5 Angola 2950 0.9223729
## 6 Antigua and Barbuda 2521 0.9170964
## 7 Argentina 5207 0.7743422
## 8 Armenia 1479 0.7592968
## 9 Australia 5245 0.5542421
## 10 Austria 4786 0.6320518
## # ... with 190 more rows
arrange(by_country, percent_yes)
## # A tibble: 200 × 3
## country n_votes percent_yes
## <chr> <int> <dbl>
## 1 Zanzibar 2 0.0000000
## 2 United States 5237 0.2850869
## 3 Palau 777 0.3063063
## 4 Israel 4790 0.3503132
## 5 Federal Republic of Germany 2151 0.3984193
## 6 Micronesia, Federated States of 1341 0.4131245
## 7 United Kingdom 5218 0.4269835
## 8 France 5171 0.4320248
## 9 Marshall Islands 1468 0.4788828
## 10 Belgium 5238 0.4925544
## # ... with 190 more rows
arrange(by_country, desc(percent_yes))
## # A tibble: 200 × 3
## country n_votes percent_yes
## <chr> <int> <dbl>
## 1 Seychelles 1698 0.9770318
## 2 Timor-Leste 697 0.9670014
## 3 Sao Tome and Principe 2329 0.9665092
## 4 Djibouti 3193 0.9564673
## 5 Guinea-Bissau 2933 0.9546539
## 6 Comoros 2435 0.9462012
## 7 Cabo Verde 3153 0.9454488
## 8 Mozambique 3306 0.9431337
## 9 Yemen 1527 0.9423707
## 10 Zimbabwe 2766 0.9421547
## # ... with 190 more rows
arrange(by_country, n_votes)
## # A tibble: 200 × 3
## country n_votes percent_yes
## <chr> <int> <dbl>
## 1 Zanzibar 2 0.0000000
## 2 Kiribati 93 0.8172043
## 3 South Sudan 96 0.6979167
## 4 Montenegro 558 0.6433692
## 5 Tuvalu 576 0.8246528
## 6 Nauru 606 0.6089109
## 7 Timor-Leste 697 0.9670014
## 8 Tonga 775 0.7303226
## 9 Palau 777 0.3063063
## 10 Switzerland 857 0.6569428
## # ... with 190 more rows
Moze se primetiti da ima nekoliko zemalja koje su znacajno manje puta ucestvovale u glasanju od ostalih (Zanzibar, Kiribati, South Sudan). Ove zemlje i podatke vezane za njih cemo izostatiti iz buduce analize. Filterujemo set podataka tako da izostavimo sve zemlje koje su glasale manje od 100 puta:
by_country %>%
filter( n_votes > 100) %>%
arrange(percent_yes)
## # A tibble: 197 × 3
## country n_votes percent_yes
## <chr> <int> <dbl>
## 1 United States 5237 0.2850869
## 2 Palau 777 0.3063063
## 3 Israel 4790 0.3503132
## 4 Federal Republic of Germany 2151 0.3984193
## 5 Micronesia, Federated States of 1341 0.4131245
## 6 United Kingdom 5218 0.4269835
## 7 France 5171 0.4320248
## 8 Marshall Islands 1468 0.4788828
## 9 Belgium 5238 0.4925544
## 10 Luxembourg 5169 0.5105436
## # ... with 187 more rows
library(lubridate)
## Warning: package 'lubridate' was built under R version 3.3.2
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
by_year <- joined %>%
group_by(year = year(date)) %>%
summarize(n_votes = n(),
percent_yes = mean(vote == "yes")) %>%
filter( n_votes > 100)
by_year
## # A tibble: 68 × 3
## year n_votes percent_yes
## <dbl> <int> <dbl>
## 1 1946 2143 0.5734951
## 2 1947 2039 0.5693968
## 3 1948 3454 0.3998263
## 4 1949 5700 0.4254386
## 5 1950 2911 0.4970800
## 6 1951 402 0.6567164
## 7 1952 4082 0.5460559
## 8 1953 1537 0.6317502
## 9 1954 1788 0.6224832
## 10 1955 2169 0.6947902
## # ... with 58 more rows
# Da vizualizujemo ovaj trend pomocu linijskog grafika
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.2
ggplot(by_year, aes(x = year, y = percent_yes)) +
geom_line()
# Scatter plot + geom_smooth
ggplot(by_year, aes( year, percent_yes)) +
geom_point() +
geom_smooth()
## `geom_smooth()` using method = 'loess'
Zanimljiv trend. Neposredno nakon nakon zavrsetka Drugog svetskog rata su evidentno postojale ostre nesuglasice po mnogim pitanjima. U periodu od 80-tih godina proslog veka, pa sve do skoro, deluje kao da je postojao, gotovo, koncenzus po mnogim pitanjima, medju vecinom clanica Gen. skupstine UN.
library(lubridate)
by_year_country <- joined %>%
group_by( year = year(date), country) %>%
summarise(n_votes = n(), percent_yes = mean(vote == "yes"))
by_year_country
## Source: local data frame [9,496 x 4]
## Groups: year [?]
##
## year country n_votes percent_yes
## <dbl> <chr> <int> <dbl>
## 1 1946 Afghanistan 17 0.4117647
## 2 1946 Argentina 43 0.6976744
## 3 1946 Australia 43 0.5581395
## 4 1946 Belarus 43 0.4418605
## 5 1946 Belgium 43 0.6046512
## 6 1946 Bolivia, Plurinational State of 43 0.6976744
## 7 1946 Brazil 43 0.6046512
## 8 1946 Canada 42 0.6428571
## 9 1946 Chile 43 0.6046512
## 10 1946 Colombia 42 0.3095238
## # ... with 9,486 more rows
Proverimo prvo da li ima svih ovih zemalja u setu:
country <- distinct(by_country, country) %>%
arrange(country)
View(country)
inner_join(country, data.frame(country = c("Serbia", "Croatia", "United States", "Russian Federation", "Yugoslavia", "Soviet Union")))
## Joining, by = "country"
## Warning in inner_join_impl(x, y, by$x, by$y, suffix$x, suffix$y): joining
## character vector and factor, coercing into character vector
## # A tibble: 4 × 1
## country
## <chr>
## 1 Croatia
## 2 Russian Federation
## 3 United States
## 4 Yugoslavia
Izgleda da nema ni Srbije ni SSSR-a. Posmatracemo Jugoslaviju, Rusiju, Hrvatsku i SAD:
filt_countries <- filter(by_year_country, country %in% c("Croatia", "United States", "Russian Federation", "Yugoslavia"))
filt_countries
## Source: local data frame [220 x 4]
## Groups: year [68]
##
## year country n_votes percent_yes
## <dbl> <chr> <int> <dbl>
## 1 1946 Russian Federation 43 0.4418605
## 2 1946 United States 43 0.6046512
## 3 1946 Yugoslavia 41 0.4634146
## 4 1947 Russian Federation 38 0.5263158
## 5 1947 United States 38 0.7105263
## 6 1947 Yugoslavia 38 0.5000000
## 7 1948 Russian Federation 64 0.5312500
## 8 1948 United States 64 0.3906250
## 9 1948 Yugoslavia 64 0.5312500
## 10 1949 Russian Federation 103 0.6407767
## # ... with 210 more rows
ggplot(filt_countries, aes(x = year, y = percent_yes, col = country)) +
geom_point() +
geom_smooth(se = FALSE)
## `geom_smooth()` using method = 'loess'
Evidentno da su opservacije za Srbiju, Srbiju i Crnu Goru i Jugoslavuju deklarisane kao “Yugoslavia” a za Rusiju i SSSR kao “Russian Federation”. Zanimljivo, a donekle i ocekivano, generalni trend glasanja Ruske Federacije i Jugoslavije/ Srbije su prilicno pozitivno korelisani tokom istorije, dok je situacija sa SAD upravo suprotna. Zarad podrobnije komparacije tokom poslednje dve decenije posmatracemo samo period od 1990-te do 2014-te godine.
# Samo da jos jednom proverimo raspone vrednosti, pre svega za var. "year"
summary(filt_countries)
## year country n_votes percent_yes
## Min. :1946 Length:220 Min. : 2.00 Min. :0.0000
## 1st Qu.:1965 Class :character 1st Qu.: 54.00 1st Qu.:0.4455
## Median :1983 Mode :character Median : 70.00 Median :0.6317
## Mean :1982 Mean : 75.39 Mean :0.5812
## 3rd Qu.:2001 3rd Qu.: 89.00 3rd Qu.:0.7159
## Max. :2014 Max. :160.00 Max. :0.9865
ggplot(filt_countries, aes(x = year, y = percent_yes, col = country)) +
geom_point() +
geom_smooth(se = FALSE) +
scale_x_continuous(limits = c(1990, 2014))
## `geom_smooth()` using method = 'loess'
## Warning: Removed 129 rows containing non-finite values (stat_smooth).
## Warning: Removed 129 rows containing missing values (geom_point).
ggplot(filt_countries, aes(x = year, y = percent_yes)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
scale_x_continuous(limits = c(1990, 2014)) +
facet_wrap( ~ country)
## Warning: Removed 129 rows containing non-finite values (stat_smooth).
## Warning: Removed 129 rows containing missing values (geom_point).
Posmatrajuci izolovan samo period 1990-2014 god. mozemo uociti, mozda malo neocekivano, da su Srbija, Ruska Federacija i Hrvatska imale procentualno mnogo vise glasova “za” od SAD, te da se trend glasanja ove tri zemlje prilicno poklapa u datom periodu. Sta vise ovo pogotovo vazi za Hrvatsku i Rusku Federaciju. Naravno potrebno je dataljnije utvrditi kako je glasano za pojedinacne rezolucije pre nego sto izvucemo bilo kakve zakljucke.
Prvo da proverimo o kojim se problemima tacno radi:
distinct(un_roll_call_issues, issue)
## # A tibble: 6 × 1
## issue
## <chr>
## 1 Palestinian conflict
## 2 Nuclear weapons and nuclear material
## 3 Arms control and disarmament
## 4 Human rights
## 5 Colonialism
## 6 Economic development
Nazalost u ovom setu nema podataka o glasanjima koja su se direktno ticala politickih desavanaja na prostoru bivse Jugoslavije, sto bi nama bilo posebno zanimljivo. Elem, nastavimo sa analizom:
joined %>%
filter(country %in% c("Croatia", "United States", "Russian Federation", "Yugoslavia")) %>%
inner_join(un_roll_call_issues, by = "rcid") %>%
group_by(year = year(date), country, issue) %>%
summarize(votes = n(),
percent_yes = mean(vote == "yes")) %>%
filter(votes > 5) %>%
ggplot(aes(year, percent_yes, col = country)) +
geom_point() +
geom_smooth(se = FALSE) +
facet_wrap(~ issue)
## `geom_smooth()` using method = 'loess'
#Da probamo i sa drugacijim rasporedom
joined %>%
filter(country %in% c("Croatia", "United States", "Russian Federation", "Yugoslavia")) %>%
inner_join(un_roll_call_issues, by = "rcid") %>%
group_by(year = year(date), country, issue) %>%
summarize(votes = n(),
percent_yes = mean(vote == "yes")) %>%
filter(votes > 5) %>%
ggplot(aes(year, percent_yes, col = issue)) +
geom_point() +
geom_smooth(se = FALSE) +
facet_wrap(~ country)
## `geom_smooth()` using method = 'loess'
U nastavku ce biti sprovedena korelaciona analiza, regresiona analiza, kao i detaljna graficka analiza za odabrane zemlje i njihovo glasanje u Gen. skupstini UN a u svetlu razlicitih kriterijuma…