Part 1: Data management (45 points)

Loading libraries and data

We start by loading the relevant packages:

library(dplyr)
library(tidyr)
library(forcats)
library(ggplot2)
library(googleVis)
library(plotly)
library(readr)
library(stargazer)

Then we load the two datasets. Because loading the VDEM data takes a while, we cache this code chunk. Although I didn’t mention this in class, we can use the readr library to use the read_csv() command instead of read.csv(), which the authors of suggest should be about 10 times faster.

vdem <- read_csv("V-Dem-DS-CY+Others-v6.2.csv")
undata <- read_csv("UNdata.csv")

First we will work with the VDEM data, then the UN data

Cleaning the VDEM data

The VDEM data contains 3,173 variables. To make this dataset much easier to work with, the first step will be to keep only 8 interesting variables in addition to the needed IDs. I choose to keep country_name, country_text_id, year, v2x_polyarchy, v2x_api, v2x_mpi, v2x_libdem, v2x_liberal, v2x_partipdem, v2x_delibdem, and v2x_egaldem.

vdem <- select(vdem, country_name, country_text_id, year, v2x_polyarchy, 
               v2x_api, v2x_mpi, v2x_libdem, v2x_liberal, 
               v2x_partipdem, v2x_delibdem, v2x_egaldem)

The VDEM data now appears to be both tidy (rows are observations, columns are variables, units of analysis are comparable) and clean (missing datapoints are coded as NA, no additional variables need to be generated, no special characters need to be removed).

summary(vdem)
##  country_name       country_text_id         year      v2x_polyarchy   
##  Length:16675       Length:16675       Min.   :1900   Min.   :0.0084  
##  Class :character   Class :character   1st Qu.:1932   1st Qu.:0.0965  
##  Mode  :character   Mode  :character   Median :1961   Median :0.2077  
##                                        Mean   :1960   Mean   :0.3212  
##                                        3rd Qu.:1989   3rd Qu.:0.5308  
##                                        Max.   :2015   Max.   :0.9584  
##                                                       NA's   :416     
##     v2x_api          v2x_mpi         v2x_libdem      v2x_liberal     
##  Min.   :0.0168   Min.   :0.0000   Min.   :0.0109   Min.   :0.03104  
##  1st Qu.:0.1930   1st Qu.:0.0000   1st Qu.:0.0776   1st Qu.:0.25252  
##  Median :0.4122   Median :0.0009   Median :0.1498   Median :0.44538  
##  Mean   :0.4679   Mean   :0.1746   Mean   :0.2596   Mean   :0.48289  
##  3rd Qu.:0.7713   3rd Qu.:0.2851   3rd Qu.:0.3795   3rd Qu.:0.70355  
##  Max.   :0.9832   Max.   :0.9337   Max.   :0.9278   Max.   :0.98123  
##  NA's   :416      NA's   :416      NA's   :416      NA's   :55       
##  v2x_partipdem     v2x_delibdem     v2x_egaldem    
##  Min.   :0.0004   Min.   :0.0003   Min.   :0.0074  
##  1st Qu.:0.0484   1st Qu.:0.0134   1st Qu.:0.0619  
##  Median :0.1053   Median :0.0703   Median :0.1455  
##  Mean   :0.1990   Mean   :0.2113   Mean   :0.2448  
##  3rd Qu.:0.3044   3rd Qu.:0.3484   3rd Qu.:0.3431  
##  Max.   :0.8404   Max.   :0.9291   Max.   :0.9247  
##  NA's   :425      NA's   :524      NA's   :416

So we can move on to cleaning the UN data.

Cleaning the UN data

The UN data is not shaped well. Taking a look at the first several rows and columns reveals the problems we must address.

library(knitr)
## Warning: package 'knitr' was built under R version 3.3.2
kable(head(undata[,1:5]))
Series Name Series Code Country Name Country Code 1960 [YR1960]
Physicians (per 1,000 people) SH.MED.PHYS.ZS Afghanistan AFG 0.0348442494869232
Physicians (per 1,000 people) SH.MED.PHYS.ZS Albania ALB 0.276291221380234
Physicians (per 1,000 people) SH.MED.PHYS.ZS Algeria DZA 0.173148155212402
Physicians (per 1,000 people) SH.MED.PHYS.ZS American Samoa ASM ..
Physicians (per 1,000 people) SH.MED.PHYS.ZS Andorra ADO ..
Physicians (per 1,000 people) SH.MED.PHYS.ZS Angola AGO 0.0670681074261665

(Incidentally, if you compile a document like this as PDF, you can use the stargazer library to output the above table as latex code. If you specify results='asis' as a code chunk option, the latex code gets compiled automatically in a nice looking table. We have to leave the document at HTML however to allow the interactive graphics to be displayed.)

stargazer(head(as.matrix(undata[,1:5])), 
          title="The first 6 rows and 5 columns of the UN data")
% Table created by stargazer v.5.2 by Marek Hlavac, Harvard University. E-mail: hlavac at fas.harvard.edu % Date and time: Fri, Feb 10, 2017 - 15:50:06 \begin{table}[!htbp] \centering \caption{The first 6 rows and 5 columns of the UN data} \label{} \begin{tabular}{@{\extracolsep{5pt}} ccccc} \\[-1.8ex]\hline \hline \\[-1.8ex] Series Name & Series Code & Country Name & Country Code & 1960 [YR1960] \\ \hline \\[-1.8ex] Physicians (per 1,000 people) & SH.MED.PHYS.ZS & Afghanistan & AFG & 0.0348442494869232 \\ Physicians (per 1,000 people) & SH.MED.PHYS.ZS & Albania & ALB & 0.276291221380234 \\ Physicians (per 1,000 people) & SH.MED.PHYS.ZS & Algeria & DZA & 0.173148155212402 \\ Physicians (per 1,000 people) & SH.MED.PHYS.ZS & American Samoa & ASM & .. \\ Physicians (per 1,000 people) & SH.MED.PHYS.ZS & Andorra & ADO & .. \\ Physicians (per 1,000 people) & SH.MED.PHYS.ZS & Angola & AGO & 0.0670681074261665 \\ \hline \\[-1.8ex] \end{tabular} \end{table}

First, years should be observations but are coded as columns. Second the variables are expressed as observations, distinguished by different series names and codes. Third, both the years and the variables are given names that are currently too complex to use. We must reshape the data and rename the various elements of the data.

First, let’s bring the years down to the rows by using the gather() function.

undata <- gather(undata, `1960 [YR1960]`:`2015 [YR2015]`, key="year", value="value")

The truly tricky thing about the above command is dealing with the obnoxious space in variable names like 1960 [YR1960]. Using double or single quotes around the variable names did not work. I discovered this use of the slanted quotes by typing undata$ into the R console in R Studio, as if I were about to type a variable name after the dollar sign. A drop down menu appeared that allowed me to click on a variable. I clicked on the first year and it placed the variable name in the command with slanty quotes.

The reshape appears to have worked as intended, since the years are now in the rows.

library(knitr)
kable(head(undata))
Series Name Series Code Country Name Country Code year value
Physicians (per 1,000 people) SH.MED.PHYS.ZS Afghanistan AFG 1960 [YR1960] 0.0348442494869232
Physicians (per 1,000 people) SH.MED.PHYS.ZS Albania ALB 1960 [YR1960] 0.276291221380234
Physicians (per 1,000 people) SH.MED.PHYS.ZS Algeria DZA 1960 [YR1960] 0.173148155212402
Physicians (per 1,000 people) SH.MED.PHYS.ZS American Samoa ASM 1960 [YR1960] ..
Physicians (per 1,000 people) SH.MED.PHYS.ZS Andorra ADO 1960 [YR1960] ..
Physicians (per 1,000 people) SH.MED.PHYS.ZS Angola AGO 1960 [YR1960] 0.0670681074261665
We aren’t going to need both the series name and the series code, so let’s drop t he series code a s this information is more difficult to read than the series name.
undata <- select(undata, -`Series Code`)

Next let’s move the variables listed in the series name to the columns. The series name variable appears to have a name with both a space and a special character at the beginning of the name, and it is difficult to deal with this name. So first we can change this name.

names(undata)[1] <- "Series"

We run into an error with the spread() command that says there are duplicate identifiers. Upon investigating the data in the View() window, it appears that these duplicates are missing values on the Series variable that separate one health indicator from the next. To correct this error, we eliminate the rows with missing values on the series name.

undata <- filter(undata, !is.na(Series))

Now we can perform a wide reshape.

undata <- spread(undata, key=Series, value=value)

The year variable still has the unhelpful [YR1960] syntax attached to each year. To isolate the numeric years, we use the separate() command (inverse of the unite()) command. We save the bracketed part as a variable named todrop so we know exactly what to drop.

undata <- separate(undata, year, into=c("year", "todrop"), sep=" ")

Two variables exist because the raw UN data placed headers in the rows (along with missing values) to separate the variables. These variables contain no data, so we can delete them along with todrop.

undata <- select(undata, -todrop, 
                 -`Data from database: Health Nutrition and Population Statistics`,
                -`Last Updated: 12/16/2016`)

Finally, we rename the variables to match VDEM and to be short and descriptive. We also convert year and the three substantive variables to be numeric (since they are all still character vectors).

names(undata) <- c("country_name", "country_text_id", "year",
                   "healthex","phys","undernourish")
undata <- mutate(undata, year=as.numeric(year), healthex=as.numeric(healthex),
                 phys=as.numeric(phys), undernourish=as.numeric(undernourish))
## Warning in eval(substitute(expr), envir, enclos): NAs introduced by
## coercion

## Warning in eval(substitute(expr), envir, enclos): NAs introduced by
## coercion

## Warning in eval(substitute(expr), envir, enclos): NAs introduced by
## coercion

Merging the two datasets together

Checking whether country abbreviations match

Merging the two datasets will likely produce mistakes that don’t produce an error, but nevertheless fail to correctly match countries in one data frame to the other. That’s because different cross-national datasets often use slightly different names and abbreviations for countries. We need a way to quickly see the countries that are matched and unmatched after the merge. If we use the full_join() command for the merge, unmatched rows will have NA values for the variables from the other dataset. The trick is, how can we be sure that a missing value is always due to an unmatched row, and not due to the myriad of other reasons why the data might be missing? For that reason, I create indicators in each dataset which are always equal to 1 and are never missing. Also, in order to merge only on the country abbreviates and year, and not the country names, I rename the country name variable in each data frame.

vdem$vdem <- 1
undata$undata <- 1
names(vdem)[1] <- "countryVDEM"
names(undata)[1] <- "countryUN"

The two datasets have different time frames. VDEM exists from 1900-2015. The UN data exists from 1960-2015. The overlap is 1960-2015. I keep only the rows in VDEM from these years.

vdem <- filter(vdem, year >= 1960)

Now I try the merge, and rearrange the columns to more easily see in the viewer the ID variables and the two indicators I just created.

vdem_un <- full_join(vdem, undata)
## Joining, by = c("country_text_id", "year")
vdem_un <- select(vdem_un, vdem, undata, country_text_id, year, 
                  countryVDEM, countryUN, everything())

Recoding country abbreviations and dropping countries in VDEM

To see what observations from VDEM are unmatched with the UN, we create a subset of the data in which the VDEM indicator exists and the UN indicator is missing. We save the country ID and year variables. Then we repeat to see the observations that exist in the UN data but not VDEM.

unmatch.vdem <- filter(vdem_un, is.na(undata) & !is.na(vdem))
unmatch.vdem <- select(unmatch.vdem, country_text_id, countryVDEM, year)
nrow(unmatch.vdem)
## [1] 401
unmatch.un <- filter(vdem_un, !is.na(undata) & is.na(vdem))
unmatch.un <- select(unmatch.un, country_text_id, countryUN, year)
nrow(unmatch.un)
## [1] 6371

There are 11 countries that exist in VDEM but not in the UN data.

unmatch.vdem <- group_by(unmatch.vdem, country_text_id, countryVDEM)
unmatch.vdem <- summarize(unmatch.vdem, start=min(year), end=max(year))
library(knitr)
kable(unmatch.vdem)
country_text_id countryVDEM start end
COD Congo_Democratic Republic of 1960 2012
DDR German Democratic Republic 1960 1990
PSE Palestine_West_Bank 1968 2014
PSG Palestine_Gaza 1960 2014
ROU Romania 1960 2015
SML Somaliland 1992 2015
TLS East Timor 1960 2015
TWN Taiwan 1960 2015
VDR Vietnam_Republic of 1960 1975
XKX Kosovo 1999 2015
YMD South Yemen 1960 1990

The reasons why these countries are unmatched are as follows:

  • The Democratic Republic of the Congo is coded COD in VDEM and ZAR in the UN data.
  • The UN did not collect data for East Germany, Palestine, Somililand, Taiwan, South Vietnam, and South Yemen.
  • Romania is ROU in VDEM but ROM in the UN data.
  • East Timor is called Timor-Leste and is coded TLS in VDEM but TMP in the UN data.
  • Kosovo is XKX in VDEM but KSV in the UN data.

I adjust the abbreviations and drop cases as necessary.

vdem <- filter(vdem, !(country_text_id %in% c("DDR", "PSE", "PSG", "SML", 
                                              "TWN", "VDR", "YMD")))
vdem <- mutate(vdem, country_text_id = as.factor(country_text_id),
               country_text_id = fct_recode(country_text_id,
                                            "ZAR"="COD",
                                            "ROM"="ROU",
                                            "TMP"="TLS",
                                            "KSV"="XKX"),
               country_text_id = as.character(country_text_id))

Recoding country abbreviations and dropping countries in the UN data

Having adjusted the VDEM data, I try the merge again.

vdem_un <- full_join(vdem, undata)
## Joining, by = c("country_text_id", "year")
vdem_un <- select(vdem_un, vdem, undata, country_text_id, year, 
                  countryVDEM, countryUN, everything())

There are 204 countries that exist in the UN data but not VDEM.

unmatch.un <- filter(vdem_un, !is.na(undata) & is.na(vdem))
unmatch.un <- select(unmatch.un, country_text_id, countryUN, year)
unmatch.un <- group_by(unmatch.un, country_text_id, countryUN)
unmatch.un <- summarize(unmatch.un, start=min(year), end=max(year))
nrow(unmatch.un)
## [1] 204
library(knitr)
kable(unmatch.un)
country_text_id countryUN start end
ABW Aruba 1960 2015
ADO Andorra 1960 2015
AGO Angola 2013 2015
ALB Albania 2013 2015
ARB Arab World 1960 2015
ARE United Arab Emirates 1960 2015
ARM Armenia 1960 1989
ASM American Samoa 1960 2015
ATG Antigua and Barbuda 1960 2015
AUS Australia 2015 2015
AUT Austria 2013 2015
AZE Azerbaijan 1960 1989
BEL Belgium 2015 2015
BGD Bangladesh 1960 2015
BGR Bulgaria 2015 2015
BHR Bahrain 1960 2015
BHS Bahamas, The 1960 2015
BIH Bosnia and Herzegovina 1960 1991
BLR Belarus 1960 1989
BLZ Belize 1960 2015
BMU Bermuda 1960 2015
BRB Barbados 2015 2015
BRN Brunei Darussalam 1960 2015
BWA Botswana 2015 2015
CAF Central African Republic 2013 2015
CAN Canada 2015 2015
CEB Central Europe and the Baltics 1960 2015
CHE Switzerland 2015 2015
CHI Channel Islands 1960 2015
CHL Chile 2015 2015
CHN China 2015 2015
CIV Cote d’Ivoire 2013 2015
CMR Cameroon 1960 1960
COG Congo, Rep. 2013 2015
COM Comoros 2013 2015
CPV Cabo Verde 2015 2015
CSS Caribbean small states 1960 2015
CUW Curacao 1960 2015
CYM Cayman Islands 1960 2015
CYP Cyprus 2013 2015
CZE Czech Republic 2013 2015
DEU Germany 2015 2015
DJI Djibouti 2013 2015
DMA Dominica 1960 2015
DNK Denmark 2015 2015
DOM Dominican Republic 2015 2015
EAP East Asia & Pacific (excluding high income) 1960 2015
EAR Early-demographic dividend 1960 2015
EAS East Asia & Pacific 1960 2015
ECA Europe & Central Asia (excluding high income) 1960 2015
ECS Europe & Central Asia 1960 2015
ECU Ecuador 2013 2015
EGY Egypt, Arab Rep. 2015 2015
EMU Euro area 1960 2015
ESP Spain 2015 2015
EST Estonia 1960 1990
EUU European Union 1960 2015
FCS Fragile and conflict affected situations 1960 2015
FIN Finland 2015 2015
FRA France 2013 2015
FRO Faroe Islands 1960 2015
FSM Micronesia, Fed. Sts. 1960 2015
GAB Gabon 2013 2015
GBR United Kingdom 2013 2015
GEO Georgia 1960 1989
GIB Gibraltar 1960 2015
GIN Guinea 2013 2015
GMB Gambia, The 2013 2015
GNB Guinea-Bissau 2013 2015
GNQ Equatorial Guinea 1960 2015
GRC Greece 2013 2015
GRD Grenada 1960 2015
GRL Greenland 1960 2015
GTM Guatemala 2013 2015
GUM Guam 1960 2015
HIC High income 1960 2015
HKG Hong Kong SAR, China 1960 2015
HND Honduras 2013 2015
HPC Heavily indebted poor countries (HIPC) 1960 2015
HRV Croatia 1960 2015
HTI Haiti 2013 2015
HUN Hungary 2013 2015
IDN Indonesia 2015 2015
IMY Isle of Man 1960 2015
IND India 2015 2015
IRL Ireland 2013 2015
ISL Iceland 2013 2015
ISR Israel 2013 2015
ITA Italy 2013 2015
JAM Jamaica 2013 2015
JPN Japan 2015 2015
KAZ Kazakhstan 1960 1989
KGZ Kyrgyz Republic 1960 1989
KIR Kiribati 1960 2015
KNA St. Kitts and Nevis 1960 2015
KOR Korea, Rep. 2015 2015
KSV Kosovo 1960 1998
KWT Kuwait 1960 2015
LAC Latin America & Caribbean (excluding high income) 1960 2015
LAO Lao PDR 2013 2015
LBR Liberia 2013 2015
LBY Libya 2015 2015
LCA St. Lucia 1960 2015
LCN Latin America & Caribbean 1960 2015
LDC Least developed countries: UN classification 1960 2015
LIC Low income 1960 2015
LIE Liechtenstein 1960 2015
LKA Sri Lanka 2015 2015
LMC Lower middle income 1960 2015
LMY Low & middle income 1960 2015
LSO Lesotho 2013 2015
LTE Late-demographic dividend 1960 2015
LTU Lithuania 1960 1990
LUX Luxembourg 1960 2015
LVA Latvia 1960 1990
MAC Macao SAR, China 1960 2015
MAF St. Martin (French part) 1960 2015
MCO Monaco 1960 2015
MDA Moldova 1960 1989
MDG Madagascar 2013 2015
MEA Middle East & North Africa 1960 2015
MEX Mexico 2015 2015
MHL Marshall Islands 1960 2015
MIC Middle income 1960 2015
MKD Macedonia, FYR 1960 1990
MLI Mali 2013 2015
MLT Malta 1960 2015
MNA Middle East & North Africa (excluding high income) 1960 2015
MNE Montenegro 1960 2015
MNP Northern Mariana Islands 1960 2015
MRT Mauritania 2013 2015
MUS Mauritius 2015 2015
MYS Malaysia 2013 2015
NAC North America 1960 2015
NAM Namibia 2015 2015
NCL New Caledonia 1960 2015
NER Niger 2013 2015
NIC Nicaragua 2013 2015
NLD Netherlands 2015 2015
NOR Norway 2015 2015
NRU Nauru 1960 2015
NZL New Zealand 2013 2015
OED OECD members 1960 2015
OMN Oman 1960 2015
OSS Other small states 1960 2015
PAK Pakistan 2015 2015
PAN Panama 2013 2015
PER Peru 2015 2015
PLW Palau 1960 2015
PNG Papua New Guinea 2015 2015
PRE Pre-demographic dividend 1960 2015
PRI Puerto Rico 1960 2015
PRK Korea, Dem. People’s Rep. 2013 2015
PSS Pacific island small states 1960 2015
PST Post-demographic dividend 1960 2015
PYF French Polynesia 1960 2015
SAS South Asia 1960 2015
SAU Saudi Arabia 2013 2015
SEN Senegal 2013 2015
SGP Singapore 1960 2015
SLE Sierra Leone 2013 2015
SMR San Marino 1960 2015
SRB Serbia 2013 2015
SSA Sub-Saharan Africa (excluding high income) 1960 2015
SSD South Sudan 1960 2010
SSF Sub-Saharan Africa 1960 2015
SST Small states 1960 2015
STP Sao Tome and Principe 2013 2015
SVK Slovak Republic 1960 2015
SVN Slovenia 1960 1988
SWE Sweden 2015 2015
SWZ Swaziland 2013 2015
SXM Sint Maarten (Dutch part) 1960 2015
SYC Seychelles 2013 2015
TCA Turks and Caicos Islands 1960 2015
TCD Chad 2013 2015
TEA East Asia & Pacific (IDA & IBRD countries) 1960 2015
TEC Europe & Central Asia (IDA & IBRD countries) 1960 2015
TGO Togo 2013 2015
TJK Tajikistan 1960 1989
TKM Turkmenistan 1960 2015
TLA Latin America & the Caribbean (IDA & IBRD countries) 1960 2015
TMN Middle East & North Africa (IDA & IBRD countries) 1960 2015
TON Tonga 1960 2015
TSA South Asia (IDA & IBRD) 1960 2015
TSS Sub-Saharan Africa (IDA & IBRD countries) 1960 2015
TTO Trinidad and Tobago 2013 2015
TUV Tuvalu 1960 2015
UKR Ukraine 1960 1989
UMC Upper middle income 1960 2015
URY Uruguay 2015 2015
UZB Uzbekistan 1960 1989
VCT St. Vincent and the Grenadines 1960 2015
VEN Venezuela, RB 2013 2015
VGB British Virgin Islands 1960 2015
VIR Virgin Islands (U.S.) 1960 2015
VNM Vietnam 2013 2015
VUT Vanuatu 2015 2015
WBG West Bank and Gaza 1960 2015
WLD World 1960 2015
WSM Samoa 1960 2015
ZAF South Africa 2015 2015
ZAR Congo, Dem. Rep. 2013 2015
NA NA 1960 2015

It appears that there are three types of entries in this list:

  • Non-democracies: countries like Oman and the United Arab Emirites are not democracies, and as such they do not appear in the Varieties of Democracy data.
  • Non-countries: some of these entries are territories (Greenland), others are regions (Euro area), others have only been countries for part of the 1960-2015 time frame (Estonia).
  • Democracies that haven’t been updated in VDEM: many entries are democracies that haven’t yet been updated through 2014 or 2015.

In all three cases, I am comfortable dropping these cases. I also now remove the vdem and undata indicators.

vdem_un <- filter(vdem_un, !is.na(vdem))
vdem_un <- select(vdem_un, -vdem, -undata)

We now have a complete and cleaned data frame with the merged VDEM and UN data.

Part 2: Collapsing the data (10 points)

I will examine the VDEM indices for liberal democracy, electoral democracy, and participatory democracy.

Collapsing by year

Collapsing by year allows us to examine over-time variation in these democracy indices. To collapse, we use the group_by() and summarize() commands.

vdem_un_time <- group_by(vdem_un, year)
vdem_un_time <- summarize(vdem_un_time, 
                          Liberal=mean(v2x_libdem, na.rm=TRUE),
                          Electoral=mean(v2x_polyarchy, na.rm=TRUE),
                          Participatory=mean(v2x_partipdem, na.rm=TRUE))

Next I use the kable() command to display the data.

library(knitr)
kable(round(vdem_un_time,3))
year Liberal Electoral Participatory
1960 0.250 0.325 0.184
1961 0.252 0.329 0.187
1962 0.252 0.327 0.187
1963 0.253 0.330 0.188
1964 0.251 0.327 0.187
1965 0.252 0.327 0.188
1966 0.253 0.328 0.190
1967 0.249 0.322 0.187
1968 0.247 0.321 0.186
1969 0.248 0.318 0.186
1970 0.247 0.318 0.188
1971 0.247 0.317 0.189
1972 0.244 0.317 0.190
1973 0.242 0.315 0.189
1974 0.244 0.316 0.191
1975 0.249 0.321 0.195
1976 0.253 0.324 0.198
1977 0.255 0.327 0.200
1978 0.258 0.332 0.203
1979 0.266 0.346 0.212
1980 0.269 0.351 0.216
1981 0.269 0.349 0.216
1982 0.268 0.350 0.216
1983 0.272 0.353 0.218
1984 0.276 0.357 0.221
1985 0.284 0.369 0.228
1986 0.289 0.375 0.234
1987 0.293 0.380 0.238
1988 0.299 0.387 0.245
1989 0.304 0.396 0.249
1990 0.333 0.434 0.271
1991 0.361 0.466 0.293
1992 0.377 0.489 0.311
1993 0.387 0.502 0.320
1994 0.392 0.510 0.328
1995 0.397 0.514 0.333
1996 0.400 0.521 0.336
1997 0.399 0.523 0.337
1998 0.401 0.524 0.339
1999 0.400 0.520 0.338
2000 0.407 0.524 0.344
2001 0.411 0.528 0.346
2002 0.419 0.539 0.351
2003 0.422 0.543 0.355
2004 0.425 0.545 0.358
2005 0.427 0.550 0.361
2006 0.429 0.554 0.364
2007 0.429 0.552 0.364
2008 0.432 0.558 0.366
2009 0.435 0.561 0.368
2010 0.435 0.561 0.369
2011 0.439 0.566 0.372
2012 0.439 0.567 0.373
2013 0.440 0.567 0.371
2014 0.431 0.558 0.366
2015 0.359 0.489 0.312

Countries, in general, exhibit more electoral democracy than liberal or participatory democracy. Across the entire time frame, the countries have become more democratic overall. Participatory democracy has experienced the greatest growth over time.

Collapsing by country

Collapsing by country allows us to examine cross-national variation in these democracy indices. To collapse, we convert countryVDEM to a factor and use the group_by() and summarize() commands.

vdem_un <- mutate(vdem_un, countryVDEM=factor(countryVDEM))
vdem_un_xs <- group_by(vdem_un, countryVDEM)
vdem_un_xs <- summarize(vdem_un_xs, 
                          Liberal=round(mean(v2x_libdem, na.rm=TRUE),3),
                          Electoral=round(mean(v2x_polyarchy, na.rm=TRUE),3),
                          Participatory=round(mean(v2x_partipdem, na.rm=TRUE),3))

Next I use the kable() command to display the data.

library(knitr)
kable(vdem_un_xs)
countryVDEM Liberal Electoral Participatory
Afghanistan 0.112 0.181 0.064
Albania 0.208 0.320 0.171
Algeria 0.130 0.243 0.114
Angola 0.080 0.102 0.062
Argentina 0.469 0.612 0.432
Armenia 0.245 0.453 0.236
Australia 0.843 0.898 0.692
Austria 0.789 0.881 0.670
Azerbaijan 0.083 0.252 0.099
Bangladesh 0.252 0.437 0.249
Barbados 0.517 0.584 0.284
Belarus 0.181 0.346 0.164
Belgium 0.738 0.815 0.569
Benin 0.300 0.403 0.223
Bhutan 0.185 0.127 0.120
Bolivia 0.296 0.479 0.278
Bosnia and Herzegovina 0.243 0.340 0.219
Botswana 0.500 0.602 0.400
Brazil 0.467 0.596 0.422
Bulgaria 0.312 0.421 0.253
Burkina Faso 0.254 0.405 0.250
Burma_Myanmar 0.046 0.150 0.082
Burundi 0.156 0.205 0.124
Cambodia 0.108 0.251 0.092
Cameroon 0.126 0.256 0.130
Canada 0.807 0.874 0.624
Cape Verde 0.419 0.477 0.296
Central African Republic 0.130 0.228 0.135
Chad 0.082 0.200 0.119
Chile 0.500 0.585 0.369
China 0.046 0.097 0.036
Colombia 0.344 0.467 0.261
Comoros 0.192 0.331 0.188
Congo_Democratic Republic of 0.089 0.223 0.135
Congo_Republic of the 0.104 0.223 0.143
Costa Rica 0.807 0.867 0.604
Croatia 0.511 0.668 0.449
Cuba 0.062 0.108 0.091
Cyprus 0.553 0.661 0.382
Czech Republic 0.411 0.500 0.298
Denmark 0.892 0.920 0.724
Djibouti 0.090 0.201 0.099
Dominican Republic 0.294 0.506 0.277
East Timor 0.146 0.211 0.086
Ecuador 0.360 0.548 0.354
Egypt 0.148 0.224 0.130
El Salvador 0.210 0.358 0.225
Eritrea 0.022 0.084 0.025
Estonia 0.862 0.911 0.683
Ethiopia 0.076 0.171 0.071
Fiji 0.331 0.434 0.168
Finland 0.847 0.887 0.635
France 0.826 0.893 0.697
Gabon 0.166 0.286 0.154
Gambia 0.284 0.430 0.219
Georgia 0.307 0.498 0.214
Germany 0.727 0.771 0.563
Ghana 0.368 0.438 0.222
Greece 0.609 0.693 0.448
Guatemala 0.186 0.342 0.216
Guinea 0.082 0.209 0.112
Guinea-Bissau 0.132 0.208 0.099
Guyana 0.281 0.442 0.270
Haiti 0.134 0.265 0.146
Honduras 0.247 0.395 0.244
Hungary 0.378 0.434 0.303
Iceland 0.785 0.878 0.671
India 0.554 0.707 0.430
Indonesia 0.184 0.348 0.178
Iran 0.125 0.192 0.100
Iraq 0.126 0.174 0.112
Ireland 0.765 0.866 0.616
Israel 0.605 0.733 0.477
Italy 0.680 0.815 0.608
Ivory Coast 0.192 0.304 0.213
Jamaica 0.419 0.554 0.375
Japan 0.812 0.874 0.605
Jordan 0.171 0.189 0.070
Kazakhstan 0.145 0.295 0.112
Kenya 0.197 0.315 0.148
Korea_North 0.014 0.092 0.021
Korea_South 0.409 0.534 0.312
Kosovo 0.326 0.469 0.294
Kyrgyzstan 0.212 0.352 0.168
Laos 0.091 0.123 0.036
Latvia 0.761 0.866 0.637
Lebanon 0.246 0.437 0.137
Lesotho 0.236 0.286 0.148
Liberia 0.197 0.306 0.170
Libya 0.103 0.117 0.065
Lithuania 0.822 0.868 0.635
Macedonia 0.387 0.529 0.360
Madagascar 0.199 0.331 0.183
Malawi 0.190 0.292 0.133
Malaysia 0.193 0.311 0.167
Maldives 0.110 0.223 0.113
Mali 0.284 0.376 0.238
Mauritania 0.142 0.271 0.140
Mauritius 0.632 0.730 0.470
Mexico 0.267 0.449 0.275
Moldova 0.457 0.603 0.352
Mongolia 0.317 0.440 0.262
Montenegro 0.408 0.516 0.330
Morocco 0.179 0.210 0.121
Mozambique 0.161 0.241 0.147
Namibia 0.261 0.346 0.212
Nepal 0.192 0.236 0.139
Netherlands 0.831 0.894 0.610
New Zealand 0.812 0.880 0.671
Nicaragua 0.238 0.411 0.220
Niger 0.298 0.365 0.210
Nigeria 0.202 0.312 0.205
Norway 0.874 0.905 0.650
Pakistan 0.186 0.293 0.166
Panama 0.332 0.426 0.262
Papua New Guinea 0.305 0.410 0.245
Paraguay 0.207 0.357 0.213
Peru 0.339 0.506 0.293
Philippines 0.315 0.448 0.318
Poland 0.441 0.519 0.356
Portugal 0.631 0.684 0.464
Qatar 0.093 0.041 0.025
Romania 0.238 0.393 0.225
Russia 0.127 0.259 0.125
Rwanda 0.163 0.217 0.139
Sao Tome and Principe 0.301 0.331 0.214
Saudi Arabia 0.066 0.020 0.022
Senegal 0.405 0.562 0.358
Serbia 0.240 0.297 0.197
Seychelles 0.248 0.337 0.118
Sierra Leone 0.176 0.325 0.192
Slovakia 0.626 0.734 0.546
Slovenia 0.729 0.810 0.653
Solomon Islands 0.380 0.490 0.260
Somalia 0.095 0.191 0.081
South Africa 0.279 0.389 0.263
South Sudan 0.102 0.219 0.175
Spain 0.579 0.651 0.477
Sri Lanka 0.392 0.578 0.303
Sudan 0.088 0.203 0.134
Suriname 0.570 0.658 0.401
Swaziland 0.115 0.118 0.113
Sweden 0.851 0.893 0.667
Switzerland 0.822 0.867 0.762
Syria 0.052 0.155 0.059
Tajikistan 0.095 0.248 0.096
Tanzania 0.308 0.356 0.169
Thailand 0.238 0.327 0.164
Togo 0.149 0.251 0.102
Trinidad and Tobago 0.547 0.655 0.339
Tunisia 0.160 0.222 0.113
Turkey 0.364 0.529 0.279
Turkmenistan 0.034 0.153 0.063
Uganda 0.178 0.264 0.163
Ukraine 0.338 0.508 0.342
United Kingdom 0.802 0.873 0.597
United States 0.773 0.838 0.601
Uruguay 0.610 0.699 0.574
Uzbekistan 0.053 0.177 0.075
Vanuatu 0.419 0.480 0.290
Venezuela 0.530 0.716 0.481
Vietnam_Democratic Republic of 0.093 0.194 0.162
Yemen 0.111 0.194 0.098
Zambia 0.279 0.377 0.262
Zimbabwe 0.161 0.258 0.129

As before, we observe higher levels of electoral democracy overall than participatory or liberal democracy. The cross-sectional data reveals, however, the heterogeneity in these levels. They also demonstrate the substantial regional variation in these indices.

Part 3: Graphics (45 points)

Scatterplot grid by year using ggplot2 (5 points)

g <- ggplot(vdem_un, aes(x=v2x_libdem, y=v2x_partipdem)) +
      geom_point() +
      geom_smooth(method="lm") +
      facet_wrap( ~ year)
g
## Warning: Removed 49 rows containing non-finite values (stat_smooth).
## Warning: Removed 49 rows containing missing values (geom_point).

Scatterplot grid by country using ggplot2 (5 points)

g <- ggplot(vdem_un, aes(x=v2x_libdem, y=v2x_partipdem)) +
      geom_point() +
      geom_smooth(method="lm") +
      facet_wrap( ~ countryVDEM)
g
## Warning: Removed 49 rows containing non-finite values (stat_smooth).
## Warning: Removed 49 rows containing missing values (geom_point).

Time series line plot for 3 countries using ggplot2 (5 points)

vdem_un_ts <- filter(vdem_un, country_text_id %in% c("USA", "ESP", "CHL"))
g <- ggplot(vdem_un_ts, aes(x=year, y=v2x_polyarchy, 
                            group=countryVDEM, color=countryVDEM)) + 
      geom_line() +
      ylab("Electoral Democracy index")
g

Time series line plot for 3 countries using plotly (5 points)

plot_ly(vdem_un_ts, x = ~year, y = ~v2x_polyarchy, color = ~countryVDEM, 
        type = "scatter", mode = "lines") %>%
      layout(yaxis = list(title="Electoral Democracy Index"))
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors

Motion graph using googleVis (8 points)

M <- gvisMotionChart(vdem_un, "countryVDEM", "year", 
                     options=list(width=600, height=400))
print(M, "chart")

Interactive world map using googleVis (8 points)

GeoStates <- gvisGeoChart(vdem_un_xs, "countryVDEM", "Electoral",
                          options=list(width=600, height=400))
print(GeoStates, "chart")

Three-dimensional scatterplot plotly (8 points)

plot_ly(vdem_un, x = ~year, y = ~v2x_polyarchy, z = ~v2x_libdem,
        type = "scatter3d", color = ~undernourish)
## No scatter3d mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode
## Warning: Ignoring 40 observations