Problem Set 1: Answers

Part 1: Data management (45 points)
Part 2: Collapsing the data (10 points)
- Collapsing by year
- Collapsing by country
Part 3: Graphics (45 points)

Part 1: Data management (45 points)

Loading libraries and data

We start by loading the relevant packages:

library(dplyr)
library(tidyr)
library(forcats)
library(ggplot2)
library(googleVis)
library(plotly)
library(readr)
library(stargazer)

Then we load the two datasets. Because loading the VDEM data takes a while, we cache this code chunk. Although I didn’t mention this in class, we can use the readr library to use the read_csv() command instead of read.csv(), which the authors of suggest should be about 10 times faster.

vdem <- read_csv("V-Dem-DS-CY+Others-v6.2.csv")
undata <- read_csv("UNdata.csv")

First we will work with the VDEM data, then the UN data

Cleaning the VDEM data

The VDEM data contains 3,173 variables. To make this dataset much easier to work with, the first step will be to keep only 8 interesting variables in addition to the needed IDs. I choose to keep country_name, country_text_id, year, v2x_polyarchy, v2x_api, v2x_mpi, v2x_libdem, v2x_liberal, v2x_partipdem, v2x_delibdem, and v2x_egaldem.

vdem <- select(vdem, country_name, country_text_id, year, v2x_polyarchy, 
               v2x_api, v2x_mpi, v2x_libdem, v2x_liberal, 
               v2x_partipdem, v2x_delibdem, v2x_egaldem)

The VDEM data now appears to be both tidy (rows are observations, columns are variables, units of analysis are comparable) and clean (missing datapoints are coded as NA, no additional variables need to be generated, no special characters need to be removed).

summary(vdem)

##  country_name       country_text_id         year      v2x_polyarchy   
##  Length:16675       Length:16675       Min.   :1900   Min.   :0.0084  
##  Class :character   Class :character   1st Qu.:1932   1st Qu.:0.0965  
##  Mode  :character   Mode  :character   Median :1961   Median :0.2077  
##                                        Mean   :1960   Mean   :0.3212  
##                                        3rd Qu.:1989   3rd Qu.:0.5308  
##                                        Max.   :2015   Max.   :0.9584  
##                                                       NA's   :416     
##     v2x_api          v2x_mpi         v2x_libdem      v2x_liberal     
##  Min.   :0.0168   Min.   :0.0000   Min.   :0.0109   Min.   :0.03104  
##  1st Qu.:0.1930   1st Qu.:0.0000   1st Qu.:0.0776   1st Qu.:0.25252  
##  Median :0.4122   Median :0.0009   Median :0.1498   Median :0.44538  
##  Mean   :0.4679   Mean   :0.1746   Mean   :0.2596   Mean   :0.48289  
##  3rd Qu.:0.7713   3rd Qu.:0.2851   3rd Qu.:0.3795   3rd Qu.:0.70355  
##  Max.   :0.9832   Max.   :0.9337   Max.   :0.9278   Max.   :0.98123  
##  NA's   :416      NA's   :416      NA's   :416      NA's   :55       
##  v2x_partipdem     v2x_delibdem     v2x_egaldem    
##  Min.   :0.0004   Min.   :0.0003   Min.   :0.0074  
##  1st Qu.:0.0484   1st Qu.:0.0134   1st Qu.:0.0619  
##  Median :0.1053   Median :0.0703   Median :0.1455  
##  Mean   :0.1990   Mean   :0.2113   Mean   :0.2448  
##  3rd Qu.:0.3044   3rd Qu.:0.3484   3rd Qu.:0.3431  
##  Max.   :0.8404   Max.   :0.9291   Max.   :0.9247  
##  NA's   :425      NA's   :524      NA's   :416

So we can move on to cleaning the UN data.

Cleaning the UN data

The UN data is not shaped well. Taking a look at the first several rows and columns reveals the problems we must address.

library(knitr)

## Warning: package 'knitr' was built under R version 3.3.2

kable(head(undata[,1:5]))

Series Name	Series Code	Country Name	Country Code	1960 [YR1960]
Physicians (per 1,000 people)	SH.MED.PHYS.ZS	Afghanistan	AFG	0.0348442494869232
Physicians (per 1,000 people)	SH.MED.PHYS.ZS	Albania	ALB	0.276291221380234
Physicians (per 1,000 people)	SH.MED.PHYS.ZS	Algeria	DZA	0.173148155212402
Physicians (per 1,000 people)	SH.MED.PHYS.ZS	American Samoa	ASM	..
Physicians (per 1,000 people)	SH.MED.PHYS.ZS	Andorra	ADO	..
Physicians (per 1,000 people)	SH.MED.PHYS.ZS	Angola	AGO	0.0670681074261665

(Incidentally, if you compile a document like this as PDF, you can use the stargazer library to output the above table as latex code. If you specify results='asis' as a code chunk option, the latex code gets compiled automatically in a nice looking table. We have to leave the document at HTML however to allow the interactive graphics to be displayed.)

stargazer(head(as.matrix(undata[,1:5])), 
          title="The first 6 rows and 5 columns of the UN data")

% Table created by stargazer v.5.2 by Marek Hlavac, Harvard University. E-mail: hlavac at fas.harvard.edu % Date and time: Fri, Feb 10, 2017 - 15:50:06 \begin{table}[!htbp] \centering \caption{The first 6 rows and 5 columns of the UN data} \label{} \begin{tabular}{@{\extracolsep{5pt}} ccccc} \\[-1.8ex]\hline \hline \\[-1.8ex] Series Name & Series Code & Country Name & Country Code & 1960 [YR1960] \\ \hline \\[-1.8ex] Physicians (per 1,000 people) & SH.MED.PHYS.ZS & Afghanistan & AFG & 0.0348442494869232 \\ Physicians (per 1,000 people) & SH.MED.PHYS.ZS & Albania & ALB & 0.276291221380234 \\ Physicians (per 1,000 people) & SH.MED.PHYS.ZS & Algeria & DZA & 0.173148155212402 \\ Physicians (per 1,000 people) & SH.MED.PHYS.ZS & American Samoa & ASM & .. \\ Physicians (per 1,000 people) & SH.MED.PHYS.ZS & Andorra & ADO & .. \\ Physicians (per 1,000 people) & SH.MED.PHYS.ZS & Angola & AGO & 0.0670681074261665 \\ \hline \\[-1.8ex] \end{tabular} \end{table}

First, years should be observations but are coded as columns. Second the variables are expressed as observations, distinguished by different series names and codes. Third, both the years and the variables are given names that are currently too complex to use. We must reshape the data and rename the various elements of the data.

First, let’s bring the years down to the rows by using the gather() function.

undata <- gather(undata, `1960 [YR1960]`:`2015 [YR2015]`, key="year", value="value")

The truly tricky thing about the above command is dealing with the obnoxious space in variable names like 1960 [YR1960]. Using double or single quotes around the variable names did not work. I discovered this use of the slanted quotes by typing undata$ into the R console in R Studio, as if I were about to type a variable name after the dollar sign. A drop down menu appeared that allowed me to click on a variable. I clicked on the first year and it placed the variable name in the command with slanty quotes.

The reshape appears to have worked as intended, since the years are now in the rows.

library(knitr)
kable(head(undata))

Series Name	Series Code	Country Name	Country Code	year	value
Physicians (per 1,000 people)	SH.MED.PHYS.ZS	Afghanistan	AFG	1960 [YR1960]	0.0348442494869232
Physicians (per 1,000 people)	SH.MED.PHYS.ZS	Albania	ALB	1960 [YR1960]	0.276291221380234
Physicians (per 1,000 people)	SH.MED.PHYS.ZS	Algeria	DZA	1960 [YR1960]	0.173148155212402
Physicians (per 1,000 people)	SH.MED.PHYS.ZS	American Samoa	ASM	1960 [YR1960]	..
Physicians (per 1,000 people)	SH.MED.PHYS.ZS	Andorra	ADO	1960 [YR1960]	..
Physicians (per 1,000 people)	SH.MED.PHYS.ZS	Angola	AGO	1960 [YR1960]	0.0670681074261665
We aren’t going to need both the	series name and	the series code,	so let’s drop t	he series code a	s this information is more difficult to read than the series name.

undata <- select(undata, -`Series Code`)

Next let’s move the variables listed in the series name to the columns. The series name variable appears to have a name with both a space and a special character at the beginning of the name, and it is difficult to deal with this name. So first we can change this name.

names(undata)[1] <- "Series"

We run into an error with the spread() command that says there are duplicate identifiers. Upon investigating the data in the View() window, it appears that these duplicates are missing values on the Series variable that separate one health indicator from the next. To correct this error, we eliminate the rows with missing values on the series name.

undata <- filter(undata, !is.na(Series))

Now we can perform a wide reshape.

undata <- spread(undata, key=Series, value=value)

The year variable still has the unhelpful [YR1960] syntax attached to each year. To isolate the numeric years, we use the separate() command (inverse of the unite()) command. We save the bracketed part as a variable named todrop so we know exactly what to drop.

undata <- separate(undata, year, into=c("year", "todrop"), sep=" ")

Two variables exist because the raw UN data placed headers in the rows (along with missing values) to separate the variables. These variables contain no data, so we can delete them along with todrop.

undata <- select(undata, -todrop, 
                 -`Data from database: Health Nutrition and Population Statistics`,
                -`Last Updated: 12/16/2016`)

Finally, we rename the variables to match VDEM and to be short and descriptive. We also convert year and the three substantive variables to be numeric (since they are all still character vectors).

names(undata) <- c("country_name", "country_text_id", "year",
                   "healthex","phys","undernourish")
undata <- mutate(undata, year=as.numeric(year), healthex=as.numeric(healthex),
                 phys=as.numeric(phys), undernourish=as.numeric(undernourish))

## Warning in eval(substitute(expr), envir, enclos): NAs introduced by
## coercion

## Warning in eval(substitute(expr), envir, enclos): NAs introduced by
## coercion

## Warning in eval(substitute(expr), envir, enclos): NAs introduced by
## coercion

Merging the two datasets together

Checking whether country abbreviations match

Merging the two datasets will likely produce mistakes that don’t produce an error, but nevertheless fail to correctly match countries in one data frame to the other. That’s because different cross-national datasets often use slightly different names and abbreviations for countries. We need a way to quickly see the countries that are matched and unmatched after the merge. If we use the full_join() command for the merge, unmatched rows will have NA values for the variables from the other dataset. The trick is, how can we be sure that a missing value is always due to an unmatched row, and not due to the myriad of other reasons why the data might be missing? For that reason, I create indicators in each dataset which are always equal to 1 and are never missing. Also, in order to merge only on the country abbreviates and year, and not the country names, I rename the country name variable in each data frame.

vdem$vdem <- 1
undata$undata <- 1
names(vdem)[1] <- "countryVDEM"
names(undata)[1] <- "countryUN"

The two datasets have different time frames. VDEM exists from 1900-2015. The UN data exists from 1960-2015. The overlap is 1960-2015. I keep only the rows in VDEM from these years.

vdem <- filter(vdem, year >= 1960)

Now I try the merge, and rearrange the columns to more easily see in the viewer the ID variables and the two indicators I just created.

vdem_un <- full_join(vdem, undata)

## Joining, by = c("country_text_id", "year")

vdem_un <- select(vdem_un, vdem, undata, country_text_id, year, 
                  countryVDEM, countryUN, everything())

Recoding country abbreviations and dropping countries in VDEM

To see what observations from VDEM are unmatched with the UN, we create a subset of the data in which the VDEM indicator exists and the UN indicator is missing. We save the country ID and year variables. Then we repeat to see the observations that exist in the UN data but not VDEM.

unmatch.vdem <- filter(vdem_un, is.na(undata) & !is.na(vdem))
unmatch.vdem <- select(unmatch.vdem, country_text_id, countryVDEM, year)
nrow(unmatch.vdem)

## [1] 401

unmatch.un <- filter(vdem_un, !is.na(undata) & is.na(vdem))
unmatch.un <- select(unmatch.un, country_text_id, countryUN, year)
nrow(unmatch.un)

## [1] 6371

There are 11 countries that exist in VDEM but not in the UN data.

unmatch.vdem <- group_by(unmatch.vdem, country_text_id, countryVDEM)
unmatch.vdem <- summarize(unmatch.vdem, start=min(year), end=max(year))
library(knitr)
kable(unmatch.vdem)

country_text_id	countryVDEM	start	end
COD	Congo_Democratic Republic of	1960	2012
DDR	German Democratic Republic	1960	1990
PSE	Palestine_West_Bank	1968	2014
PSG	Palestine_Gaza	1960	2014
ROU	Romania	1960	2015
SML	Somaliland	1992	2015
TLS	East Timor	1960	2015
TWN	Taiwan	1960	2015
VDR	Vietnam_Republic of	1960	1975
XKX	Kosovo	1999	2015
YMD	South Yemen	1960	1990

The reasons why these countries are unmatched are as follows:

The Democratic Republic of the Congo is coded COD in VDEM and ZAR in the UN data.
The UN did not collect data for East Germany, Palestine, Somililand, Taiwan, South Vietnam, and South Yemen.
Romania is ROU in VDEM but ROM in the UN data.
East Timor is called Timor-Leste and is coded TLS in VDEM but TMP in the UN data.
Kosovo is XKX in VDEM but KSV in the UN data.

I adjust the abbreviations and drop cases as necessary.

vdem <- filter(vdem, !(country_text_id %in% c("DDR", "PSE", "PSG", "SML", 
                                              "TWN", "VDR", "YMD")))
vdem <- mutate(vdem, country_text_id = as.factor(country_text_id),
               country_text_id = fct_recode(country_text_id,
                                            "ZAR"="COD",
                                            "ROM"="ROU",
                                            "TMP"="TLS",
                                            "KSV"="XKX"),
               country_text_id = as.character(country_text_id))

Recoding country abbreviations and dropping countries in the UN data

Having adjusted the VDEM data, I try the merge again.

vdem_un <- full_join(vdem, undata)

## Joining, by = c("country_text_id", "year")

vdem_un <- select(vdem_un, vdem, undata, country_text_id, year, 
                  countryVDEM, countryUN, everything())

There are 204 countries that exist in the UN data but not VDEM.

unmatch.un <- filter(vdem_un, !is.na(undata) & is.na(vdem))
unmatch.un <- select(unmatch.un, country_text_id, countryUN, year)
unmatch.un <- group_by(unmatch.un, country_text_id, countryUN)
unmatch.un <- summarize(unmatch.un, start=min(year), end=max(year))
nrow(unmatch.un)

## [1] 204

library(knitr)
kable(unmatch.un)

country_text_id	countryUN	start	end
ABW	Aruba	1960	2015
ADO	Andorra	1960	2015
AGO	Angola	2013	2015
ALB	Albania	2013	2015
ARB	Arab World	1960	2015
ARE	United Arab Emirates	1960	2015
ARM	Armenia	1960	1989
ASM	American Samoa	1960	2015
ATG	Antigua and Barbuda	1960	2015
AUS	Australia	2015	2015
AUT	Austria	2013	2015
AZE	Azerbaijan	1960	1989
BEL	Belgium	2015	2015
BGD	Bangladesh	1960	2015
BGR	Bulgaria	2015	2015
BHR	Bahrain	1960	2015
BHS	Bahamas, The	1960	2015
BIH	Bosnia and Herzegovina	1960	1991
BLR	Belarus	1960	1989
BLZ	Belize	1960	2015
BMU	Bermuda	1960	2015
BRB	Barbados	2015	2015
BRN	Brunei Darussalam	1960	2015
BWA	Botswana	2015	2015
CAF	Central African Republic	2013	2015
CAN	Canada	2015	2015
CEB	Central Europe and the Baltics	1960	2015
CHE	Switzerland	2015	2015
CHI	Channel Islands	1960	2015
CHL	Chile	2015	2015
CHN	China	2015	2015
CIV	Cote d’Ivoire	2013	2015
CMR	Cameroon	1960	1960
COG	Congo, Rep.	2013	2015
COM	Comoros	2013	2015
CPV	Cabo Verde	2015	2015
CSS	Caribbean small states	1960	2015
CUW	Curacao	1960	2015
CYM	Cayman Islands	1960	2015
CYP	Cyprus	2013	2015
CZE	Czech Republic	2013	2015
DEU	Germany	2015	2015
DJI	Djibouti	2013	2015
DMA	Dominica	1960	2015
DNK	Denmark	2015	2015
DOM	Dominican Republic	2015	2015
EAP	East Asia & Pacific (excluding high income)	1960	2015
EAR	Early-demographic dividend	1960	2015
EAS	East Asia & Pacific	1960	2015
ECA	Europe & Central Asia (excluding high income)	1960	2015
ECS	Europe & Central Asia	1960	2015
ECU	Ecuador	2013	2015
EGY	Egypt, Arab Rep.	2015	2015
EMU	Euro area	1960	2015
ESP	Spain	2015	2015
EST	Estonia	1960	1990
EUU	European Union	1960	2015
FCS	Fragile and conflict affected situations	1960	2015
FIN	Finland	2015	2015
FRA	France	2013	2015
FRO	Faroe Islands	1960	2015
FSM	Micronesia, Fed. Sts.	1960	2015
GAB	Gabon	2013	2015
GBR	United Kingdom	2013	2015
GEO	Georgia	1960	1989
GIB	Gibraltar	1960	2015
GIN	Guinea	2013	2015
GMB	Gambia, The	2013	2015
GNB	Guinea-Bissau	2013	2015
GNQ	Equatorial Guinea	1960	2015
GRC	Greece	2013	2015
GRD	Grenada	1960	2015
GRL	Greenland	1960	2015
GTM	Guatemala	2013	2015
GUM	Guam	1960	2015
HIC	High income	1960	2015
HKG	Hong Kong SAR, China	1960	2015
HND	Honduras	2013	2015
HPC	Heavily indebted poor countries (HIPC)	1960	2015
HRV	Croatia	1960	2015
HTI	Haiti	2013	2015
HUN	Hungary	2013	2015
IDN	Indonesia	2015	2015
IMY	Isle of Man	1960	2015
IND	India	2015	2015
IRL	Ireland	2013	2015
ISL	Iceland	2013	2015
ISR	Israel	2013	2015
ITA	Italy	2013	2015
JAM	Jamaica	2013	2015
JPN	Japan	2015	2015
KAZ	Kazakhstan	1960	1989
KGZ	Kyrgyz Republic	1960	1989
KIR	Kiribati	1960	2015
KNA	St. Kitts and Nevis	1960	2015
KOR	Korea, Rep.	2015	2015
KSV	Kosovo	1960	1998
KWT	Kuwait	1960	2015
LAC	Latin America & Caribbean (excluding high income)	1960	2015
LAO	Lao PDR	2013	2015
LBR	Liberia	2013	2015
LBY	Libya	2015	2015
LCA	St. Lucia	1960	2015
LCN	Latin America & Caribbean	1960	2015
LDC	Least developed countries: UN classification	1960	2015
LIC	Low income	1960	2015
LIE	Liechtenstein	1960	2015
LKA	Sri Lanka	2015	2015
LMC	Lower middle income	1960	2015
LMY	Low & middle income	1960	2015
LSO	Lesotho	2013	2015
LTE	Late-demographic dividend	1960	2015
LTU	Lithuania	1960	1990
LUX	Luxembourg	1960	2015
LVA	Latvia	1960	1990
MAC	Macao SAR, China	1960	2015
MAF	St. Martin (French part)	1960	2015
MCO	Monaco	1960	2015
MDA	Moldova	1960	1989
MDG	Madagascar	2013	2015
MEA	Middle East & North Africa	1960	2015
MEX	Mexico	2015	2015
MHL	Marshall Islands	1960	2015
MIC	Middle income	1960	2015
MKD	Macedonia, FYR	1960	1990
MLI	Mali	2013	2015
MLT	Malta	1960	2015
MNA	Middle East & North Africa (excluding high income)	1960	2015
MNE	Montenegro	1960	2015
MNP	Northern Mariana Islands	1960	2015
MRT	Mauritania	2013	2015
MUS	Mauritius	2015	2015
MYS	Malaysia	2013	2015
NAC	North America	1960	2015
NAM	Namibia	2015	2015
NCL	New Caledonia	1960	2015
NER	Niger	2013	2015
NIC	Nicaragua	2013	2015
NLD	Netherlands	2015	2015
NOR	Norway	2015	2015
NRU	Nauru	1960	2015
NZL	New Zealand	2013	2015
OED	OECD members	1960	2015
OMN	Oman	1960	2015
OSS	Other small states	1960	2015
PAK	Pakistan	2015	2015
PAN	Panama	2013	2015
PER	Peru	2015	2015
PLW	Palau	1960	2015
PNG	Papua New Guinea	2015	2015
PRE	Pre-demographic dividend	1960	2015
PRI	Puerto Rico	1960	2015
PRK	Korea, Dem. People’s Rep.	2013	2015
PSS	Pacific island small states	1960	2015
PST	Post-demographic dividend	1960	2015
PYF	French Polynesia	1960	2015
SAS	South Asia	1960	2015
SAU	Saudi Arabia	2013	2015
SEN	Senegal	2013	2015
SGP	Singapore	1960	2015
SLE	Sierra Leone	2013	2015
SMR	San Marino	1960	2015
SRB	Serbia	2013	2015
SSA	Sub-Saharan Africa (excluding high income)	1960	2015
SSD	South Sudan	1960	2010
SSF	Sub-Saharan Africa	1960	2015
SST	Small states	1960	2015
STP	Sao Tome and Principe	2013	2015
SVK	Slovak Republic	1960	2015
SVN	Slovenia	1960	1988
SWE	Sweden	2015	2015
SWZ	Swaziland	2013	2015
SXM	Sint Maarten (Dutch part)	1960	2015
SYC	Seychelles	2013	2015
TCA	Turks and Caicos Islands	1960	2015
TCD	Chad	2013	2015
TEA	East Asia & Pacific (IDA & IBRD countries)	1960	2015
TEC	Europe & Central Asia (IDA & IBRD countries)	1960	2015
TGO	Togo	2013	2015
TJK	Tajikistan	1960	1989
TKM	Turkmenistan	1960	2015
TLA	Latin America & the Caribbean (IDA & IBRD countries)	1960	2015
TMN	Middle East & North Africa (IDA & IBRD countries)	1960	2015
TON	Tonga	1960	2015
TSA	South Asia (IDA & IBRD)	1960	2015
TSS	Sub-Saharan Africa (IDA & IBRD countries)	1960	2015
TTO	Trinidad and Tobago	2013	2015
TUV	Tuvalu	1960	2015
UKR	Ukraine	1960	1989
UMC	Upper middle income	1960	2015
URY	Uruguay	2015	2015
UZB	Uzbekistan	1960	1989
VCT	St. Vincent and the Grenadines	1960	2015
VEN	Venezuela, RB	2013	2015
VGB	British Virgin Islands	1960	2015
VIR	Virgin Islands (U.S.)	1960	2015
VNM	Vietnam	2013	2015
VUT	Vanuatu	2015	2015
WBG	West Bank and Gaza	1960	2015
WLD	World	1960	2015
WSM	Samoa	1960	2015
ZAF	South Africa	2015	2015
ZAR	Congo, Dem. Rep.	2013	2015
NA	NA	1960	2015

It appears that there are three types of entries in this list:

Non-democracies: countries like Oman and the United Arab Emirites are not democracies, and as such they do not appear in the Varieties of Democracy data.
Non-countries: some of these entries are territories (Greenland), others are regions (Euro area), others have only been countries for part of the 1960-2015 time frame (Estonia).
Democracies that haven’t been updated in VDEM: many entries are democracies that haven’t yet been updated through 2014 or 2015.

In all three cases, I am comfortable dropping these cases. I also now remove the vdem and undata indicators.

vdem_un <- filter(vdem_un, !is.na(vdem))
vdem_un <- select(vdem_un, -vdem, -undata)

We now have a complete and cleaned data frame with the merged VDEM and UN data.

Part 2: Collapsing the data (10 points)

I will examine the VDEM indices for liberal democracy, electoral democracy, and participatory democracy.

Collapsing by year

Collapsing by year allows us to examine over-time variation in these democracy indices. To collapse, we use the group_by() and summarize() commands.

vdem_un_time <- group_by(vdem_un, year)
vdem_un_time <- summarize(vdem_un_time, 
                          Liberal=mean(v2x_libdem, na.rm=TRUE),
                          Electoral=mean(v2x_polyarchy, na.rm=TRUE),
                          Participatory=mean(v2x_partipdem, na.rm=TRUE))

Next I use the kable() command to display the data.

library(knitr)
kable(round(vdem_un_time,3))

year	Liberal	Electoral	Participatory
1960	0.250	0.325	0.184
1961	0.252	0.329	0.187
1962	0.252	0.327	0.187
1963	0.253	0.330	0.188
1964	0.251	0.327	0.187
1965	0.252	0.327	0.188
1966	0.253	0.328	0.190
1967	0.249	0.322	0.187
1968	0.247	0.321	0.186
1969	0.248	0.318	0.186
1970	0.247	0.318	0.188
1971	0.247	0.317	0.189
1972	0.244	0.317	0.190
1973	0.242	0.315	0.189
1974	0.244	0.316	0.191
1975	0.249	0.321	0.195
1976	0.253	0.324	0.198
1977	0.255	0.327	0.200
1978	0.258	0.332	0.203
1979	0.266	0.346	0.212
1980	0.269	0.351	0.216
1981	0.269	0.349	0.216
1982	0.268	0.350	0.216
1983	0.272	0.353	0.218
1984	0.276	0.357	0.221
1985	0.284	0.369	0.228
1986	0.289	0.375	0.234
1987	0.293	0.380	0.238
1988	0.299	0.387	0.245
1989	0.304	0.396	0.249
1990	0.333	0.434	0.271
1991	0.361	0.466	0.293
1992	0.377	0.489	0.311
1993	0.387	0.502	0.320
1994	0.392	0.510	0.328
1995	0.397	0.514	0.333
1996	0.400	0.521	0.336
1997	0.399	0.523	0.337
1998	0.401	0.524	0.339
1999	0.400	0.520	0.338
2000	0.407	0.524	0.344
2001	0.411	0.528	0.346
2002	0.419	0.539	0.351
2003	0.422	0.543	0.355
2004	0.425	0.545	0.358
2005	0.427	0.550	0.361
2006	0.429	0.554	0.364
2007	0.429	0.552	0.364
2008	0.432	0.558	0.366
2009	0.435	0.561	0.368
2010	0.435	0.561	0.369
2011	0.439	0.566	0.372
2012	0.439	0.567	0.373
2013	0.440	0.567	0.371
2014	0.431	0.558	0.366
2015	0.359	0.489	0.312

Countries, in general, exhibit more electoral democracy than liberal or participatory democracy. Across the entire time frame, the countries have become more democratic overall. Participatory democracy has experienced the greatest growth over time.

Collapsing by country

Collapsing by country allows us to examine cross-national variation in these democracy indices. To collapse, we convert countryVDEM to a factor and use the group_by() and summarize() commands.

vdem_un <- mutate(vdem_un, countryVDEM=factor(countryVDEM))
vdem_un_xs <- group_by(vdem_un, countryVDEM)
vdem_un_xs <- summarize(vdem_un_xs, 
                          Liberal=round(mean(v2x_libdem, na.rm=TRUE),3),
                          Electoral=round(mean(v2x_polyarchy, na.rm=TRUE),3),
                          Participatory=round(mean(v2x_partipdem, na.rm=TRUE),3))

Next I use the kable() command to display the data.

library(knitr)
kable(vdem_un_xs)

countryVDEM	Liberal	Electoral	Participatory
Afghanistan	0.112	0.181	0.064
Albania	0.208	0.320	0.171
Algeria	0.130	0.243	0.114
Angola	0.080	0.102	0.062
Argentina	0.469	0.612	0.432
Armenia	0.245	0.453	0.236
Australia	0.843	0.898	0.692
Austria	0.789	0.881	0.670
Azerbaijan	0.083	0.252	0.099
Bangladesh	0.252	0.437	0.249
Barbados	0.517	0.584	0.284
Belarus	0.181	0.346	0.164
Belgium	0.738	0.815	0.569
Benin	0.300	0.403	0.223
Bhutan	0.185	0.127	0.120
Bolivia	0.296	0.479	0.278
Bosnia and Herzegovina	0.243	0.340	0.219
Botswana	0.500	0.602	0.400
Brazil	0.467	0.596	0.422
Bulgaria	0.312	0.421	0.253
Burkina Faso	0.254	0.405	0.250
Burma_Myanmar	0.046	0.150	0.082
Burundi	0.156	0.205	0.124
Cambodia	0.108	0.251	0.092
Cameroon	0.126	0.256	0.130
Canada	0.807	0.874	0.624
Cape Verde	0.419	0.477	0.296
Central African Republic	0.130	0.228	0.135
Chad	0.082	0.200	0.119
Chile	0.500	0.585	0.369
China	0.046	0.097	0.036
Colombia	0.344	0.467	0.261
Comoros	0.192	0.331	0.188
Congo_Democratic Republic of	0.089	0.223	0.135
Congo_Republic of the	0.104	0.223	0.143
Costa Rica	0.807	0.867	0.604
Croatia	0.511	0.668	0.449
Cuba	0.062	0.108	0.091
Cyprus	0.553	0.661	0.382
Czech Republic	0.411	0.500	0.298
Denmark	0.892	0.920	0.724
Djibouti	0.090	0.201	0.099
Dominican Republic	0.294	0.506	0.277
East Timor	0.146	0.211	0.086
Ecuador	0.360	0.548	0.354
Egypt	0.148	0.224	0.130
El Salvador	0.210	0.358	0.225
Eritrea	0.022	0.084	0.025
Estonia	0.862	0.911	0.683
Ethiopia	0.076	0.171	0.071
Fiji	0.331	0.434	0.168
Finland	0.847	0.887	0.635
France	0.826	0.893	0.697
Gabon	0.166	0.286	0.154
Gambia	0.284	0.430	0.219
Georgia	0.307	0.498	0.214
Germany	0.727	0.771	0.563
Ghana	0.368	0.438	0.222
Greece	0.609	0.693	0.448
Guatemala	0.186	0.342	0.216
Guinea	0.082	0.209	0.112
Guinea-Bissau	0.132	0.208	0.099
Guyana	0.281	0.442	0.270
Haiti	0.134	0.265	0.146
Honduras	0.247	0.395	0.244
Hungary	0.378	0.434	0.303
Iceland	0.785	0.878	0.671
India	0.554	0.707	0.430
Indonesia	0.184	0.348	0.178
Iran	0.125	0.192	0.100
Iraq	0.126	0.174	0.112
Ireland	0.765	0.866	0.616
Israel	0.605	0.733	0.477
Italy	0.680	0.815	0.608
Ivory Coast	0.192	0.304	0.213
Jamaica	0.419	0.554	0.375
Japan	0.812	0.874	0.605
Jordan	0.171	0.189	0.070
Kazakhstan	0.145	0.295	0.112
Kenya	0.197	0.315	0.148
Korea_North	0.014	0.092	0.021
Korea_South	0.409	0.534	0.312
Kosovo	0.326	0.469	0.294
Kyrgyzstan	0.212	0.352	0.168
Laos	0.091	0.123	0.036
Latvia	0.761	0.866	0.637
Lebanon	0.246	0.437	0.137
Lesotho	0.236	0.286	0.148
Liberia	0.197	0.306	0.170
Libya	0.103	0.117	0.065
Lithuania	0.822	0.868	0.635
Macedonia	0.387	0.529	0.360
Madagascar	0.199	0.331	0.183
Malawi	0.190	0.292	0.133
Malaysia	0.193	0.311	0.167
Maldives	0.110	0.223	0.113
Mali	0.284	0.376	0.238
Mauritania	0.142	0.271	0.140
Mauritius	0.632	0.730	0.470
Mexico	0.267	0.449	0.275
Moldova	0.457	0.603	0.352
Mongolia	0.317	0.440	0.262
Montenegro	0.408	0.516	0.330
Morocco	0.179	0.210	0.121
Mozambique	0.161	0.241	0.147
Namibia	0.261	0.346	0.212
Nepal	0.192	0.236	0.139
Netherlands	0.831	0.894	0.610
New Zealand	0.812	0.880	0.671
Nicaragua	0.238	0.411	0.220
Niger	0.298	0.365	0.210
Nigeria	0.202	0.312	0.205
Norway	0.874	0.905	0.650
Pakistan	0.186	0.293	0.166
Panama	0.332	0.426	0.262
Papua New Guinea	0.305	0.410	0.245
Paraguay	0.207	0.357	0.213
Peru	0.339	0.506	0.293
Philippines	0.315	0.448	0.318
Poland	0.441	0.519	0.356
Portugal	0.631	0.684	0.464
Qatar	0.093	0.041	0.025
Romania	0.238	0.393	0.225
Russia	0.127	0.259	0.125
Rwanda	0.163	0.217	0.139
Sao Tome and Principe	0.301	0.331	0.214
Saudi Arabia	0.066	0.020	0.022
Senegal	0.405	0.562	0.358
Serbia	0.240	0.297	0.197
Seychelles	0.248	0.337	0.118
Sierra Leone	0.176	0.325	0.192
Slovakia	0.626	0.734	0.546
Slovenia	0.729	0.810	0.653
Solomon Islands	0.380	0.490	0.260
Somalia	0.095	0.191	0.081
South Africa	0.279	0.389	0.263
South Sudan	0.102	0.219	0.175
Spain	0.579	0.651	0.477
Sri Lanka	0.392	0.578	0.303
Sudan	0.088	0.203	0.134
Suriname	0.570	0.658	0.401
Swaziland	0.115	0.118	0.113
Sweden	0.851	0.893	0.667
Switzerland	0.822	0.867	0.762
Syria	0.052	0.155	0.059
Tajikistan	0.095	0.248	0.096
Tanzania	0.308	0.356	0.169
Thailand	0.238	0.327	0.164
Togo	0.149	0.251	0.102
Trinidad and Tobago	0.547	0.655	0.339
Tunisia	0.160	0.222	0.113
Turkey	0.364	0.529	0.279
Turkmenistan	0.034	0.153	0.063
Uganda	0.178	0.264	0.163
Ukraine	0.338	0.508	0.342
United Kingdom	0.802	0.873	0.597
United States	0.773	0.838	0.601
Uruguay	0.610	0.699	0.574
Uzbekistan	0.053	0.177	0.075
Vanuatu	0.419	0.480	0.290
Venezuela	0.530	0.716	0.481
Vietnam_Democratic Republic of	0.093	0.194	0.162
Yemen	0.111	0.194	0.098
Zambia	0.279	0.377	0.262
Zimbabwe	0.161	0.258	0.129

As before, we observe higher levels of electoral democracy overall than participatory or liberal democracy. The cross-sectional data reveals, however, the heterogeneity in these levels. They also demonstrate the substantial regional variation in these indices.

Part 3: Graphics (45 points)

Scatterplot grid by year using `ggplot2` (5 points)

g <- ggplot(vdem_un, aes(x=v2x_libdem, y=v2x_partipdem)) +
      geom_point() +
      geom_smooth(method="lm") +
      facet_wrap( ~ year)
g

## Warning: Removed 49 rows containing non-finite values (stat_smooth).

## Warning: Removed 49 rows containing missing values (geom_point).

Scatterplot grid by country using `ggplot2` (5 points)

g <- ggplot(vdem_un, aes(x=v2x_libdem, y=v2x_partipdem)) +
      geom_point() +
      geom_smooth(method="lm") +
      facet_wrap( ~ countryVDEM)
g

## Warning: Removed 49 rows containing non-finite values (stat_smooth).

## Warning: Removed 49 rows containing missing values (geom_point).

Time series line plot for 3 countries using `ggplot2` (5 points)

vdem_un_ts <- filter(vdem_un, country_text_id %in% c("USA", "ESP", "CHL"))
g <- ggplot(vdem_un_ts, aes(x=year, y=v2x_polyarchy, 
                            group=countryVDEM, color=countryVDEM)) + 
      geom_line() +
      ylab("Electoral Democracy index")
g

Time series line plot for 3 countries using `plotly` (5 points)

plot_ly(vdem_un_ts, x = ~year, y = ~v2x_polyarchy, color = ~countryVDEM, 
        type = "scatter", mode = "lines") %>%
      layout(yaxis = list(title="Electoral Democracy Index"))

## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors

Motion graph using `googleVis` (8 points)

M <- gvisMotionChart(vdem_un, "countryVDEM", "year", 
                     options=list(width=600, height=400))
print(M, "chart")

Interactive world map using `googleVis` (8 points)

GeoStates <- gvisGeoChart(vdem_un_xs, "countryVDEM", "Electoral",
                          options=list(width=600, height=400))
print(GeoStates, "chart")

Three-dimensional scatterplot `plotly` (8 points)

plot_ly(vdem_un, x = ~year, y = ~v2x_polyarchy, z = ~v2x_libdem,
        type = "scatter3d", color = ~undernourish)

## No scatter3d mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode

## Warning: Ignoring 40 observations

Problem Set 1: Answers

Jonathan Kropko

February 9, 2017

Part 1: Data management (45 points)

Loading libraries and data

Cleaning the VDEM data

Cleaning the UN data

Merging the two datasets together

Checking whether country abbreviations match

Recoding country abbreviations and dropping countries in VDEM

Recoding country abbreviations and dropping countries in the UN data

Part 2: Collapsing the data (10 points)

Collapsing by year

Collapsing by country

Part 3: Graphics (45 points)

Scatterplot grid by year using `ggplot2` (5 points)

Scatterplot grid by country using `ggplot2` (5 points)

Time series line plot for 3 countries using `ggplot2` (5 points)

Time series line plot for 3 countries using `plotly` (5 points)

Motion graph using `googleVis` (8 points)

Interactive world map using `googleVis` (8 points)

Three-dimensional scatterplot `plotly` (8 points)

Problem Set 1: Answers

Jonathan Kropko

February 9, 2017

Part 1: Data management (45 points)

Loading libraries and data

Cleaning the VDEM data

Cleaning the UN data

Merging the two datasets together

Checking whether country abbreviations match

Recoding country abbreviations and dropping countries in VDEM

Recoding country abbreviations and dropping countries in the UN data

Part 2: Collapsing the data (10 points)

Collapsing by year

Collapsing by country

Part 3: Graphics (45 points)

Scatterplot grid by year using ggplot2 (5 points)

Scatterplot grid by country using ggplot2 (5 points)

Time series line plot for 3 countries using ggplot2 (5 points)

Time series line plot for 3 countries using plotly (5 points)

Motion graph using googleVis (8 points)

Interactive world map using googleVis (8 points)

Three-dimensional scatterplot plotly (8 points)

Scatterplot grid by year using `ggplot2` (5 points)

Scatterplot grid by country using `ggplot2` (5 points)

Time series line plot for 3 countries using `ggplot2` (5 points)

Time series line plot for 3 countries using `plotly` (5 points)

Motion graph using `googleVis` (8 points)

Interactive world map using `googleVis` (8 points)

Three-dimensional scatterplot `plotly` (8 points)