ggplot2 (5 points)ggplot2 (5 points)ggplot2 (5 points)plotly (5 points)googleVis (8 points)googleVis (8 points)plotly (8 points)We start by loading the relevant packages:
library(dplyr)
library(tidyr)
library(forcats)
library(ggplot2)
library(googleVis)
library(plotly)
library(readr)
library(stargazer)
Then we load the two datasets. Because loading the VDEM data takes a while, we cache this code chunk. Although I didn’t mention this in class, we can use the readr library to use the read_csv() command instead of read.csv(), which the authors of suggest should be about 10 times faster.
vdem <- read_csv("V-Dem-DS-CY+Others-v6.2.csv")
undata <- read_csv("UNdata.csv")
First we will work with the VDEM data, then the UN data
The VDEM data contains 3,173 variables. To make this dataset much easier to work with, the first step will be to keep only 8 interesting variables in addition to the needed IDs. I choose to keep country_name, country_text_id, year, v2x_polyarchy, v2x_api, v2x_mpi, v2x_libdem, v2x_liberal, v2x_partipdem, v2x_delibdem, and v2x_egaldem.
vdem <- select(vdem, country_name, country_text_id, year, v2x_polyarchy,
v2x_api, v2x_mpi, v2x_libdem, v2x_liberal,
v2x_partipdem, v2x_delibdem, v2x_egaldem)
The VDEM data now appears to be both tidy (rows are observations, columns are variables, units of analysis are comparable) and clean (missing datapoints are coded as NA, no additional variables need to be generated, no special characters need to be removed).
summary(vdem)
## country_name country_text_id year v2x_polyarchy
## Length:16675 Length:16675 Min. :1900 Min. :0.0084
## Class :character Class :character 1st Qu.:1932 1st Qu.:0.0965
## Mode :character Mode :character Median :1961 Median :0.2077
## Mean :1960 Mean :0.3212
## 3rd Qu.:1989 3rd Qu.:0.5308
## Max. :2015 Max. :0.9584
## NA's :416
## v2x_api v2x_mpi v2x_libdem v2x_liberal
## Min. :0.0168 Min. :0.0000 Min. :0.0109 Min. :0.03104
## 1st Qu.:0.1930 1st Qu.:0.0000 1st Qu.:0.0776 1st Qu.:0.25252
## Median :0.4122 Median :0.0009 Median :0.1498 Median :0.44538
## Mean :0.4679 Mean :0.1746 Mean :0.2596 Mean :0.48289
## 3rd Qu.:0.7713 3rd Qu.:0.2851 3rd Qu.:0.3795 3rd Qu.:0.70355
## Max. :0.9832 Max. :0.9337 Max. :0.9278 Max. :0.98123
## NA's :416 NA's :416 NA's :416 NA's :55
## v2x_partipdem v2x_delibdem v2x_egaldem
## Min. :0.0004 Min. :0.0003 Min. :0.0074
## 1st Qu.:0.0484 1st Qu.:0.0134 1st Qu.:0.0619
## Median :0.1053 Median :0.0703 Median :0.1455
## Mean :0.1990 Mean :0.2113 Mean :0.2448
## 3rd Qu.:0.3044 3rd Qu.:0.3484 3rd Qu.:0.3431
## Max. :0.8404 Max. :0.9291 Max. :0.9247
## NA's :425 NA's :524 NA's :416
So we can move on to cleaning the UN data.
The UN data is not shaped well. Taking a look at the first several rows and columns reveals the problems we must address.
library(knitr)
## Warning: package 'knitr' was built under R version 3.3.2
kable(head(undata[,1:5]))
| Series Name | Series Code | Country Name | Country Code | 1960 [YR1960] |
|---|---|---|---|---|
| Physicians (per 1,000 people) | SH.MED.PHYS.ZS | Afghanistan | AFG | 0.0348442494869232 |
| Physicians (per 1,000 people) | SH.MED.PHYS.ZS | Albania | ALB | 0.276291221380234 |
| Physicians (per 1,000 people) | SH.MED.PHYS.ZS | Algeria | DZA | 0.173148155212402 |
| Physicians (per 1,000 people) | SH.MED.PHYS.ZS | American Samoa | ASM | .. |
| Physicians (per 1,000 people) | SH.MED.PHYS.ZS | Andorra | ADO | .. |
| Physicians (per 1,000 people) | SH.MED.PHYS.ZS | Angola | AGO | 0.0670681074261665 |
(Incidentally, if you compile a document like this as PDF, you can use the stargazer library to output the above table as latex code. If you specify results='asis' as a code chunk option, the latex code gets compiled automatically in a nice looking table. We have to leave the document at HTML however to allow the interactive graphics to be displayed.)
stargazer(head(as.matrix(undata[,1:5])),
title="The first 6 rows and 5 columns of the UN data")
% Table created by stargazer v.5.2 by Marek Hlavac, Harvard University. E-mail: hlavac at fas.harvard.edu % Date and time: Fri, Feb 10, 2017 - 15:50:06
\begin{table}[!htbp] \centering
\caption{The first 6 rows and 5 columns of the UN data}
\label{}
\begin{tabular}{@{\extracolsep{5pt}} ccccc}
\\[-1.8ex]\hline
\hline \\[-1.8ex]
Series Name & Series Code & Country Name & Country Code & 1960 [YR1960] \\
\hline \\[-1.8ex]
Physicians (per 1,000 people) & SH.MED.PHYS.ZS & Afghanistan & AFG & 0.0348442494869232 \\
Physicians (per 1,000 people) & SH.MED.PHYS.ZS & Albania & ALB & 0.276291221380234 \\
Physicians (per 1,000 people) & SH.MED.PHYS.ZS & Algeria & DZA & 0.173148155212402 \\
Physicians (per 1,000 people) & SH.MED.PHYS.ZS & American Samoa & ASM & .. \\
Physicians (per 1,000 people) & SH.MED.PHYS.ZS & Andorra & ADO & .. \\
Physicians (per 1,000 people) & SH.MED.PHYS.ZS & Angola & AGO & 0.0670681074261665 \\
\hline \\[-1.8ex]
\end{tabular}
\end{table}
First, years should be observations but are coded as columns. Second the variables are expressed as observations, distinguished by different series names and codes. Third, both the years and the variables are given names that are currently too complex to use. We must reshape the data and rename the various elements of the data.
First, let’s bring the years down to the rows by using the gather() function.
undata <- gather(undata, `1960 [YR1960]`:`2015 [YR2015]`, key="year", value="value")
The truly tricky thing about the above command is dealing with the obnoxious space in variable names like 1960 [YR1960]. Using double or single quotes around the variable names did not work. I discovered this use of the slanted quotes by typing undata$ into the R console in R Studio, as if I were about to type a variable name after the dollar sign. A drop down menu appeared that allowed me to click on a variable. I clicked on the first year and it placed the variable name in the command with slanty quotes.
The reshape appears to have worked as intended, since the years are now in the rows.
library(knitr)
kable(head(undata))
| Series Name | Series Code | Country Name | Country Code | year | value |
|---|---|---|---|---|---|
| Physicians (per 1,000 people) | SH.MED.PHYS.ZS | Afghanistan | AFG | 1960 [YR1960] | 0.0348442494869232 |
| Physicians (per 1,000 people) | SH.MED.PHYS.ZS | Albania | ALB | 1960 [YR1960] | 0.276291221380234 |
| Physicians (per 1,000 people) | SH.MED.PHYS.ZS | Algeria | DZA | 1960 [YR1960] | 0.173148155212402 |
| Physicians (per 1,000 people) | SH.MED.PHYS.ZS | American Samoa | ASM | 1960 [YR1960] | .. |
| Physicians (per 1,000 people) | SH.MED.PHYS.ZS | Andorra | ADO | 1960 [YR1960] | .. |
| Physicians (per 1,000 people) | SH.MED.PHYS.ZS | Angola | AGO | 1960 [YR1960] | 0.0670681074261665 |
| We aren’t going to need both the | series name and | the series code, | so let’s drop t | he series code a | s this information is more difficult to read than the series name. |
undata <- select(undata, -`Series Code`)
Next let’s move the variables listed in the series name to the columns. The series name variable appears to have a name with both a space and a special character at the beginning of the name, and it is difficult to deal with this name. So first we can change this name.
names(undata)[1] <- "Series"
We run into an error with the spread() command that says there are duplicate identifiers. Upon investigating the data in the View() window, it appears that these duplicates are missing values on the Series variable that separate one health indicator from the next. To correct this error, we eliminate the rows with missing values on the series name.
undata <- filter(undata, !is.na(Series))
Now we can perform a wide reshape.
undata <- spread(undata, key=Series, value=value)
The year variable still has the unhelpful [YR1960] syntax attached to each year. To isolate the numeric years, we use the separate() command (inverse of the unite()) command. We save the bracketed part as a variable named todrop so we know exactly what to drop.
undata <- separate(undata, year, into=c("year", "todrop"), sep=" ")
Two variables exist because the raw UN data placed headers in the rows (along with missing values) to separate the variables. These variables contain no data, so we can delete them along with todrop.
undata <- select(undata, -todrop,
-`Data from database: Health Nutrition and Population Statistics`,
-`Last Updated: 12/16/2016`)
Finally, we rename the variables to match VDEM and to be short and descriptive. We also convert year and the three substantive variables to be numeric (since they are all still character vectors).
names(undata) <- c("country_name", "country_text_id", "year",
"healthex","phys","undernourish")
undata <- mutate(undata, year=as.numeric(year), healthex=as.numeric(healthex),
phys=as.numeric(phys), undernourish=as.numeric(undernourish))
## Warning in eval(substitute(expr), envir, enclos): NAs introduced by
## coercion
## Warning in eval(substitute(expr), envir, enclos): NAs introduced by
## coercion
## Warning in eval(substitute(expr), envir, enclos): NAs introduced by
## coercion
Merging the two datasets will likely produce mistakes that don’t produce an error, but nevertheless fail to correctly match countries in one data frame to the other. That’s because different cross-national datasets often use slightly different names and abbreviations for countries. We need a way to quickly see the countries that are matched and unmatched after the merge. If we use the full_join() command for the merge, unmatched rows will have NA values for the variables from the other dataset. The trick is, how can we be sure that a missing value is always due to an unmatched row, and not due to the myriad of other reasons why the data might be missing? For that reason, I create indicators in each dataset which are always equal to 1 and are never missing. Also, in order to merge only on the country abbreviates and year, and not the country names, I rename the country name variable in each data frame.
vdem$vdem <- 1
undata$undata <- 1
names(vdem)[1] <- "countryVDEM"
names(undata)[1] <- "countryUN"
The two datasets have different time frames. VDEM exists from 1900-2015. The UN data exists from 1960-2015. The overlap is 1960-2015. I keep only the rows in VDEM from these years.
vdem <- filter(vdem, year >= 1960)
Now I try the merge, and rearrange the columns to more easily see in the viewer the ID variables and the two indicators I just created.
vdem_un <- full_join(vdem, undata)
## Joining, by = c("country_text_id", "year")
vdem_un <- select(vdem_un, vdem, undata, country_text_id, year,
countryVDEM, countryUN, everything())
To see what observations from VDEM are unmatched with the UN, we create a subset of the data in which the VDEM indicator exists and the UN indicator is missing. We save the country ID and year variables. Then we repeat to see the observations that exist in the UN data but not VDEM.
unmatch.vdem <- filter(vdem_un, is.na(undata) & !is.na(vdem))
unmatch.vdem <- select(unmatch.vdem, country_text_id, countryVDEM, year)
nrow(unmatch.vdem)
## [1] 401
unmatch.un <- filter(vdem_un, !is.na(undata) & is.na(vdem))
unmatch.un <- select(unmatch.un, country_text_id, countryUN, year)
nrow(unmatch.un)
## [1] 6371
There are 11 countries that exist in VDEM but not in the UN data.
unmatch.vdem <- group_by(unmatch.vdem, country_text_id, countryVDEM)
unmatch.vdem <- summarize(unmatch.vdem, start=min(year), end=max(year))
library(knitr)
kable(unmatch.vdem)
| country_text_id | countryVDEM | start | end |
|---|---|---|---|
| COD | Congo_Democratic Republic of | 1960 | 2012 |
| DDR | German Democratic Republic | 1960 | 1990 |
| PSE | Palestine_West_Bank | 1968 | 2014 |
| PSG | Palestine_Gaza | 1960 | 2014 |
| ROU | Romania | 1960 | 2015 |
| SML | Somaliland | 1992 | 2015 |
| TLS | East Timor | 1960 | 2015 |
| TWN | Taiwan | 1960 | 2015 |
| VDR | Vietnam_Republic of | 1960 | 1975 |
| XKX | Kosovo | 1999 | 2015 |
| YMD | South Yemen | 1960 | 1990 |
The reasons why these countries are unmatched are as follows:
I adjust the abbreviations and drop cases as necessary.
vdem <- filter(vdem, !(country_text_id %in% c("DDR", "PSE", "PSG", "SML",
"TWN", "VDR", "YMD")))
vdem <- mutate(vdem, country_text_id = as.factor(country_text_id),
country_text_id = fct_recode(country_text_id,
"ZAR"="COD",
"ROM"="ROU",
"TMP"="TLS",
"KSV"="XKX"),
country_text_id = as.character(country_text_id))
Having adjusted the VDEM data, I try the merge again.
vdem_un <- full_join(vdem, undata)
## Joining, by = c("country_text_id", "year")
vdem_un <- select(vdem_un, vdem, undata, country_text_id, year,
countryVDEM, countryUN, everything())
There are 204 countries that exist in the UN data but not VDEM.
unmatch.un <- filter(vdem_un, !is.na(undata) & is.na(vdem))
unmatch.un <- select(unmatch.un, country_text_id, countryUN, year)
unmatch.un <- group_by(unmatch.un, country_text_id, countryUN)
unmatch.un <- summarize(unmatch.un, start=min(year), end=max(year))
nrow(unmatch.un)
## [1] 204
library(knitr)
kable(unmatch.un)
| country_text_id | countryUN | start | end |
|---|---|---|---|
| ABW | Aruba | 1960 | 2015 |
| ADO | Andorra | 1960 | 2015 |
| AGO | Angola | 2013 | 2015 |
| ALB | Albania | 2013 | 2015 |
| ARB | Arab World | 1960 | 2015 |
| ARE | United Arab Emirates | 1960 | 2015 |
| ARM | Armenia | 1960 | 1989 |
| ASM | American Samoa | 1960 | 2015 |
| ATG | Antigua and Barbuda | 1960 | 2015 |
| AUS | Australia | 2015 | 2015 |
| AUT | Austria | 2013 | 2015 |
| AZE | Azerbaijan | 1960 | 1989 |
| BEL | Belgium | 2015 | 2015 |
| BGD | Bangladesh | 1960 | 2015 |
| BGR | Bulgaria | 2015 | 2015 |
| BHR | Bahrain | 1960 | 2015 |
| BHS | Bahamas, The | 1960 | 2015 |
| BIH | Bosnia and Herzegovina | 1960 | 1991 |
| BLR | Belarus | 1960 | 1989 |
| BLZ | Belize | 1960 | 2015 |
| BMU | Bermuda | 1960 | 2015 |
| BRB | Barbados | 2015 | 2015 |
| BRN | Brunei Darussalam | 1960 | 2015 |
| BWA | Botswana | 2015 | 2015 |
| CAF | Central African Republic | 2013 | 2015 |
| CAN | Canada | 2015 | 2015 |
| CEB | Central Europe and the Baltics | 1960 | 2015 |
| CHE | Switzerland | 2015 | 2015 |
| CHI | Channel Islands | 1960 | 2015 |
| CHL | Chile | 2015 | 2015 |
| CHN | China | 2015 | 2015 |
| CIV | Cote d’Ivoire | 2013 | 2015 |
| CMR | Cameroon | 1960 | 1960 |
| COG | Congo, Rep. | 2013 | 2015 |
| COM | Comoros | 2013 | 2015 |
| CPV | Cabo Verde | 2015 | 2015 |
| CSS | Caribbean small states | 1960 | 2015 |
| CUW | Curacao | 1960 | 2015 |
| CYM | Cayman Islands | 1960 | 2015 |
| CYP | Cyprus | 2013 | 2015 |
| CZE | Czech Republic | 2013 | 2015 |
| DEU | Germany | 2015 | 2015 |
| DJI | Djibouti | 2013 | 2015 |
| DMA | Dominica | 1960 | 2015 |
| DNK | Denmark | 2015 | 2015 |
| DOM | Dominican Republic | 2015 | 2015 |
| EAP | East Asia & Pacific (excluding high income) | 1960 | 2015 |
| EAR | Early-demographic dividend | 1960 | 2015 |
| EAS | East Asia & Pacific | 1960 | 2015 |
| ECA | Europe & Central Asia (excluding high income) | 1960 | 2015 |
| ECS | Europe & Central Asia | 1960 | 2015 |
| ECU | Ecuador | 2013 | 2015 |
| EGY | Egypt, Arab Rep. | 2015 | 2015 |
| EMU | Euro area | 1960 | 2015 |
| ESP | Spain | 2015 | 2015 |
| EST | Estonia | 1960 | 1990 |
| EUU | European Union | 1960 | 2015 |
| FCS | Fragile and conflict affected situations | 1960 | 2015 |
| FIN | Finland | 2015 | 2015 |
| FRA | France | 2013 | 2015 |
| FRO | Faroe Islands | 1960 | 2015 |
| FSM | Micronesia, Fed. Sts. | 1960 | 2015 |
| GAB | Gabon | 2013 | 2015 |
| GBR | United Kingdom | 2013 | 2015 |
| GEO | Georgia | 1960 | 1989 |
| GIB | Gibraltar | 1960 | 2015 |
| GIN | Guinea | 2013 | 2015 |
| GMB | Gambia, The | 2013 | 2015 |
| GNB | Guinea-Bissau | 2013 | 2015 |
| GNQ | Equatorial Guinea | 1960 | 2015 |
| GRC | Greece | 2013 | 2015 |
| GRD | Grenada | 1960 | 2015 |
| GRL | Greenland | 1960 | 2015 |
| GTM | Guatemala | 2013 | 2015 |
| GUM | Guam | 1960 | 2015 |
| HIC | High income | 1960 | 2015 |
| HKG | Hong Kong SAR, China | 1960 | 2015 |
| HND | Honduras | 2013 | 2015 |
| HPC | Heavily indebted poor countries (HIPC) | 1960 | 2015 |
| HRV | Croatia | 1960 | 2015 |
| HTI | Haiti | 2013 | 2015 |
| HUN | Hungary | 2013 | 2015 |
| IDN | Indonesia | 2015 | 2015 |
| IMY | Isle of Man | 1960 | 2015 |
| IND | India | 2015 | 2015 |
| IRL | Ireland | 2013 | 2015 |
| ISL | Iceland | 2013 | 2015 |
| ISR | Israel | 2013 | 2015 |
| ITA | Italy | 2013 | 2015 |
| JAM | Jamaica | 2013 | 2015 |
| JPN | Japan | 2015 | 2015 |
| KAZ | Kazakhstan | 1960 | 1989 |
| KGZ | Kyrgyz Republic | 1960 | 1989 |
| KIR | Kiribati | 1960 | 2015 |
| KNA | St. Kitts and Nevis | 1960 | 2015 |
| KOR | Korea, Rep. | 2015 | 2015 |
| KSV | Kosovo | 1960 | 1998 |
| KWT | Kuwait | 1960 | 2015 |
| LAC | Latin America & Caribbean (excluding high income) | 1960 | 2015 |
| LAO | Lao PDR | 2013 | 2015 |
| LBR | Liberia | 2013 | 2015 |
| LBY | Libya | 2015 | 2015 |
| LCA | St. Lucia | 1960 | 2015 |
| LCN | Latin America & Caribbean | 1960 | 2015 |
| LDC | Least developed countries: UN classification | 1960 | 2015 |
| LIC | Low income | 1960 | 2015 |
| LIE | Liechtenstein | 1960 | 2015 |
| LKA | Sri Lanka | 2015 | 2015 |
| LMC | Lower middle income | 1960 | 2015 |
| LMY | Low & middle income | 1960 | 2015 |
| LSO | Lesotho | 2013 | 2015 |
| LTE | Late-demographic dividend | 1960 | 2015 |
| LTU | Lithuania | 1960 | 1990 |
| LUX | Luxembourg | 1960 | 2015 |
| LVA | Latvia | 1960 | 1990 |
| MAC | Macao SAR, China | 1960 | 2015 |
| MAF | St. Martin (French part) | 1960 | 2015 |
| MCO | Monaco | 1960 | 2015 |
| MDA | Moldova | 1960 | 1989 |
| MDG | Madagascar | 2013 | 2015 |
| MEA | Middle East & North Africa | 1960 | 2015 |
| MEX | Mexico | 2015 | 2015 |
| MHL | Marshall Islands | 1960 | 2015 |
| MIC | Middle income | 1960 | 2015 |
| MKD | Macedonia, FYR | 1960 | 1990 |
| MLI | Mali | 2013 | 2015 |
| MLT | Malta | 1960 | 2015 |
| MNA | Middle East & North Africa (excluding high income) | 1960 | 2015 |
| MNE | Montenegro | 1960 | 2015 |
| MNP | Northern Mariana Islands | 1960 | 2015 |
| MRT | Mauritania | 2013 | 2015 |
| MUS | Mauritius | 2015 | 2015 |
| MYS | Malaysia | 2013 | 2015 |
| NAC | North America | 1960 | 2015 |
| NAM | Namibia | 2015 | 2015 |
| NCL | New Caledonia | 1960 | 2015 |
| NER | Niger | 2013 | 2015 |
| NIC | Nicaragua | 2013 | 2015 |
| NLD | Netherlands | 2015 | 2015 |
| NOR | Norway | 2015 | 2015 |
| NRU | Nauru | 1960 | 2015 |
| NZL | New Zealand | 2013 | 2015 |
| OED | OECD members | 1960 | 2015 |
| OMN | Oman | 1960 | 2015 |
| OSS | Other small states | 1960 | 2015 |
| PAK | Pakistan | 2015 | 2015 |
| PAN | Panama | 2013 | 2015 |
| PER | Peru | 2015 | 2015 |
| PLW | Palau | 1960 | 2015 |
| PNG | Papua New Guinea | 2015 | 2015 |
| PRE | Pre-demographic dividend | 1960 | 2015 |
| PRI | Puerto Rico | 1960 | 2015 |
| PRK | Korea, Dem. People’s Rep. | 2013 | 2015 |
| PSS | Pacific island small states | 1960 | 2015 |
| PST | Post-demographic dividend | 1960 | 2015 |
| PYF | French Polynesia | 1960 | 2015 |
| SAS | South Asia | 1960 | 2015 |
| SAU | Saudi Arabia | 2013 | 2015 |
| SEN | Senegal | 2013 | 2015 |
| SGP | Singapore | 1960 | 2015 |
| SLE | Sierra Leone | 2013 | 2015 |
| SMR | San Marino | 1960 | 2015 |
| SRB | Serbia | 2013 | 2015 |
| SSA | Sub-Saharan Africa (excluding high income) | 1960 | 2015 |
| SSD | South Sudan | 1960 | 2010 |
| SSF | Sub-Saharan Africa | 1960 | 2015 |
| SST | Small states | 1960 | 2015 |
| STP | Sao Tome and Principe | 2013 | 2015 |
| SVK | Slovak Republic | 1960 | 2015 |
| SVN | Slovenia | 1960 | 1988 |
| SWE | Sweden | 2015 | 2015 |
| SWZ | Swaziland | 2013 | 2015 |
| SXM | Sint Maarten (Dutch part) | 1960 | 2015 |
| SYC | Seychelles | 2013 | 2015 |
| TCA | Turks and Caicos Islands | 1960 | 2015 |
| TCD | Chad | 2013 | 2015 |
| TEA | East Asia & Pacific (IDA & IBRD countries) | 1960 | 2015 |
| TEC | Europe & Central Asia (IDA & IBRD countries) | 1960 | 2015 |
| TGO | Togo | 2013 | 2015 |
| TJK | Tajikistan | 1960 | 1989 |
| TKM | Turkmenistan | 1960 | 2015 |
| TLA | Latin America & the Caribbean (IDA & IBRD countries) | 1960 | 2015 |
| TMN | Middle East & North Africa (IDA & IBRD countries) | 1960 | 2015 |
| TON | Tonga | 1960 | 2015 |
| TSA | South Asia (IDA & IBRD) | 1960 | 2015 |
| TSS | Sub-Saharan Africa (IDA & IBRD countries) | 1960 | 2015 |
| TTO | Trinidad and Tobago | 2013 | 2015 |
| TUV | Tuvalu | 1960 | 2015 |
| UKR | Ukraine | 1960 | 1989 |
| UMC | Upper middle income | 1960 | 2015 |
| URY | Uruguay | 2015 | 2015 |
| UZB | Uzbekistan | 1960 | 1989 |
| VCT | St. Vincent and the Grenadines | 1960 | 2015 |
| VEN | Venezuela, RB | 2013 | 2015 |
| VGB | British Virgin Islands | 1960 | 2015 |
| VIR | Virgin Islands (U.S.) | 1960 | 2015 |
| VNM | Vietnam | 2013 | 2015 |
| VUT | Vanuatu | 2015 | 2015 |
| WBG | West Bank and Gaza | 1960 | 2015 |
| WLD | World | 1960 | 2015 |
| WSM | Samoa | 1960 | 2015 |
| ZAF | South Africa | 2015 | 2015 |
| ZAR | Congo, Dem. Rep. | 2013 | 2015 |
| NA | NA | 1960 | 2015 |
It appears that there are three types of entries in this list:
In all three cases, I am comfortable dropping these cases. I also now remove the vdem and undata indicators.
vdem_un <- filter(vdem_un, !is.na(vdem))
vdem_un <- select(vdem_un, -vdem, -undata)
We now have a complete and cleaned data frame with the merged VDEM and UN data.
I will examine the VDEM indices for liberal democracy, electoral democracy, and participatory democracy.
Collapsing by year allows us to examine over-time variation in these democracy indices. To collapse, we use the group_by() and summarize() commands.
vdem_un_time <- group_by(vdem_un, year)
vdem_un_time <- summarize(vdem_un_time,
Liberal=mean(v2x_libdem, na.rm=TRUE),
Electoral=mean(v2x_polyarchy, na.rm=TRUE),
Participatory=mean(v2x_partipdem, na.rm=TRUE))
Next I use the kable() command to display the data.
library(knitr)
kable(round(vdem_un_time,3))
| year | Liberal | Electoral | Participatory |
|---|---|---|---|
| 1960 | 0.250 | 0.325 | 0.184 |
| 1961 | 0.252 | 0.329 | 0.187 |
| 1962 | 0.252 | 0.327 | 0.187 |
| 1963 | 0.253 | 0.330 | 0.188 |
| 1964 | 0.251 | 0.327 | 0.187 |
| 1965 | 0.252 | 0.327 | 0.188 |
| 1966 | 0.253 | 0.328 | 0.190 |
| 1967 | 0.249 | 0.322 | 0.187 |
| 1968 | 0.247 | 0.321 | 0.186 |
| 1969 | 0.248 | 0.318 | 0.186 |
| 1970 | 0.247 | 0.318 | 0.188 |
| 1971 | 0.247 | 0.317 | 0.189 |
| 1972 | 0.244 | 0.317 | 0.190 |
| 1973 | 0.242 | 0.315 | 0.189 |
| 1974 | 0.244 | 0.316 | 0.191 |
| 1975 | 0.249 | 0.321 | 0.195 |
| 1976 | 0.253 | 0.324 | 0.198 |
| 1977 | 0.255 | 0.327 | 0.200 |
| 1978 | 0.258 | 0.332 | 0.203 |
| 1979 | 0.266 | 0.346 | 0.212 |
| 1980 | 0.269 | 0.351 | 0.216 |
| 1981 | 0.269 | 0.349 | 0.216 |
| 1982 | 0.268 | 0.350 | 0.216 |
| 1983 | 0.272 | 0.353 | 0.218 |
| 1984 | 0.276 | 0.357 | 0.221 |
| 1985 | 0.284 | 0.369 | 0.228 |
| 1986 | 0.289 | 0.375 | 0.234 |
| 1987 | 0.293 | 0.380 | 0.238 |
| 1988 | 0.299 | 0.387 | 0.245 |
| 1989 | 0.304 | 0.396 | 0.249 |
| 1990 | 0.333 | 0.434 | 0.271 |
| 1991 | 0.361 | 0.466 | 0.293 |
| 1992 | 0.377 | 0.489 | 0.311 |
| 1993 | 0.387 | 0.502 | 0.320 |
| 1994 | 0.392 | 0.510 | 0.328 |
| 1995 | 0.397 | 0.514 | 0.333 |
| 1996 | 0.400 | 0.521 | 0.336 |
| 1997 | 0.399 | 0.523 | 0.337 |
| 1998 | 0.401 | 0.524 | 0.339 |
| 1999 | 0.400 | 0.520 | 0.338 |
| 2000 | 0.407 | 0.524 | 0.344 |
| 2001 | 0.411 | 0.528 | 0.346 |
| 2002 | 0.419 | 0.539 | 0.351 |
| 2003 | 0.422 | 0.543 | 0.355 |
| 2004 | 0.425 | 0.545 | 0.358 |
| 2005 | 0.427 | 0.550 | 0.361 |
| 2006 | 0.429 | 0.554 | 0.364 |
| 2007 | 0.429 | 0.552 | 0.364 |
| 2008 | 0.432 | 0.558 | 0.366 |
| 2009 | 0.435 | 0.561 | 0.368 |
| 2010 | 0.435 | 0.561 | 0.369 |
| 2011 | 0.439 | 0.566 | 0.372 |
| 2012 | 0.439 | 0.567 | 0.373 |
| 2013 | 0.440 | 0.567 | 0.371 |
| 2014 | 0.431 | 0.558 | 0.366 |
| 2015 | 0.359 | 0.489 | 0.312 |
Countries, in general, exhibit more electoral democracy than liberal or participatory democracy. Across the entire time frame, the countries have become more democratic overall. Participatory democracy has experienced the greatest growth over time.
Collapsing by country allows us to examine cross-national variation in these democracy indices. To collapse, we convert countryVDEM to a factor and use the group_by() and summarize() commands.
vdem_un <- mutate(vdem_un, countryVDEM=factor(countryVDEM))
vdem_un_xs <- group_by(vdem_un, countryVDEM)
vdem_un_xs <- summarize(vdem_un_xs,
Liberal=round(mean(v2x_libdem, na.rm=TRUE),3),
Electoral=round(mean(v2x_polyarchy, na.rm=TRUE),3),
Participatory=round(mean(v2x_partipdem, na.rm=TRUE),3))
Next I use the kable() command to display the data.
library(knitr)
kable(vdem_un_xs)
| countryVDEM | Liberal | Electoral | Participatory |
|---|---|---|---|
| Afghanistan | 0.112 | 0.181 | 0.064 |
| Albania | 0.208 | 0.320 | 0.171 |
| Algeria | 0.130 | 0.243 | 0.114 |
| Angola | 0.080 | 0.102 | 0.062 |
| Argentina | 0.469 | 0.612 | 0.432 |
| Armenia | 0.245 | 0.453 | 0.236 |
| Australia | 0.843 | 0.898 | 0.692 |
| Austria | 0.789 | 0.881 | 0.670 |
| Azerbaijan | 0.083 | 0.252 | 0.099 |
| Bangladesh | 0.252 | 0.437 | 0.249 |
| Barbados | 0.517 | 0.584 | 0.284 |
| Belarus | 0.181 | 0.346 | 0.164 |
| Belgium | 0.738 | 0.815 | 0.569 |
| Benin | 0.300 | 0.403 | 0.223 |
| Bhutan | 0.185 | 0.127 | 0.120 |
| Bolivia | 0.296 | 0.479 | 0.278 |
| Bosnia and Herzegovina | 0.243 | 0.340 | 0.219 |
| Botswana | 0.500 | 0.602 | 0.400 |
| Brazil | 0.467 | 0.596 | 0.422 |
| Bulgaria | 0.312 | 0.421 | 0.253 |
| Burkina Faso | 0.254 | 0.405 | 0.250 |
| Burma_Myanmar | 0.046 | 0.150 | 0.082 |
| Burundi | 0.156 | 0.205 | 0.124 |
| Cambodia | 0.108 | 0.251 | 0.092 |
| Cameroon | 0.126 | 0.256 | 0.130 |
| Canada | 0.807 | 0.874 | 0.624 |
| Cape Verde | 0.419 | 0.477 | 0.296 |
| Central African Republic | 0.130 | 0.228 | 0.135 |
| Chad | 0.082 | 0.200 | 0.119 |
| Chile | 0.500 | 0.585 | 0.369 |
| China | 0.046 | 0.097 | 0.036 |
| Colombia | 0.344 | 0.467 | 0.261 |
| Comoros | 0.192 | 0.331 | 0.188 |
| Congo_Democratic Republic of | 0.089 | 0.223 | 0.135 |
| Congo_Republic of the | 0.104 | 0.223 | 0.143 |
| Costa Rica | 0.807 | 0.867 | 0.604 |
| Croatia | 0.511 | 0.668 | 0.449 |
| Cuba | 0.062 | 0.108 | 0.091 |
| Cyprus | 0.553 | 0.661 | 0.382 |
| Czech Republic | 0.411 | 0.500 | 0.298 |
| Denmark | 0.892 | 0.920 | 0.724 |
| Djibouti | 0.090 | 0.201 | 0.099 |
| Dominican Republic | 0.294 | 0.506 | 0.277 |
| East Timor | 0.146 | 0.211 | 0.086 |
| Ecuador | 0.360 | 0.548 | 0.354 |
| Egypt | 0.148 | 0.224 | 0.130 |
| El Salvador | 0.210 | 0.358 | 0.225 |
| Eritrea | 0.022 | 0.084 | 0.025 |
| Estonia | 0.862 | 0.911 | 0.683 |
| Ethiopia | 0.076 | 0.171 | 0.071 |
| Fiji | 0.331 | 0.434 | 0.168 |
| Finland | 0.847 | 0.887 | 0.635 |
| France | 0.826 | 0.893 | 0.697 |
| Gabon | 0.166 | 0.286 | 0.154 |
| Gambia | 0.284 | 0.430 | 0.219 |
| Georgia | 0.307 | 0.498 | 0.214 |
| Germany | 0.727 | 0.771 | 0.563 |
| Ghana | 0.368 | 0.438 | 0.222 |
| Greece | 0.609 | 0.693 | 0.448 |
| Guatemala | 0.186 | 0.342 | 0.216 |
| Guinea | 0.082 | 0.209 | 0.112 |
| Guinea-Bissau | 0.132 | 0.208 | 0.099 |
| Guyana | 0.281 | 0.442 | 0.270 |
| Haiti | 0.134 | 0.265 | 0.146 |
| Honduras | 0.247 | 0.395 | 0.244 |
| Hungary | 0.378 | 0.434 | 0.303 |
| Iceland | 0.785 | 0.878 | 0.671 |
| India | 0.554 | 0.707 | 0.430 |
| Indonesia | 0.184 | 0.348 | 0.178 |
| Iran | 0.125 | 0.192 | 0.100 |
| Iraq | 0.126 | 0.174 | 0.112 |
| Ireland | 0.765 | 0.866 | 0.616 |
| Israel | 0.605 | 0.733 | 0.477 |
| Italy | 0.680 | 0.815 | 0.608 |
| Ivory Coast | 0.192 | 0.304 | 0.213 |
| Jamaica | 0.419 | 0.554 | 0.375 |
| Japan | 0.812 | 0.874 | 0.605 |
| Jordan | 0.171 | 0.189 | 0.070 |
| Kazakhstan | 0.145 | 0.295 | 0.112 |
| Kenya | 0.197 | 0.315 | 0.148 |
| Korea_North | 0.014 | 0.092 | 0.021 |
| Korea_South | 0.409 | 0.534 | 0.312 |
| Kosovo | 0.326 | 0.469 | 0.294 |
| Kyrgyzstan | 0.212 | 0.352 | 0.168 |
| Laos | 0.091 | 0.123 | 0.036 |
| Latvia | 0.761 | 0.866 | 0.637 |
| Lebanon | 0.246 | 0.437 | 0.137 |
| Lesotho | 0.236 | 0.286 | 0.148 |
| Liberia | 0.197 | 0.306 | 0.170 |
| Libya | 0.103 | 0.117 | 0.065 |
| Lithuania | 0.822 | 0.868 | 0.635 |
| Macedonia | 0.387 | 0.529 | 0.360 |
| Madagascar | 0.199 | 0.331 | 0.183 |
| Malawi | 0.190 | 0.292 | 0.133 |
| Malaysia | 0.193 | 0.311 | 0.167 |
| Maldives | 0.110 | 0.223 | 0.113 |
| Mali | 0.284 | 0.376 | 0.238 |
| Mauritania | 0.142 | 0.271 | 0.140 |
| Mauritius | 0.632 | 0.730 | 0.470 |
| Mexico | 0.267 | 0.449 | 0.275 |
| Moldova | 0.457 | 0.603 | 0.352 |
| Mongolia | 0.317 | 0.440 | 0.262 |
| Montenegro | 0.408 | 0.516 | 0.330 |
| Morocco | 0.179 | 0.210 | 0.121 |
| Mozambique | 0.161 | 0.241 | 0.147 |
| Namibia | 0.261 | 0.346 | 0.212 |
| Nepal | 0.192 | 0.236 | 0.139 |
| Netherlands | 0.831 | 0.894 | 0.610 |
| New Zealand | 0.812 | 0.880 | 0.671 |
| Nicaragua | 0.238 | 0.411 | 0.220 |
| Niger | 0.298 | 0.365 | 0.210 |
| Nigeria | 0.202 | 0.312 | 0.205 |
| Norway | 0.874 | 0.905 | 0.650 |
| Pakistan | 0.186 | 0.293 | 0.166 |
| Panama | 0.332 | 0.426 | 0.262 |
| Papua New Guinea | 0.305 | 0.410 | 0.245 |
| Paraguay | 0.207 | 0.357 | 0.213 |
| Peru | 0.339 | 0.506 | 0.293 |
| Philippines | 0.315 | 0.448 | 0.318 |
| Poland | 0.441 | 0.519 | 0.356 |
| Portugal | 0.631 | 0.684 | 0.464 |
| Qatar | 0.093 | 0.041 | 0.025 |
| Romania | 0.238 | 0.393 | 0.225 |
| Russia | 0.127 | 0.259 | 0.125 |
| Rwanda | 0.163 | 0.217 | 0.139 |
| Sao Tome and Principe | 0.301 | 0.331 | 0.214 |
| Saudi Arabia | 0.066 | 0.020 | 0.022 |
| Senegal | 0.405 | 0.562 | 0.358 |
| Serbia | 0.240 | 0.297 | 0.197 |
| Seychelles | 0.248 | 0.337 | 0.118 |
| Sierra Leone | 0.176 | 0.325 | 0.192 |
| Slovakia | 0.626 | 0.734 | 0.546 |
| Slovenia | 0.729 | 0.810 | 0.653 |
| Solomon Islands | 0.380 | 0.490 | 0.260 |
| Somalia | 0.095 | 0.191 | 0.081 |
| South Africa | 0.279 | 0.389 | 0.263 |
| South Sudan | 0.102 | 0.219 | 0.175 |
| Spain | 0.579 | 0.651 | 0.477 |
| Sri Lanka | 0.392 | 0.578 | 0.303 |
| Sudan | 0.088 | 0.203 | 0.134 |
| Suriname | 0.570 | 0.658 | 0.401 |
| Swaziland | 0.115 | 0.118 | 0.113 |
| Sweden | 0.851 | 0.893 | 0.667 |
| Switzerland | 0.822 | 0.867 | 0.762 |
| Syria | 0.052 | 0.155 | 0.059 |
| Tajikistan | 0.095 | 0.248 | 0.096 |
| Tanzania | 0.308 | 0.356 | 0.169 |
| Thailand | 0.238 | 0.327 | 0.164 |
| Togo | 0.149 | 0.251 | 0.102 |
| Trinidad and Tobago | 0.547 | 0.655 | 0.339 |
| Tunisia | 0.160 | 0.222 | 0.113 |
| Turkey | 0.364 | 0.529 | 0.279 |
| Turkmenistan | 0.034 | 0.153 | 0.063 |
| Uganda | 0.178 | 0.264 | 0.163 |
| Ukraine | 0.338 | 0.508 | 0.342 |
| United Kingdom | 0.802 | 0.873 | 0.597 |
| United States | 0.773 | 0.838 | 0.601 |
| Uruguay | 0.610 | 0.699 | 0.574 |
| Uzbekistan | 0.053 | 0.177 | 0.075 |
| Vanuatu | 0.419 | 0.480 | 0.290 |
| Venezuela | 0.530 | 0.716 | 0.481 |
| Vietnam_Democratic Republic of | 0.093 | 0.194 | 0.162 |
| Yemen | 0.111 | 0.194 | 0.098 |
| Zambia | 0.279 | 0.377 | 0.262 |
| Zimbabwe | 0.161 | 0.258 | 0.129 |
As before, we observe higher levels of electoral democracy overall than participatory or liberal democracy. The cross-sectional data reveals, however, the heterogeneity in these levels. They also demonstrate the substantial regional variation in these indices.
ggplot2 (5 points)g <- ggplot(vdem_un, aes(x=v2x_libdem, y=v2x_partipdem)) +
geom_point() +
geom_smooth(method="lm") +
facet_wrap( ~ year)
g
## Warning: Removed 49 rows containing non-finite values (stat_smooth).
## Warning: Removed 49 rows containing missing values (geom_point).
ggplot2 (5 points)g <- ggplot(vdem_un, aes(x=v2x_libdem, y=v2x_partipdem)) +
geom_point() +
geom_smooth(method="lm") +
facet_wrap( ~ countryVDEM)
g
## Warning: Removed 49 rows containing non-finite values (stat_smooth).
## Warning: Removed 49 rows containing missing values (geom_point).
ggplot2 (5 points)vdem_un_ts <- filter(vdem_un, country_text_id %in% c("USA", "ESP", "CHL"))
g <- ggplot(vdem_un_ts, aes(x=year, y=v2x_polyarchy,
group=countryVDEM, color=countryVDEM)) +
geom_line() +
ylab("Electoral Democracy index")
g
plotly (5 points)plot_ly(vdem_un_ts, x = ~year, y = ~v2x_polyarchy, color = ~countryVDEM,
type = "scatter", mode = "lines") %>%
layout(yaxis = list(title="Electoral Democracy Index"))
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
googleVis (8 points)M <- gvisMotionChart(vdem_un, "countryVDEM", "year",
options=list(width=600, height=400))
print(M, "chart")
googleVis (8 points)GeoStates <- gvisGeoChart(vdem_un_xs, "countryVDEM", "Electoral",
options=list(width=600, height=400))
print(GeoStates, "chart")
plotly (8 points)plot_ly(vdem_un, x = ~year, y = ~v2x_polyarchy, z = ~v2x_libdem,
type = "scatter3d", color = ~undernourish)
## No scatter3d mode specifed:
## Setting the mode to markers
## Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode
## Warning: Ignoring 40 observations