This document is a quick introduction to the R statistical programing language and is meant for a “crash course” introduction for Honors Politcs undergraduates. The focus will be on basics of the language, data wrangling, graphics, and typesetting in R with basic regression modeling at the end. You can download the code by clicking the “Code” button to the upper right.
First thing is first; let’s install/load all of the packages that we will be using, and clear out the environment.
packages <- c("haven","dplyr","ggplot2","countrycode","tidyr","gridExtra","grid",
"stargazer","tidyselect")
for(i in packages){
if(!require(i,character.only = T, quietly = T)){
install.packages(i)
}
library(i, character.only = T, quietly = T)
}
rm(list=ls())
If you haven’t seen R before that is a bunch of hieroglyphics. Let’s break it down and get a feel for the most basic operations and entities used in the language. First, R is an object oriented programming language. This means that we will create named objects which exist within our working environment that we can then access or manipulate. First, let’s create an object named packages which contains a vector of package names:
packages <- c("haven","dplyr","ggplot2","countrycode","tidyr","gridExtra","grid",
"stargazer")
This is an example of the most basic operation within R, object assignment. In general, the syntax looks like <object name> <- <stuff> where <- is the most commonly used assignment operator. Suppose we wanted to access only the first three elements of this vector; to do so we can specify the indices we want like so:
packages[1:3]
## [1] "haven" "dplyr" "ggplot2"
Note that unlike some other languages, indices in R start at 1 rather than 0. Suppose we wanted the 2nd, 5th, and 8th element of the vector instead. We can do so by supplying a vector thusly:
packages[c(2,5,8)]
## [1] "dplyr" "tidyr" "stargazer"
There are a few particularly important classes within R. Above we have a character vector which is comprised of strings. We can check the class of an object with the class function:
class(packages)
## [1] "character"
Other particularly important classes are numeric and factor variables, the former being self descriptive while the latter is the name for categorical data in R. We are able to convert between classes with the as.whatever family of functions. For an example, let’s draw 15 random normal deviates with mean 5 and standard deviation 10 after setting a seed (this ensures that we draw the same pseudo-random samples every time):
set.seed(1234)
num <- rnorm(n = 15, mean = 5, sd = 10)
class(num)
## [1] "numeric"
As expected, our numeric data is, well, numeric. We can convert it to a factor variable with the as.factor function.
fac <- as.factor(num)
class(fac)
## [1] "factor"
Of note, to convert a factor variable back to numeric an additional step is required. Let’s create a data.frame to see what happens and to learn how to conduct basic modifications of such objects.
dat <- data.frame(num,fac)
dat$wrong <- as.numeric(dat$fac)
dat$right <- as.numeric(as.character(dat$fac))
dat
## num fac wrong right
## 1 -7.0706575 -7.07065749385421 2 -7.0706575
## 2 7.7742924 7.7742924211066 11 7.7742924
## 3 15.8444118 15.8444117668306 15 15.8444118
## 4 -18.4569770 -18.4569770262935 1 -18.4569770
## 5 9.2912469 9.2912468881105 12 9.2912469
## 6 10.0605589 10.0605589215757 13 10.0605589
## 7 -0.7473996 -0.747399601346488 6 -0.7473996
## 8 -0.4663186 -0.466318557841871 8 -0.4663186
## 9 -0.6445200 -0.64451999093283 7 -0.6445200
## 10 -3.9003783 -3.90037829044104 4 -3.9003783
## 11 0.2280730 0.22807300246453 9 0.2280730
## 12 -4.9838644 -4.98386444859704 3 -4.9838644
## 13 -2.7625389 -2.7625389463799 5 -2.7625389
## 14 5.6445882 5.64458817276269 10 5.6445882
## 15 14.5949406 14.5949405897077 14 14.5949406
What we did in the above was create a data.frame with two columns, num and fac. Note that converting from a factor variable directly to numeric returns the factor level rather than the value itself while converting to a character in-between gives us back the correct information.
This is an example of why it is so important for beginners in the R programming language, or any language for that matter, to read the documentation so that mistakes are not made. To access the documentation for a function we can simply ask for help:
help(as.numeric)
R has great documentation and you should always read about functions you are unfamiliar with. Scrolling down a bit we can see under the warning header that “If x is a factor, as.numeric will return the underlying numeric (integer) representation, which is often meaningless as it may not correspond to the factor levels.”
Since we have a data.frame handy, let’s learn how to interact with it. Using the $ operator we can access columns of the data in a straighforward manner:
dat$num
## [1] -7.0706575 7.7742924 15.8444118 -18.4569770 9.2912469 10.0605589
## [7] -0.7473996 -0.4663186 -0.6445200 -3.9003783 0.2280730 -4.9838644
## [13] -2.7625389 5.6445882 14.5949406
There are two other useful ways of extracting information from data.frames. First, we can use indices in a way very similar to the above except for noting that now we have both rows AND column indices. For example, we can access the first 3 rows of columns 2 and 4 like so:
dat[1:3,c(2,4)]
## fac right
## 1 -7.07065749385421 -7.070657
## 2 7.7742924211066 7.774292
## 3 15.8444117668306 15.844412
Alternatively, we can also call variables by their names like this:
dat[,c("wrong","right")]
## wrong right
## 1 2 -7.0706575
## 2 11 7.7742924
## 3 15 15.8444118
## 4 1 -18.4569770
## 5 12 9.2912469
## 6 13 10.0605589
## 7 6 -0.7473996
## 8 8 -0.4663186
## 9 7 -0.6445200
## 10 4 -3.9003783
## 11 9 0.2280730
## 12 3 -4.9838644
## 13 5 -2.7625389
## 14 10 5.6445882
## 15 14 14.5949406
Note that when you leave an index blank you get all of those elements back – in the above we got all of the rows for the two selected columns. Alternatively we could get all of the columns for a particular subset of rows like so:
dat[1:2,]
## num fac wrong right
## 1 -7.070657 -7.07065749385421 2 -7.070657
## 2 7.774292 7.7742924211066 11 7.774292
Of particular importance is that the columns of data.frames can be different classes.
c(class(dat$num),class(dat$fac),class(dat$wrong),class(dat$right))
## [1] "numeric" "factor" "numeric" "numeric"
This is distinct from the matrix class of object, generally only used in particular machine learning libraries or to do matrix algebra in R, but we won’t talk about those things in detail here. Note what happens when we coerce our data.frame to a matrix (note, accessing elements of matrices is almost identical to data.frames except that the $ no longer works):
head(as.matrix(dat))
## num fac wrong right
## [1,] " -7.0706575" "-7.07065749385421" " 2" " -7.0706575"
## [2,] " 7.7742924" "7.7742924211066" "11" " 7.7742924"
## [3,] " 15.8444118" "15.8444117668306" "15" " 15.8444118"
## [4,] "-18.4569770" "-18.4569770262935" " 1" "-18.4569770"
## [5,] " 9.2912469" "9.2912468881105" "12" " 9.2912469"
## [6,] " 10.0605589" "10.0605589215757" "13" " 10.0605589"
They are all characters now! We get the same behavior with vectors when combining various classes:
numz <- c(1,2,3)
chaz <- c("a","b","c")
c(numz,chaz)
## [1] "1" "2" "3" "a" "b" "c"
To see why, check out the “Details” secion of the help file for the c function.
?c
The final main object type I want to introduce you to is my favorite: lists! They are kind of like a mash between data.frames and vectors in that they are one dimensional but can have elements of any class.
a_list <- list(packages,dat,chaz)
a_list
## [[1]]
## [1] "haven" "dplyr" "ggplot2" "countrycode" "tidyr"
## [6] "gridExtra" "grid" "stargazer"
##
## [[2]]
## num fac wrong right
## 1 -7.0706575 -7.07065749385421 2 -7.0706575
## 2 7.7742924 7.7742924211066 11 7.7742924
## 3 15.8444118 15.8444117668306 15 15.8444118
## 4 -18.4569770 -18.4569770262935 1 -18.4569770
## 5 9.2912469 9.2912468881105 12 9.2912469
## 6 10.0605589 10.0605589215757 13 10.0605589
## 7 -0.7473996 -0.747399601346488 6 -0.7473996
## 8 -0.4663186 -0.466318557841871 8 -0.4663186
## 9 -0.6445200 -0.64451999093283 7 -0.6445200
## 10 -3.9003783 -3.90037829044104 4 -3.9003783
## 11 0.2280730 0.22807300246453 9 0.2280730
## 12 -4.9838644 -4.98386444859704 3 -4.9838644
## 13 -2.7625389 -2.7625389463799 5 -2.7625389
## 14 5.6445882 5.64458817276269 10 5.6445882
## 15 14.5949406 14.5949405897077 14 14.5949406
##
## [[3]]
## [1] "a" "b" "c"
To access their elements we use “double bracked” notation like so:
a_list[[2]]
## num fac wrong right
## 1 -7.0706575 -7.07065749385421 2 -7.0706575
## 2 7.7742924 7.7742924211066 11 7.7742924
## 3 15.8444118 15.8444117668306 15 15.8444118
## 4 -18.4569770 -18.4569770262935 1 -18.4569770
## 5 9.2912469 9.2912468881105 12 9.2912469
## 6 10.0605589 10.0605589215757 13 10.0605589
## 7 -0.7473996 -0.747399601346488 6 -0.7473996
## 8 -0.4663186 -0.466318557841871 8 -0.4663186
## 9 -0.6445200 -0.64451999093283 7 -0.6445200
## 10 -3.9003783 -3.90037829044104 4 -3.9003783
## 11 0.2280730 0.22807300246453 9 0.2280730
## 12 -4.9838644 -4.98386444859704 3 -4.9838644
## 13 -2.7625389 -2.7625389463799 5 -2.7625389
## 14 5.6445882 5.64458817276269 10 5.6445882
## 15 14.5949406 14.5949405897077 14 14.5949406
Beautifil. Now back to where we began. Packages are user created collections of functions which enhance the capabilities of R. “Official” packages are available via CRAN, a network of ftp and web servers around the world which store identical, up-to-date, versions of code and documentation for R. To access the functions stored within packages, we first need to install.packages if they are not already installed and then load them into our current session with library.
Now let’s look at the initial chunk of code you were confronted with:
packages <- c("haven","dplyr","ggplot2","countrycode","tidyr","gridExtra","grid",
"stargazer")
for(i in packages){
if(!require(i,character.only = T, quietly = T)){
install.packages(i)
}
library(i, character.only = T, quietly = T)
}
rm(list=ls())
First, we create a vector of package names. Then, for each element of that vector we
require (see ?require)install.packageslibraryBy the way – if you’d like to install only a single package you might do something like:
install.packages("gamlss")
library(gamlss)
But sometimes starting with something hard and breaking it down is a better way to learn than building up from basics.
Now that we have those basics in our heads we can start putting R to use.
We will be using data from Peterson (2017): Export Diversity and Human Rights. You can download the replication archive by clicking here or download the data directly by running the following chunk.
d <- read_dta("https://bit.ly/3a9nUSL")
d
## # A tibble: 5,188 x 18
## ccode year twoway inhhi comper polity2 physint conflictonlocat~ lnpop
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2 1981 0.986 0.990 0.997 10 8 0 19.3
## 2 2 1982 0.988 0.989 0.998 10 8 0 19.3
## 3 2 1983 0.982 0.989 0.994 10 8 1 19.3
## 4 2 1984 0.987 0.987 1 10 8 0 19.3
## 5 2 1985 0.985 0.987 0.998 10 7 0 19.3
## 6 2 1986 0.980 0.987 0.994 10 7 0 19.3
## 7 2 1987 0.982 0.987 0.995 10 8 0 19.3
## 8 2 1988 0.977 0.988 0.989 10 7 0 19.3
## 9 2 1989 0.980 0.988 0.992 10 7 1 19.3
## 10 2 1990 0.983 0.988 0.995 10 8 0 19.4
## # ... with 5,178 more rows, and 9 more variables: lngdppc <dbl>, gdppc <dbl>,
## # expdep <dbl>, gdpgrowth <dbl>, lib_HK <dbl>, meanarab <dbl>, meanpop <dbl>,
## # sdarab_manual <dbl>, sdpop_manual <dbl>
The read_dta function comes from the haven package and is useful for reading in datasets from other statistical software like SPSS, STATA, or SAS.
Usually you’ll be loading data from your computer rather than from a link. For this it is important to get a feel for how file paths work on your computer and how to use working directories.
To check your working directory, you can run:
getwd()
## [1] "C:/Users/cs4770/Dropbox/!2020_Honors"
To introduce you quickly to a few useful functions, let’s have R
dir <- getwd()
path <- paste(dir,"example_folder",sep="/")
dir.create(path)
setwd(path)
write.csv(d,"peterson_2017.csv",row.names = F)
list.files()
## [1] "peterson_2017.csv"
Boom, there it is! Now if we wanted to read in the data we could:
dat_path <- paste(path,"peterson_2017.csv",sep="/")
dat <- read.csv(dat_path)
And boom there it is. Now let’s remove just that:
rm(dat)
In your own time I suggest looking through the documentation for the functions used above: getwd, paste, dir.create, setwd, write.csv, list.files, and rm.
Finally, before we get our hands dirty, let’s look at how to take a look at our data for the first time:
summary(d)
## ccode year twoway inhhi
## Min. : 2.0 Min. :1981 Min. :0.0000 Min. :0.0000
## 1st Qu.:232.0 1st Qu.:1989 1st Qu.:0.1989 1st Qu.:0.6856
## Median :450.0 Median :1996 Median :0.4499 Median :0.8636
## Mean :457.3 Mean :1996 Mean :0.4855 Mean :0.7784
## 3rd Qu.:670.0 3rd Qu.:2003 3rd Qu.:0.7919 3rd Qu.:0.9545
## Max. :990.0 Max. :2010 Max. :0.9882 Max. :0.9932
##
## comper polity2 physint conflictonlocation
## Min. :0.00161 Min. :-10.000 Min. :0.000 Min. :0.0000
## 1st Qu.:0.32156 1st Qu.: -6.000 1st Qu.:3.000 1st Qu.:0.0000
## Median :0.58347 Median : 4.000 Median :5.000 Median :0.0000
## Mean :0.57874 Mean : 1.746 Mean :4.838 Mean :0.1714
## 3rd Qu.:0.86127 3rd Qu.: 9.000 3rd Qu.:7.000 3rd Qu.:0.0000
## Max. :1.00000 Max. : 10.000 Max. :8.000 Max. :1.0000
## NA's :808 NA's :701
## lnpop lngdppc gdppc expdep
## Min. :10.61 Min. : 4.889 Min. : 132.8 Min. :0.0007
## 1st Qu.:14.67 1st Qu.: 7.455 1st Qu.: 1729.2 1st Qu.:0.0581
## Median :15.84 Median : 8.507 Median : 4951.4 Median :0.1164
## Mean :15.67 Mean : 8.473 Mean : 9711.5 Mean :0.1684
## 3rd Qu.:16.90 3rd Qu.: 9.453 3rd Qu.: 12743.0 3rd Qu.:0.2103
## Max. :21.00 Max. :11.541 Max. :102804.8 Max. :3.7119
## NA's :731 NA's :731 NA's :731 NA's :731
## gdpgrowth lib_HK meanarab meanpop
## Min. :-0.6532 Min. :-0.0142 Min. : 0.000 Min. : 0.1193
## 1st Qu.:-0.0032 1st Qu.:-0.0016 1st Qu.: 7.661 1st Qu.: 6.8294
## Median : 0.0347 Median :-0.0002 Median :13.599 Median : 17.0428
## Mean : 0.0377 Mean : 0.0002 Mean :15.963 Mean : 62.5552
## 3rd Qu.: 0.0742 3rd Qu.: 0.0016 3rd Qu.:24.007 3rd Qu.: 34.3939
## Max. : 1.8620 Max. : 0.0572 Max. :73.389 Max. :1216.1071
## NA's :762 NA's :1059 NA's :1209 NA's :1186
## sdarab_manual sdpop_manual
## Min. : 0.000 Min. : 0.0000
## 1st Qu.: 1.423 1st Qu.: 0.6023
## Median : 6.301 Median : 8.5648
## Mean : 7.074 Mean : 40.1050
## 3rd Qu.:11.619 3rd Qu.: 26.2434
## Max. :30.083 Max. :629.2051
## NA's :1209 NA's :1186
Of particular importance is the NA counts representing missing data. This is not only important to take a look at to get a better sense of your data, but also is useful for alerting you to the behavior how functions like mean and sum react to missing data.
c(mean(d$polity2),
sum(d$polity2))
## [1] NA NA
Checking documentation with ?mean or help(mean) you’ll note the argument na.rm defaults to FALSE. To compute these things omitting the missing values, you would specify:
c(mean(d$polity2, na.rm=T),
sum(d$polity2, na.rm=TRUE))
## [1] 1.745662 7646.000000
where either T or TRUE can be used to indicate, well, true.
Cool, with all that out of the way we can start getting our hands dirty.
As with most things in R there are multiple ways of accomplishing the same goal. Being able to switch between approaches can help you accomplish more complicated tasks. I generally stick to base R for subsetting. To get indices which satisfy logical statements you can use the which function
which(d$gdppc > 50000)
## [1] 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 2116 2117 2118
## [16] 2119 2120 2121 2122 2123 2124 2125 2126 3981 3982 3983 3984 3985 3986 4017
## [31] 4018 4019 4020 4035 4036 4037 4038 4039 4040 4041 4042 4043 4044 4045 4046
## [46] 4802 4803 4804 4805 4806 4807 4808 4809 4810 4811 4812 4813 4814 4815 4816
## [61] 4817 4818 4820 4821 4823 4824 4825 4826 4827 4828 4830
which(d$gdppc < 2000 & d$polity2 == 10)
## [1] 4277 4278 4279 4280
which(d$gdpgrowth < -.5 | d$gdppc < 100)
## [1] 3040 3369 3717 3805 3967
We can combine this with indexing to subset down the data. We can also call columns in a variety of ways. Remember that you can create objects carying this information to modularize your code, which might be helpful in particular situations to keep everything clear.
inds <- which(d$gdppc > 50000)
cols <- c("ccode","year","physint","lnpop")
sub1 <- d[inds,cols]
sub1
## # A tibble: 71 x 4
## ccode year physint lnpop
## <dbl> <dbl> <dbl> <dbl>
## 1 212 1999 8 13.0
## 2 212 2000 8 13.0
## 3 212 2001 8 13.0
## 4 212 2002 8 13.0
## 5 212 2003 8 13.0
## 6 212 2004 8 13.0
## 7 212 2005 8 13.0
## 8 212 2006 8 13.1
## 9 212 2007 8 13.1
## 10 212 2008 8 13.1
## # ... with 61 more rows
Alternatively, one could use the subset function from base R to get the same result.
sub2 <- subset(d,d$gdppc > 50000,cols)
sub2
## # A tibble: 71 x 4
## ccode year physint lnpop
## <dbl> <dbl> <dbl> <dbl>
## 1 212 1999 8 13.0
## 2 212 2000 8 13.0
## 3 212 2001 8 13.0
## 4 212 2002 8 13.0
## 5 212 2003 8 13.0
## 6 212 2004 8 13.0
## 7 212 2005 8 13.0
## 8 212 2006 8 13.1
## 9 212 2007 8 13.1
## 10 212 2008 8 13.1
## # ... with 61 more rows
identical(sub1,sub2)
## [1] TRUE
Another great option is the dplyr package, which is part of the tidyverse alongside the packages haven and ggplot2. One of the best things about the tidyverse family of packages is that they are very well documented, including a variety of cheatsheets and books. One thing that makes the wrangling tools particularly powerful is that they leverage a pipe (%>%) from the magrittr package which says,in pseudo-code, that x %>% f(y) is the same as f(x,y). This can create nice work flow. For example, to get the same subset yet again:
d %>% filter(gdppc > 50000) %>% dplyr::select(all_of(cols)) -> sub3
all(sub1 == sub3, na.rm=T)
## [1] TRUE
Neat. What we did was took our dataframe, filtered the rows that we wanted, and then selected the columns of interest.
If we want to sort data, there is a base R approach for vectors.
head(sort(d$gdpgrowth))
## [1] -0.6532034 -0.6011391 -0.5537450 -0.5498320 -0.5225903 -0.4939937
For dataframes you have to use order, which produces index numbers that can be used as before
d[order(d$gdpgrowth),c("ccode","year","gdpgrowth")]
## # A tibble: 5,188 x 3
## ccode year gdpgrowth
## <dbl> <dbl> <dbl>
## 1 645 1991 -0.653
## 2 572 2003 -0.601
## 3 517 1994 -0.554
## 4 690 1991 -0.550
## 5 660 1989 -0.523
## 6 92 2007 -0.494
## 7 475 1986 -0.480
## 8 450 1990 -0.475
## 9 373 1993 -0.440
## 10 411 1990 -0.423
## # ... with 5,178 more rows
We can also switch the ordering around by setting decreasing = T
d[order(d$gdpgrowth,decreasing = T),c("ccode","year","gdpgrowth")]
## # A tibble: 5,188 x 3
## ccode year gdpgrowth
## <dbl> <dbl> <dbl>
## 1 690 1992 1.86
## 2 572 2005 1.48
## 3 450 1997 1.39
## 4 411 1997 1.38
## 5 92 2009 1.13
## 6 411 2002 0.874
## 7 345 1996 0.854
## 8 411 1999 0.788
## 9 552 2010 0.752
## 10 411 1992 0.731
## # ... with 5,178 more rows
Or we could use the handy %>%. In this case we have to use the placeholder . for the input, which might be handy to know that you can do for more complicated functions.
order(d$gdpgrowth,decreasing = T) %>% d[.,c("ccode","year","gdpgrowth")]
## # A tibble: 5,188 x 3
## ccode year gdpgrowth
## <dbl> <dbl> <dbl>
## 1 690 1992 1.86
## 2 572 2005 1.48
## 3 450 1997 1.39
## 4 411 1997 1.38
## 5 92 2009 1.13
## 6 411 2002 0.874
## 7 345 1996 0.854
## 8 411 1999 0.788
## 9 552 2010 0.752
## 10 411 1992 0.731
## # ... with 5,178 more rows
There is also a handy arrange function in dplyr for accomplishing the same sort of task but allowing to sort based on multiple columns.
d %>% dplyr::select(c("ccode","year","gdpgrowth")) %>% arrange(ccode,gdpgrowth)
## # A tibble: 5,188 x 3
## ccode year gdpgrowth
## <dbl> <dbl> <dbl>
## 1 2 2009 -0.0359
## 2 2 1982 -0.0233
## 3 2 2008 -0.00629
## 4 2 1991 -0.00469
## 5 2 2001 0.00829
## 6 2 2002 0.0129
## 7 2 1990 0.0150
## 8 2 2007 0.0195
## 9 2 2003 0.0238
## 10 2 2010 0.0256
## # ... with 5,178 more rows
Sometimes data comes in “Long” format, like the data we have been working with, where each row represents an observational unit (here country - year). In other cases data will come in “Wide” format like this or this
We might want to convert data back and forth depending on what we want to do. Let’s do this for the GDP per capita variable; see NYU Data Services for a handy guide to reshaping.
long <- d[,c("ccode","year","gdppc")]
long
## # A tibble: 5,188 x 3
## ccode year gdppc
## <dbl> <dbl> <dbl>
## 1 2 1981 25977.
## 2 2 1982 25131.
## 3 2 1983 26026.
## 4 2 1984 27751.
## 5 2 1985 28455.
## 6 2 1986 29014.
## 7 2 1987 29639.
## 8 2 1988 30506.
## 9 2 1989 31274.
## 10 2 1990 31432.
## # ... with 5,178 more rows
Long to Wide with tidyr (updated versions of gather and spread)
long %>% pivot_wider(names_from = year, values_from = gdppc) -> wide
wide
## # A tibble: 189 x 31
## ccode `1981` `1982` `1983` `1984` `1985` `1986` `1987` `1988` `1989` `1990`
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2 25977. 25131. 26026. 27751. 28455. 29014. 29639. 30506. 31274. 31432.
## 2 20 22718. 22267. 23001. 24238. 25387. 25987. 27535. 28487. 29005. 28589.
## 3 31 12648. 14004. 14880. 17393. 18428. 17593. 18885. 17582. 18414. 16909.
## 4 40 NA NA NA NA NA NA NA NA NA NA
## 5 41 NA NA NA NA NA NA NA NA NA NA
## 6 42 3600. 3638. 3719. 4064. 4100. 4273. 4673. 4826. 4364. 3923.
## 7 51 3904. 3811. 3862. 3744. 3329. 3539. 3875. 3958. 3962. 4267.
## 8 52 24382. 25326. 22556. 20483. 19106. 17596. 14946. 14170. 12743. 12770.
## 9 53 10365. 10358. 10463. 10725. 9966. 11390. 13180. 13719. 15294. 15270.
## 10 54 4407. 4683. 4862. 5109. 5008. 5477. 5973. 6233. 6168. 6562.
## # ... with 179 more rows, and 20 more variables: `1991` <dbl>, `1992` <dbl>,
## # `1993` <dbl>, `1994` <dbl>, `1995` <dbl>, `1996` <dbl>, `1997` <dbl>,
## # `1998` <dbl>, `1999` <dbl>, `2000` <dbl>, `2001` <dbl>, `2002` <dbl>,
## # `2003` <dbl>, `2004` <dbl>, `2005` <dbl>, `2006` <dbl>, `2007` <dbl>,
## # `2008` <dbl>, `2009` <dbl>, `2010` <dbl>
Back to long
wide %>% pivot_longer(cols=-ccode,names_to = "year", values_to = "gdppc") -> long_again
long_again
## # A tibble: 5,670 x 3
## ccode year gdppc
## <dbl> <chr> <dbl>
## 1 2 1981 25977.
## 2 2 1982 25131.
## 3 2 1983 26026.
## 4 2 1984 27751.
## 5 2 1985 28455.
## 6 2 1986 29014.
## 7 2 1987 29639.
## 8 2 1988 30506.
## 9 2 1989 31274.
## 10 2 1990 31432.
## # ... with 5,660 more rows
You may have noticed that the ccode variable isn’t particularly descriptive for which country it means. We can use this to illustrate merging datasets together. At the start we loaded in the countrycode package which contains additional information.
codes <- countrycode::codelist_panel
Let’s see what they have.
colnames(codes)
## [1] "country.name.en" "year" "ar5"
## [4] "continent" "country.name.de" "country.name.de.regex"
## [7] "country.name.en.regex" "cow.name" "cowc"
## [10] "cown" "ecb" "ecb.name"
## [13] "eu28" "eurocontrol_pru" "eurocontrol_statfor"
## [16] "eurostat" "eurostat.name" "fao"
## [19] "fao.name" "fips" "fips.name"
## [22] "gaul" "genc.name" "genc2c"
## [25] "genc3c" "genc3n" "gwc"
## [28] "gwn" "icao.name" "icao.region"
## [31] "imf" "ioc" "ioc.name"
## [34] "iso.name.en" "iso.name.fr" "iso2c"
## [37] "iso3c" "iso3n" "p4.name"
## [40] "p4c" "p4n" "region"
## [43] "un" "un.name.ar" "un.name.en"
## [46] "un.name.es" "un.name.fr" "un.name.ru"
## [49] "un.name.zh" "unpd" "unpd.name"
## [52] "vdem" "vdem.name" "wb"
## [55] "wb_api.name" "wb_api2c" "wb_api3c"
## [58] "wb.name" "wvs" "wvs.name"
The country codes we are currently using are cown. Let’s grab iso3c and region to add to the dataset. We also know that the dataset we are working with only has years from 1981 to 2010, so let’s practice our subsetting skillz
codes <- codes[codes$year %in% 1981:2010,c("cown","year","iso3c","country.name.en","region")]
One thing to pay attention to is losing or gaining observations durring a merge. For a great overview, check out this handy NYU Data Services guide.
nrow(d)
## [1] 5188
out1 <- merge(d,codes,by.x=c("ccode","year"),by.y=c("cown","year"))
nrow(out1)
## [1] 5125
out2 <- merge(d,codes,by.x=c("ccode","year"),by.y=c("cown","year"),all.x=T)
nrow(out2)
## [1] 5188
out3 <- merge(d,codes,by.x=c("ccode","year"),by.y=c("cown","year"),all.y=T)
nrow(out3)
## [1] 6959
out4 <- merge(d,codes,by.x=c("ccode","year"),by.y=c("cown","year"),all=T)
nrow(out4)
## [1] 7022
And, of course, we can do the same merges using dplyr with inner_join, left_join, right_join, and full_join respectively. Going forward we will keep out2 as the working dataset.
There are a number of great things you can do with the apply family of functions, including easily going in parallel with the pbapply package. If you are interested in more details on this you should check out this tutorial and this taskview. We will focus on using dplyr to calculate summaries of interest.
One reason for this is that it is super easy to calculate summaries grouping on another variable. For example, if we wanted to think about regional variation in gdppc we could
out2 %>% group_by(region) %>%
summarize(mean=mean(gdppc,na.rm=T),sd=sd(gdppc,na.rm=T),sum=sum(gdppc,na.rm=T))
## # A tibble: 23 x 4
## region mean sd sum
## <chr> <dbl> <dbl> <dbl>
## 1 Australia and New Zealand 24990. 5898. 1499423.
## 2 Caribbean 9328. 5894. 3059582.
## 3 Central America 5276. 3314. 1107909.
## 4 Central Asia 4762. 2813. 452347.
## 5 Eastern Africa 1803. 2240. 735621.
## 6 Eastern Asia 11792. 10740. 1415087.
## 7 Eastern Europe 9855. 4712. 2305957.
## 8 Melanesia 4459. 558. 133778.
## 9 Micronesia NaN NA 0
## 10 Middle Africa 3184. 4070. 859647.
## # ... with 13 more rows
We can also use the mutate function to add this information to our dataframe. In base R this would take mergeing the output of aggregate, so it can certainly be done, but dplyr makes it somewhat more straightforward and scaleable.
out2 %>% group_by(region) %>%
mutate(mean_gdppc=mean(gdppc,na.rm=T),sd_gdppc=sd(gdppc,na.rm=T)) -> out2
out2
## # A tibble: 5,188 x 23
## # Groups: region [23]
## ccode year twoway inhhi comper polity2 physint conflictonlocat~ lnpop
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2 1981 0.986 0.990 0.997 10 8 0 19.3
## 2 2 1982 0.988 0.989 0.998 10 8 0 19.3
## 3 2 1983 0.982 0.989 0.994 10 8 1 19.3
## 4 2 1984 0.987 0.987 1 10 8 0 19.3
## 5 2 1985 0.985 0.987 0.998 10 7 0 19.3
## 6 2 1986 0.980 0.987 0.994 10 7 0 19.3
## 7 2 1987 0.982 0.987 0.995 10 8 0 19.3
## 8 2 1988 0.977 0.988 0.989 10 7 0 19.3
## 9 2 1989 0.980 0.988 0.992 10 7 1 19.3
## 10 2 1990 0.983 0.988 0.995 10 8 0 19.4
## # ... with 5,178 more rows, and 14 more variables: lngdppc <dbl>, gdppc <dbl>,
## # expdep <dbl>, gdpgrowth <dbl>, lib_HK <dbl>, meanarab <dbl>, meanpop <dbl>,
## # sdarab_manual <dbl>, sdpop_manual <dbl>, iso3c <chr>,
## # country.name.en <chr>, region <chr>, mean_gdppc <dbl>, sd_gdppc <dbl>
A base R version of the above might be
a1 <- aggregate(out2$gdppc,by=list(out2$region),mean,na.rm=T)
a1
## Group.1 x
## 1 Australia and New Zealand 24990.386
## 2 Caribbean 9327.994
## 3 Central America 5275.757
## 4 Central Asia 4761.549
## 5 Eastern Africa 1802.991
## 6 Eastern Asia 11792.392
## 7 Eastern Europe 9854.516
## 8 Melanesia 4459.267
## 9 Micronesia NaN
## 10 Middle Africa 3183.878
## 11 Northern Africa 3388.796
## 12 Northern America 32390.804
## 13 Northern Europe 24848.061
## 14 Polynesia NaN
## 15 South-Eastern Asia 12756.196
## 16 South America 6143.031
## 17 Southern Africa 5279.329
## 18 Southern Asia 3029.750
## 19 Southern Europe 14384.153
## 20 Western Africa 1175.762
## 21 Western Asia 15632.999
## 22 Western Europe 30016.166
colnames(a1) <- c("region","mean_gdppc")
a2 <- aggregate(out2$gdppc,by=list(out2$region),sd,na.rm=T)
colnames(a2) <- c("region","sd_gdppc")
t1 <- merge(out2,a1,by="region")
t2 <- merge(t1,a2,by="region")
tbl_df(t2)
## # A tibble: 5,063 x 25
## region ccode year twoway inhhi comper polity2 physint conflictonlocat~ lnpop
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Austr~ 900 1982 0.915 0.955 0.958 10 8 0 16.5
## 2 Austr~ 900 1983 0.920 0.951 0.968 10 7 0 16.5
## 3 Austr~ 900 1984 0.917 0.952 0.963 10 8 0 16.6
## 4 Austr~ 900 1986 0.912 0.945 0.965 10 7 0 16.6
## 5 Austr~ 900 1987 0.921 0.954 0.966 10 8 0 16.6
## 6 Austr~ 900 1988 0.924 0.958 0.965 10 6 0 16.6
## 7 Austr~ 900 1985 0.903 0.945 0.955 10 8 0 16.6
## 8 Austr~ 900 1990 0.928 0.961 0.966 10 7 0 16.7
## 9 Austr~ 900 1991 0.914 0.958 0.955 10 7 0 16.7
## 10 Austr~ 900 1992 0.932 0.958 0.972 10 7 0 16.7
## # ... with 5,053 more rows, and 15 more variables: lngdppc <dbl>, gdppc <dbl>,
## # expdep <dbl>, gdpgrowth <dbl>, lib_HK <dbl>, meanarab <dbl>, meanpop <dbl>,
## # sdarab_manual <dbl>, sdpop_manual <dbl>, iso3c <chr>,
## # country.name.en <chr>, mean_gdppc.x <dbl>, sd_gdppc.x <dbl>,
## # mean_gdppc.y <dbl>, sd_gdppc.y <dbl>
but the dplyr approach really is quite nice.
We will focus on using ggplot2 for graphics in R, although base R has nice capabilities on its own. ggplot is all about the `grammar of graphics’ which follows a layered approach to describe and construct graphics in a structured manner. To begin, we will always initialize a plot:
p1 <- ggplot(out2[which(out2$region == "Northern America"),], aes(x=log(gdppc)))
To get different plots, we will add layers. For example, if we wanted a dot plot
p1 + geom_dotplot(binwidth=0.03)
or a histogram
p1 + geom_histogram(binwidth=0.03)
or a density plot
p1 + geom_density()
we can just add a different layer to the same underlying plot.
The order of the layers does not matter, and there are a bunch more customizations that we can add.
p1 + geom_histogram(color="black",fill="darkblue",binwidth = 0.03) +
xlab("Natural Log of Per Capita GDP") +
ylab("Frequency") +
ggtitle('North American GDPPC') +
theme_bw() -> g1
g1
You can also add multiple geometries to the same underderlying plot.
p2 <- ggplot(out2[which(out2$region == "Eastern Europe"),],aes(x=year,y=log(gdppc),color=iso3c))
p2 + geom_point(na.rm=T) +
geom_line(na.rm=T) +
labs(color="Country") +
scale_color_brewer(palette="Spectral") -> g2
g2
You can even add some smoothers if you want.
p3 <- ggplot(out3[which(out3$iso3c=="RUS"),],aes(x=year,y=gdppc))
p3 + geom_point(na.rm=T) +
geom_smooth(color ="gray", method = "lm", se = TRUE,na.rm=T, formula=y~x)
p3 <- ggplot(out3[which(out3$iso3c=="RUS"),],aes(x=year,y=gdppc))
p3 + geom_point(na.rm=T) +
geom_smooth(color ="gray", method = "loess", se = TRUE,formula=y~x, na.rm=T) -> g3
g3
Two last notes on plots – faceting and adding plots together into a larger image.
Faceting can be a nice way to break up a continuous variable by category.
p4 <- ggplot(na.omit(out2[which(out2$region %in% c("Western Europe","Western Africa","Eastern Europe")),]),aes(x=log(gdppc)))
p4 + geom_histogram(binwidth = 0.1) +
facet_grid(region ~ .)
p4 <- ggplot(na.omit(out2[which(out2$region %in% c("Western Europe","Western Africa","Eastern Europe")),]),aes(x=log(gdppc)))
p4 + geom_histogram(binwidth = 0.1) +
facet_grid(. ~ region)
Once we do all that, we might want to add multiple plots together into a larger multi-panel graphic. The gridExtra package is great for this.
grid.arrange(g1,g2,g3,textGrob("Spiffy!"),ncol=2,nrow=2)
You should take a look at the ggplot2 book linked to above, as well as other stuff like ggpubr, some examples can be found here.
In R it is relatively easy to create tables that are presentable in Word, LaTex, or HTML. A large number of packages – apsrtable, xtable, texreg, memisc, outreg and others – cater to LaTex users. If you’re using word to prepare your documents, stargazer is an option that can work nicely. One thing we might want to do from the above is greate a table of descriptive statistics.
vars <- c("polity2","physint","conflictonlocation","lnpop")
path <- getwd()
stargazer(as.data.frame(out2[,vars]),type="html",
title="Descriptive Statistics",
covariate.labels = c("Democracy","Human Rights","Any Conflict?",
"Population (Logged)"),
out=paste0(path,"/summary1.doc"))
As you might have gathered from the code above, what we are going to do is use the ability of Word to render html by saving the output with a .doc extension.
Otherwise, the table looks like this:
| Statistic | N | Mean | St. Dev. | Min | Pctl(25) | Pctl(75) | Max |
| Democracy | 4,380 | 1.746 | 7.203 | -10.000 | -6.000 | 9.000 | 10.000 |
| Human Rights | 4,487 | 4.838 | 2.322 | 0.000 | 3.000 | 7.000 | 8.000 |
| Any Conflict? | 5,188 | 0.171 | 0.377 | 0 | 0 | 0 | 1 |
| Population (Logged) | 4,457 | 15.674 | 1.925 | 10.615 | 14.671 | 16.902 | 21.000 |
If you’re interested in what other summary statistics are available natively from stargazer, check out the link in omit.summary.stat in their documentation
?stargazer
Something a bit more interesting would be to take a look at Regression tables. First, let’s run some models. To replicate the Peterson (2017) results, we need to define a function for leading a variable.
tscslead <- function(x, cs,ts){
obs <- 1:length(x)
lagobs <- match(paste(cs, ts+1, sep="::"), paste(cs, ts, sep="::"))
lagx <- x[lagobs]
lagx
}
The next thing that we will do is create the dependent variable with a one period lead
out2$physint_lead <- tscslead(out2$physint, out2$ccode, out2$year)
And then we will replicate models 1-3
m1 <- lm(physint_lead ~ twoway +lib_HK + polity2 + conflictonlocation
+ lngdppc + gdpgrowth + lnpop + year + physint, data = out2)
m2 <- lm(physint_lead ~ twoway + expdep + polity2 + conflictonlocation
+ lngdppc + gdpgrowth + lnpop + year + physint, data = out2)
m3 <- lm(physint_lead ~ twoway + expdep +lib_HK + polity2 + conflictonlocation
+ lngdppc + gdpgrowth + lnpop + year + physint, data = out2)
stargazer(m1,m2,m3,type="html",out = paste0(path,"/regressions.doc"),
covariate.labels = c("Export Diversification","Liberalization","
Export Dependence","Democracy","Conflict","
Log Income","Growth","Log Population",
"Linear Time Trend","Lagged DV"),
title="Replication of Peterson (2017) Models 1-3",
dep.var.caption = "All States",
dep.var.labels = "Human Rights")
The output looks like this:
| All States | |||
| Human Rights | |||
| (1) | (2) | (3) | |
| Export Diversification | 0.276** | 0.242** | 0.226** |
| (0.114) | (0.113) | (0.114) | |
| Liberalization | -10.638 | -57.858*** | |
| (8.896) | (13.994) | ||
| Export Dependence | 0.202** | 0.707*** | |
| (0.101) | (0.162) | ||
| Democracy | 0.026*** | 0.026*** | 0.027*** |
| (0.004) | (0.004) | (0.004) | |
| Conflict | -0.583*** | -0.555*** | -0.611*** |
| (0.062) | (0.061) | (0.062) | |
| Log Income | 0.126*** | 0.118*** | 0.131*** |
| (0.023) | (0.022) | (0.023) | |
| Growth | -0.154 | -0.207 | -0.140 |
| (0.193) | (0.191) | (0.192) | |
| Log Population | -0.190*** | -0.174*** | -0.226*** |
| (0.020) | (0.018) | (0.022) | |
| Linear Time Trend | -0.014*** | -0.014*** | -0.016*** |
| (0.002) | (0.002) | (0.002) | |
| Lagged DV | 0.675*** | 0.679*** | 0.666*** |
| (0.013) | (0.012) | (0.013) | |
| Constant | 31.817*** | 31.360*** | 35.327*** |
| (4.914) | (4.876) | (4.967) | |
| Observations | 3,479 | 3,562 | 3,479 |
| R2 | 0.751 | 0.754 | 0.753 |
| Adjusted R2 | 0.751 | 0.753 | 0.752 |
| Residual Std. Error | 1.118 (df = 3469) | 1.121 (df = 3552) | 1.115 (df = 3468) |
| F Statistic | 1,164.520*** (df = 9; 3469) | 1,208.221*** (df = 9; 3552) | 1,055.422*** (df = 10; 3468) |
| Note: | p<0.1; p<0.05; p<0.01 | ||
Stargazer has a number of useful “style” options which format close to a number of major journals, which may be worth looking at or playing around with.
The best way to learn R is to use R. Let’s play around with some of the above concepts and coding approaches to replicate a number of other aspects of the paper and ‘probe robustness’ to deviations in specification.
Figure 1 of Peterson (2017) shows two histograms of Export Diversity (twoway), one for all states and the other excluding high states (those with a GDP Per Capita larger than 11101.2). Reproduce this image as closely as possible; this will take digging into the documentation or Googling to accomplish particular customizations! The final output should look something like this:
sub <- out2[which(out2$gdppc < 11101.2),]
out2$all_states <- "All States"
sub$exc <- "Excluding high-income states"
out2 %>% ggplot(aes(x=twoway)) +
geom_histogram(binwidth=0.03,col="black") +
theme_bw() +
theme(plot.title = element_text(hjust=0.5),
strip.background = element_rect(fill="grey"),
strip.text = element_text(size=13)) +
xlab("Export diversity") +
ylab("Frequency") +
facet_grid(.~all_states)-> f1
sub %>% ggplot(aes(x=twoway)) +
geom_histogram(binwidth=0.03,col="black") +
theme_bw() +
theme(plot.title = element_text(hjust=0.5),
strip.background = element_rect(fill="grey"),
strip.text = element_text(size=13)) +
xlab("Export diversity") +
ylab("Frequency") +
facet_grid(.~exc) -> f2
jpeg(paste0(path,"/Exercise1.jpg"))
grid.arrange(f1,f2,ncol=2,nrow=1,
bottom = textGrob("Figure 1. Histogram of export diversity for all states (left) and excluding high-income states (right).", gp=gpar(fontsize=10)))
grid.rect(width = 1, height = 0.96, vjust=0.485 , gp = gpar(lwd = 2, col = "black", fill = NA))
dev.off()
Models 4-6 replicate use the same specifications as above but excluding high income states. Run these regressions and format the results to get the following:
| Excluding high income states | |||
| Human Rights | |||
| (4) | (5) | (6) | |
| Export Diversification | 0.472*** | 0.471*** | 0.450*** |
| (0.141) | (0.139) | (0.141) | |
| Liberalization | -19.891* | -44.903** | |
| (11.636) | (20.707) | ||
| Export Dependence | -0.077 | 0.333 | |
| (0.126) | (0.228) | ||
| Democracy | 0.024*** | 0.023*** | 0.025*** |
| (0.004) | (0.004) | (0.004) | |
| Conflict | -0.598*** | -0.578*** | -0.597*** |
| (0.072) | (0.071) | (0.072) | |
| Log Income | -0.030 | -0.038 | -0.017 |
| (0.034) | (0.034) | (0.035) | |
| Growth | -0.066 | -0.141 | -0.060 |
| (0.232) | (0.229) | (0.232) | |
| Log Population | -0.290*** | -0.273*** | -0.307*** |
| (0.027) | (0.024) | (0.029) | |
| Linear Time Trend | -0.015*** | -0.014*** | -0.015*** |
| (0.003) | (0.003) | (0.003) | |
| Lagged DV | 0.622*** | 0.628*** | 0.621*** |
| (0.016) | (0.016) | (0.016) | |
| Constant | 36.093*** | 34.344*** | 37.349*** |
| (6.419) | (6.352) | (6.475) | |
| Observations | 2,488 | 2,550 | 2,488 |
| R2 | 0.682 | 0.682 | 0.683 |
| Adjusted R2 | 0.681 | 0.681 | 0.681 |
| Residual Std. Error | 1.190 (df = 2478) | 1.196 (df = 2540) | 1.190 (df = 2477) |
| F Statistic | 591.627*** (df = 9; 2478) | 605.829*** (df = 9; 2540) | 532.921*** (df = 10; 2477) |
| Note: | p<0.1; p<0.05; p<0.01 | ||
m5 <- lm(physint_lead ~ twoway +lib_HK + polity2 + conflictonlocation
+ lngdppc + gdpgrowth + lnpop + year + physint, data = sub)
m6 <- lm(physint_lead ~ twoway + expdep + polity2 + conflictonlocation
+ lngdppc + gdpgrowth + lnpop + year + physint, data = sub)
m7 <- lm(physint_lead ~ twoway + expdep +lib_HK + polity2 + conflictonlocation
+ lngdppc + gdpgrowth + lnpop + year + physint, data = sub)
stargazer(m5,m6,m7,type="html",out = paste0(path,"/regressions2.doc"),
covariate.labels = c("Export Diversification","Liberalization","
Export Dependence","Democracy","Conflict","
Log Income","Growth","Log Population",
"Linear Time Trend","Lagged DV"),
column.labels = c("(4)","(5)","(6)"),
model.numbers = F,
title="Replication of Peterson (2017) Models 4-6",
dep.var.caption = "Excluding high income states",
dep.var.labels = "Human Rights")
One might wonder whether the results are robust to the inclusion of region or state fixed effects. These can be entered into the regression equation by including, for example, as.factor(ccode) to the regression equation. Fixed effects are generally omitted from regression tables; figure out how to exclude regressors from stargazer output and add an additional note to the table indicating that fixed effects were included. You should also remove the R-squared, Residual Standard Error, and F Statistic from being reported. The output should look something like this:
| All States | |||
| Human Rights | |||
| (1a) | (2a) | (3a) | |
| Export Diversification | 0.190 | 0.220 | 0.248 |
| (0.259) | (0.257) | (0.260) | |
| Liberalization | 21.905 | -9.210 | |
| (17.509) | (20.824) | ||
| Export Dependence | 0.587*** | 0.633*** | |
| (0.191) | (0.230) | ||
| Democracy | 0.037*** | 0.039*** | 0.039*** |
| (0.006) | (0.006) | (0.006) | |
| Conflict | -0.520*** | -0.495*** | -0.508*** |
| (0.077) | (0.075) | (0.077) | |
| Log Income | 0.029 | 0.086 | 0.077 |
| (0.091) | (0.090) | (0.092) | |
| Growth | 0.016 | -0.039 | 0.032 |
| (0.188) | (0.186) | (0.188) | |
| Log Population | -0.324 | -0.244 | -0.267 |
| (0.237) | (0.234) | (0.238) | |
| Linear Time Trend | -0.017*** | -0.022*** | -0.022*** |
| (0.005) | (0.006) | (0.006) | |
| Lagged DV | |||
| Constant | 44.060*** | 50.103*** | 50.438*** |
| (7.467) | (7.665) | (7.811) | |
| Observations | 3,479 | 3,562 | 3,479 |
| Adjusted R2 | 0.775 | 0.779 | 0.776 |
| Note: | p<0.1; p<0.05; p<0.01 | ||
| All models include country fixed effects. | |||
m7 <- lm(physint_lead ~ twoway +lib_HK + polity2 + conflictonlocation
+ lngdppc + gdpgrowth + lnpop + year + physint + as.factor(ccode), data = out2)
m8 <- lm(physint_lead ~ twoway + expdep + polity2 + conflictonlocation
+ lngdppc + gdpgrowth + lnpop + year + physint + as.factor(ccode), data = out2)
m9 <- lm(physint_lead ~ twoway + expdep +lib_HK + polity2 + conflictonlocation
+ lngdppc + gdpgrowth + lnpop + year + physint + as.factor(ccode), data = out2)
stargazer(m7,m8,m9,type="html",out = paste0(path,"/regressions3.doc"),
covariate.labels = c("Export Diversification","Liberalization","
Export Dependence","Democracy","Conflict","
Log Income","Growth","Log Population",
"Linear Time Trend","Lagged DV"),
title="Robustness of Peterson (2017) Models 1-3",
omit=11:length(m9$coefficients),
notes = "All models include country fixed effects.",
notes.append=TRUE,
column.labels = c("(1a)","(2a)","(3a)"),
model.numbers = F,
dep.var.caption = "All States",
dep.var.labels = "Human Rights",
omit.stat = c("f","rsq","ser"))
That’s all I have for you today. Hope it was helpful! Feel free to reach out to NYU Data Services if you encounter any issues with programming in R or would like to be directed to any additional resources. Over the session we have only touched the surface of what R is able to do thanks to the vibrant community of scholars and practitioners who contribute to the project. If you are looking to get further into R for statistical analysis, I highly recommend checking out what is available on the various taskviews or even these neat swirl courses where you can learn R in R. There is also a great R for Data Science book which is free!