This handout introduces/extends estimation procedures for panel data. Note that the Bailey “Computing Corners” will still be useful.
library(foreign)
library(stargazer)
library(plm) ## if you don't have this, install.packages("plm")
data <- read.csv("qog_std_ts_jan16.csv")
# This is a comparatively huge file. It will take
# a moment to run. You might actually consider
# shutting down unnecesary programs. But I
# want you to get used to using the QoGPanel.dta
# file since it will be important to you very soon.
This is a huge dataset. It is an (old) version of the Quality of Government Time-Series dataset; see more at http://qog.pol.gu.se/data/datadownloads/qogstandarddata, which is a compilation of many different data sets. To see how huge, run the names() command:
names(data) # variable names
We begin with an example of something that we know won’t work: a simple, pooled cross-sectional analysis. We fit a standard OLS regression in which we regress an indicator of military expenditure (wdi_expmilgdp, military expenditure as percent of GDP) on real per capita GDP, Economic Globalization (an index created by a social scientist named Axel Dreher), and regime type (Polity score).
out.plain <- lm(wdi_expmilgdp~I(wdi_gdppcpppcon/1000)+dr_eg+p_polity2,data=data)
# The I() operator lets us make transformations to the data
# directly from the regression line. Here, I'm rescaling
# Real Per Capita GDP because the coefficients are easier to interpret
# when this variable is in thousands than in single dollars
# Note wdi_gdppcpppcon, which is the World Development Indicator's
# GDP Per Capita
# at Purchasing Power Parity
# in Constant (2011) dollars...
# which makes for an ugly but precise variable name
This is just for illustration, though; since we know that this is panel data (look at the data using something like head(data[,1:10]) to see for yourself!) we know that we should begin thinking about how that knowledge changes our modeling strategy. For instance, we can use the least-squares dummy method:
out.lsdv <- lm(wdi_expmilgdp~I(wdi_gdppcpppcon/1000)+dr_eg+p_polity2+as.factor(cname),data=data)
(Why these variables? We might think that military expenditure is affected by societies’ wealth, regime type, and integration into the global economy.)
And now compare them
stargazer(out.plain,out.lsdv,
type="html",
column.labels = c("Plain","With Fixed Effects"),title="Comparing Pooled and Fixed-Effects",
omit="as.factor",
covariate.labels = c("Per Capita GDP in 1000s", "Economic Globalization", "Polity 2 Score"),
notes=c("Fixed effects estimated but not shown in Fixed Effects column"),
add.lines = list(c("Fixed effects?", "No", "Yes")),
dep.var.labels = "Military Expenditure"
)
| Dependent variable: | ||
| Military Expenditure | ||
| Plain | With Fixed Effects | |
| (1) | (2) | |
| Per Capita GDP in 1000s | 0.027*** | -0.034*** |
| (0.004) | (0.012) | |
| Economic Globalization | -0.007* | -0.033*** |
| (0.004) | (0.006) | |
| Polity 2 Score | -0.101*** | -0.024 |
| (0.009) | (0.017) | |
| Constant | 2.664*** | 3.702*** |
| (0.176) | (0.502) | |
| Fixed effects? | No | Yes |
| Observations | 2,806 | 2,806 |
| R2 | 0.077 | 0.427 |
| Adjusted R2 | 0.076 | 0.396 |
| Residual Std. Error | 2.580 (df = 2802) | 2.086 (df = 2665) |
| F Statistic | 78.432*** (df = 3; 2802) | 14.161*** (df = 140; 2665) |
| Note: | p<0.1; p<0.05; p<0.01 | |
| Fixed effects estimated but not shown in Fixed Effects column | ||
We suspect that time, and not just country, might be a source of unexplained variation. Accordingly, we seek to use two-way fixed effects, including controls for both year and country.
This is easy.
We begin by replicating the one-way fixed effects from earlier to verify that plm and LSDV are interchangeable. We then proceed to add the two-way model.
out.plm1way <- plm(wdi_expmilgdp~I(wdi_gdppcpppcon/1000)+dr_eg+p_polity2+as.factor(cname),data=data,
index=c("ccode"),model="within")
## These series are constants and have been removed: version, arda_isnatpct, cpds_lmo, p_sf, scip_ameantst, vi_ext, vi_nmw, vi_rag, vi_ram, vi_rcbg, vi_rcbm, vi_rsg, vi_rsm, wdi_ebrdpngnfl
# Note that ccode is a country code -- there are many, I'm just using this for convenience
out.plm2way <- plm(wdi_expmilgdp~I(wdi_gdppcpppcon/1000)+dr_eg+p_polity2+as.factor(cname),data=data,
index=c("ccode","year"),model="within",effect="twoways")
## These series are constants and have been removed: version, arda_isnatpct, cpds_lmo, p_sf, scip_ameantst, vi_ext, vi_nmw, vi_rag, vi_ram, vi_rcbg, vi_rcbm, vi_rsg, vi_rsm, wdi_ebrdpngnfl
And now we report the results:
stargazer(out.plain,out.lsdv,out.plm1way,out.plm2way,
type="html",
column.labels = c("Plain","LSDV","De-Meaned","Two Ways"),
title="Comparing Pooled and Fixed-Effects",
omit="as.factor",
covariate.labels = c("Per Capita GDP in 1000s", "Economic Globalization", "Polity 2 Score"),
notes=c("Fixed effects estimated but not shown in Fixed Effects column"),
add.lines = list(c("Fixed effects?", "No", "Country","Country","Country and Year")), # Note how this has changed!
dep.var.labels = "Military Expenditure"
)
| Dependent variable: | ||||
| Military Expenditure | ||||
| OLS | panel | |||
| linear | ||||
| Plain | LSDV | De-Meaned | Two Ways | |
| (1) | (2) | (3) | (4) | |
| Per Capita GDP in 1000s | 0.027*** | -0.034*** | -0.034*** | -0.012 |
| (0.004) | (0.012) | (0.012) | (0.013) | |
| Economic Globalization | -0.007* | -0.033*** | -0.033*** | -0.015* |
| (0.004) | (0.006) | (0.006) | (0.008) | |
| Polity 2 Score | -0.101*** | -0.024 | -0.024 | -0.003 |
| (0.009) | (0.017) | (0.017) | (0.017) | |
| Constant | 2.664*** | 3.702*** | ||
| (0.176) | (0.502) | |||
| Fixed effects? | No | Country | Country | Country and Year |
| Observations | 2,806 | 2,806 | 2,806 | 2,806 |
| R2 | 0.077 | 0.427 | 0.024 | 0.002 |
| Adjusted R2 | 0.076 | 0.396 | -0.027 | -0.060 |
| Residual Std. Error | 2.580 (df = 2802) | 2.086 (df = 2665) | ||
| F Statistic | 78.432*** (df = 3; 2802) | 14.161*** (df = 140; 2665) | 22.151*** (df = 3; 2665) | 1.442 (df = 3; 2643) |
| Note: | p<0.1; p<0.05; p<0.01 | |||
| Fixed effects estimated but not shown in Fixed Effects column | ||||
In this very stripped-down model, you can see how our substantive interpretation of the data changes radically based on our modeling choices. In a pooled model, increased GDP per capita promotes military spending; when we control for country-specific effects, it decreases military expenditure; and when we control for period effects, it has no statistically significant effect (although our estimate of \(\beta_1\) remains negative). Indeed, the only think we are certain of, across these specifications, is that more economic globalization leads to lower military expenditure—although, to be even more skeptical, I suspect part of that is driven by EU countries, who are highly ``globalized’’ and have lower military expenditure as a percent of GDP than other countries.
Please note: this is a toy analysis, not one that you should use as dispositive (“I learned in class that regime type only weakly affects military spending!”) A proper analysis would take way more time and effort to overcome problems that, by now, should be second nature. But the fact that we asked the same question with three different methods and got three different answers should be terrifying to you! It’s super easy to trick yourself into thinking that you have the right answer … when you might just have the wrong method.
Over the next term, we will begin to think about what modeling strategy is most appropriate. But these results should force you to think hard about appropriate choices.