Running Multiple Regressions at Once

The goal here is to show how to run many regressions at once, allowing for very fast comparisons. We analyze the bahavior of economic openness through time for different countries. Openness is measured in % of GDP in current prices, a simple linear regression model is fit for each country, with year as the independent variable and openness as the dependent variable; data comes from Penn World Table 7.1.

First load the libraries and the data.

library(pwt)
library(dplyr)
library(tidyr)
library(purrr)
library(broom)
library(pander)

The code is written as sequence of dplyr pipes, but could be broken down into intermediate tibbles if desired. The “trick” is combining tidyr’s nest and purrr’s map functions. Nest will perform an operation similar to grouping and store the operation as a list; and map will apply a function, namely linear regression, to the list for each nested element. Broom’s tidy function will exctract coeffcients from the regression and this will allow the creation of a dataframe with intercepts and coefficients for each country.

open_coefs <- pwt7.1 %>% select(country, year, openc) %>% 
              na.omit( )%>%
              nest(-country) %>%
              mutate(model = map(data, ~lm(openc ~ year, data = .)),
                     tidied = map(model, tidy)) %>%
              unnest(tidied) %>%
              filter(term == "year") %>%
              mutate(adjusted = p.adjust(p.value)) %>%
              filter(adjusted < 0.05) %>%
              arrange(desc(estimate))

The result of this operation is a table of slope coefficients, one for each country, arranged by order of magnitude. Note that only significant coefficients are shown, selected via the p.adjust function, with p value correction for multiple hypothesis testing. The first 10 elements of this table are:

pander(head(open_coefs, 10))

country	term	estimate	std.error	statistic	p.value	adjusted
Hong Kong	year	4.928	0.3353	14.7	1.366e-19	2.144e-17
Slovak Republic	year	4.584	0.4834	9.483	3.145e-09	3.428e-07
Kyrgyzstan	year	4.06	0.6464	6.281	1.096e-05	0.0008876
Cambodia	year	3.647	0.3147	11.59	3.361e-14	4.739e-12
Malaysia	year	2.896	0.1732	16.72	5.238e-23	8.8e-21
Lesotho	year	2.715	0.1856	14.63	1.631e-19	2.544e-17
Georgia	year	2.674	0.3985	6.711	5.005e-06	0.0004355
Singapore	year	2.656	0.3627	7.323	2.099e-09	2.33e-07
Vietnam	year	2.626	0.2462	10.67	3.972e-13	5.283e-11
Serbia	year	2.513	0.4143	6.066	7.802e-06	0.0006476

A simple way of determining if more countries are becoming open is to plot the coefficients.

barplot(open_coefs$estimate, ylab = "Coefficients", xlab = "Countries", main = "Openness Coefficients")

As we can see, the majority of countries have coefficients above zero, meaning they became more open with time.

Other Applications

The same code can be executed for other means, let’s do a quick example with multiple regression, having per capita gdp explained by openness (openc) and government share of gdp (cg).

multiple_growth <- pwt7.1 %>% select(country, openc, cg, cgdp) %>% 
  na.omit() %>%
  nest(-country) %>%
  mutate(model = map(data, ~lm(cgdp ~ openc + cg, data = .)),
         tidied = map(model, tidy)) %>%
  unnest(tidied)

This time we didn’t filter for a specific coefficient or only for significant ones, let’s take a look at the output.

pander(head(multiple_growth, 18))

country	term	estimate	std.error	statistic	p.value
Afghanistan	(Intercept)	255	77.66	3.283	0.00221
Afghanistan	openc	-5.025	1.093	-4.597	4.632e-05
Afghanistan	cg	69.99	10.29	6.803	4.549e-08
Albania	(Intercept)	615.3	768.7	0.8004	0.4284
Albania	openc	80.04	12.53	6.39	1.661e-07
Albania	cg	-254	95.63	-2.656	0.0115
Algeria	(Intercept)	3775	1943	1.942	0.05799
Algeria	openc	24.93	17.73	1.406	0.1661
Algeria	cg	-169.4	118	-1.435	0.1578
Angola	(Intercept)	5213	886	5.884	8.214e-07
Angola	openc	0.4503	4.256	0.1058	0.9163
Angola	cg	-99.51	16.32	-6.097	4.185e-07
Antigua and Barbuda	(Intercept)	-44051	10384	-4.242	0.0001368
Antigua and Barbuda	openc	40.93	26.06	1.57	0.1246
Antigua and Barbuda	cg	1225	237	5.169	7.822e-06
Argentina	(Intercept)	9681	1165	8.309	1.863e-11
Argentina	openc	155.5	22.09	7.039	2.517e-09
Argentina	cg	-1040	105.2	-9.888	4.707e-14

We won’t go into interpreting any of the results, since the idea was just to show how powerful this code can by in providing quick comparisons.

Running Multiple Regressions at Once

Analyzing Economic Openness Through Time

Marcelo Bohrer

Other Applications