The goal here is to show how to run many regressions at once, allowing for very fast comparisons. We analyze the bahavior of economic openness through time for different countries. Openness is measured in % of GDP in current prices, a simple linear regression model is fit for each country, with year as the independent variable and openness as the dependent variable; data comes from Penn World Table 7.1.
First load the libraries and the data.
library(pwt)
library(dplyr)
library(tidyr)
library(purrr)
library(broom)
library(pander)
The code is written as sequence of dplyr pipes, but could be broken down into intermediate tibbles if desired. The “trick” is combining tidyr’s nest and purrr’s map functions. Nest will perform an operation similar to grouping and store the operation as a list; and map will apply a function, namely linear regression, to the list for each nested element. Broom’s tidy function will exctract coeffcients from the regression and this will allow the creation of a dataframe with intercepts and coefficients for each country.
open_coefs <- pwt7.1 %>% select(country, year, openc) %>%
na.omit( )%>%
nest(-country) %>%
mutate(model = map(data, ~lm(openc ~ year, data = .)),
tidied = map(model, tidy)) %>%
unnest(tidied) %>%
filter(term == "year") %>%
mutate(adjusted = p.adjust(p.value)) %>%
filter(adjusted < 0.05) %>%
arrange(desc(estimate))
The result of this operation is a table of slope coefficients, one for each country, arranged by order of magnitude. Note that only significant coefficients are shown, selected via the p.adjust function, with p value correction for multiple hypothesis testing. The first 10 elements of this table are:
pander(head(open_coefs, 10))
| country | term | estimate | std.error | statistic | p.value | adjusted |
|---|---|---|---|---|---|---|
| Hong Kong | year | 4.928 | 0.3353 | 14.7 | 1.366e-19 | 2.144e-17 |
| Slovak Republic | year | 4.584 | 0.4834 | 9.483 | 3.145e-09 | 3.428e-07 |
| Kyrgyzstan | year | 4.06 | 0.6464 | 6.281 | 1.096e-05 | 0.0008876 |
| Cambodia | year | 3.647 | 0.3147 | 11.59 | 3.361e-14 | 4.739e-12 |
| Malaysia | year | 2.896 | 0.1732 | 16.72 | 5.238e-23 | 8.8e-21 |
| Lesotho | year | 2.715 | 0.1856 | 14.63 | 1.631e-19 | 2.544e-17 |
| Georgia | year | 2.674 | 0.3985 | 6.711 | 5.005e-06 | 0.0004355 |
| Singapore | year | 2.656 | 0.3627 | 7.323 | 2.099e-09 | 2.33e-07 |
| Vietnam | year | 2.626 | 0.2462 | 10.67 | 3.972e-13 | 5.283e-11 |
| Serbia | year | 2.513 | 0.4143 | 6.066 | 7.802e-06 | 0.0006476 |
A simple way of determining if more countries are becoming open is to plot the coefficients.
barplot(open_coefs$estimate, ylab = "Coefficients", xlab = "Countries", main = "Openness Coefficients")
As we can see, the majority of countries have coefficients above zero, meaning they became more open with time.
The same code can be executed for other means, let’s do a quick example with multiple regression, having per capita gdp explained by openness (openc) and government share of gdp (cg).
multiple_growth <- pwt7.1 %>% select(country, openc, cg, cgdp) %>%
na.omit() %>%
nest(-country) %>%
mutate(model = map(data, ~lm(cgdp ~ openc + cg, data = .)),
tidied = map(model, tidy)) %>%
unnest(tidied)
This time we didn’t filter for a specific coefficient or only for significant ones, let’s take a look at the output.
pander(head(multiple_growth, 18))
| country | term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|---|
| Afghanistan | (Intercept) | 255 | 77.66 | 3.283 | 0.00221 |
| Afghanistan | openc | -5.025 | 1.093 | -4.597 | 4.632e-05 |
| Afghanistan | cg | 69.99 | 10.29 | 6.803 | 4.549e-08 |
| Albania | (Intercept) | 615.3 | 768.7 | 0.8004 | 0.4284 |
| Albania | openc | 80.04 | 12.53 | 6.39 | 1.661e-07 |
| Albania | cg | -254 | 95.63 | -2.656 | 0.0115 |
| Algeria | (Intercept) | 3775 | 1943 | 1.942 | 0.05799 |
| Algeria | openc | 24.93 | 17.73 | 1.406 | 0.1661 |
| Algeria | cg | -169.4 | 118 | -1.435 | 0.1578 |
| Angola | (Intercept) | 5213 | 886 | 5.884 | 8.214e-07 |
| Angola | openc | 0.4503 | 4.256 | 0.1058 | 0.9163 |
| Angola | cg | -99.51 | 16.32 | -6.097 | 4.185e-07 |
| Antigua and Barbuda | (Intercept) | -44051 | 10384 | -4.242 | 0.0001368 |
| Antigua and Barbuda | openc | 40.93 | 26.06 | 1.57 | 0.1246 |
| Antigua and Barbuda | cg | 1225 | 237 | 5.169 | 7.822e-06 |
| Argentina | (Intercept) | 9681 | 1165 | 8.309 | 1.863e-11 |
| Argentina | openc | 155.5 | 22.09 | 7.039 | 2.517e-09 |
| Argentina | cg | -1040 | 105.2 | -9.888 | 4.707e-14 |
We won’t go into interpreting any of the results, since the idea was just to show how powerful this code can by in providing quick comparisons.