In this session I’m going to replicate some of the results from the paper “Determinants of Long-Term Growth: A Bayesian Averaging of Classical Estimates (BACE) Approach” (Xavier Sala-i-Martin, Gernot Doppelhofer and Ronald I. Miller The American Economic Review Vol. 94, No. 4 (Sep., 2004), pp. 813-835).
The objective of the paper is to develop method for model specification. The method is called “Bayesian Averaging of Classical Estimates (BACE)”. In this method the researcher takes handful of variables and tries to narrow the support to the meaningful ones. Very often different researchers use different specifications to explain certain phenomenon. One have to ask what is the right specification. This method tries to address this issue.
The BACE model, as its name suggests, use Bayesian approach in order to choose the right model; e.g, it uses the notion that we have prior assumptions about the model’s important explanatory variables. Given those priors we look at the data and update our knowledge of the data. Specifically, \(g(\beta|y) = f(y|\beta)g(\beta)/f(y)\) where \(g(\beta)\) is the prior density on \(\beta\), \(f(y|\beta)\) is the data on \(\beta\) in the data, \(f(y)\) is the marginal density of \(y\) and \(g(\beta|y)\) is the posterior. Intuitively, this means that the researcher have beliefs on the data, but she also updates its beliefs due to the data.
The idea that the writers are going to introduce is that the researcher should randomize specifications, and then she will see what variables remain in the model given our data and the priors. It is clear though, that variables that have strong priors will be more likely to remain in each model specification. Once we have many model specifications we will estimate the variables for each specification and average them:
\(E(\beta|y) = \sum_{j=1}^{2^K} P(M_{j}|y)\hat{\beta_j}\)
Where \(K\) is the number of variables to keep on the \(j\)th specification. As we can see all that one have to choose is the \(K\) - the number of variables to keep in each specification.
To wrap it up - so far we have seen that given our priors and data we can calculate the probability that a variable is included in the model. All the researcher has to do is choosing the \(K\) - the number of variables to include in each randomization. The bigger the \(K\) the more combinations of variables are prevalent. Once we get our multiple models we average the covariates.
The paper is using the BACE method in order to determine what variables are important in explaining the GDP growth of 88 coutries. The writers chose 67 variables as a beginning, from whom they are going to choose the relevant ones, using BACE.
In what follows I am going to reproduce some of their paper.
First, we will take the data (the data can be found at the AER website):
data <- readxl::read_xls("C:/Users/dorgo/Documents/R/ml4econ/BACE_data.xls")
Next, we will subset the data:
{Africa <- c("Algeria","Benin","Botswana","Burundi","Cameroon",
"Cent'l Afr. Rep.","Congo","Egypt",
"Ethiopia","Gabon","Gambia","Ghana","Kenya","Lesotho",
"Liberia","Madagascar","Malawi",
"Mauritania","Morocco", "Niger", "Nigeria","Rwanda","Senegal",
"South africa","Tanzania", "Togo", "Tunisia",
"Uganda","Zaire","Zambia","Zimbabwe","Canada",
"Costa Rica","Dominican Rep.","El Salvador","Guatemala",
"Haiti","Honduras","Jamaica")
Europe_etc <- c("Netherlands","Norway","Portugal","Spain","Sweden",
"Turkey","United Kingdom","Australia","Fiji",
"Papua New Guinea")
Asia_etc <- c("Mexico","Panama","Trinidad & Tobago","United States",
"Argentina", "Bolivia", "Brazil", "Chile","Colombia",
"Ecuador","Paraguay","Peru","Uruguay","Venezuela","Hong Kong",
"India","Indonesia","Israel","Japan","Jordan","Korea","Malaysia",
"Nepal","Pakistan","Philippines","Singapore","Sri Lanka","Syria","Taiwan",
"Thailand","Austria","Belgium","Denmark","Finland","France",
"Germany, West","Greece","Ireland","Italy")
included <- c(Africa, Europe_etc, Asia_etc)
data1 <- data %>%
filter(data$COUNTRY %in% included)
data1[4:71] %<>% sapply(FUN = as.numeric)
}
Now we will write some functions to help us replicate the paper. It is important to note that the results will not be exactly the same, due to limits of processing time (the writers used 87M iterations, I will use only 1M).
bayesian <- function(iterations, k_size){
a <- bms(X.data = GR6096 ~ ., mcmc="bd",
iter = iterations,
g="UIP", mprior="fixed",
mprior.size = k_size, user.int = FALSE,
nmodel=10, data = data1[,4:71])
b <- paste("bms", k_size, sep = "")
assign(b, a, envir = .GlobalEnv)
}
ks <- c(5,7,9,11,16,22,28)
sapply(ks, bayesian, iterations = 1000000)
extract <- function(num, colu){
name <- paste("bms", num, sep = "")
a <- get(name)
b <- coef(a)[,colu]
return(b)
}
Now, I will replicate the results from the paper.
table2 <- coef(bms7) %>% as.data.frame()
table3 <- sapply(ks, extract, colu = 1) %>% as.data.frame()
colnames(table3) <- paste("kbar", ks, sep = "")
table4 <- sapply(ks, extract, colu = 2) %>% as.data.frame()
colnames(table4) <- paste("kbar", ks, sep = "")
table5 <- sapply(ks, extract, colu = 4) %>% as.data.frame()
colnames(table5) <- paste("kbar", ks, sep = "")
I will only display two of the results (though the rest was computed in the chunk above)
kable(table2, "html") %>%
kable_styling(position = "center") %>%
scroll_box(width = "500px", height = "200px")
PIP | Post Mean | Post SD | Cond.Pos.Sign | Idx | |
---|---|---|---|---|---|
EAST | 0.838140 | 0.0193115 | 0.0101693 | 0.9999726 | 14 |
P60 | 0.756978 | 0.0189050 | 0.0126921 | 1.0000000 | 44 |
IPRICE1 | 0.703474 | -0.0000580 | 0.0000429 | 0.0000000 | 29 |
TROPICAR | 0.551773 | -0.0080563 | 0.0079246 | 0.0004096 | 62 |
GDPCH60L | 0.494529 | -0.0039265 | 0.0044591 | 0.0005035 | 20 |
DENS65C | 0.342597 | 0.0000028 | 0.0000043 | 0.9994688 | 11 |
MALFAL66 | 0.271176 | -0.0044091 | 0.0078719 | 0.0002581 | 36 |
CONFUC | 0.206310 | 0.0115408 | 0.0247881 | 1.0000000 | 9 |
LIFE060 | 0.191369 | 0.0001490 | 0.0003441 | 0.9917228 | 34 |
LAAM | 0.169619 | -0.0021592 | 0.0053244 | 0.0149570 | 30 |
SAFRICA | 0.148902 | -0.0022143 | 0.0058705 | 0.0040362 | 55 |
SPAIN | 0.130906 | -0.0013871 | 0.0040037 | 0.0097704 | 59 |
MUSLIM00 | 0.098725 | 0.0011993 | 0.0041273 | 0.9904280 | 38 |
BUDDHA | 0.098542 | 0.0021020 | 0.0071650 | 1.0000000 | 5 |
AVELF | 0.097996 | -0.0011479 | 0.0039739 | 0.0006939 | 3 |
GVR61 | 0.092900 | -0.0040794 | 0.0145970 | 0.0038213 | 25 |
MINING | 0.086407 | 0.0032135 | 0.0118888 | 0.9999768 | 37 |
OPENDEC1 | 0.085635 | 0.0007870 | 0.0029552 | 0.9935540 | 41 |
RERD | 0.078479 | -0.0000061 | 0.0000241 | 0.0000000 | 53 |
YRSOPEN | 0.070527 | 0.0008070 | 0.0034025 | 0.9993903 | 66 |
H60 | 0.062957 | -0.0042903 | 0.0192760 | 0.0043998 | 26 |
POP1560 | 0.060308 | 0.0030890 | 0.0153386 | 0.9601545 | 48 |
GOVSH61 | 0.059808 | -0.0021945 | 0.0105906 | 0.0143794 | 24 |
OTHFRAC | 0.058741 | 0.0003834 | 0.0018024 | 0.9976167 | 43 |
DENS60 | 0.056617 | 0.0000006 | 0.0000031 | 0.9969797 | 10 |
TROPPOP | 0.046315 | -0.0004906 | 0.0026574 | 0.0066069 | 63 |
PRIGHTS | 0.046093 | -0.0000771 | 0.0004391 | 0.0362094 | 47 |
BRIT | 0.041742 | 0.0001965 | 0.0011794 | 0.9914714 | 4 |
GGCFD3 | 0.040958 | -0.0022662 | 0.0140473 | 0.0347673 | 22 |
HINDU00 | 0.035763 | 0.0005569 | 0.0037296 | 0.9705282 | 28 |
SCOUT | 0.035608 | -0.0001281 | 0.0008419 | 0.0026679 | 56 |
PRIEXP70 | 0.035540 | -0.0003277 | 0.0022927 | 0.0621272 | 51 |
GOVNOM1 | 0.035295 | -0.0012530 | 0.0081051 | 0.0069698 | 23 |
PROT00 | 0.034142 | -0.0003317 | 0.0024040 | 0.0627380 | 52 |
ABSLATIT | 0.031772 | 0.0000028 | 0.0000475 | 0.6243548 | 1 |
EUROPE | 0.030006 | 0.0000238 | 0.0016896 | 0.5544558 | 17 |
FERTLDC1 | 0.028848 | -0.0001240 | 0.0019735 | 0.3484124 | 18 |
REVCOUP | 0.028195 | -0.0001987 | 0.0015540 | 0.0069870 | 54 |
SIZE60 | 0.027492 | -0.0000305 | 0.0002830 | 0.0988651 | 57 |
CATH00 | 0.026810 | -0.0001690 | 0.0016731 | 0.1383066 | 6 |
CIV72 | 0.026782 | -0.0001796 | 0.0015182 | 0.0392428 | 7 |
COLONY | 0.025539 | -0.0001112 | 0.0010417 | 0.0929950 | 8 |
POP6560 | 0.022854 | -0.0000589 | 0.0183851 | 0.4751903 | 50 |
AIRDIST | 0.022404 | 0.0000000 | 0.0000001 | 0.3221300 | 2 |
LHCPC | 0.020237 | 0.0000046 | 0.0000666 | 0.7334091 | 33 |
DPOP6090 | 0.019800 | 0.0014847 | 0.0437023 | 0.6449495 | 13 |
POP60 | 0.019630 | 0.0000000 | 0.0000000 | 0.9422313 | 49 |
PI6090 | 0.019296 | -0.0000015 | 0.0000171 | 0.0476783 | 45 |
SQPI6090 | 0.019247 | 0.0000000 | 0.0000002 | 0.0985608 | 46 |
TOT1DEC1 | 0.018641 | 0.0005538 | 0.0074665 | 0.8701250 | 60 |
NEWSTATE | 0.018607 | 0.0000178 | 0.0002950 | 0.7736336 | 39 |
GDE1 | 0.018409 | 0.0008055 | 0.0112503 | 0.8679994 | 19 |
WARTORN | 0.018230 | -0.0000120 | 0.0004073 | 0.3302798 | 65 |
GEEREC1 | 0.018129 | 0.0017092 | 0.0269759 | 0.8450549 | 21 |
TOTIND | 0.017559 | -0.0001013 | 0.0015030 | 0.1546216 | 61 |
SOCIALIST | 0.016747 | 0.0000637 | 0.0008142 | 0.9855496 | 58 |
OIL | 0.015823 | 0.0000551 | 0.0009593 | 0.7935284 | 40 |
WARTIME | 0.015761 | -0.0000231 | 0.0011902 | 0.3749128 | 64 |
ENGFRAC | 0.015729 | -0.0000259 | 0.0009258 | 0.3936042 | 16 |
ORTH00 | 0.015568 | 0.0001036 | 0.0018590 | 0.8799460 | 42 |
LT100CR | 0.015507 | -0.0000270 | 0.0007473 | 0.3774424 | 35 |
ZTROPICS | 0.015416 | -0.0000290 | 0.0008354 | 0.3641671 | 67 |
HERF00 | 0.015220 | -0.0000642 | 0.0010526 | 0.1444810 | 27 |
ECORG | 0.015093 | -0.0000009 | 0.0001336 | 0.4694892 | 15 |
LANDLOCK | 0.014914 | -0.0000273 | 0.0005522 | 0.2572750 | 32 |
LANDAREA | 0.014498 | 0.0000000 | 0.0000000 | 0.4819285 | 31 |
DENS65I | 0.014181 | 0.0000000 | 0.0000019 | 0.3790283 | 12 |
The table above follows the second table from the paper, in which \(K=7\). In this table we can see the Posterior inclusion probability on the left column, the posterior mean conditional on inclusion, the s.d of the former column, and the certainty of the sign. The table shows that 18 variables are the most important given \(K = 7\).
kable(table5, "html") %>%
kable_styling(position = "center") %>%
scroll_box(width = "500px", height = "200px")
kbar5 | kbar7 | kbar9 | kbar11 | kbar16 | kbar22 | kbar28 | |
---|---|---|---|---|---|---|---|
EAST | 1.0000000 | 0.9999726 | 1.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 |
P60 | 1.0000000 | 1.0000000 | 0.9998830 | 1.0000000 | 1.0000000 | 1.0000000 | 1.0000000 |
IPRICE1 | 0.0000000 | 0.0000000 | 0.0000000 | 0.9992479 | 0.0000939 | 0.0000356 | 0.0000000 |
TROPICAR | 0.0002919 | 0.0004096 | 0.0004425 | 0.0000921 | 0.9982069 | 1.0000000 | 1.0000000 |
MALFAL66 | 0.0000359 | 0.0005035 | 0.0005803 | 0.0011109 | 0.0020900 | 0.9998971 | 0.9997700 |
GDPCH60L | 0.0014540 | 0.9994688 | 0.9998854 | 0.9995790 | 1.0000000 | 0.9921956 | 0.0006346 |
DENS65C | 1.0000000 | 0.0002581 | 1.0000000 | 1.0000000 | 0.9992396 | 0.0031843 | 0.0042482 |
LIFE060 | 0.9901516 | 1.0000000 | 0.0009440 | 0.9944940 | 0.9998980 | 0.0031432 | 0.9988453 |
SPAIN | 0.0029136 | 0.9917228 | 0.0194972 | 0.0120155 | 0.0029026 | 0.0124670 | 0.0178356 |
CONFUC | 1.0000000 | 0.0149570 | 0.9938396 | 0.0034777 | 0.0124001 | 0.0004788 | 0.9864349 |
LAAM | 0.0135675 | 0.0040362 | 0.0038540 | 0.0012028 | 0.9967089 | 0.9991885 | 0.9975094 |
SAFRICA | 0.0056695 | 0.0097704 | 0.0072230 | 0.9997827 | 0.9997824 | 0.9984524 | 0.9997626 |
GVR61 | 0.0010448 | 0.9904280 | 1.0000000 | 0.9998965 | 0.0028309 | 0.9966282 | 0.0055281 |
AVELF | 0.0002597 | 1.0000000 | 1.0000000 | 0.9953774 | 0.9975634 | 0.9994641 | 0.9935003 |
BUDDHA | 1.0000000 | 0.0006939 | 0.9878105 | 0.0070153 | 0.9965085 | 0.9966865 | 0.9934124 |
OPENDEC1 | 0.9960361 | 0.0038213 | 0.0013374 | 0.0004702 | 0.9990012 | 0.9954507 | 0.9953407 |
YRSOPEN | 1.0000000 | 0.9999768 | 0.0027741 | 0.0050941 | 0.0062613 | 0.0002133 | 0.0066676 |
MUSLIM00 | 0.9764486 | 0.9935540 | 0.9974608 | 0.9973672 | 0.0088176 | 0.0180082 | 0.0008388 |
RERD | 0.0000000 | 0.0000000 | 0.9866140 | 0.0000000 | 0.0000060 | 0.0323440 | 0.0388872 |
GOVSH61 | 0.0258568 | 0.9993903 | 0.9976769 | 0.9988442 | 0.0021382 | 0.0063637 | 0.0292408 |
H60 | 0.0014897 | 0.0043998 | 0.0000000 | 0.0253577 | 0.0303822 | 0.0126392 | 0.0020297 |
MINING | 1.0000000 | 0.9601545 | 0.9969188 | 0.0131370 | 0.0195550 | 0.9616922 | 0.9638914 |
POP1560 | 0.9432698 | 0.0143794 | 0.0236151 | 0.9783628 | 0.9596923 | 0.0026265 | 0.7919412 |
TROPPOP | 0.0088262 | 0.9976167 | 0.0136628 | 0.9957744 | 0.0140196 | 0.0344739 | 0.0392752 |
BRIT | 0.9973896 | 0.9969797 | 0.0052691 | 0.0338398 | 0.0270944 | 0.0273567 | 0.0020213 |
OTHFRAC | 0.9956865 | 0.0066069 | 0.9449499 | 0.0068680 | 0.9849362 | 0.0441570 | 0.9795539 |
PRIEXP70 | 0.0427285 | 0.0362094 | 0.0300684 | 0.9435402 | 0.9640732 | 0.0532203 | 0.0347143 |
DENS60 | 0.9945519 | 0.9914714 | 0.0103547 | 0.0115381 | 0.0181061 | 0.9470223 | 0.0425533 |
PRIGHTS | 0.0297478 | 0.0347673 | 0.0071478 | 0.9730671 | 0.0632707 | 0.0575358 | 0.9187230 |
ABSLATIT | 0.7517146 | 0.9705282 | 0.9779650 | 0.0175436 | 0.0200521 | 0.0028198 | 0.0362566 |
PROT00 | 0.0459113 | 0.0026679 | 0.9790616 | 0.9738543 | 0.0038267 | 0.7156889 | 0.0607893 |
SCOUT | 0.0021540 | 0.0621272 | 0.0018104 | 0.0026235 | 0.6966108 | 0.9663910 | 0.0719798 |
HINDU00 | 0.9767083 | 0.0069698 | 0.0582253 | 0.0041866 | 0.0035652 | 0.9583467 | 0.0486292 |
GOVNOM1 | 0.0164067 | 0.0627380 | 0.0713037 | 0.0858475 | 0.8631939 | 0.0344140 | 0.0557547 |
SIZE60 | 0.0796244 | 0.6243548 | 0.2525893 | 0.5901216 | 0.1246206 | 0.8024032 | 0.9188941 |
EUROPE | 0.5477184 | 0.5544558 | 0.0050172 | 0.0789799 | 0.9476055 | 0.0304840 | 0.8966488 |
GGCFD3 | 0.0825900 | 0.3484124 | 0.5499705 | 0.1847163 | 0.1251417 | 0.0664721 | 0.1965394 |
REVCOUP | 0.0034276 | 0.0069870 | 0.1519555 | 0.0795802 | 0.0660271 | 0.0604580 | 0.0527306 |
CIV72 | 0.0344701 | 0.0988651 | 0.5827687 | 0.0723053 | 0.0728469 | 0.8664876 | 0.0611841 |
COLONY | 0.0956391 | 0.1383066 | 0.1213194 | 0.1391926 | 0.9516164 | 0.9489659 | 0.7522167 |
POP6560 | 0.4377194 | 0.0392428 | 0.0693072 | 0.2476116 | 0.4110972 | 0.1752038 | 0.9760258 |
FERTLDC1 | 0.4170570 | 0.0929950 | 0.2894308 | 0.1605457 | 0.1431437 | 0.3507316 | 0.3227974 |
CATH00 | 0.1834077 | 0.4751903 | 0.8074614 | 0.8746640 | 0.2293214 | 0.2967356 | 0.9414636 |
PI6090 | 0.0405174 | 0.3221300 | 0.5344786 | 0.4809732 | 0.9298539 | 0.3264203 | 0.9316514 |
SQPI6090 | 0.0547602 | 0.7334091 | 0.1346959 | 0.9190784 | 0.7585217 | 0.1573904 | 0.1768022 |
DPOP6090 | 0.6793459 | 0.6449495 | 0.9185238 | 0.6099967 | 0.2374492 | 0.9817841 | 0.9226316 |
GDE1 | 0.9446998 | 0.9422313 | 0.8788833 | 0.8240765 | 0.0997153 | 0.9055678 | 0.2589900 |
WARTORN | 0.2873589 | 0.0476783 | 0.0541076 | 0.8146693 | 0.9190261 | 0.9329807 | 0.1434518 |
NEWSTATE | 0.8217165 | 0.0985608 | 0.8301538 | 0.9948881 | 0.4936576 | 0.0704008 | 0.3399390 |
AIRDIST | 0.2894000 | 0.8701250 | 0.9885421 | 0.1805732 | 0.7809978 | 0.7191525 | 0.0705414 |
SOCIALIST | 0.9987054 | 0.7736336 | 0.6312985 | 0.1144136 | 0.6902596 | 0.1999461 | 0.3786010 |
WARTIME | 0.2028565 | 0.8679994 | 0.7861844 | 0.5507275 | 0.1315756 | 0.5796536 | 0.5382243 |
POP60 | 0.9462052 | 0.3302798 | 0.3868591 | 0.7599710 | 0.6860807 | 0.3158392 | 0.6096466 |
LHCPC | 0.5310281 | 0.8450549 | 0.7983607 | 0.1226738 | 0.9842786 | 0.2503271 | 0.6091377 |
OIL | 0.7358931 | 0.1546216 | 0.3102515 | 0.7799627 | 0.6482800 | 0.5606528 | 0.6189761 |
ENGFRAC | 0.2787334 | 0.9855496 | 0.1451161 | 0.9129400 | 0.5406897 | 0.4975778 | 0.4012236 |
TOTIND | 0.1683113 | 0.7935284 | 0.1890896 | 0.1559409 | 0.2238068 | 0.5509662 | 0.6828875 |
LT100CR | 0.3741114 | 0.3749128 | 0.2888909 | 0.4978541 | 0.3350369 | 0.5440671 | 0.3420667 |
ORTH00 | 0.8885191 | 0.3936042 | 0.4633074 | 0.3466382 | 0.3672915 | 0.4329094 | 0.4591506 |
TOT1DEC1 | 0.8789141 | 0.8799460 | 0.2832606 | 0.4079328 | 0.2616272 | 0.3932785 | 0.4538845 |
GEEREC1 | 0.8475396 | 0.3774424 | 0.4545767 | 0.4846810 | 0.3064087 | 0.6499269 | 0.5344941 |
ZTROPICS | 0.2395763 | 0.3641671 | 0.7779417 | 0.2301488 | 0.4457482 | 0.5041232 | 0.1625185 |
ECORG | 0.5883517 | 0.1444810 | 0.3132020 | 0.3296926 | 0.2866268 | 0.2512713 | 0.5790139 |
LANDLOCK | 0.4828340 | 0.4694892 | 0.1511065 | 0.3264070 | 0.3046215 | 0.1844666 | 0.3224001 |
DENS65I | 0.4065801 | 0.2572750 | 0.8207921 | 0.4095387 | 0.5985423 | 0.6416562 | 0.6047849 |
HERF00 | 0.1408257 | 0.4819285 | 0.4542343 | 0.2606749 | 0.2310337 | 0.3814386 | 0.5458431 |
LANDAREA | 0.2766586 | 0.3790283 | 0.3708982 | 0.7974722 | 0.6899957 | 0.4457971 | 0.2075870 |
The second table follows the fifth table in the paper. The table shows the certainty of the sign per \(K\) (\(K\) is taking the values of 5,7,9,11,16,22,28). The numbers in the table are the probabilities that the sign is certain.
I am going to argue that we can use this method, combined with lasso to show some patterns of causality. We can recall that lasso throws away variables, and it will throw away variables that are highly correlated with other variables in our set of variables. In our case we have a dummy variable for ‘east asian’ countries, and Confutian variable for percentage of Confutians in the country. This variables are, obviously, highly correlated, and we expect lasso to throw the Confutian variable away for \(\lambda\) high enough. If so, we would be able to know that East-Asian country -> more growth. Nontheless, we don’t know what is the mechanism it goes through. Here the BACE comes in handy.
The BACE method starts at randomization of \(K\) variables. So, the higher the \(K\) the higher the probability that we will get both East Asian dummy variable and the Confutian variable, and the importance of the dummy will decrease. If we will see that happen we will know that mechanism of the East Asia variable go through Confutian variable.
To summarise - we can use the lasso for causation, and the BACE for mechanism given the lasso.
I will now show that the importance of the East Asian dummy will decrease as \(K\) goes up. I will also show the influence of being in East Asia on GDP growth. I will leave for someone else to show the relations between Confutian population percentage and GDP growth.
First, I will show the change in the importance of the variables as \(K\) increases.
bayesian2 <- function(iterations, k_size){
a <- bms(X.data = GR6096 ~ ., mcmc="bd",
iter = iterations,
g="UIP", mprior="fixed",
mprior.size = k_size, user.int = FALSE,
nmodel=10, data = data1[,4:71])
b <- coef(a) %>% as.data.frame() %>% rownames_to_column()
colnames(b)[1] <- "name"
b %<>% rownames_to_column()
b <- b[,c(2,1,3)]
return(b)
}
n <- 100000
df <- bayesian2(iterations = n, k_size = 1)
df1 <- df[,1:2]
df2 <- df[,c(1,3)]
colnames(df1)[2] <- "1"
colnames(df2)[2] <- "1"
for (i in 2:67) {
tmp <- bayesian2(iterations = n, k_size = i)
tmp1 <- tmp[,1:2]
tmp2 <- tmp[,c(1,3)]
colnames(tmp1)[2] <- i
colnames(tmp2)[2] <- i
df1 %<>% left_join(tmp1, by = "name")
df2 %<>% left_join(tmp2, by = "name")
}
We can see that the EAST variable is the most important one, but ist importance goes down as the \(K\) goes up, because now it also includes Confutian variable, and it takes some of the variance of EAST. (It’s consistent with table 3 in the paper, which has not been shown here). It’s can be seen that the row of EAST tend to white, but as we get to the left of the x-axis it gets less white. It can be seen both in the first graph (which reflects the probablity of EAST to be in the model) and both in the second graph (which shows the estimator for the \(\beta\)s).
long_df1 <- melt(df1, id = "name")
long_df1$name <- factor(long_df1$name, levels = df1$name)
long_df1$value <- as.numeric(levels(long_df1$value))[long_df1$value]
p <- long_df1 %>%
ggplot(aes(y = name, x = variable)) +
geom_tile(aes(fill = value),colour = "white") +
scale_fill_gradient(low = "firebrick3",high = "white")
p + theme_grey(base_size = 9) + labs(x = "",
y = "") + scale_x_discrete(expand = c(0, 0)) +
scale_y_discrete(expand = c(0, 0))
long_df2 <- melt(df2, id = "name")
long_df2$name <- factor(long_df2$name, levels = df2$name)
v <- long_df2 %>%
ggplot(aes(y = name, x = variable)) +
geom_tile(aes(fill = value),colour = "white") +
scale_fill_gradient(low = "dodgerblue4",high = "white")
v + theme_grey(base_size = 9) + labs(x = "",
y = "") + scale_x_discrete(expand = c(0, 0)) +
scale_y_discrete(expand = c(0, 0))
Now, to finish this section we will use the double selection lasso to see the influence of the East Asia dummy on the GDP growth. We would like to see if it throws away the Confutian variable. In any case we can still argue that it is part of the East Asia mechanism as explained above.
YT <- data1$GR6096
Y0 <- data1$EAST
X <- data1[,(c(5:17, 19:71))] %>% as.matrix()
double_Lasso <- rlassoEffect(x = X, y = YT, d = Y0,
method = "double selection")
double_Lasso$selection.index
## ABSLATIT AIRDIST AVELF BRIT BUDDHA CATH00 CIV72
## FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## COLONY CONFUC DENS60 DENS65C DENS65I DPOP6090 ECORG
## FALSE TRUE FALSE TRUE FALSE FALSE FALSE
## ENGFRAC EUROPE FERTLDC1 GDE1 GDPCH60L GEEREC1 GGCFD3
## FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## GOVNOM1 GOVSH61 GVR61 H60 HERF00 HINDU00 IPRICE1
## FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## LAAM LANDAREA LANDLOCK LHCPC LIFE060 LT100CR MALFAL66
## FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## MINING MUSLIM00 NEWSTATE OIL OPENDEC1 ORTH00 OTHFRAC
## FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## P60 PI6090 SQPI6090 PRIGHTS POP1560 POP60 POP6560
## TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## PRIEXP70 PROT00 RERD REVCOUP SAFRICA SCOUT SIZE60
## FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## SOCIALIST SPAIN TOT1DEC1 TOTIND TROPICAR TROPPOP WARTIME
## FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## WARTORN YRSOPEN ZTROPICS
## FALSE TRUE FALSE
double_Lasso$coefficients.reg
## (Intercept) xd1 xBUDDHA xCONFUC xDENS65C
## 1.430168e-02 8.660365e-03 1.383491e-02 5.270209e-02 2.504732e-06
## xP60 xRERD xTROPPOP xYRSOPEN
## 1.304975e-02 -6.886884e-05 -9.932221e-03 9.374284e-03
As we can see the double selection lasso kept the Confutian variable. It is possible though, that for higher \(\lambda\) it would have thrown it away.
Someone, someday may try and find the pattern exists between CONFUC and growth. I will leave it open for further work.