Perform forward stepwise-selection on all variables included in Robert’s canonical specification (“pre_coverage”, “rGDPg2020”, “digIDcov”, “dig_reg”, “eap”, “log_deaths”). At each iteration, the variable which would have the the smallest p-value if added to the model is added. I add in the few variables which are not statistically significant all at once.
Note that the final regression on the right weights each observation by the country population.
##
## Stepwise Selection Summary
## -----------------------------------------------------------------------------------------
## Added/ Adj.
## Step Variable Removed R-Square R-Square C(p) AIC RMSE
## -----------------------------------------------------------------------------------------
## 1 pre_coverage addition 0.505 0.499 55.3290 -8.9381 0.2239
## 2 dig_reg addition 0.644 0.635 19.6250 -34.2834 0.1910
## 3 eap addition 0.675 0.663 13.1690 -39.8843 0.1836
## 4 log_deaths addition 0.711 0.696 5.4160 -47.6148 0.1743
## -----------------------------------------------------------------------------------------
| Dependent variable: | |||||
| coverage | |||||
| (1) | (2) | (3) | (4) | (5) | |
| pre_coverage | 0.626*** | 1.147*** | 0.849*** | 0.731*** | 0.673*** |
| (0.112) | (0.117) | (0.138) | (0.134) | (0.121) | |
| rGDPg2020 | -0.005 | ||||
| (0.004) | |||||
| log_deaths | 0.034** | 0.040*** | |||
| (0.015) | (0.013) | ||||
| dig_id | 0.058 | ||||
| (0.065) | |||||
| dig_reg | 0.190*** | 0.262*** | 0.267*** | 0.204*** | |
| (0.063) | (0.057) | (0.054) | (0.059) | ||
| eap | 0.268*** | 0.164** | 0.279*** | ||
| (0.080) | (0.082) | (0.077) | |||
| Constant | -0.089** | 0.131*** | 0.055** | 0.046** | -0.070* |
| (0.044) | (0.028) | (0.022) | (0.019) | (0.037) | |
| Observations | 83 | 83 | 83 | 83 | 83 |
| R2 | 0.720 | 0.505 | 0.644 | 0.675 | 0.711 |
| Adjusted R2 | 0.698 | 0.499 | 0.635 | 0.663 | 0.696 |
| Residual Std. Error | 0.174 (df = 76) | 0.224 (df = 81) | 0.191 (df = 80) | 0.184 (df = 79) | 0.174 (df = 78) |
| F Statistic | 32.592*** (df = 6; 76) | 82.746*** (df = 1; 81) | 72.416*** (df = 2; 80) | 54.771*** (df = 3; 79) | 48.029*** (df = 4; 78) |
| Note: | p<0.1; p<0.05; p<0.01 | ||||
The following regressions include only one of the variables digital ID coverage, digital ID registration, and FSI each.
| Dependent variable: | ||||
| coverage | ||||
| (1) | (2) | (3) | (4) | |
| pre_coverage | 0.739*** | 0.641*** | 0.735*** | 0.792*** |
| (0.115) | (0.112) | (0.121) | (0.101) | |
| rGDPg2020 | -0.004 | -0.005 | -0.004 | -0.007 |
| (0.004) | (0.004) | (0.005) | (0.005) | |
| eap | 0.318*** | 0.268*** | 0.287*** | 0.294*** |
| (0.073) | (0.076) | (0.078) | (0.063) | |
| log_deaths | 0.052*** | 0.035** | 0.052*** | 0.058*** |
| (0.013) | (0.015) | (0.015) | (0.016) | |
| dig_id | 0.139** | |||
| (0.057) | ||||
| dig_reg | 0.205*** | |||
| (0.059) | ||||
| fsi | -0.019 | |||
| (0.016) | ||||
| soc_reg_coverage | 0.0003 | |||
| (0.001) | ||||
| Constant | -0.141*** | -0.063* | 0.086 | -0.089** |
| (0.038) | (0.038) | (0.159) | (0.045) | |
| Observations | 83 | 83 | 82 | 78 |
| R2 | 0.667 | 0.717 | 0.653 | 0.673 |
| Adjusted R2 | 0.646 | 0.699 | 0.630 | 0.650 |
| Residual Std. Error | 0.188 (df = 77) | 0.174 (df = 77) | 0.192 (df = 76) | 0.184 (df = 72) |
| F Statistic | 30.898*** (df = 5; 77) | 39.021*** (df = 5; 77) | 28.559*** (df = 5; 76) | 29.630*** (df = 5; 72) |
| Note: | p<0.1; p<0.05; p<0.01 | |||
The following regressions only include either GDP growth and deaths but not both.
| Dependent variable: | ||
| coverage | ||
| (1) | (2) | |
| pre_coverage | 0.647*** | 0.667*** |
| (0.135) | (0.127) | |
| dig_reg | 0.252*** | 0.200*** |
| (0.065) | (0.067) | |
| eap | 0.191** | 0.283*** |
| (0.086) | (0.078) | |
| soc_reg_coverage | 0.0001 | -0.001 |
| (0.001) | (0.001) | |
| rGDPg2020 | -0.012*** | |
| (0.004) | ||
| log_deaths | 0.048*** | |
| (0.014) | ||
| Constant | 0.028 | -0.086** |
| (0.019) | (0.038) | |
| Observations | 78 | 78 |
| R2 | 0.710 | 0.723 |
| Adjusted R2 | 0.690 | 0.704 |
| Residual Std. Error (df = 72) | 0.173 | 0.169 |
| F Statistic (df = 5; 72) | 35.269*** | 37.651*** |
| Note: | p<0.1; p<0.05; p<0.01 | |
The first figure below shows predicted versus actual coverage for a model with the following RHS vars: pre-COVID coverage, real GDP growth 2020, digital ID coverage, digital registration, and log of deaths. I wasn’t able to see any pattern in the outliers but I could be missing something. The next few graphs are partial adjustment graphs which are sometimes useful for checking if there are any important non-linearities that we should be taking into account. (There don’t seem to be any.)
Perform stepwise forward selection to select independent variables in a regression of spending on other vars. Do the same thing with spending / coverage.
##
## Stepwise Selection Summary
## ------------------------------------------------------------------------------------------
## Added/ Adj.
## Step Variable Removed R-Square R-Square C(p) AIC RMSE
## ------------------------------------------------------------------------------------------
## 1 pre_coverage addition 0.194 0.180 1.7080 -409.7216 0.0077
## 2 pre_spending addition 0.244 0.217 0.1740 -411.5137 0.0075
## 3 dig_reg addition 0.291 0.253 -1.2420 -413.4207 0.0074
## ------------------------------------------------------------------------------------------
##
## Stepwise Selection Summary
## --------------------------------------------------------------------------------------
## Added/ Adj.
## Step Variable Removed R-Square R-Square C(p) AIC RMSE
## --------------------------------------------------------------------------------------
## 1 fsi addition 0.046 0.029 -3.8090 -304.4924 0.0185
## --------------------------------------------------------------------------------------
| Dependent variable: | ||||
| spending | ||||
| (1) | (2) | (3) | (4) | |
| pre_coverage | 0.016*** | 0.019*** | 0.015*** | 0.010** |
| (0.006) | (0.004) | (0.005) | (0.005) | |
| pre_spending | 0.001 | 0.002 | 0.002 | |
| (0.001) | (0.002) | (0.002) | ||
| dig_reg | -0.001 | 0.004** | ||
| (0.002) | (0.002) | |||
| expansion | 0.021*** | |||
| (0.005) | ||||
| Constant | -0.0001 | 0.004*** | 0.003*** | 0.001 |
| (0.001) | (0.001) | (0.001) | (0.001) | |
| Observations | 60 | 60 | 60 | 60 |
| R2 | 0.501 | 0.194 | 0.244 | 0.291 |
| Adjusted R2 | 0.465 | 0.180 | 0.217 | 0.253 |
| Residual Std. Error | 0.006 (df = 55) | 0.008 (df = 58) | 0.008 (df = 57) | 0.007 (df = 56) |
| F Statistic | 13.812*** (df = 4; 55) | 13.991*** (df = 1; 58) | 9.183*** (df = 2; 57) | 7.675*** (df = 3; 56) |
| Note: | p<0.1; p<0.05; p<0.01 | |||