For this assignment, you will utilize the Garrett data available on the Canvas site. The Garrett data are employed in his book, Partisan Politics in the Global Economy (Cambridge University Press, 1998). With the exception of country, the variables are largely self-explanatory.
The country variable includes identifiers for 14 OECD countries.
For this assignment, we will examine whether power of labor parties (leftlab) influence economic growth, controlling for other economic factors. For this assignment, treat the data as repeated observations (time) nested within countries.
Important
I implemented all models in both R and Python (See respective tabs). However, the results are marginally different. Therefore, I base all interpretations on the results from R…cause R is just cooler!
Load and Clean Data (select variables)
Show the code
oecd_df <-read_dta('data/GarrettData.dta') %>%select(!contains(c("Icc", "per"))) %>%select(!c('clint', 'demand', 'corp', 'capmob', 'year', 'gdpl', 'unem')) %>%mutate(country_lab =fct_recode(as_factor(country),US ="2", Canada ="20", UK ="200", Netherlands ="210", Belgium ="211", France ="220", Germany ="260", Austria ="305", Italy ="325", Finland ="375", Sweden ="380", Norway ="385", Denmark ="390", Japan ="740")) %>%# mutate_at(vars, function(x) rescale(x,to = c(0,10)) ) %>% group_by(country) %>%mutate(mean_leftlab =mean(leftlab, na.rm =TRUE)) %>%ungroup() oecd_df %>%head() %>% knitr::kable()
country
infl
gdp
uneml
infll
trade
oild
leftlab
country_lab
mean_leftlab
2
2.9
5.111141
4.5
1.99005
9.622906
0.000675
1.3764145
US
0.7189868
2
2.8
2.277283
3.8
2.90000
9.983546
0.000590
1.1766782
US
0.7189868
2
4.2
4.700000
3.8
2.80000
10.089120
0.000662
1.1766782
US
0.7189868
2
5.4
2.800000
3.6
4.20000
10.435930
0.000720
0.4331716
US
0.7189868
2
5.9
-0.200000
3.5
5.40000
10.495350
0.000773
0.4331716
US
0.7189868
2
4.3
3.100000
4.9
5.90000
11.278270
0.001063
0.4840479
US
0.7189868
Question 1.
Prompt:
In a few sentences, theorize the relationship between labor party power and economic growth.
Response: Labor party power could influence economic growth through a range of policies, such as pro-worker reforms or redistributive measures, which may boost consumer spending and stimulate economic activity. However, the impact could differ depending on contextual factors, political institutions, and economic conditions in each region. By employing a random effects model, the analysis can account for country-specific variations and unobserved heterogeneity.
Question 2
Prompt:
Estimate a random intercepts model with economic development (gdp) as the dependent variable, leftlab as the independent variable, and include three economic-based variables. Report the random effects component of the results in standard deviations. Interpret both the fixed and random effects results.
FIXED EFFECT
The variable of interest is leftlab. Once controlling for inflation (lagged one year), openness to trade, and oil dependence, the changes in the power of left-labor parties no longer has a statistically significant association with changes in gdp (based on an exceedingly high p-value).
If, for the sake of exercise, we pretend that the power of left-labor parting was significant, we might say that a one unit increase in the power of left-labor parties, the effect of left-labor parties on gdp is 0.19 units controlling the the variables stated above. However, as stated earlier - this effect is NOT statistically significant at any of the excepted standards.
Notably, the remaining control variables do have statistically significant associations with changes in gdp.
RANDOM EFFECT (country)
The random effects represent the variation across different groups (countries) that are not explained by the fixed effects. The estimated standard deviation of the random intercept for “country” is 0.85. It quantifies the variability in the baseline “gdp” values among different countries. (NB: A larger value indicates higher variation between countries.
ICC represents the proportion of total variance in the dependent variable (in this case, gdp) that can be attributed to the variability between groups (countries) relative to the total variability. It is a measure of the similarity or homogeneity of observations within each group compared to the overall variability in the entire dataset.
In this specific model, the ICC for the random effect country is 0.14, which indicates that approximately 14% of the total variance in gdp can be attributed to differences between countries, while the remaining 86% is due to variation within individual countries.
In the given model, leftlab itself may be considered a within level (fixed effect) variable, as it is likely to vary within each country, and each country may have its own relationship between leftlab and gdp. On the other hand, mean_leftlab is a between level (fixed effect) variable as it represents an aggregate measure of leftlab across all countries and has the same value for all observations within the same country. In other words, it does not vary within each country; instead, it captures the overall variation of leftlabbetween different countries.
Notably, neither leftlab nor mean_leftlab were statistically significant.
If we pretend for a second that these predictors are significant, it would be interesting to notice that the between level predicted line has a radically different slope than the within predictions.
Show the code
## Example 4.3 - Within and Between Group Regressions jtools::summ( mod_rand_int_mean <-lmer(gdp ~ leftlab +# uneml + infl + trade + oild + mean_leftlab + (1|country),data = oecd_df,REML =FALSE))
MODEL INFO:
Observations: 350
Dependent Variable: gdp
Type: Mixed effects linear regression
MODEL FIT:
AIC = 1542.06, BIC = 1572.92
Pseudo-R² (fixed effects) = 0.23
Pseudo-R² (total) = 0.35
FIXED EFFECTS:
-----------------------------------------------------------
Est. S.E. t val. d.f. p
------------------ -------- ------ -------- -------- ------
(Intercept) 6.09 0.88 6.96 25.68 0.00
leftlab 0.15 0.20 0.74 340.64 0.46
infl -0.18 0.03 -5.61 357.98 0.00
trade -0.04 0.01 -3.74 60.01 0.00
oild -13.34 6.00 -2.22 239.94 0.03
mean_leftlab 0.17 0.45 0.37 41.02 0.71
-----------------------------------------------------------
p values calculated using Kenward-Roger standard errors and d.f.
RANDOM EFFECTS:
------------------------------------
Group Parameter Std. Dev.
---------- ------------- -----------
country (Intercept) 0.87
Residual 2.07
------------------------------------
Grouping variables:
---------------------------
Group # groups ICC
--------- ---------- ------
country 14 0.15
---------------------------
Show the code
### Between effect of leftlab (deltamethod get std errors of the effect)# print("The mean effect of leftlab:")# car::deltaMethod(mod_rand_int_mean,# "b1+b2",# parameterNames = paste("b",# 0:2,# sep = ""))### Empirical bayes predictions of the mean in each groupBLUPres <-ranef(mod_rand_int_mean)BLUPres <- BLUPres[["country"]] %>%rownames_to_column("country") %>%transmute(country =as.numeric(country),BLUPres =`(Intercept)`)### Graph with predictionsoecd_df <- oecd_df %>%inner_join(BLUPres,by ="country") %>%mutate(betweenPred = mod_rand_int_mean@beta[1] + (mod_rand_int_mean@beta[2] + mod_rand_int_mean@beta[3])*mean_leftlab,withinPred = (mod_rand_int_mean@beta[1] + BLUPres) + (mod_rand_int_mean@beta[2]*leftlab))oecd_df %>%group_by(country) %>%arrange(yhat,.by_group =TRUE) %>%ungroup() %>%ggplot() +geom_line(aes(x = leftlab,y = withinPred,col ="withinPred",group = country),linewidth =1,alpha =0.5) +geom_line(aes(x = mean_leftlab,y = betweenPred,col ="betweenPred"),linewidth =2,alpha =0.8) +scale_color_manual(values =c("green", "grey40")) +labs(title ="Within and Between Predictions",subtitle ="Whaaaaat?!?!?!?",x ="Left-Labor Party Power",y ="")+ my_gg_theme +theme(legend.position ="bottom",legend.title =element_blank())
Mixed Linear Model Regression Results
========================================================
Model: MixedLM Dependent Variable: gdp
No. Observations: 350 Method: REML
No. Groups: 14 Scale: 4.3192
Min. group size: 25 Log-Likelihood: -768.2382
Max. group size: 25 Converged: Yes
Mean group size: 25.0
--------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975]
--------------------------------------------------------
Intercept 6.403 0.725 8.831 0.000 4.982 7.824
leftlab 0.191 0.175 1.090 0.276 -0.152 0.533
infl -0.176 0.031 -5.650 0.000 -0.236 -0.115
trade -0.038 0.011 -3.577 0.000 -0.059 -0.017
oild -13.338 5.817 -2.293 0.022 -24.738 -1.937
country Var 0.945 0.270
========================================================
Show the code
# Extract random intercepts for each country and convert to DataFramerandom_intercepts = pd.DataFrame.from_dict(model1.random_effects, orient='index', columns=['raneff'])random_intercepts.reset_index(inplace=True)random_intercepts.rename(columns={'index': 'country'}, inplace=True)# Calculate the predicted values (yhat)r.oecd_df['yhat'] = (model1.params[0] + random_intercepts['raneff']) + (model1.params[1] * r.oecd_df['leftlab'])# Sort the dataframe by yhat within each country groupoecd_df_sorted = r.oecd_df.sort_values(by=['country', 'yhat'])# Plot with seabornplt.figure(figsize=(8, 5))sns.lineplot(oecd_df_sorted, x='leftlab', y='yhat', hue='country_lab', palette='tab20', linewidth=1.1)plt.title("Random intercepts by country")plt.xlabel("Labor Party Power")plt.ylabel("Predicted level")plt.legend(title="", bbox_to_anchor=(1, 1), loc='upper left', frameon=False)plt.tight_layout() # Ensures all elements fit within the plot areaplt.show()
Question 3
Prompt:
Using the model estimated in Question #2, add a random effect to one of the independent variables of your choosing. Report the random effects component of the results in standard deviations. Interpret both the fixed and random effects results.
Here I added a random effect on leftlab. In short, in this mixed effects model, the random effects analysis suggests that there is variability in the baseline gdp values between different countries, as indicated by the standard deviation of the random intercept for country (1.34). Additionally, the standard deviation of the random effect for leftlab within country (0.49) suggests that the impact of leftlab on gdp varies across different countries.
Regarding fixed effects, the variables infl, trade, and oild show statistically significant effects on gdp, while leftlab does not have a statistically significant impact
Random Effects Component (in Standard Deviations):
Country Intercept: The estimated standard deviation of the random intercept for country is 1.34. This value represents the variability in the baseline gdp values among (or between) different countries. A larger standard deviation indicates higher variability between countries in their baseline gdp values. Given the scale of the data, this represents a pretty large variability.
Country Leftlab: The estimated standard deviation of the random effect for leftlab within country is 0.49 “units.” This value represents the variability in the impact of trade on gdp between different countries. That is, the impact of trade on gdp varies by 0.49 units.
Interpretation of Fixed Effects:
leftlab: The estimated coefficient for leftlab is 0.25. Its p-value (0.32) indicates that the effect of leftlab on gdp is not statistically significant at the conventional significance level (alpha = 0.05). Therefore, there is no strong evidence to suggest that the variable leftlab has a significant impact on gdp in this model.
infl: The estimated coefficient for infl is -0.18. It is statistically significant with a p-value less than 0.01. For each unit increase in infl, gdp is estimated to decrease by 0.18 units, holding the other variables constant.
trade: The estimated coefficient for trade is -0.03. It is statistically significant at the 0.001 level (p = 0.00). For each unit increase in trade, gdp is estimated to decrease by 0.03 units, holding other variables constant.
oild: The estimated coefficient for oild is -14.90. It has a p-value of 0.01. For each unit increase in oild, gdp is estimated to decrease by 14 units, holding other variables constant. There is likely a scaling issue with the data that should be addressed.
Inter-Country Variability (ICC):
In this model, the ICC for the country grouping variable is 0.30, which means that approximately 30% of the total variability in GDP can be attributed to differences between countries.
Random Intercept and Random Slope Model on Leftlab
MODEL INFO:
Observations: 350
Dependent Variable: gdp
Type: Mixed effects linear regression
MODEL FIT:
AIC = 1541.60, BIC = 1576.32
Pseudo-R² (fixed effects) = 0.20
Pseudo-R² (total) = 0.30
FIXED EFFECTS:
----------------------------------------------------------
Est. S.E. t val. d.f. p
----------------- -------- ------ -------- -------- ------
(Intercept) 5.71 0.77 7.44 16.08 0.00
leftlab 0.25 0.24 1.04 13.22 0.32
infl -0.18 0.03 -5.76 351.56 0.00
trade -0.03 0.01 -3.11 41.29 0.00
oild -14.90 5.89 -2.53 150.32 0.01
----------------------------------------------------------
p values calculated using Kenward-Roger standard errors and d.f.
RANDOM EFFECTS:
------------------------------------
Group Parameter Std. Dev.
---------- ------------- -----------
country (Intercept) 1.34
country leftlab 0.49
Residual 2.07
------------------------------------
Grouping variables:
---------------------------
Group # groups ICC
--------- ---------- ------
country 14 0.30
---------------------------
Estimate model (random intercept, random slope on leftlab:
Show the code
model2 = smf.\ mixedlm(formula, r.oecd_df, groups ="country", re_formula ="1 + leftlab").\ fit()# Estimate model (no random intercept) with random slope on leftlab would be:# mixedlm(formula, r.oecd_df, groups = "country", re_formula = "0 + leftlab")print(model2.summary())
Mixed Linear Model Regression Results
==================================================================
Model: MixedLM Dependent Variable: gdp
No. Observations: 350 Method: REML
No. Groups: 14 Scale: 4.2764
Min. group size: 25 Log-Likelihood: -766.9980
Max. group size: 25 Converged: Yes
Mean group size: 25.0
------------------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975]
------------------------------------------------------------------
Intercept 5.886 0.823 7.150 0.000 4.272 7.499
leftlab 0.282 0.238 1.188 0.235 -0.184 0.749
infl -0.181 0.031 -5.836 0.000 -0.241 -0.120
trade -0.032 0.012 -2.684 0.007 -0.055 -0.009
oild -13.790 6.036 -2.284 0.022 -25.620 -1.959
country Var 1.979 0.668
country x leftlab Cov -0.617 0.295
leftlab Var 0.263 0.132
==================================================================
Question 4
Prompt:
Conduct a likelihood ratio test to determine whether the model warrants a random slope effect. For R users, you will use the anova() function (make sure to specify the object name for the “restricted model” (random intercept) first, followed by the object name for the “unrestricted model” (random slope) second).
Does adding the random intercept help our analysis?
In short, based on the ANOVA test results, the difference in model fit between “reMod1” and “mod_rand_int” is not statistically significant (p = 0.2681). In other words, the inclusion of the random slope for leftlab in “mod_rand_int” does not significantly improve the model’s fit over “reMod1” in explaining the variance in the dependent variable “gdp.”
In addition, all other test statistics are consistent. For instance, both the AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion), measures of the models’ goodness-of-fit. For both AIC and BIC, lower score are better, and both point to the smaller model (model1).
A high p-value, here it’s close to 1, says we are not gaining any explanatory power by include the random slope.
Show the code
# get the log likelihoods for the two modelslog_likelihood_full = model1.llflog_likelihood_restricted = model2.llf# calc the likelihood ratio statisticlr_stat =2* (log_likelihood_full - log_likelihood_restricted)# Calc the p-valuep_value = stats.chi2.cdf(lr_stat, df=1)print(p_value)
0.0
Question 5
Prompt:
Explain the importance for estimating a multilevel model for nested data. What are the issues one may encounter if utilizing a pooled model?
Response: Using a multilevel model for nested data is essential to appropriately account for the hierarchical structure of the data and to obtain accurate and reliable estimates of fixed and random effects. Being from Las Vegas, I always love a backyard pool. However, utilizing a pooled model can lead to biased estimates, underestimated standard errors, and improper inference, making it inadequate for analyzing nested data.