Random Effects

Homework 1

Author

Michael L. Davies

Published

July 25, 2023

For this assignment, you will utilize the Garrett data available on the Canvas site. The Garrett data are employed in his book, Partisan Politics in the Global Economy (Cambridge University Press, 1998). With the exception of country, the variables are largely self-explanatory.

The country variable includes identifiers for 14 OECD countries.

For this assignment, we will examine whether power of labor parties (leftlab) influence economic growth, controlling for other economic factors. For this assignment, treat the data as repeated observations (time) nested within countries.

Important

I implemented all models in both R and Python (See respective tabs). However, the results are marginally different. Therefore, I base all interpretations on the results from R…cause R is just cooler!


Load and Clean Data (select variables)

Show the code
oecd_df <- read_dta('data/GarrettData.dta') %>% 
  select(!contains(c("Icc", "per"))) %>% 
  select(
    !c('clint', 'demand', 'corp', 'capmob', 'year', 'gdpl', 'unem')) %>%  
  mutate(country_lab = 
           fct_recode(as_factor(country),
                      US = "2", 
                      Canada = "20", 
                      UK = "200", 
                      Netherlands = "210", 
                      Belgium = "211", 
                      France = "220", 
                      Germany = "260", 
                      Austria = "305", 
                      Italy = "325", 
                      Finland = "375", 
                      Sweden = "380", 
                      Norway = "385", 
                      Denmark = "390", 
                      Japan = "740")) %>% 
  # mutate_at(vars, function(x) rescale(x,to = c(0,10)) ) %>% 
  group_by(country) %>% 
  mutate(mean_leftlab = mean(leftlab, na.rm = TRUE)) %>% 
  ungroup() 

oecd_df %>% 
  head() %>% 
  knitr::kable()
country infl gdp uneml infll trade oild leftlab country_lab mean_leftlab
2 2.9 5.111141 4.5 1.99005 9.622906 0.000675 1.3764145 US 0.7189868
2 2.8 2.277283 3.8 2.90000 9.983546 0.000590 1.1766782 US 0.7189868
2 4.2 4.700000 3.8 2.80000 10.089120 0.000662 1.1766782 US 0.7189868
2 5.4 2.800000 3.6 4.20000 10.435930 0.000720 0.4331716 US 0.7189868
2 5.9 -0.200000 3.5 5.40000 10.495350 0.000773 0.4331716 US 0.7189868
2 4.3 3.100000 4.9 5.90000 11.278270 0.001063 0.4840479 US 0.7189868

Question 1.

Prompt:

In a few sentences, theorize the relationship between labor party power and economic growth.

Response: Labor party power could influence economic growth through a range of policies, such as pro-worker reforms or redistributive measures, which may boost consumer spending and stimulate economic activity. However, the impact could differ depending on contextual factors, political institutions, and economic conditions in each region. By employing a random effects model, the analysis can account for country-specific variations and unobserved heterogeneity.

Question 2

Prompt:

Estimate a random intercepts model with economic development (gdp) as the dependent variable, leftlab as the independent variable, and include three economic-based variables. Report the random effects component of the results in standard deviations. Interpret both the fixed and random effects results.

FIXED EFFECT

The variable of interest is leftlab. Once controlling for inflation (lagged one year), openness to trade, and oil dependence, the changes in the power of left-labor parties no longer has a statistically significant association with changes in gdp (based on an exceedingly high p-value).

If, for the sake of exercise, we pretend that the power of left-labor parting was significant, we might say that a one unit increase in the power of left-labor parties, the effect of left-labor parties on gdp is 0.19 units controlling the the variables stated above. However, as stated earlier - this effect is NOT statistically significant at any of the excepted standards.

Notably, the remaining control variables do have statistically significant associations with changes in gdp.

RANDOM EFFECT (country)

The random effects represent the variation across different groups (countries) that are not explained by the fixed effects. The estimated standard deviation of the random intercept for “country” is 0.85. It quantifies the variability in the baseline “gdp” values among different countries. (NB: A larger value indicates higher variation between countries.

ICC represents the proportion of total variance in the dependent variable (in this case, gdp) that can be attributed to the variability between groups (countries) relative to the total variability. It is a measure of the similarity or homogeneity of observations within each group compared to the overall variability in the entire dataset.

In this specific model, the ICC for the random effect country is 0.14, which indicates that approximately 14% of the total variance in gdp can be attributed to differences between countries, while the remaining 86% is due to variation within individual countries.

Random Intercept Model

Here I conduct a simple random intercepts model, the focus of this assignment.

Show the code
## Example 4.2 
jtools::summ(
  mod_rand_int <- lmer(gdp ~ 
                   leftlab +
                   # uneml +
                   infl +
                   trade +
                   oild +
                   # mean_leftlab +
                   # 1 inicates allowing for random intercept
                   (1|country),
                 data = oecd_df,
                 REML = FALSE))
MODEL INFO:
Observations: 350
Dependent Variable: gdp
Type: Mixed effects linear regression 

MODEL FIT:
AIC = 1540.23, BIC = 1567.23
Pseudo-R² (fixed effects) = 0.23
Pseudo-R² (total) = 0.34 

FIXED EFFECTS:
----------------------------------------------------------
                      Est.   S.E.   t val.     d.f.      p
----------------- -------- ------ -------- -------- ------
(Intercept)           6.26   0.66     9.49    45.00   0.00
leftlab               0.19   0.18     1.07   200.04   0.29
infl                 -0.18   0.03    -5.63   355.77   0.00
trade                -0.04   0.01    -3.95    48.53   0.00
oild                -13.77   5.85    -2.36   220.08   0.02
----------------------------------------------------------

p values calculated using Kenward-Roger standard errors and d.f.

RANDOM EFFECTS:
------------------------------------
  Group      Parameter    Std. Dev. 
---------- ------------- -----------
 country    (Intercept)     0.85    
 Residual                   2.07    
------------------------------------

Grouping variables:
---------------------------
  Group    # groups   ICC  
--------- ---------- ------
 country      14      0.14 
---------------------------
Show the code
## Figure 4.2 - specific regression lines

res_oecd <- ranef(mod_rand_int)

res_oecd <- res_oecd[["country"]] %>%
  rownames_to_column("country") %>%
  transmute(country = as.numeric(country),
            raneff = `(Intercept)`)

oecd_df <- oecd_df %>%
  inner_join(res_oecd,
             by = "country") %>%
  mutate(yhat = (mod_rand_int@beta[1] + raneff) +
           (mod_rand_int@beta[2]*leftlab))

oecd_df %>% 
  group_by(country) %>%
  arrange(yhat,
          .by_group = TRUE) %>%
  ungroup() %>% 
  ggplot(aes(x = leftlab,
             y = yhat,
             col = country_lab)) +
  geom_line(linewidth = 1.1) +
  labs(title = "Random intercepts by country",
       y = 'Predicted level',
       x = "Labor Party Power",
       col = '') +
  my_gg_theme +
  theme(
    legend.position = 'right',
    legend.key = element_rect(
      fill = "white", colour = "white"))

Provocative Findings

This is just for practice…

In the given model, leftlab itself may be considered a within level (fixed effect) variable, as it is likely to vary within each country, and each country may have its own relationship between leftlab and gdp. On the other hand, mean_leftlab is a between level (fixed effect) variable as it represents an aggregate measure of leftlab across all countries and has the same value for all observations within the same country. In other words, it does not vary within each country; instead, it captures the overall variation of leftlab between different countries.

Notably, neither leftlab nor mean_leftlab were statistically significant.

If we pretend for a second that these predictors are significant, it would be interesting to notice that the between level predicted line has a radically different slope than the within predictions.

Show the code
## Example 4.3 - Within and Between Group Regressions 

jtools::summ(
  mod_rand_int_mean <- 
    lmer(gdp ~ 
           leftlab +
           # uneml +
           infl +
           trade +
           oild +
           mean_leftlab +
           (1|country),
         data = oecd_df,
         REML = FALSE))
MODEL INFO:
Observations: 350
Dependent Variable: gdp
Type: Mixed effects linear regression 

MODEL FIT:
AIC = 1542.06, BIC = 1572.92
Pseudo-R² (fixed effects) = 0.23
Pseudo-R² (total) = 0.35 

FIXED EFFECTS:
-----------------------------------------------------------
                       Est.   S.E.   t val.     d.f.      p
------------------ -------- ------ -------- -------- ------
(Intercept)            6.09   0.88     6.96    25.68   0.00
leftlab                0.15   0.20     0.74   340.64   0.46
infl                  -0.18   0.03    -5.61   357.98   0.00
trade                 -0.04   0.01    -3.74    60.01   0.00
oild                 -13.34   6.00    -2.22   239.94   0.03
mean_leftlab           0.17   0.45     0.37    41.02   0.71
-----------------------------------------------------------

p values calculated using Kenward-Roger standard errors and d.f.

RANDOM EFFECTS:
------------------------------------
  Group      Parameter    Std. Dev. 
---------- ------------- -----------
 country    (Intercept)     0.87    
 Residual                   2.07    
------------------------------------

Grouping variables:
---------------------------
  Group    # groups   ICC  
--------- ---------- ------
 country      14      0.15 
---------------------------
Show the code
### Between effect of leftlab (deltamethod get std errors of the effect)
# print("The mean effect of leftlab:")
# car::deltaMethod(mod_rand_int_mean,
#                  "b1+b2",
#                  parameterNames = paste("b",
#                                         0:2,
#                                         sep = ""))

### Empirical bayes predictions of the mean in each group

BLUPres <- ranef(mod_rand_int_mean)

BLUPres <- BLUPres[["country"]] %>%
  rownames_to_column("country") %>%
  transmute(country = as.numeric(country),
            BLUPres = `(Intercept)`)

### Graph with predictions

oecd_df <- oecd_df %>%
  inner_join(BLUPres,
             by = "country") %>%
  mutate(betweenPred = mod_rand_int_mean@beta[1] +
           (mod_rand_int_mean@beta[2] + mod_rand_int_mean@beta[3])*mean_leftlab,
         withinPred = (mod_rand_int_mean@beta[1] + BLUPres) +
           (mod_rand_int_mean@beta[2]*leftlab))

oecd_df %>%
  group_by(country) %>%
  arrange(yhat,
          .by_group = TRUE) %>%
  ungroup() %>% 
  ggplot() +
  geom_line(aes(x = leftlab,
                y = withinPred,
                col = "withinPred",
                group = country),
            linewidth = 1,
            alpha = 0.5) +
  geom_line(aes(x = mean_leftlab,
                y = betweenPred,
                col = "betweenPred"),
            linewidth = 2,
            alpha = 0.8) +
  scale_color_manual(values = c("green", "grey40")) +
  labs(
    title = "Within and Between Predictions",
    subtitle = "Whaaaaat?!?!?!?",
    x = "Left-Labor Party Power",
    y = "")+
  my_gg_theme +
  theme(legend.position = "bottom",
        legend.title = element_blank())

Estimate random intercept model:

Show the code
# Estimate random intercept model:
formula = "gdp ~ + leftlab + infl + trade + oild"

model1 = smf.\
    mixedlm(formula, r.oecd_df, groups = "country").\
    fit()

print(model1.summary())
         Mixed Linear Model Regression Results
========================================================
Model:             MixedLM Dependent Variable: gdp      
No. Observations:  350     Method:             REML     
No. Groups:        14      Scale:              4.3192   
Min. group size:   25      Log-Likelihood:     -768.2382
Max. group size:   25      Converged:          Yes      
Mean group size:   25.0                                 
--------------------------------------------------------
             Coef.  Std.Err.   z    P>|z|  [0.025 0.975]
--------------------------------------------------------
Intercept     6.403    0.725  8.831 0.000   4.982  7.824
leftlab       0.191    0.175  1.090 0.276  -0.152  0.533
infl         -0.176    0.031 -5.650 0.000  -0.236 -0.115
trade        -0.038    0.011 -3.577 0.000  -0.059 -0.017
oild        -13.338    5.817 -2.293 0.022 -24.738 -1.937
country Var   0.945    0.270                            
========================================================
Show the code
# Extract random intercepts for each country and convert to DataFrame
random_intercepts = pd.DataFrame.from_dict(model1.random_effects, 
    orient='index',  
    columns=['raneff'])
random_intercepts.reset_index(inplace=True)
random_intercepts.rename(columns={'index': 'country'}, inplace=True)

# Calculate the predicted values (yhat)
r.oecd_df['yhat'] = (model1.params[0] + random_intercepts['raneff']) + (model1.params[1] * r.oecd_df['leftlab'])

# Sort the dataframe by yhat within each country group
oecd_df_sorted = r.oecd_df.sort_values(by=['country', 'yhat'])

# Plot with seaborn
plt.figure(figsize=(8, 5))
sns.lineplot(oecd_df_sorted, 
    x='leftlab', 
    y='yhat', 
    hue='country_lab', 
    palette='tab20', 
    linewidth=1.1)
    
plt.title("Random intercepts by country")
plt.xlabel("Labor Party Power")
plt.ylabel("Predicted level")
plt.legend(title="", 
    bbox_to_anchor=(1, 1), 
    loc='upper left', 
    frameon=False)
plt.tight_layout()  # Ensures all elements fit within the plot area
plt.show()

Question 3

Prompt:

Using the model estimated in Question #2, add a random effect to one of the independent variables of your choosing. Report the random effects component of the results in standard deviations. Interpret both the fixed and random effects results.

Here I added a random effect on leftlab. In short, in this mixed effects model, the random effects analysis suggests that there is variability in the baseline gdp values between different countries, as indicated by the standard deviation of the random intercept for country (1.34). Additionally, the standard deviation of the random effect for leftlab within country (0.49) suggests that the impact of leftlab on gdp varies across different countries.

Regarding fixed effects, the variables infl, trade, and oild show statistically significant effects on gdp, while leftlab does not have a statistically significant impact

Random Effects Component (in Standard Deviations):

Country Intercept: The estimated standard deviation of the random intercept for country is 1.34. This value represents the variability in the baseline gdp values among (or between) different countries. A larger standard deviation indicates higher variability between countries in their baseline gdp values. Given the scale of the data, this represents a pretty large variability.

Country Leftlab: The estimated standard deviation of the random effect for leftlab within country is 0.49 “units.” This value represents the variability in the impact of trade on gdp between different countries. That is, the impact of trade on gdp varies by 0.49 units.

Interpretation of Fixed Effects:

leftlab: The estimated coefficient for leftlab is 0.25. Its p-value (0.32) indicates that the effect of leftlab on gdp is not statistically significant at the conventional significance level (alpha = 0.05). Therefore, there is no strong evidence to suggest that the variable leftlab has a significant impact on gdp in this model.

infl: The estimated coefficient for infl is -0.18. It is statistically significant with a p-value less than 0.01. For each unit increase in infl, gdp is estimated to decrease by 0.18 units, holding the other variables constant.

trade: The estimated coefficient for trade is -0.03. It is statistically significant at the 0.001 level (p = 0.00). For each unit increase in trade, gdp is estimated to decrease by 0.03 units, holding other variables constant.

oild: The estimated coefficient for oild is -14.90. It has a p-value of 0.01. For each unit increase in oild, gdp is estimated to decrease by 14 units, holding other variables constant. There is likely a scaling issue with the data that should be addressed.

Inter-Country Variability (ICC):

In this model, the ICC for the country grouping variable is 0.30, which means that approximately 30% of the total variability in GDP can be attributed to differences between countries.

Random Intercept and Random Slope Model on Leftlab

Show the code
jtools::summ(
  reMod4 <- lmer(gdp ~ 
                   leftlab +
                   infl +
                   trade +
                   oild +
                   #mean_leftlab +
                   (1 + leftlab|country),
                 data = oecd_df,
                 REML = FALSE))
MODEL INFO:
Observations: 350
Dependent Variable: gdp
Type: Mixed effects linear regression 

MODEL FIT:
AIC = 1541.60, BIC = 1576.32
Pseudo-R² (fixed effects) = 0.20
Pseudo-R² (total) = 0.30 

FIXED EFFECTS:
----------------------------------------------------------
                      Est.   S.E.   t val.     d.f.      p
----------------- -------- ------ -------- -------- ------
(Intercept)           5.71   0.77     7.44    16.08   0.00
leftlab               0.25   0.24     1.04    13.22   0.32
infl                 -0.18   0.03    -5.76   351.56   0.00
trade                -0.03   0.01    -3.11    41.29   0.00
oild                -14.90   5.89    -2.53   150.32   0.01
----------------------------------------------------------

p values calculated using Kenward-Roger standard errors and d.f.

RANDOM EFFECTS:
------------------------------------
  Group      Parameter    Std. Dev. 
---------- ------------- -----------
 country    (Intercept)     1.34    
 country      leftlab       0.49    
 Residual                   2.07    
------------------------------------

Grouping variables:
---------------------------
  Group    # groups   ICC  
--------- ---------- ------
 country      14      0.30 
---------------------------

Estimate model (random intercept, random slope on leftlab:

Show the code
model2 = smf.\
    mixedlm(formula, r.oecd_df, groups = "country", re_formula = "1 + leftlab").\
    fit()

# Estimate model (no random intercept) with random slope on leftlab would be:
# mixedlm(formula, r.oecd_df, groups = "country", re_formula = "0 + leftlab")

print(model2.summary())
              Mixed Linear Model Regression Results
==================================================================
Model:                 MixedLM    Dependent Variable:    gdp      
No. Observations:      350        Method:                REML     
No. Groups:            14         Scale:                 4.2764   
Min. group size:       25         Log-Likelihood:        -766.9980
Max. group size:       25         Converged:             Yes      
Mean group size:       25.0                                       
------------------------------------------------------------------
                       Coef.  Std.Err.   z    P>|z|  [0.025 0.975]
------------------------------------------------------------------
Intercept               5.886    0.823  7.150 0.000   4.272  7.499
leftlab                 0.282    0.238  1.188 0.235  -0.184  0.749
infl                   -0.181    0.031 -5.836 0.000  -0.241 -0.120
trade                  -0.032    0.012 -2.684 0.007  -0.055 -0.009
oild                  -13.790    6.036 -2.284 0.022 -25.620 -1.959
country Var             1.979    0.668                            
country x leftlab Cov  -0.617    0.295                            
leftlab Var             0.263    0.132                            
==================================================================

Question 4

Prompt:

Conduct a likelihood ratio test to determine whether the model warrants a random slope effect. For R users, you will use the anova() function (make sure to specify the object name for the “restricted model” (random intercept) first, followed by the object name for the “unrestricted model” (random slope) second).

Does adding the random intercept help our analysis?

In short, based on the ANOVA test results, the difference in model fit between “reMod1” and “mod_rand_int” is not statistically significant (p = 0.2681). In other words, the inclusion of the random slope for leftlab in “mod_rand_int” does not significantly improve the model’s fit over “reMod1” in explaining the variance in the dependent variable “gdp.”

In addition, all other test statistics are consistent. For instance, both the AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion), measures of the models’ goodness-of-fit. For both AIC and BIC, lower score are better, and both point to the smaller model (model1).

Show the code
anova(mod_rand_int,
      mod_rand_int_mean,
      reMod4)
Data: oecd_df
Models:
mod_rand_int: gdp ~ leftlab + infl + trade + oild + (1 | country)
mod_rand_int_mean: gdp ~ leftlab + infl + trade + oild + mean_leftlab + (1 | country)
reMod4: gdp ~ leftlab + infl + trade + oild + (1 + leftlab | country)
                  npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)
mod_rand_int         7 1540.2 1567.2 -763.11   1526.2                     
mod_rand_int_mean    8 1542.1 1572.9 -763.03   1526.1 0.1718  1     0.6785
reMod4               9 1541.6 1576.3 -761.80   1523.6 2.4608  1     0.1167

A high p-value, here it’s close to 1, says we are not gaining any explanatory power by include the random slope.

Show the code
# get the log likelihoods for the two models
log_likelihood_full = model1.llf
log_likelihood_restricted = model2.llf

# calc the likelihood ratio statistic
lr_stat = 2 * (log_likelihood_full - log_likelihood_restricted)

# Calc the p-value
p_value = stats.chi2.cdf(lr_stat, df=1)

print(p_value)
0.0

Question 5

Prompt:

Explain the importance for estimating a multilevel model for nested data. What are the issues one may encounter if utilizing a pooled model?

Response: Using a multilevel model for nested data is essential to appropriately account for the hierarchical structure of the data and to obtain accurate and reliable estimates of fixed and random effects. Being from Las Vegas, I always love a backyard pool. However, utilizing a pooled model can lead to biased estimates, underestimated standard errors, and improper inference, making it inadequate for analyzing nested data.


Simple OLS

\(Y = B_0 + B_1X_1 + e\)

Fixed Effects Model (group-specific intercepts, fixed slopes)

\(Y_{it} = B_{X_{it}} + a_i\)

Random Intercept Model

\(Y_j = B_{0j} + B_1X_1 + e + u_{0j}\)

Random Slope Model

\(Y_{ij} = B_{0i} + B_{1ij}X_{1ij} + e + u_{1j}\)

Random Slope and Intercept Model

\(Y_{ij} = B_{0j} + B_1X_{1ij} + e + u_{0j} + u_{1j}\)

```{r}

# Simple OLS
lm(dep_var ~ indep_var1 + indep_var2, data = my_data)

# Fixed effects (fixed slopes, group-specific via dummifying the group)
lmer(dep_var ~ indep_var1 + indep_var2 + (as.factor(group_var)), data = my_data)

# Random Intercept (with fixed slopes)
lmer(depe_var ~ indep_var1 + indep_var2 + (1 | group_var), data = my_data)

# Random Slope (fixed intercept) - nonsensical 
lmer(dep_var ~ indep_var1 + indep_var2 + (0 + indep_var1 | group_var), data = my_data)

# Random Slope and Random Intercept
lmer(dep_var ~ indep_var1 + indep_var2 + (1 + indep_var1 | group_var), data = my_data)

```
```{python}

# Simple OLS
lm("dep_var ~ indep_var1 + indep_var2", my_data)

# Fixed effects (fixed slopes, group-specific intercepts represented)
import statsmodels.formula.api as smf
smf.mixedlm("dep_var ~ indep_var1 + indep_var2", my_data, groups="group_var")

# Random Intercept
smf.mixedlm("dep_var ~ indep_var1 + indep_var2", my_data, groups="group_var")

# Random Slope on indep_var1 (fixed intercept)
smf.mixedlm("dep_var ~ indep_var1 + indep_var2", my_data, groups = "group_var", re_formula = "0 + indep_var1")

# Random Slope and Intercept
smf.mixedlm("dep_var ~ indep_var1 + indep_var2", my_data, groups="group_var", re_formula = "1 + indep_var1")
```

AND, what is the role of REML?

Restricted Maximum Likelihood - use with small data. Account for the loss of information as the number of parameters (and dof) grows large.

```{r}

lmer(langpost ~ 
       iq_verb + 
       mean_iqverb + 
       (1|schoolnr),
     data = dataSchools,
     REML = FALSE)
```