Packages

Loading relevant packages:

library(lme4) # Hierarchical models
library(stargazer) # Regression tables
library(dplyr) # Data wrangling and pipes
library(ggplot2) # Graphics
library(ggeffects) # ggpredict

Data

Loading the relevant data:

load('/Users/davidhamad/Documents/Cand.Scient.Pol/2. Semester/Statistical models beyond linear regression - applied statistics for political scientists/4-5) Hierarchical data structures/MEP.rda')

df <- MEP

Exercise 1: Descriptive statistics

Q1: What is the size of the data? How many individual MEPs are there in the data set? How often are they observed?

A1: 7143 observations, 116 variables, 1174 unique MEPs, and 10 periods.

nrow(df) # Observations
[1] 7143
ncol(df) # Variables
[1] 116
n_distinct(df$ID) # Unique MEPs
[1] 1174
n_distinct(df$Period) # Periods
[1] 10
table(table(df$ID)) # Distribution

  1   2   3   4   5   6   7   8   9  10 
 41  27  21  33 711  11  14   8  10 298 

Q2: How is the relevant variation here? How are the variables measured? What is the within-group and between group variation?

A2: The dependent variable (ShareOfLocalAssistants) is continuous (0 to 43). ProxNatElection is how close we are to the next national parlament election (-4 to 0), and the variable is continuous. EPElection is whether it is the period with a European parlament election or not (no/yes = 0/1), which makes the variable binary. OpenList is also binary, 0 is closed list (party-centered), and 1 is open list (candidate-centered). NationalPartyCentered is binary because 0 is more candidate-centered, while 1 is more party-centered in regards to the national parlament election. The electoral calendar variables (ProxNatElection & EPElection) have within-individual variation because they change over time for the same MEP. The electoral system variables (OpenList & NationalPartyCentered) have only between-individual variation because they never change for the same MEP, since no country changes its electoral system during the study period. The dependent variable (ShareOfLocalAssistants) has both within-individual variation and between-individual variation.

summary(df$ShareOfLocalAssistants) # Continuous
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   1.000   2.000   2.493   3.000  43.000 
summary(df$ProxNatElection) # Continuous
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-4.00000 -3.27671 -2.17260 -2.14068 -1.04658 -0.06575 
table(df$EPElection) # Binary

   0    1 
6404  739 
table(df$OpenList) # Binary

   0    1 
3840 3303 
table(df$NationalPartyCentered) # Binary

   0    1 
2952 4191 
# Within-individual variation. How many MEPs experience change?
df %>%
  group_by(ID) %>%
  summarise(
    change_y    = n_distinct(ShareOfLocalAssistants) > 1,
    change_prox = n_distinct(ProxNatElection) > 1,
    change_ep   = n_distinct(EPElection) > 1,
    change_open = n_distinct(OpenList) > 1,
    change_npc  = n_distinct(NationalPartyCentered) > 1
  ) %>%
  summarise(
    y    = paste0(sum(change_y), " / ", n()),
    prox = paste0(sum(change_prox), " / ", n()),
    ep   = paste0(sum(change_ep), " / ", n()),
    open = paste0(sum(change_open), " / ", n()),
    npc  = paste0(sum(change_npc), " / ", n())
  )

Q3: How could I model this? What would be the advantages and drawbacks? I.e. what variation would I leverage?

A3: Pooled OLS treats all 7143 observations as independent. Simple, and can estimate the effect of all variables. But the independence assumption is violated. The same MEP appears up to 10 times. This means standard errors will be too small, making us overconfident in our results. Fixed effects (individual level) adds a dummy for each MEP, leveraging only within-individual variation. The advantage is strong causal identification because it controls for everything that is constant about each MEP. Drawback is that OpenList and NationalPartyCentered would be dropped from the model because they have zero within-individual variation. So we cannot test hypothesis 1 (personal vote) with this model. Random effects/hierarchical model (varying intercepts) pools between- and within-individual variation using shrinkage. The advantage is that it can estimate effects of both electoral calendar (within) and electoral system (between), and corrects standard errors for the clustered data structure. Drawback is that it assumes that the group-level effects are uncorrelated with the predictors, which is a stronger assumption than fixed effects. Since we want to test both hypotheses, the random effects model is the most suitable choice here.


Exercise 2: Choice of covariates

Q1: I want to model the change in staff size as a function of the electoral calendar. How can I do this?

A1: To model the change in staff size, I need to leverage within-individual variation by using a varying-intercept model at the individual level (either fixed effects or random effects). This controls for everything constant about each MEP and isolates how their staff size changes over time as a function of the electoral calendar.


Q2: I’m considering a fixed-effects model at the individual level. Please advise me on the following covariates: electoral calendar, electoral system, labor cost, gender, age, and lag of the dependent variable.

A2: Electoral calendar can be estimated. They vary over time within each MEP. These are the main predictors of interest. Electoral system would be dropped. They have zero within-individual variation because no MEP changes electoral system. The individual fixed effect absorbs them completely. LaborCost survives technically. But it is primarily a between-country variable, and since individual fixed effects already absorb nationality, most of its explanatory power is already controlled away. Gender would be dropped. No MEP changes gender. Zero within-variation. Age survives. But it increases at the exact same rate for everyone, which makes it nearly collinear with period effects. The dependent variable can be estimated. It is the lagged dependent variable, meaning last period’s staff size predicts this period’s. It controls for persistence in staffing decisions.


Q3: Now, I’m considering a fixed-effect on time-period. Would this be a good idea? What would be the effects of my two election variables? (EPElection & ProxNatElection).

A3: Adding period fixed effects would not be a good idea if we want to estimate the effect of EPElection, because it would be perfectly collinear with the period dummies and therefore dropped. ProxNatElection would survive because it varies across countries within the same period (different countries hold national elections at different times). So period fixed effects would prevent us from testing the EP election hypothesis.


Q4: How do you think the random-effects model would perform here?

A4: The random effects model would perform well here because it can estimate all our variables. The within-individual variables (ProxNatElection & EPElection) are estimated using within-variation, while the between-individual variables (OpenList & NationalPartyCentered) are estimated using between-variation. It also corrects the standard errors for the clustered data structure (same MEP observed multiple times).


Exercise 3: Fit and interpret the model

Q1: Fit the following model as a pooled linear regression, fixed-effects model and a random effects model (with varying individual intercepts): ShareOfLocalAssistants ~ ProxNatElection + NationalCandidateCentered + EPElection + OpenList + ShareOfLocal.lag. Present the results in stargazer().

A1: See the models below:

# Pooled OLS
mod.pool <- lm(ShareOfLocalAssistants ~ ProxNatElection + NationalCandidateCentered + EPElection + OpenList + ShareOfLocal.lag, data = df)

# Fixed effects (individual level)
mod.fe <- lm(ShareOfLocalAssistants ~ ProxNatElection + NationalCandidateCentered + EPElection + OpenList + ShareOfLocal.lag + as.factor(ID), data = df)

# Random effects (varying individual intercepts)
mod.re <- lmer(ShareOfLocalAssistants ~ ProxNatElection + NationalCandidateCentered + EPElection + OpenList + ShareOfLocal.lag + (1|ID), data = df)

# Compare
stargazer(mod.pool, mod.fe, mod.re, 
          type = "text", 
          omit = "as.factor",
          column.labels = c("Pooled", "Fixed effects", "Random effects"))

================================================================================================
                                                   Dependent variable:                          
                          ----------------------------------------------------------------------
                                                  ShareOfLocalAssistants                        
                                                    OLS                               linear    
                                                                                  mixed-effects 
                                    Pooled                   Fixed effects        Random effects
                                      (1)                         (2)                  (3)      
------------------------------------------------------------------------------------------------
ProxNatElection                     0.035**                     0.032**              0.034**    
                                    (0.014)                     (0.014)              (0.014)    
                                                                                                
NationalCandidateCentered          0.177***                     -0.083               0.360***   
                                    (0.038)                     (1.330)              (0.066)    
                                                                                                
EPElection                         0.298***                    0.378***              0.355***   
                                    (0.059)                     (0.052)              (0.054)    
                                                                                                
OpenList                           0.213***                      0.795               0.360***   
                                    (0.036)                     (1.793)              (0.063)    
                                                                                                
ShareOfLocal.lag                   0.838***                    0.418***              0.636***   
                                    (0.006)                     (0.009)              (0.008)    
                                                                                                
Constant                           0.400***                      0.282               0.679***   
                                    (0.049)                     (1.268)              (0.070)    
                                                                                                
------------------------------------------------------------------------------------------------
Observations                         7,047                       7,047                7,047     
R2                                   0.727                       0.834                          
Adjusted R2                          0.726                       0.802                          
Log Likelihood                                                                     -12,742.830  
Akaike Inf. Crit.                                                                   25,501.660  
Bayesian Inf. Crit.                                                                 25,556.540  
Residual Std. Error            1.489 (df = 7041)           1.268 (df = 5894)                    
F Statistic               3,741.136*** (df = 5; 7041) 25.709*** (df = 1152; 5894)               
================================================================================================
Note:                                                                *p<0.1; **p<0.05; ***p<0.01

Q2: What do you find? What happened?

A2: The fixed effects model cannot identify NationalCandidateCentered and OpenList because they have no within-individual variation. The random effects model solves this by pooling between- and within-variation, estimating all variables with reasonable precision.


Q3: Discuss my choice of variables.

A3: ProxNatElection og EPElection are good choices. They directly test hypothesis 2 (electoral calendar). Both have within-individual variation, so they can be estimated in all model types. NationalCandidateCentered og OpenList are good choices for testing hypothesis 1 (personal vote). They capture electoral system incentives at two levels (national and EU). However, they only have between-individual variation, which means they require a random effects model. ShareOfLocal.lag is a useful control variable. It accounts for persistence in staffing decisions, so the model estimates the effect of the other variables on changes in staff size rather than the level. However, including a lagged dependent variable in a random effects model can introduce bias, because the lag is correlated with the random intercept by construction. A MEP with a high intercept will also tend to have a high lagged value. What is missing might be the model does not control for time-varying confounders like Reform2016 (the spending cap) that we know affected local staff size for all MEPs. Including it could improve the estimates of the other variables.


Exercise 4: Interpretation

Context: Interpret the results from the random-effects model. What is the effect of the two measures of the electoral calendar and the electoral system?

Q1: Create two scenarios and interpret either the marginal effect or the first-difference. Justify your choice.

A1: Marginal effect is the change in y for a one-unit increase in x. First-difference is the change in y for a specific, chosen change in x. I use first-differences because the electoral system variables are binary (the only meaningful change is from 0 to 1), and for ProxNatElection a first-difference based on the full range gives a more substantively interpretable result than a one-unit marginal effect. In scenario 1 (electoral calender) When a MEP moves from the furthest point from a national election (ProxNatElection = -4) to immediately before an election (ProxNatElection = 0), the predicted increase in local staff is 0,034 × 4 = 0,135. In other words, roughly 1 in 7 MEPs would hire one additional local assistant as the election approaches. In scenario 2 (electoral system) MEPs in candidate-centered national electoral systems have on average 0.36 more local assistants than MEPs in party-centered systems. In other words, roughly 1 in 3 MEPs from candidate-centered systems hires one additional local assistant compared to their party-centered counterparts.

# Scenario 1: Electoral calendar (ProxNatElection)
# A MEP goes from far from election (-4) to close to election (0)
# Holding all other variables constant
fd_prox <- fixef(mod.re)["ProxNatElection"] * (0 - (-4))
fd_prox
ProxNatElection 
      0.1351446 
# Scenario 2: Electoral system (NationalCandidateCentered)
# Comparing a MEP in a party-centered system (0) vs candidate-centered system (1)
# Holding all other variables constant
fd_ncc <- fixef(mod.re)["NationalCandidateCentered"] * (1 - 0)
fd_ncc
NationalCandidateCentered 
                0.3604779 

Q2: Descriptive statistics from a survey conducted among MEPs shows that 31 % claim that they envision staying in Parliament for 10 or more years (i.e. they will seek reelection). How does this change your understanding of the results?

A2: Our model estimates the average effect across all MEPs. But the theory assumes that MEPs respond to electoral incentives, which only makes sense for MEPs who actually plan to seek reelection. If only 31 % of MEPs intend to stay, then 69 % have no reason to respond to the electoral calendar or invest in personal vote cultivation. These 69 % are likely not changing their staff at all, which dilutes our estimates toward zero. The true effect for the 31 % who actually seek reelection is probably much larger than what we estimate. This means our results are conservative, and the electoral incentives for those who care about reelection are likely stronger than the model suggests.


Q3: Illustrate the effects. What plot would you opt for?

A3: I would opt for effect plots using ggpredict() from the ggeffects package. For ProxNatElection (continuous), an effect plot shows the predicted staff size across the full range of the variable, with a confidence interval. For the binary variables (EPElection, NationalCandidateCentered, OpenList), the same type of plot shows the predicted value at 0 and 1.

# Effect of ProxNatElection (continuous)
ggpredict(mod.re, terms = "ProxNatElection") %>%
  plot() +
  ggtitle("Effect of proximity to national election on local staff size")

# Effect of EPElection (binary)
ggpredict(mod.re, terms = "EPElection") %>%
  plot() +
  ggtitle("Effect of EP election on local staff size")

# Effect of NationalCandidateCentered (binary)
ggpredict(mod.re, terms = "NationalCandidateCentered") %>%
  plot() +
  ggtitle("Effect of candidate-centered system on local staff size")

# Effect of OpenList (binary)
ggpredict(mod.re, terms = "OpenList") %>%
  plot() +
  ggtitle("Effect of open list system on local staff size")


Q4: In your opinion, are the two hypotheses supported? How big are the electoral incentives?

A4: Both hypotheses find some support, but the effects are substantively small. Hypothesis 1 (personal vote) is supported. MEPs in candidate-centered systems (NationalCandidateCentered) have 0,36 more local assistants, and MEPs with open lists (OpenList) also have 0,36 more. Both are statistically significant. But substantively, this means roughly 1 in 3 MEPs hires one additional assistant due to electoral system incentives. Hypothesis 2 (electoral calendar) is supported, but weak. Moving from the furthest point to immediately before a national election increases staff by only 0,135 (roughly 1 in 7 MEPs hiring one extra assistant). EP elections have a larger effect (0,355), comparable to the electoral system variables. Overall the electoral incentives exist, but they are modest. MEPs do appear to use parliamentary resources for electoral purposes, but the magnitude is small. Combined with the fact that only 31 % of MEPs plan to seek reelection, the true effect among those who actually respond to electoral incentives is likely larger than what we estimate here.


Exercise 5: Interaction effect

Context: I have a third hypothesis: I believe MEPs with higher incentives to cultivate a personal vote are more sensitive to the electoral calendar.

Q1 + Q2: How could I model this Can you implement your suggestion using a random-intercept model??

A1 + A2: This hypothesis says that the effect of the electoral calendar depends on the electoral system. That is an interaction effect. I would include an interaction term between ProxNatElection and NationalCandidateCentered.

mod.int <- lmer(ShareOfLocalAssistants ~ ProxNatElection * NationalCandidateCentered + EPElection + OpenList + ShareOfLocal.lag + (1|ID), data = df)
summary(mod.int)
Linear mixed model fit by REML ['lmerMod']
Formula: ShareOfLocalAssistants ~ ProxNatElection * NationalCandidateCentered +  
    EPElection + OpenList + ShareOfLocal.lag + (1 | ID)
   Data: df

REML criterion at convergence: 25482

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-19.0678  -0.3163  -0.1004   0.2481  21.4927 

Random effects:
 Groups   Name        Variance Std.Dev.
 ID       (Intercept) 0.7981   0.8934  
 Residual             1.7579   1.3259  
Number of obs: 7047, groups:  ID, 1150

Fixed effects:
                                           Estimate Std. Error t value
(Intercept)                                0.563795   0.079659   7.078
ProxNatElection                           -0.016548   0.021642  -0.765
NationalCandidateCentered                  0.541118   0.089391   6.053
EPElection                                 0.346492   0.053696   6.453
OpenList                                   0.356806   0.063068   5.657
ShareOfLocal.lag                           0.639107   0.007892  80.979
ProxNatElection:NationalCandidateCentered  0.084216   0.028004   3.007

Correlation of Fixed Effects:
            (Intr) PrxNtE NtnlCC EPElct OpnLst ShrOL.
ProxNtElctn  0.607                                   
NtnlCnddtCn -0.727 -0.544                            
EPElection  -0.026  0.063 -0.036                     
OpenList    -0.410  0.012  0.067 -0.003              
ShrOfLcl.lg -0.151 -0.024 -0.052 -0.013 -0.084       
PrxNtEl:NCC -0.473 -0.774  0.678 -0.051 -0.013  0.046

Q3: Can you interpret the results following the Berry et al.s recommendation? What is the effect of: national electoral calendar when electoral system is candidate-centered, national electoral calendar when electoral system is party-centered, national electoral system when electoins are far away, and national electoral system when electons are tomorrow?

A3: Following Berry et al., I compute the conditional effects at meaningful values rather than interpreting the interaction term alone. Effect of electoral calendar when candidate-centered (NCC = 1): 0,068 per unit. MEPs in candidate-centered systems hire more staff as elections approach. Effect of electoral calendar when party-centered (NCC = 0): -0,017 per unit. Close to zero and not significant. MEPs in party-centered systems do not respond to the electoral calendar. Effect of electoral system when elections are far away (Prox = -4): 0,204. A small difference between the two systems when there is no electoral pressure. Effect of electoral system when elections are tomorrow (Prox = 0): 0,541. A much larger difference. Candidate-centered MEPs have about half an assistant more than party-centered MEPs right before an election. The pattern is clear. The two systems only diverge when elections are near. When elections are far away, it does not matter much whether the system is candidate-centered or party-centered. This supports hypothesis 3.

# Compute conditional effects
b_prox <- fixef(mod.int)["ProxNatElection"]
# Compute conditional effects
b_ncc  <- fixef(mod.int)["NationalCandidateCentered"]
# Compute conditional effects
b_int  <- fixef(mod.int)["ProxNatElection:NationalCandidateCentered"]
# Effect of calendar when candidate-centered
b_prox + b_int * 1
ProxNatElection 
     0.06766755 
# Effect of calendar when party-centered
b_prox + b_int * 0
ProxNatElection 
    -0.01654839 
# Effect of system when elections far away (Prox = -4)
b_ncc + b_int * (-4)
NationalCandidateCentered 
                0.2042542 
# Effect of system when elections tomorrow (Prox = 0)
b_ncc + b_int * 0
NationalCandidateCentered 
                 0.541118 
---
title: "Hierarchical modeling"
output: html_notebook
---

```{css, echo=FALSE}
body{text-align: Justify;}
h1.title {border-bottom: 3px solid black; padding-bottom: 10px;}
```

```{r, echo=FALSE}
setwd('/Users/davidhamad/Documents/Cand.Scient.Pol/2. Semester/Statistical models beyond linear regression - applied statistics for political scientists/4-5) Hierarchical data structures')
```


### Packages
Loading relevant packages:

```{r}
library(lme4) # Hierarchical models
library(stargazer) # Regression tables
library(dplyr) # Data wrangling and pipes
library(ggplot2) # Graphics
library(ggeffects) # ggpredict
```

---

### Data
Loading the relevant data:

```{r}
load('/Users/davidhamad/Documents/Cand.Scient.Pol/2. Semester/Statistical models beyond linear regression - applied statistics for political scientists/4-5) Hierarchical data structures/MEP.rda')

df <- MEP
```

---

### Exercise 1: Descriptive statistics
**Q1:** What is the size of the data? How many individual MEPs are there in the data set? How often are they observed?

**A1:** 7143 observations, 116 variables, 1174 unique MEPs, and 10 periods.

```{r}
nrow(df) # Observations
ncol(df) # Variables
n_distinct(df$ID) # Unique MEPs
n_distinct(df$Period) # Periods
table(table(df$ID)) # Distribution
```

---

**Q2:** How is the relevant variation here? How are the variables measured? What is the within-group and between group variation?

**A2:** The dependent variable (ShareOfLocalAssistants) is continuous (0 to 43). ProxNatElection is how close we are to the next national parlament election (-4 to 0), and the variable is continuous. EPElection is whether it is the period with a European parlament election or not (no/yes = 0/1), which makes the variable binary. OpenList is also binary, 0 is closed list (party-centered), and 1 is open list (candidate-centered). NationalPartyCentered is binary because 0 is more candidate-centered, while 1 is more party-centered in regards to the national parlament election. The electoral calendar variables (ProxNatElection & EPElection) have within-individual variation because they change over time for the same MEP. The electoral system variables (OpenList & NationalPartyCentered) have only between-individual variation because they never change for the same MEP, since no country changes its electoral system during the study period. The dependent variable (ShareOfLocalAssistants) has both within-individual variation and between-individual variation.


```{r}
summary(df$ShareOfLocalAssistants) # Continuous
summary(df$ProxNatElection) # Continuous
table(df$EPElection) # Binary
table(df$OpenList) # Binary
table(df$NationalPartyCentered) # Binary
```

```{r}
# Within-individual variation. How many MEPs experience change?
df %>%
  group_by(ID) %>%
  summarise(
    change_y    = n_distinct(ShareOfLocalAssistants) > 1,
    change_prox = n_distinct(ProxNatElection) > 1,
    change_ep   = n_distinct(EPElection) > 1,
    change_open = n_distinct(OpenList) > 1,
    change_npc  = n_distinct(NationalPartyCentered) > 1
  ) %>%
  summarise(
    y    = paste0(sum(change_y), " / ", n()),
    prox = paste0(sum(change_prox), " / ", n()),
    ep   = paste0(sum(change_ep), " / ", n()),
    open = paste0(sum(change_open), " / ", n()),
    npc  = paste0(sum(change_npc), " / ", n())
  )
```

---

**Q3:** How could I model this? What would be the advantages and drawbacks? I.e. what variation would I leverage?

**A3:** Pooled OLS treats all 7143 observations as independent. Simple, and can estimate the effect of all variables. But the independence assumption is violated. The same MEP appears up to 10 times. This means standard errors will be too small, making us overconfident in our results. Fixed effects (individual level) adds a dummy for each MEP, leveraging only within-individual variation. The advantage is strong causal identification because it controls for everything that is constant about each MEP. Drawback is that OpenList and NationalPartyCentered would be dropped from the model because they have zero within-individual variation. So we cannot test hypothesis 1 (personal vote) with this model. Random effects/hierarchical model (varying intercepts) pools between- and within-individual variation using shrinkage. The advantage is that it can estimate effects of both electoral calendar (within) and electoral system (between), and corrects standard errors for the clustered data structure. Drawback is that it assumes that the group-level effects are uncorrelated with the predictors, which is a stronger assumption than fixed effects. Since we want to test both hypotheses, the random effects model is the most suitable choice here.

---

### Exercise 2: Choice of covariates

**Q1:** I want to model the change in staﬀ size as a function of the electoral calendar. How can I do this?

**A1:** To model the change in staff size, I need to leverage within-individual variation by using a varying-intercept model at the individual level (either fixed effects or random effects). This controls for everything constant about each MEP and isolates how their staff size changes over time as a function of the electoral calendar.

---

**Q2:** I'm considering a fixed-eﬀects model at the individual level. Please advise me on the following covariates: electoral calendar, electoral system, labor cost, gender, age, and lag of the dependent variable.

**A2:** Electoral calendar can be estimated. They vary over time within each MEP. These are the main predictors of interest. Electoral system would be dropped. They have zero within-individual variation because no MEP changes electoral system. The individual fixed effect absorbs them completely. LaborCost survives technically. But it is primarily a between-country variable, and since individual fixed effects already absorb nationality, most of its explanatory power is already controlled away. Gender would be dropped. No MEP changes gender. Zero within-variation. Age survives. But it increases at the exact same rate for everyone, which makes it nearly collinear with period effects. The dependent variable can be estimated. It is the lagged dependent variable, meaning last period's staff size predicts this period's. It controls for persistence in staffing decisions.

---

**Q3:** Now, I’m considering a fixed-eﬀect on time-period. Would this be a good idea? What would be the eﬀects of my two election variables? (EPElection & ProxNatElection).

**A3:** Adding period fixed effects would not be a good idea if we want to estimate the effect of EPElection, because it would be perfectly collinear with the period dummies and therefore dropped. ProxNatElection would survive because it varies across countries within the same period (different countries hold national elections at different times). So period fixed effects would prevent us from testing the EP election hypothesis.

---

**Q4:** How do you think the random-eﬀects model would perform here?

**A4:** The random effects model would perform well here because it can estimate all our variables. The within-individual variables (ProxNatElection & EPElection) are estimated using within-variation, while the between-individual variables (OpenList & NationalPartyCentered) are estimated using between-variation. It also corrects the standard errors for the clustered data structure (same MEP observed multiple times).

---

### Exercise 3: Fit and interpret the model

**Q1:** Fit the following model as a pooled linear regression, fixed-eﬀects model and a random eﬀects model (with varying individual intercepts): ShareOfLocalAssistants ~ ProxNatElection + NationalCandidateCentered + EPElection + OpenList + ShareOfLocal.lag. Present the results in stargazer().

**A1:** See the models below:

```{r}
# Pooled OLS
mod.pool <- lm(ShareOfLocalAssistants ~ ProxNatElection + NationalCandidateCentered + EPElection + OpenList + ShareOfLocal.lag, data = df)

# Fixed effects (individual level)
mod.fe <- lm(ShareOfLocalAssistants ~ ProxNatElection + NationalCandidateCentered + EPElection + OpenList + ShareOfLocal.lag + as.factor(ID), data = df)

# Random effects (varying individual intercepts)
mod.re <- lmer(ShareOfLocalAssistants ~ ProxNatElection + NationalCandidateCentered + EPElection + OpenList + ShareOfLocal.lag + (1|ID), data = df)

# Compare
stargazer(mod.pool, mod.fe, mod.re, 
          type = "text", 
          omit = "as.factor",
          column.labels = c("Pooled", "Fixed effects", "Random effects"))
```

---

**Q2:** What do you find? What happened?

**A2:** The fixed effects model cannot identify NationalCandidateCentered and OpenList because they have no within-individual variation. The random effects model solves this by pooling between- and within-variation, estimating all variables with reasonable precision.

---

**Q3:** Discuss my choice of variables.

**A3:** ProxNatElection og EPElection are good choices. They directly test hypothesis 2 (electoral calendar). Both have within-individual variation, so they can be estimated in all model types. NationalCandidateCentered og OpenList are good choices for testing hypothesis 1 (personal vote). They capture electoral system incentives at two levels (national and EU). However, they only have between-individual variation, which means they require a random effects model. ShareOfLocal.lag is a useful control variable. It accounts for persistence in staffing decisions, so the model estimates the effect of the other variables on changes in staff size rather than the level. However, including a lagged dependent variable in a random effects model can introduce bias, because the lag is correlated with the random intercept by construction. A MEP with a high intercept will also tend to have a high lagged value. What is missing might be the model does not control for time-varying confounders like Reform2016 (the spending cap) that we know affected local staff size for all MEPs. Including it could improve the estimates of the other variables.

---

### Exercise 4: Interpretation

**Context:** Interpret the results from the random-eﬀects model. What is the eﬀect of the two measures of the electoral calendar and the electoral system?

**Q1:** Create two scenarios and interpret either the marginal eﬀect or the first-diﬀerence. Justify your choice.

**A1:** Marginal effect is the change in y for a one-unit increase in x. First-difference is the change in y for a specific, chosen change in x. I use first-differences because the electoral system variables are binary (the only meaningful change is from 0 to 1), and for ProxNatElection a first-difference based on the full range gives a more substantively interpretable result than a one-unit marginal effect. In scenario 1 (electoral calender) When a MEP moves from the furthest point from a national election (ProxNatElection = -4) to immediately before an election (ProxNatElection = 0), the predicted increase in local staff is 0,034 × 4 = 0,135. In other words, roughly 1 in 7 MEPs would hire one additional local assistant as the election approaches. In scenario 2 (electoral system) MEPs in candidate-centered national electoral systems have on average 0.36 more local assistants than MEPs in party-centered systems. In other words, roughly 1 in 3 MEPs from candidate-centered systems hires one additional local assistant compared to their party-centered counterparts.

```{r}
# Scenario 1: Electoral calendar (ProxNatElection)
# A MEP goes from far from election (-4) to close to election (0)
# Holding all other variables constant
fd_prox <- fixef(mod.re)["ProxNatElection"] * (0 - (-4))
fd_prox
```

```{r}
# Scenario 2: Electoral system (NationalCandidateCentered)
# Comparing a MEP in a party-centered system (0) vs candidate-centered system (1)
# Holding all other variables constant
fd_ncc <- fixef(mod.re)["NationalCandidateCentered"] * (1 - 0)
fd_ncc
```

---

**Q2:** Descriptive statistics from a survey conducted among MEPs shows that 31 % claim that they envision staying in Parliament for 10 or more years (i.e. they will seek reelection). How does this change your understanding of the results?

**A2:** Our model estimates the average effect across all MEPs. But the theory assumes that MEPs respond to electoral incentives, which only makes sense for MEPs who actually plan to seek reelection. If only 31 % of MEPs intend to stay, then 69 % have no reason to respond to the electoral calendar or invest in personal vote cultivation. These 69 % are likely not changing their staff at all, which dilutes our estimates toward zero. The true effect for the 31 % who actually seek reelection is probably much larger than what we estimate. This means our results are conservative, and the electoral incentives for those who care about reelection are likely stronger than the model suggests.

---

**Q3:** Illustrate the eﬀects. What plot would you opt for?

**A3:** I would opt for effect plots using ggpredict() from the ggeffects package. For ProxNatElection (continuous), an effect plot shows the predicted staff size across the full range of the variable, with a confidence interval. For the binary variables (EPElection, NationalCandidateCentered, OpenList), the same type of plot shows the predicted value at 0 and 1.

```{r}
# Effect of ProxNatElection (continuous)
ggpredict(mod.re, terms = "ProxNatElection") %>%
  plot() +
  ggtitle("Effect of proximity to national election on local staff size")
```

```{r}
# Effect of EPElection (binary)
ggpredict(mod.re, terms = "EPElection") %>%
  plot() +
  ggtitle("Effect of EP election on local staff size")
```

```{r}
# Effect of NationalCandidateCentered (binary)
ggpredict(mod.re, terms = "NationalCandidateCentered") %>%
  plot() +
  ggtitle("Effect of candidate-centered system on local staff size")
```

```{r}
# Effect of OpenList (binary)
ggpredict(mod.re, terms = "OpenList") %>%
  plot() +
  ggtitle("Effect of open list system on local staff size")
```

---

**Q4:** In your opinion, are the two hypotheses supported? How big are the electoral incentives?

**A4:** Both hypotheses find some support, but the effects are substantively small. Hypothesis 1 (personal vote) is supported. MEPs in candidate-centered systems (NationalCandidateCentered) have 0,36 more local assistants, and MEPs with open lists (OpenList) also have 0,36 more. Both are statistically significant. But substantively, this means roughly 1 in 3 MEPs hires one additional assistant due to electoral system incentives. Hypothesis 2 (electoral calendar) is supported, but weak. Moving from the furthest point to immediately before a national election increases staff by only 0,135 (roughly 1 in 7 MEPs hiring one extra assistant). EP elections have a larger effect (0,355), comparable to the electoral system variables. Overall the electoral incentives exist, but they are modest. MEPs do appear to use parliamentary resources for electoral purposes, but the magnitude is small. Combined with the fact that only 31 % of MEPs plan to seek reelection, the true effect among those who actually respond to electoral incentives is likely larger than what we estimate here.

---

### Exercise 5: Interaction effect
**Context:** I have a third hypothesis: I believe MEPs with higher incentives to cultivate a personal vote are more sensitive to the electoral calendar.

**Q1 + Q2:** How could I model this Can you implement your suggestion using a random-intercept model??

**A1 + A2:** This hypothesis says that the effect of the electoral calendar depends on the electoral system. That is an interaction effect. I would include an interaction term between ProxNatElection and NationalCandidateCentered.

```{r}
mod.int <- lmer(ShareOfLocalAssistants ~ ProxNatElection * NationalCandidateCentered + EPElection + OpenList + ShareOfLocal.lag + (1|ID), data = df)
```

```{r}
summary(mod.int)
```

---

**Q3:** Can you interpret the results following the Berry et al.s recommendation? What is the eﬀect of: national electoral calendar when electoral system is candidate-centered, national electoral calendar when electoral system is party-centered, national electoral system when electoins are far away, and national electoral system when electons are tomorrow?

**A3:** Following Berry et al., I compute the conditional effects at meaningful values rather than interpreting the interaction term alone. Effect of electoral calendar when candidate-centered (NCC = 1): 0,068 per unit. MEPs in candidate-centered systems hire more staff as elections approach. Effect of electoral calendar when party-centered (NCC = 0): -0,017 per unit. Close to zero and not significant. MEPs in party-centered systems do not respond to the electoral calendar. Effect of electoral system when elections are far away (Prox = -4): 0,204. A small difference between the two systems when there is no electoral pressure. Effect of electoral system when elections are tomorrow (Prox = 0): 0,541. A much larger difference. Candidate-centered MEPs have about half an assistant more than party-centered MEPs right before an election. The pattern is clear. The two systems only diverge when elections are near. When elections are far away, it does not matter much whether the system is candidate-centered or party-centered. This supports hypothesis 3.

```{r}
# Compute conditional effects
b_prox <- fixef(mod.int)["ProxNatElection"]
```

```{r}
# Compute conditional effects
b_ncc  <- fixef(mod.int)["NationalCandidateCentered"]
```

```{r}
# Compute conditional effects
b_int  <- fixef(mod.int)["ProxNatElection:NationalCandidateCentered"]
```

```{r}
# Effect of calendar when candidate-centered
b_prox + b_int * 1
```

```{r}
# Effect of calendar when party-centered
b_prox + b_int * 0
```

```{r}
# Effect of system when elections far away (Prox = -4)
b_ncc + b_int * (-4)
```

```{r}
# Effect of system when elections tomorrow (Prox = 0)
b_ncc + b_int * 0
```