Conceptual points
- Q1: Can you express in layman’s terms what a
“standard deviation” of a variable is?
- A1: A standard deviation tells you how spread out
the values of a variable are around the average. If the standard
deviation is small, most values are close to the average. If it’s large,
values are scattered far from the average. Think of it like this: if the
average income is $50,000, a small standard deviation means most people
earn close to $50,000, while a large one means some earn $20,000 and
others $80,000. It’s basically a measure of “how much do values
typically differ from the mean?”
- Q2: What are the “residuals” of the regression? How
are they calculated?
- A2: Residuals are the mistakes your model makes –
the difference between what actually happened (the observed Y) and what
your model predicted would happen (the estimated Y). They are calculated
as: Residual = Observed Y – Predicted Y. If someone actually earns
$60,000 but your model predicted $55,000, the residual is $5,000.
Residuals represent the “fundamental uncertainty” King et al. discuss –
all the random stuff (weather, illness, luck) that affects Y but isn’t
captured by your explanatory variables X. In their framework, this is
the stochastic component f(θ, α) from Equation 1.
- Q3: What is a variance-covariance matrix?
- A3: A variance-covariance matrix is a table that
summarizes two things about your parameter estimates. The diagonal
contains the variances of each parameter. Variance measures how spread
out an estimate is. For example, if you estimate a coefficient to be 5
with a variance of 4 (standard deviation = 2), the true value probably
lies somewhere around 5 ± 2, roughly between 3 and 7. The larger the
variance, the more uncertain you are about that estimate. The
off-diagonal elements contain the covariances, which tell you how two
estimates move together. Imagine you estimate two coefficients: b₁
(effect of education) and b₂ (effect of experience). If their covariance
is negative, it means: when your model overestimates the effect of
education, it typically underestimates the effect of experience – they
compensate for each other. If the covariance is positive, they move in
the same direction. If it’s zero, they are independent. This matters
because when you simulate parameters (as King et al. recommend), you
need to draw values that respect these connections. If you ignore the
covariances and simulate each parameter independently, you get
unrealistic combinations and your uncertainty estimates will be wrong.
Standard errors alone only tell you about each parameter individually,
but the covariance matrix gives you the complete map of how your
uncertainties are connected.
- Q4:: What is the role of the variance-covariance
matrix in the King et al. article?
- A4: In King et al., the variance-covariance matrix
is the engine of their entire simulation approach. Their method works in
three steps: first, estimate the model and record the point estimates
(γ̂) and the variance-covariance matrix V(γ̂). Then, draw simulated
parameter values from a multivariate normal distribution using those two
pieces of information. The variance-covariance matrix determines how
spread out and correlated these simulated draws are.
- Q5: Can you explain what the covariance matrix is
good for in this example?
- A5: In practical terms, the covariance matrix
allows researchers to do something powerful: translate raw regression
output into quantities that anyone can understand, while properly
accounting for uncertainty. For example, instead of saying “the
coefficient on education is 0.3 with a standard error of 0.1,” you can
say “an extra year of education increases your income by $1,500 on
average, plus or minus about $500.” The covariance matrix makes this
possible because when you calculate quantities like predicted values or
first differences, the uncertainty in those quantities depends on the
uncertainty in all the parameters together – not just one at a time. The
covariance matrix captures those connections. Without it, you would
either ignore uncertainty entirely (just plugging in point estimates) or
underestimate/overestimate it (by treating parameters as independent
when they’re not).
- Q6: What is the difference between fundamental and
estimation uncertainty?
- A6: Estimation uncertainty comes from not having
infinite data. We estimate β and α from a sample, so our estimates are
imperfect. If we had more observations, our estimates would be more
precise. This type of uncertainty can be reduced by collecting more
data. Fundamental uncertainty comes from the randomness of the world
itself. Even if you knew the exact parameter values (eliminating all
estimation uncertainty), you still couldn’t predict Y perfectly because
countless random factors (weather, illness, mood, luck) influence Y but
aren’t in your model. This is the stochastic component – the randomness
built into the real world. This uncertainty cannot be reduced by
collecting more data; it’s inherent to the phenomenon. As King et
al. put it: estimation uncertainty is about not knowing the parameters
perfectly, while fundamental uncertainty is about the world being
inherently unpredictable.
- Q7: What is the difference between expected and
predicted values of Y, and how does this relate to fundamental
vs. estimation uncertainty? When am I interested in one rather than the
other?
- A7: Expected values (E(Y)) give you the average
outcome for a given set of X values. They only contain estimation
uncertainty – the variability comes solely from not knowing the
parameters perfectly. Fundamental uncertainty is averaged away.
Predicted values (Ŷ) give you a specific outcome for a given set of X
values. They contain both estimation uncertainty AND fundamental
uncertainty. So predicted values have a wider confidence interval than
expected values, even though their average is roughly the same. When to
use which: Use expected values when you care about the average effect of
a variable – for example: “on average, how many more assistants does a
candidate-centered MEP have compared to a party-centered one?” You want
to highlight the systematic pattern, not random noise. Use predicted
values when you care about a specific case – for example: “how many
assistants will this particular MEP actually have?” Here you need to
account for all the random factors that could push the actual outcome
away from the average. Election forecasting is another example – you
don’t just want the expected winner, you want to know how likely an
upset is. The key intuition: expected values tell you about the signal,
predicted values tell you about the signal plus noise.
Exercises in R
- Q1: Can you re-fit model 2 with each MEP’s national
party size in the national parliament as a predictor?
- A1: Adding SeatsNatPal.prop (party size in the
national parliament) to model 2 yields a coefficient of -0.184, but it
is not statistically significant (p = 0.768). The other coefficients
remain largely unchanged: OpenList increases slightly from 0.829 to
0.937, and LaborCost stays at about -0.068. The model’s R² barely
changes (from 0.081 to 0.085), suggesting that party size does not
meaningfully improve the model. Note that 17 observations are lost due
to missing data on party size (N drops from 739 to 722).
setwd('/Users/davidhamad/Documents/Cand.Scient.Pol/2. Semester/Statistical models beyond linear regression - applied statistics for political scientists/3) Linear regression')
load("MEP2014.rda")
df <- MEP2014
mod2 <- lm(LocalAssistants ~ OpenList + LaborCost, df)
mod2.party <- lm(LocalAssistants ~ OpenList + LaborCost + SeatsNatPal.prop, df)
stargazer(mod2, mod2.party, type = "text")
===================================================================
Dependent variable:
-----------------------------------------------
LocalAssistants
(1) (2)
-------------------------------------------------------------------
OpenList 0.829*** 0.937***
(0.228) (0.227)
LaborCost -0.070*** -0.068***
(0.010) (0.010)
SeatsNatPal.prop -0.184
(0.625)
Constant 4.127*** 4.057***
(0.286) (0.352)
-------------------------------------------------------------------
Observations 739 722
R2 0.081 0.085
Adjusted R2 0.079 0.081
Residual Std. Error 3.083 (df = 736) 3.009 (df = 718)
F Statistic 32.612*** (df = 2; 736) 22.200*** (df = 3; 718)
===================================================================
Note: *p<0.1; **p<0.05; ***p<0.01
- Q2: What is the marginal effect of party size on
MEP’s local investment?
- A2: The marginal effect of party size is the
coefficient: -0.184. This means that a one-unit increase in party size
(from 0% to 100% of national parliament seats) is associated with 0.18
fewer local assistants, on average. However, since the effect is not
statistically significant, we cannot distinguish it from zero. In
substantive terms, party size does not appear to affect how many local
assistants an MEP hires, once we control for the electoral system and
labor costs.
- Q3: Create two scenarios, justify your choice and
calculate the first difference between the two.
- A3: I chose two scenarios based on the quartiles of
the party size variable (Q1 = 0.09, Q3 = 0.40), to ensure they represent
realistic values in the data. All other variables are held at their
means (OpenList = 0, LaborCost = 22.96). The predicted staff size for a
small party is 2.47 (95% CI: 2.09–2.85) and for a large party is 2.41
(95% CI: 2.08–2.75). The first difference is -0.06, meaning that moving
from a small to a large national party is associated with essentially no
change in local staff size. The confidence intervals overlap almost
entirely, confirming that the difference is not statistically
significant.
summary(df$SeatsNatPal.prop)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.00000 0.08769 0.29738 0.26110 0.40426 0.66834 17
eff <- ggpredict(mod2.party, terms = "SeatsNatPal.prop [0.09, 0.40]")
- Q4: Visualize the effect of party size on MEP’s
local investment.
- A4: The plot shows a nearly flat line with a wide
confidence interval, visually confirming that party size has no
meaningful effect on the number of local assistants. This is consistent
with the non-significant coefficient from the regression.
eff_full <- ggpredict(mod2.party, terms = "SeatsNatPal.prop")
eff_full %>%
plot +
ylab("Predicted local staff size") +
xlab("Party size in national parliament (proportion)") +
ggtitle("Effect of national party size on MEP local staff",
subtitle = "Controlling for OpenList and LaborCost")

Fundamental variation
- Q1: Can you calculate the residuals for model 1,
then model 2 and store them as separate variables in R?
mod1 <- lm(LocalAssistants ~ OpenList, df)
mod2 <- lm(LocalAssistants ~ OpenList + LaborCost, df)
df$resid_mod1 <- residuals(mod1)
df$resid_mod2 <- residuals(mod2)
- Q2: Can you describe the resituals of the two
models in a histogram, then in numbers by calculating the mean and
standard deviation?
ggplot(data.frame(r = residuals(mod1)), aes(r)) +
geom_histogram(bins = 30) +
ggtitle("Residuals: Model 1 (OpenList only)")

ggplot(data.frame(r = residuals(mod2)), aes(r)) +
geom_histogram(bins = 30) +
ggtitle("Residuals: Model 2 (OpenList + LaborCost)")

mean(residuals(mod1))
[1] 4.254603e-16
sd(residuals(mod1))
[1] 3.176834
mean(residuals(mod2))
[1] 5.910172e-16
sd(residuals(mod2))
[1] 3.078466
- Q3: What is the difference between the two sets of
residuals and why?
- A3: Model 2’s residuals have a smaller standard
deviation (3.08 vs 3.18) because adding LaborCost as a predictor
explains more of the variation in LocalAssistants. The R² increases from
0.022 to 0.081, meaning Model 2 captures more of the systematic pattern
in the data, leaving less unexplained variation (i.e. smaller
residuals). In King et al.’s terms, adding a relevant predictor reduces
the fundamental uncertainty – there is less “leftover” randomness that
the model cannot account for.
- Q4: Can you extract the variance-covariance matrix
for model 2?
vcov(mod2)
(Intercept) OpenList LaborCost
(Intercept) 0.081867230 -0.0284615942 -0.0024363118
OpenList -0.028461594 0.0519105737 0.0001759822
LaborCost -0.002436312 0.0001759822 0.0001031140
- Q5: Can you calculate the standard error for the
regression coefficients (parameters) from the variance- covariance
matrix?
- A5: The standard errors are the square root of the
diagonal elements: Intercept = 0.286, OpenList = 0.228, LaborCost =
0.010. These match exactly what summary(mod2) reports – because that’s
where standard errors come from.
sqrt(diag(vcov(mod2)))
(Intercept) OpenList LaborCost
0.28612450 0.22783892 0.01015451
- Q6: Are there any predictors that correlate more
than others?
- A6: The correlation between the Intercept and
LaborCost estimates is -0.84, which is very strong. This means: when the
model overestimates the intercept, it tends to underestimate the effect
of LaborCost, and vice versa. The correlation between Intercept and
OpenList is moderate (-0.44), while OpenList and LaborCost are nearly
independent (0.08).
cov2cor(vcov(mod2))
(Intercept) OpenList LaborCost
(Intercept) 1.0000000 -0.43659249 -0.83853084
OpenList -0.4365925 1.00000000 0.07606449
LaborCost -0.8385308 0.07606449 1.00000000
- Q7: How does this relate to King et al’s
argument?
- A7: King et al. argue that you need the full
variance-covariance matrix – not just individual standard errors – to
properly account for uncertainty. This example shows why: the intercept
and LaborCost are strongly correlated (-0.84). If you simulated these
parameters independently (ignoring that correlation), you would get
unrealistic combinations where both are overestimated simultaneously.
The covariance matrix ensures that when you simulate parameters to
calculate quantities like predicted values or first differences, the
draws respect these connections, giving you accurate confidence
intervals. Without it, your uncertainty estimates would be wrong.
---
title: "Problem set"
output: html_notebook
---

```{=html}
<style>
body {
  text-align: justify;
  font-size: 14px;
  line-height: 1.6;
  max-width: 800px;
  margin: auto;
}

h1.title {
  font-size: 28px;
  border-bottom: 2px solid #333;
  padding-bottom: 10px;
  margin-bottom: 30px;
}

h1 {
  font-size: 22px;
  margin-top: 40px;
  color: #2c3e50;
}

li {
  margin-bottom: 10px;
}
</style>
```

# Conceptual points

-   **Q1:** Can you express in layman’s terms what a “standard deviation” of a variable is?
-   **A1:** A standard deviation tells you how spread out the values of a variable are around the average. If the standard deviation is small, most values are close to the average. If it's large, values are scattered far from the average. Think of it like this: if the average income is \$50,000, a small standard deviation means most people earn close to \$50,000, while a large one means some earn \$20,000 and others \$80,000. It's basically a measure of "how much do values typically differ from the mean?"

------------------------------------------------------------------------

-   **Q2:** What are the “residuals” of the regression? How are they calculated?
-   **A2:** Residuals are the mistakes your model makes – the difference between what actually happened (the observed Y) and what your model predicted would happen (the estimated Y). They are calculated as: Residual = Observed Y – Predicted Y. If someone actually earns \$60,000 but your model predicted \$55,000, the residual is \$5,000. Residuals represent the "fundamental uncertainty" King et al. discuss – all the random stuff (weather, illness, luck) that affects Y but isn't captured by your explanatory variables X. In their framework, this is the stochastic component f(θ, α) from Equation 1.

------------------------------------------------------------------------

-   **Q3:** What is a variance-covariance matrix?
-   **A3:** A variance-covariance matrix is a table that summarizes two things about your parameter estimates. The diagonal contains the variances of each parameter. Variance measures how spread out an estimate is. For example, if you estimate a coefficient to be 5 with a variance of 4 (standard deviation = 2), the true value probably lies somewhere around 5 ± 2, roughly between 3 and 7. The larger the variance, the more uncertain you are about that estimate. The off-diagonal elements contain the covariances, which tell you how two estimates move together. Imagine you estimate two coefficients: b₁ (effect of education) and b₂ (effect of experience). If their covariance is negative, it means: when your model overestimates the effect of education, it typically underestimates the effect of experience – they compensate for each other. If the covariance is positive, they move in the same direction. If it's zero, they are independent. This matters because when you simulate parameters (as King et al. recommend), you need to draw values that respect these connections. If you ignore the covariances and simulate each parameter independently, you get unrealistic combinations and your uncertainty estimates will be wrong. Standard errors alone only tell you about each parameter individually, but the covariance matrix gives you the complete map of how your uncertainties are connected.

------------------------------------------------------------------------

-   **Q4:**: What is the role of the variance-covariance matrix in the King et al. article?
-   **A4**: In King et al., the variance-covariance matrix is the engine of their entire simulation approach. Their method works in three steps: first, estimate the model and record the point estimates (γ̂) and the variance-covariance matrix V(γ̂). Then, draw simulated parameter values from a multivariate normal distribution using those two pieces of information. The variance-covariance matrix determines how spread out and correlated these simulated draws are.

------------------------------------------------------------------------

-   **Q5**: Can you explain what the covariance matrix is good for in this example?
-   **A5**: In practical terms, the covariance matrix allows researchers to do something powerful: translate raw regression output into quantities that anyone can understand, while properly accounting for uncertainty. For example, instead of saying "the coefficient on education is 0.3 with a standard error of 0.1," you can say "an extra year of education increases your income by \$1,500 on average, plus or minus about \$500." The covariance matrix makes this possible because when you calculate quantities like predicted values or first differences, the uncertainty in those quantities depends on the uncertainty in all the parameters together – not just one at a time. The covariance matrix captures those connections. Without it, you would either ignore uncertainty entirely (just plugging in point estimates) or underestimate/overestimate it (by treating parameters as independent when they're not).

------------------------------------------------------------------------

-   **Q6:** What is the difference between fundamental and estimation uncertainty?
-   **A6:** Estimation uncertainty comes from not having infinite data. We estimate β and α from a sample, so our estimates are imperfect. If we had more observations, our estimates would be more precise. This type of uncertainty can be reduced by collecting more data. Fundamental uncertainty comes from the randomness of the world itself. Even if you knew the exact parameter values (eliminating all estimation uncertainty), you still couldn't predict Y perfectly because countless random factors (weather, illness, mood, luck) influence Y but aren't in your model. This is the stochastic component – the randomness built into the real world. This uncertainty cannot be reduced by collecting more data; it's inherent to the phenomenon. As King et al. put it: estimation uncertainty is about not knowing the parameters perfectly, while fundamental uncertainty is about the world being inherently unpredictable.

------------------------------------------------------------------------

-   **Q7:** What is the difference between expected and predicted values of Y, and how does this relate to fundamental vs. estimation uncertainty? When am I interested in one rather than the other?
-   **A7:** Expected values (E(Y)) give you the average outcome for a given set of X values. They only contain estimation uncertainty – the variability comes solely from not knowing the parameters perfectly. Fundamental uncertainty is averaged away. Predicted values (Ŷ) give you a specific outcome for a given set of X values. They contain both estimation uncertainty AND fundamental uncertainty. So predicted values have a wider confidence interval than expected values, even though their average is roughly the same. When to use which: Use expected values when you care about the average effect of a variable – for example: "on average, how many more assistants does a candidate-centered MEP have compared to a party-centered one?" You want to highlight the systematic pattern, not random noise. Use predicted values when you care about a specific case – for example: "how many assistants will this particular MEP actually have?" Here you need to account for all the random factors that could push the actual outcome away from the average. Election forecasting is another example – you don't just want the expected winner, you want to know how likely an upset is. The key intuition: expected values tell you about the signal, predicted values tell you about the signal plus noise.

------------------------------------------------------------------------

# Exercises in R

-   **Q1:** Can you re-fit model 2 with each MEP’s national party size in the national parliament as a predictor?
-   **A1:** Adding SeatsNatPal.prop (party size in the national parliament) to model 2 yields a coefficient of -0.184, but it is not statistically significant (p = 0.768). The other coefficients remain largely unchanged: OpenList increases slightly from 0.829 to 0.937, and LaborCost stays at about -0.068. The model's R² barely changes (from 0.081 to 0.085), suggesting that party size does not meaningfully improve the model. Note that 17 observations are lost due to missing data on party size (N drops from 739 to 722).

```{r}
setwd('/Users/davidhamad/Documents/Cand.Scient.Pol/2. Semester/Statistical models beyond linear regression - applied statistics for political scientists/3) Linear regression')

load("MEP2014.rda")

df <- MEP2014

mod2 <- lm(LocalAssistants ~ OpenList + LaborCost, df)

mod2.party <- lm(LocalAssistants ~ OpenList + LaborCost + SeatsNatPal.prop, df)

stargazer(mod2, mod2.party, type = "text")
```

---

- **Q2:** What is the marginal effect of party size on MEP’s local investment?
- **A2:** The marginal effect of party size is the coefficient: -0.184. This means that a one-unit increase in party size (from 0% to 100% of national parliament seats) is associated with 0.18 fewer local assistants, on average. However, since the effect is not statistically significant, we cannot distinguish it from zero. In substantive terms, party size does not appear to affect how many local assistants an MEP hires, once we control for the electoral system and labor costs.

---

- **Q3:** Create two scenarios, justify your choice and calculate the first difference between the two.
- **A3:** I chose two scenarios based on the quartiles of the party size variable (Q1 = 0.09, Q3 = 0.40), to ensure they represent realistic values in the data. All other variables are held at their means (OpenList = 0, LaborCost = 22.96). The predicted staff size for a small party is 2.47 (95% CI: 2.09–2.85) and for a large party is 2.41 (95% CI: 2.08–2.75). The first difference is -0.06, meaning that moving from a small to a large national party is associated with essentially no change in local staff size. The confidence intervals overlap almost entirely, confirming that the difference is not statistically significant.


```{r}
summary(df$SeatsNatPal.prop)
eff <- ggpredict(mod2.party, terms = "SeatsNatPal.prop [0.09, 0.40]")
```

---

- **Q4:** Visualize the effect of party size on MEP’s local investment.
- **A4:** The plot shows a nearly flat line with a wide confidence interval, visually confirming that party size has no meaningful effect on the number of local assistants. This is consistent with the non-significant coefficient from the regression.

```{r}
eff_full <- ggpredict(mod2.party, terms = "SeatsNatPal.prop")
eff_full %>%
  plot +
  ylab("Predicted local staff size") +
  xlab("Party size in national parliament (proportion)") +
  ggtitle("Effect of national party size on MEP local staff",
          subtitle = "Controlling for OpenList and LaborCost")
```

---

# Fundamental variation

- **Q1:** Can you calculate the residuals for model 1, then model 2 and store them as separate variables in R?

```{r}
mod1 <- lm(LocalAssistants ~ OpenList, df)
mod2 <- lm(LocalAssistants ~ OpenList + LaborCost, df)

df$resid_mod1 <- residuals(mod1)
df$resid_mod2 <- residuals(mod2)
```

---

- **Q2:** Can you describe the resituals of the two models in a histogram, then in numbers by calculating the
mean and standard deviation?

```{r}
ggplot(data.frame(r = residuals(mod1)), aes(r)) +
  geom_histogram(bins = 30) +
  ggtitle("Residuals: Model 1 (OpenList only)")

ggplot(data.frame(r = residuals(mod2)), aes(r)) +
  geom_histogram(bins = 30) +
  ggtitle("Residuals: Model 2 (OpenList + LaborCost)")

mean(residuals(mod1))
sd(residuals(mod1))

mean(residuals(mod2))
sd(residuals(mod2))    
```

---

- **Q3:** What is the difference between the two sets of residuals and why?
- **A3:** Model 2's residuals have a smaller standard deviation (3.08 vs 3.18) because adding LaborCost as a predictor explains more of the variation in LocalAssistants. The R² increases from 0.022 to 0.081, meaning Model 2 captures more of the systematic pattern in the data, leaving less unexplained variation (i.e. smaller residuals). In King et al.'s terms, adding a relevant predictor reduces the fundamental uncertainty – there is less "leftover" randomness that the model cannot account for.

---

- **Q4:** Can you extract the variance-covariance matrix for model 2?

```{r}
vcov(mod2)
```

---

- **Q5:** Can you calculate the standard error for the regression coefficients (parameters) from the variance-
covariance matrix?
- **A5:** The standard errors are the square root of the diagonal elements: Intercept = 0.286, OpenList = 0.228, LaborCost = 0.010. These match exactly what summary(mod2) reports – because that's where standard errors come from.

```{r}
sqrt(diag(vcov(mod2)))
```

---

- **Q6:** Are there any predictors that correlate more than others?
- **A6:** The correlation between the Intercept and LaborCost estimates is -0.84, which is very strong. This means: when the model overestimates the intercept, it tends to underestimate the effect of LaborCost, and vice versa. The correlation between Intercept and OpenList is moderate (-0.44), while OpenList and LaborCost are nearly independent (0.08).


```{r}
cov2cor(vcov(mod2))
```

---

- **Q7:** How does this relate to King et al’s argument?
- **A7:** King et al. argue that you need the full variance-covariance matrix – not just individual standard errors – to properly account for uncertainty. This example shows why: the intercept and LaborCost are strongly correlated (-0.84). If you simulated these parameters independently (ignoring that correlation), you would get unrealistic combinations where both are overestimated simultaneously. The covariance matrix ensures that when you simulate parameters to calculate quantities like predicted values or first differences, the draws respect these connections, giving you accurate confidence intervals. Without it, your uncertainty estimates would be wrong.