Q1: Your colleague starts analyzing this data using a pooled OLS model.

Q1a: Write the formula of a pooled regression model (5 points).

Accidents = β0 + β1 * Taxes

Q1b: Run a pooled OLS model and present your results in a table. (5 + 5 points)

see output

Pooled OLS
Dependent variable:
accidents
Constant 1,359.74***
(262.70)
taxes 760.12
(494.49)
Note: p<0.1; p<0.05; p<0.01

Q1c: Do beer taxes have a significant effect on reducing car accidents? (5 points)

No, the Pooled OLS model suggests that beer taxes do not have a statistically significant effect on reducing car accidents, as indicated by the non-significant coefficient for taxes.

Q1d: Interpret the coefficient of “taxes”. What is its effect on car accidents? Is the effect across or within states? (5 + 5 points)

The coefficient represents the estimated average increase in car accidents associated with a one-unit increase in beer taxes. However, this effect is not statistically significant, which means we cannot conclude that beer taxes have a significant effect on car accidents. This is the average effect of beer taxes on car accidents across states, not within states.

Q3: We now have a look at state-level differences.

Q3a: What is the value of the intercept in the first model? What does the intercept represent in the first model? (5 points)

This question is incredibly confusing. It doesn’t specify which models are needed or which order is 1 or 2. If the plm() model was used above, there is no apples:apples comparison to be had here. I’m also assuming that model_1 is with and model_2 is without.

Answer: The State 1 intercept is 2,090.57 and is significant. It represents the average number of car accidents in State 1, when taxes are equal to zero.

Dependent variable:
Car Accidents
FE with Intercept FE without Intercept
(1) (2)
Constant - State 1 2,090.57***
(8.64)
Taxes -31.37** -31.37**
(13.76) (13.76)
State 1 2,090.57***
(8.64)
State 2 6.42 2,096.99***
(5.94) (8.52)
State 3 -2.80 2,087.77***
(5.95) (8.29)
State 4 -1,159.82*** 930.75***
(6.05) (7.64)
State 5 9.66 2,100.24***
(5.94) (8.72)
State 6 -607.00*** 1,483.57***
(5.95) (8.42)
State 7 -495.99*** 1,594.58***
(6.06) (7.63)
Note: p<0.1; p<0.05; p<0.01

Q3b: What is the value of the coefficient of “State4” in the first model? What does it say about the number of car accidents in state 4? (5 points)

The value is -1,159.82 and is significant. This coefficient represents the difference in the average number of car accidents between State 4 and State 1. State 4 has 1,159.82 fewer car accidents on average compared to State 1.

Q3c: What is the value of the coefficient of “State4” in the second model? What does it say about the number of car accidents in state 4? (5 points)

The value is 930.75 and is significant. This coefficient represents the average number of car accidents in State 4 when taxes are equal to zero.

Q3d: Why is the coefficient of “State3” non-significant in model 1 while it is significant in model 2? (10 points)

In model 1, there is no significant difference in the average number of car accidents between State 3 and State 1. However, model 2 suggests the average number car accidents in State 3 is significantly different from zero.

Q4: Considering what you know about fixed effect models and the current study, which of these variables would you suggest that your colleague add in the model? Specify why. (10 points)

Both State geographical location (north, south, east… ) and State form of government are time-invariant factors (meaning they remain constant for each state over time), so Annual unemployment rates would be the variable I would suggest. Higher unemployment rates might lead to less disposable income for drinking; however, higher unemployment rates could lead to higher rates of despair, which might increase rates over time.

This would be the best choice because it adds additional explanatory power to the model and controls for confounders that affect the relationship between accidents and taxes.

BONUS QUESTION: We can run a fixed effect model by de-meaning our data and then using an OLS predictor.

BQ1: De-mean the data (help yourself with the codes in the lecture)

# state means
data_demeaned <- 
  data %>%
  group_by(state) %>%
  mutate(mean_taxes = mean(taxes), taxes_demeaned = taxes - mean_taxes,
         mean_accidents = mean(accidents), accidents_demeaned = accidents - mean_accidents) %>%
  ungroup()

BQ2: Run a de-meaned OLS model and present results in a nice table

Dependent variable:
De-meaned Car Accidents
De-meaned Taxes -0.00
(1.48)
taxes_demeaned -31.37**
(12.85)
Note: p<0.1; p<0.05; p<0.01

BQ3: Do results change from the fixed effect model? Explain why.

While the coefficients are the same, the standard errors differ slightly (13.76 vs. 12.85).

The similarity is expected because de-meaning the data is equivalent to running a fixed effect model. Both methods control for unobserved time-invariant variables of each state.

I’m unsure why the standard errors are different, but it could be due to how the calculations are performed.