Q1: Your colleague starts analyzing this data using a pooled OLS model.

Q1a: Write the formula of a pooled regression model (5 points).

Accidents = β0 + β1 * Taxes

Q1b: Run a pooled OLS model and present your results in a table. (5 + 5 points)

see output

**Pooled OLS**

	Dependent variable:

	accidents

Constant	1,359.74^***
	(262.70)

taxes	760.12
	(494.49)



Note:	p<0.1; p<0.05; p<0.01

Q1c: Do beer taxes have a significant effect on reducing car accidents? (5 points)

No, the Pooled OLS model suggests that beer taxes do not have a statistically significant effect on reducing car accidents, as indicated by the non-significant coefficient for taxes.

Q1d: Interpret the coefficient of “taxes”. What is its effect on car accidents? Is the effect across or within states? (5 + 5 points)

The coefficient represents the estimated average increase in car accidents associated with a one-unit increase in beer taxes. However, this effect is not statistically significant, which means we cannot conclude that beer taxes have a significant effect on car accidents. This is the average effect of beer taxes on car accidents across states, not within states.

Q2: However, you know that grouped data might lead to biased results because of the Simpson’s paradox (trends in the data are different when data are looked at the group or aggregate level). You propose a fixed effect model.

Q2a: Using an OLS approach, write the formula of the fixed effect model. (5 points)

Accidents_it = β1 * Taxes_it + α_i

Q2b: Looking at the formulas you wrote in response to Q1a and Q2a, what is the main difference between a pooled OLS and a fixed effect model? (5 points)

The pooled OLS model assumes that the relationship between the independent variable(s) and the dependent variable is constant across all groups. In contrast, the fixed effect model allows for group-specific effects by including additional terms (fixed effects) in the model.

Q2c: Now run two fixed-effect models, one including the intercept and one excluding the intercept (you can use the plm function for this latter). Present the results in a nice table. (5 + 5 + 5 points)

This question is incredibly confusing as written.

See the output.


	Dependent variable:

	Car Accidents
	FE with Intercept	FE without Intercept
	(1)	(2)

Constant - State 1	2,090.57^***
	(8.64)

Taxes	-31.37^**	-31.37^**
	(13.76)	(13.76)

State 1		2,090.57^***
		(8.64)

State 2	6.42	2,096.99^***
	(5.94)	(8.52)

State 3	-2.80	2,087.77^***
	(5.95)	(8.29)

State 4	-1,159.82^***	930.75^***
	(6.05)	(7.64)

State 5	9.66	2,100.24^***
	(5.94)	(8.72)

State 6	-607.00^***	1,483.57^***
	(5.95)	(8.42)

State 7	-495.99^***	1,594.58^***
	(6.06)	(7.63)



Note:	p<0.1; p<0.05; p<0.01

Q2d: When using a fixed effect model, do beer taxes have a significant effect on reducing car accidents? (5 points)

Beer taxes have a statistically significant effect on reducing car accidents in both models. An increase in beer taxes is associated with a decrease in car accidents.

Q2e: Describe the effect of “taxes” on car accidents. Make sure to specify whether the estimated effect is across or within states (5 points).

The effect represents the within-state effect. This means that the model measures the relationship, controlling for unobserved, time-invariant factors that differ by state. The taxes coefficient (-31.37) is statistically significant and suggests a one-unit increase in beer taxes is associated with a decrease of 31.37 car accidents within a state.

Q3: We now have a look at state-level differences.

Q3a: What is the value of the intercept in the first model? What does the intercept represent in the first model? (5 points)

This question is incredibly confusing. It doesn’t specify which models are needed or which order is 1 or 2. If the plm() model was used above, there is no apples:apples comparison to be had here. I’m also assuming that model_1 is with and model_2 is without.

Answer: The State 1 intercept is 2,090.57 and is significant. It represents the average number of car accidents in State 1, when taxes are equal to zero.


	Dependent variable:

	Car Accidents
	FE with Intercept	FE without Intercept
	(1)	(2)

Constant - State 1	2,090.57^***
	(8.64)

Taxes	-31.37^**	-31.37^**
	(13.76)	(13.76)

State 1		2,090.57^***
		(8.64)

State 2	6.42	2,096.99^***
	(5.94)	(8.52)

State 3	-2.80	2,087.77^***
	(5.95)	(8.29)

State 4	-1,159.82^***	930.75^***
	(6.05)	(7.64)

State 5	9.66	2,100.24^***
	(5.94)	(8.72)

State 6	-607.00^***	1,483.57^***
	(5.95)	(8.42)

State 7	-495.99^***	1,594.58^***
	(6.06)	(7.63)



Note:	p<0.1; p<0.05; p<0.01

Q3b: What is the value of the coefficient of “State4” in the first model? What does it say about the number of car accidents in state 4? (5 points)

The value is -1,159.82 and is significant. This coefficient represents the difference in the average number of car accidents between State 4 and State 1. State 4 has 1,159.82 fewer car accidents on average compared to State 1.

Q3c: What is the value of the coefficient of “State4” in the second model? What does it say about the number of car accidents in state 4? (5 points)

The value is 930.75 and is significant. This coefficient represents the average number of car accidents in State 4 when taxes are equal to zero.

Q3d: Why is the coefficient of “State3” non-significant in model 1 while it is significant in model 2? (10 points)

In model 1, there is no significant difference in the average number of car accidents between State 3 and State 1. However, model 2 suggests the average number car accidents in State 3 is significantly different from zero.

Q4: Considering what you know about fixed effect models and the current study, which of these variables would you suggest that your colleague add in the model? Specify why. (10 points)

Both State geographical location (north, south, east… ) and State form of government are time-invariant factors (meaning they remain constant for each state over time), so Annual unemployment rates would be the variable I would suggest. Higher unemployment rates might lead to less disposable income for drinking; however, higher unemployment rates could lead to higher rates of despair, which might increase rates over time.

This would be the best choice because it adds additional explanatory power to the model and controls for confounders that affect the relationship between accidents and taxes.

BONUS QUESTION: We can run a fixed effect model by de-meaning our data and then using an OLS predictor.

BQ1: De-mean the data (help yourself with the codes in the lecture)

# state means
data_demeaned <- 
  data %>%
  group_by(state) %>%
  mutate(mean_taxes = mean(taxes), taxes_demeaned = taxes - mean_taxes,
         mean_accidents = mean(accidents), accidents_demeaned = accidents - mean_accidents) %>%
  ungroup()

BQ2: Run a de-meaned OLS model and present results in a nice table


	Dependent variable:

	De-meaned Car Accidents

De-meaned Taxes	-0.00
	(1.48)

taxes_demeaned	-31.37^**
	(12.85)



Note:	p<0.1; p<0.05; p<0.01

BQ3: Do results change from the fixed effect model? Explain why.

While the coefficients are the same, the standard errors differ slightly (13.76 vs. 12.85).

The similarity is expected because de-meaning the data is equivalent to running a fixed effect model. Both methods control for unobserved time-invariant variables of each state.

I’m unsure why the standard errors are different, but it could be due to how the calculations are performed.

Lab-03, CPP 525, Brett Foster