In this homework Assignment you will use linear regression to study speeding tickets. Each question builds on the previous question. Your regressions should have more controls as you move through the assignment. Try to capture all of these regression in one nicely formatted table.

What determines how much drivers are fined if they are stopped for speeding? Do demographics like age, gender, and race matter? To answer this question, we’ll investigate traffic stops and citations in Massachusetts using data from Makowsky and Stratmann (2009). Even though state law sets a formula for tickets based on how fast a person was driving, polic officers in practice often deviate from the formula. An amount for the fine is given only for observations in which the police officer decided to assess a fine.

  1. Plot a histogram of fines. Does it looked normally distributed or skewed?

This histogram shows that amount of fines is skewed to the left.

library(haven)
options(repos="https://cran.rstudio.com" )

speeding_tickets_text <- read_dta("Ch07.Ex5.SpeedingTicketsData/speeding_tickets_text.dta")

Fines <- speeding_tickets_text$Amount

Fines[is.na(Fines)]<-0
hist(Fines)

  1. Estimate a simple linear regression model in which the ticket amount is the dependent variable as a function of age. Is age statistically significant? (Use Stargazer to present your results. If you are unfamiliar with stargazer, then go to the following link

Age is statistically significant on the simple linear regression model.

reg1 <- lm(Amount~ Age, data=speeding_tickets_text) 
library(stargazer)
stargazer(reg1, type="html")
Dependent variable:
Amount
Age -0.289***
(0.025)
Constant 131.707***
(0.886)
Observations 31,674
R2 0.004
Adjusted R2 0.004
Residual Std. Error 56.125 (df = 31672)
F Statistic 136.323*** (df = 1; 31672)
Note: p<0.1; p<0.05; p<0.01
  1. What does it mean for a variable to be endogenous? Is it possibly age endogenous? Please explain your answer.

A variable that is endogenous is any explanatory variable that is correlated with the error term; it is dependent. It is possible that age could be endogenous with the likelihood of getting a ticket. Young people may drive at higher speeds then an older population, therefore could be more prone to receiving a ticket.

  1. Estimate the model from part b), also controlling for miles per hour over the speed limit. Explain what happens to the coefficient on age and why.

When adding in miles per hour over, the age variable is no longer statistically significant. Miles per hour over has a stronger impact on the ticket amount potentially decreasing the impact of age on the amount of a speeding ticket.

reg2 <- lm(Amount~ Age + MPHover, data=speeding_tickets_text) 
stargazer(reg1,reg2,type="html",covariate.labels = c("Age","Miles Per Hour Over"))
Dependent variable:
Amount
(1) (2)
Age -0.289*** 0.025
(0.025) (0.018)
Miles Per Hour Over 6.892***
(0.039)
Constant 131.707*** 3.494***
(0.886) (0.954)
Observations 31,674 31,674
R2 0.004 0.503
Adjusted R2 0.004 0.503
Residual Std. Error 56.125 (df = 31672) 39.669 (df = 31671)
F Statistic 136.323*** (df = 1; 31672) 16,001.640*** (df = 2; 31671)
Note: p<0.1; p<0.05; p<0.01
  1. Is the effect of age on fines linear or non-linear? Assess this question by estimating a model with a quadratic age term, controlling for MPHover, Female, Black, and Hispanic. Interpret the coefficients on the age variables.

The effect of age on fines is linear. Originally, the age was at -0.289 however when Miles Per Hour (MPH) Over was added, age had very little effect on the constant. Once we added in the other variables (Race, Gender), the effect of age returned to significant levels while MPH Over remained almost constant. The age variable has a negative impact on the Amount. As seen in the Age Squared variable, the decrease in age is also going to be decreasing at an increasing rate.

speeding_tickets_text$age2 <- speeding_tickets_text$Age^2
reg3 <- lm(Amount~ Age + age2 + MPHover + Female + Black + Hispanic, data=speeding_tickets_text)
stargazer(reg1,reg2,reg3, type = "html", covariate.labels = c("Age","Age Squared", "Miles Per Hour Over", "Female", "Black", "Hispanic" ))
Dependent variable:
Amount
(1) (2) (3)
Age -0.289*** 0.025 -0.265***
(0.025) (0.018) (0.089)
Age Squared 0.004***
(0.001)
Miles Per Hour Over 6.892*** 6.873***
(0.039) (0.039)
Female -3.510***
(0.475)
Black -1.920*
(1.018)
Hispanic 2.068*
(1.058)
Constant 131.707*** 3.494*** 9.867***
(0.886) (0.954) (1.788)
Observations 31,674 31,674 31,674
R2 0.004 0.503 0.504
Adjusted R2 0.004 0.503 0.504
Residual Std. Error 56.125 (df = 31672) 39.669 (df = 31671) 39.624 (df = 31667)
F Statistic 136.323*** (df = 1; 31672) 16,001.640*** (df = 2; 31671) 5,358.445*** (df = 6; 31667)
Note: p<0.1; p<0.05; p<0.01
  1. Sketch the relationship between age and ticket amount from the foregoing quadratic model: calculate the fitted value for a white male with 0 MPHover for ages equal to 20, 25, 30, 35, 40, and 70. Use R to calculate these values and plot them.
newdata <- data.frame("Age" = c(20,25,30,35,40,70), "MPHover" = c(0,0,0,0,0,0), "Female" = c(0,0,0,0,0,0), "Black"= c(0,0,0,0,0,0), "Hispanic"= c(0,0,0,0,0,0))

newdata$age2 <- newdata$Age * newdata$Age

yhat <- predict(reg3, newdata)
age<-c(20,25,30,35,40,70)

library(ggplot2)
library(dplyr)
library(ggrepel)
library(tidyr)

ggplot(newdata, aes(x=age, y=yhat)) + geom_point(shape=21, color="black", fill="#69b3a2", size=4)+ 
  geom_line(color="grey") + labs(y= "Fine Amount", x = "Age")

  1. Calculate the age that is associated with the lowest predicted fines.

The age with the lowest fines is 35.37 years old.

b_age1 <- coef(reg3)[[“Age”]] b_age2 <- coef(reg3)[[“age2”]] lowest <- -b_age1/(2*b_age2)

  1. Do drivers from out of town and out of state get treated differently? Do state police and local police treat nonlocals differently? Estimate a model that allows us to assess whether out of towners and out of staters are treated differently and whether state police respond differently to out of towners and out of staters. Interpret the coefficients on the relevant variables. Hint: you have to do something more than just including the dummy variables.

If we review the results as it relates to out of towners versus in towners, out of towners pay more on fines than in towners. Additionally, out of staters pay a significantly higher amount. When looking at fine amounts for state police versus local police - only looking at in towners - there is not a significant difference in fine amounts, however, the comparison with out of towners and out of staters shows that this population is treated differently and that state police more biased to out of towners.

Dependent variable:
Amount
(1) (2) (3) (4)
Age -0.289*** 0.025 -0.265*** -0.341***
(0.025) (0.018) (0.089) (0.088)
Age Squared 0.004*** 0.005***
(0.001) (0.001)
Miles per Hour Over 6.892*** 6.873*** 6.884***
(0.039) (0.039) (0.038)
Female -3.510*** -2.974***
(0.475) (0.471)
Black -1.920* -2.151**
(1.018) (1.010)
Hispanic 2.068* 3.361***
(1.058) (1.050)
Out of Town 1.386*
(0.722)
Out of State 7.508***
(0.869)
State Police 0.884
(1.542)
Out of State * State Police 0.285
(1.128)
Out of Town * State Police 6.173***
(1.641)
Constant 131.707*** 3.494*** 9.867*** 5.058***
(0.886) (0.954) (1.788) (1.834)
Observations 31,674 31,674 31,674 31,674
R2 0.004 0.503 0.504 0.513
Adjusted R2 0.004 0.503 0.504 0.513
Residual Std. Error 56.125 (df = 31672) 39.669 (df = 31671) 39.624 (df = 31667) 39.251 (df = 31662)
F Statistic 136.323*** (df = 1; 31672) 16,001.640*** (df = 2; 31671) 5,358.445*** (df = 6; 31667) 3,034.083*** (df = 11; 31662)
Note: p<0.1; p<0.05; p<0.01
  1. Test whether the two state police interaction terms are jointly significant. Briefly explain your results. Hint: it says jointly so it is not a T-test.

As we can see from the results, the p value is small meaning that the interaction terms are jointly significant. Their join effect enhances the significance of both of the variables.

library(AER)
## Loading required package: car
## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode
## Loading required package: lmtest
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: sandwich
## Loading required package: survival
linearHypothesis(reg4, c("policetown=0", "policestate=0"))
## Linear hypothesis test
## 
## Hypothesis:
## policetown = 0
## policestate = 0
## 
## Model 1: restricted model
## Model 2: Amount ~ Age + age2 + MPHover + Female + Black + Hispanic + OutTown + 
##     OutState + StatePol + policestate + policetown
## 
##   Res.Df      RSS Df Sum of Sq      F    Pr(>F)    
## 1  31664 48802637                                  
## 2  31662 48779577  2     23060 7.4839 0.0005631 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Variable Name Description
MPHover Miles per hour over the speed limit
Amount Assessed fine for the ticket
Age Age of driver
Female Equals 1 for women and 0 for men
Black Equals 1 for African-American and 0 otherwise
Hispanic Equals 1 for Hispanics and 0 otherwise
State Pol Equals 1 if ticketing officer was state patrol officer and 0 otherwise
OutTown Equals 1 if driver from out of town and 0 otherwise
OutState Equals 1 if driver from out of state and 0 otherwise