In this homework Assignment you will use linear regression to study speeding tickets. Each question builds on the previous question. Your regressions should have more controls as you move through the assignment. Try to capture all of these regression in one nicely formatted table.
What determines how much drivers are fined if they are stopped for speeding? Do demographics like age, gender, and race matter? To answer this question, we’ll investigate traffic stops and citations in Massachusetts using data from Makowsky and Stratmann (2009). Even though state law sets a formula for tickets based on how fast a person was driving, polic officers in practice often deviate from the formula. An amount for the fine is given only for observations in which the police officer decided to assess a fine.
This histogram shows that amount of fines is skewed to the left.
library(haven)
options(repos="https://cran.rstudio.com" )
speeding_tickets_text <- read_dta("Ch07.Ex5.SpeedingTicketsData/speeding_tickets_text.dta")
Fines <- speeding_tickets_text$Amount
Fines[is.na(Fines)]<-0
hist(Fines)
Age is statistically significant on the simple linear regression model.
reg1 <- lm(Amount~ Age, data=speeding_tickets_text)
library(stargazer)
stargazer(reg1, type="html")
| Dependent variable: | |
| Amount | |
| Age | -0.289*** |
| (0.025) | |
| Constant | 131.707*** |
| (0.886) | |
| Observations | 31,674 |
| R2 | 0.004 |
| Adjusted R2 | 0.004 |
| Residual Std. Error | 56.125 (df = 31672) |
| F Statistic | 136.323*** (df = 1; 31672) |
| Note: | p<0.1; p<0.05; p<0.01 |
A variable that is endogenous is any explanatory variable that is correlated with the error term; it is dependent. It is possible that age could be endogenous with the likelihood of getting a ticket. Young people may drive at higher speeds then an older population, therefore could be more prone to receiving a ticket.
When adding in miles per hour over, the age variable is no longer statistically significant. Miles per hour over has a stronger impact on the ticket amount potentially decreasing the impact of age on the amount of a speeding ticket.
reg2 <- lm(Amount~ Age + MPHover, data=speeding_tickets_text)
stargazer(reg1,reg2,type="html",covariate.labels = c("Age","Miles Per Hour Over"))
| Dependent variable: | ||
| Amount | ||
| (1) | (2) | |
| Age | -0.289*** | 0.025 |
| (0.025) | (0.018) | |
| Miles Per Hour Over | 6.892*** | |
| (0.039) | ||
| Constant | 131.707*** | 3.494*** |
| (0.886) | (0.954) | |
| Observations | 31,674 | 31,674 |
| R2 | 0.004 | 0.503 |
| Adjusted R2 | 0.004 | 0.503 |
| Residual Std. Error | 56.125 (df = 31672) | 39.669 (df = 31671) |
| F Statistic | 136.323*** (df = 1; 31672) | 16,001.640*** (df = 2; 31671) |
| Note: | p<0.1; p<0.05; p<0.01 | |
The effect of age on fines is linear. Originally, the age was at -0.289 however when Miles Per Hour (MPH) Over was added, age had very little effect on the constant. Once we added in the other variables (Race, Gender), the effect of age returned to significant levels while MPH Over remained almost constant. The age variable has a negative impact on the Amount. As seen in the Age Squared variable, the decrease in age is also going to be decreasing at an increasing rate.
speeding_tickets_text$age2 <- speeding_tickets_text$Age^2
reg3 <- lm(Amount~ Age + age2 + MPHover + Female + Black + Hispanic, data=speeding_tickets_text)
stargazer(reg1,reg2,reg3, type = "html", covariate.labels = c("Age","Age Squared", "Miles Per Hour Over", "Female", "Black", "Hispanic" ))
| Dependent variable: | |||
| Amount | |||
| (1) | (2) | (3) | |
| Age | -0.289*** | 0.025 | -0.265*** |
| (0.025) | (0.018) | (0.089) | |
| Age Squared | 0.004*** | ||
| (0.001) | |||
| Miles Per Hour Over | 6.892*** | 6.873*** | |
| (0.039) | (0.039) | ||
| Female | -3.510*** | ||
| (0.475) | |||
| Black | -1.920* | ||
| (1.018) | |||
| Hispanic | 2.068* | ||
| (1.058) | |||
| Constant | 131.707*** | 3.494*** | 9.867*** |
| (0.886) | (0.954) | (1.788) | |
| Observations | 31,674 | 31,674 | 31,674 |
| R2 | 0.004 | 0.503 | 0.504 |
| Adjusted R2 | 0.004 | 0.503 | 0.504 |
| Residual Std. Error | 56.125 (df = 31672) | 39.669 (df = 31671) | 39.624 (df = 31667) |
| F Statistic | 136.323*** (df = 1; 31672) | 16,001.640*** (df = 2; 31671) | 5,358.445*** (df = 6; 31667) |
| Note: | p<0.1; p<0.05; p<0.01 | ||
newdata <- data.frame("Age" = c(20,25,30,35,40,70), "MPHover" = c(0,0,0,0,0,0), "Female" = c(0,0,0,0,0,0), "Black"= c(0,0,0,0,0,0), "Hispanic"= c(0,0,0,0,0,0))
newdata$age2 <- newdata$Age * newdata$Age
yhat <- predict(reg3, newdata)
age<-c(20,25,30,35,40,70)
library(ggplot2)
library(dplyr)
library(ggrepel)
library(tidyr)
ggplot(newdata, aes(x=age, y=yhat)) + geom_point(shape=21, color="black", fill="#69b3a2", size=4)+
geom_line(color="grey") + labs(y= "Fine Amount", x = "Age")
The age with the lowest fines is 35.37 years old.
b_age1 <- coef(reg3)[[“Age”]] b_age2 <- coef(reg3)[[“age2”]] lowest <- -b_age1/(2*b_age2)
If we review the results as it relates to out of towners versus in towners, out of towners pay more on fines than in towners. Additionally, out of staters pay a significantly higher amount. When looking at fine amounts for state police versus local police - only looking at in towners - there is not a significant difference in fine amounts, however, the comparison with out of towners and out of staters shows that this population is treated differently and that state police more biased to out of towners.
| Dependent variable: | ||||
| Amount | ||||
| (1) | (2) | (3) | (4) | |
| Age | -0.289*** | 0.025 | -0.265*** | -0.341*** |
| (0.025) | (0.018) | (0.089) | (0.088) | |
| Age Squared | 0.004*** | 0.005*** | ||
| (0.001) | (0.001) | |||
| Miles per Hour Over | 6.892*** | 6.873*** | 6.884*** | |
| (0.039) | (0.039) | (0.038) | ||
| Female | -3.510*** | -2.974*** | ||
| (0.475) | (0.471) | |||
| Black | -1.920* | -2.151** | ||
| (1.018) | (1.010) | |||
| Hispanic | 2.068* | 3.361*** | ||
| (1.058) | (1.050) | |||
| Out of Town | 1.386* | |||
| (0.722) | ||||
| Out of State | 7.508*** | |||
| (0.869) | ||||
| State Police | 0.884 | |||
| (1.542) | ||||
| Out of State * State Police | 0.285 | |||
| (1.128) | ||||
| Out of Town * State Police | 6.173*** | |||
| (1.641) | ||||
| Constant | 131.707*** | 3.494*** | 9.867*** | 5.058*** |
| (0.886) | (0.954) | (1.788) | (1.834) | |
| Observations | 31,674 | 31,674 | 31,674 | 31,674 |
| R2 | 0.004 | 0.503 | 0.504 | 0.513 |
| Adjusted R2 | 0.004 | 0.503 | 0.504 | 0.513 |
| Residual Std. Error | 56.125 (df = 31672) | 39.669 (df = 31671) | 39.624 (df = 31667) | 39.251 (df = 31662) |
| F Statistic | 136.323*** (df = 1; 31672) | 16,001.640*** (df = 2; 31671) | 5,358.445*** (df = 6; 31667) | 3,034.083*** (df = 11; 31662) |
| Note: | p<0.1; p<0.05; p<0.01 | |||
jointly so it is not a T-test.As we can see from the results, the p value is small meaning that the interaction terms are jointly significant. Their join effect enhances the significance of both of the variables.
library(AER)
## Loading required package: car
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
## Loading required package: lmtest
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: sandwich
## Loading required package: survival
linearHypothesis(reg4, c("policetown=0", "policestate=0"))
## Linear hypothesis test
##
## Hypothesis:
## policetown = 0
## policestate = 0
##
## Model 1: restricted model
## Model 2: Amount ~ Age + age2 + MPHover + Female + Black + Hispanic + OutTown +
## OutState + StatePol + policestate + policetown
##
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 31664 48802637
## 2 31662 48779577 2 23060 7.4839 0.0005631 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
| Variable Name | Description |
|---|---|
| MPHover | Miles per hour over the speed limit |
| Amount | Assessed fine for the ticket |
| Age | Age of driver |
| Female | Equals 1 for women and 0 for men |
| Black | Equals 1 for African-American and 0 otherwise |
| Hispanic | Equals 1 for Hispanics and 0 otherwise |
| State Pol | Equals 1 if ticketing officer was state patrol officer and 0 otherwise |
| OutTown | Equals 1 if driver from out of town and 0 otherwise |
| OutState | Equals 1 if driver from out of state and 0 otherwise |