knitr::opts_chunk$set(message = FALSE, warning = FALSE)

Exploratory Data Analysis

  1. In how many schools is the percentage of students exempt for medical reasons (med_exempt) greater than the percentage exempt for religious reasons (rel_exempt)? [2 pts] Of this set of schools, what percent are public schools? [2 pts]
dim(cavax) #7032 rows, 11 columns. 7032 schools
## [1] 7032   11
cavax_medical = cavax %>%
  mutate("greater" = if_else(med_exempt > rel_exempt, "high", "low")
         ) 

table(cavax_medical$greater) # 518 schools is med_exempt > rel_exempt
## 
## high  low 
##  518 6514
# filter for these 518 schools.
cavax_medical2 = cavax_medical %>%
  filter(greater == "high")


table(cavax_medical2$type)
## 
## PRIVATE  PUBLIC 
##      70     448
 #70 private, 448 public.  So 448/518 = 86.4% of these schools are public

Answer: In 518 schools, 86.4%

  1. Which c®ounty, when averaging across all the schools in that county, has the highest average percentage of exempt students (exempt)? Note, we are using the variable exempt here. [2 pts]
table(cavax$county)
## 
##         ALAMEDA          AMADOR           BUTTE       CALAVERAS          COLUSA 
##             286               6              48               9               4 
##    CONTRA COSTA       DEL NORTE       EL DORADO          FRESNO           GLENN 
##             197               8              32             210               5 
##        HUMBOLDT        IMPERIAL            INYO            KERN           KINGS 
##              35              41               4             164              34 
##            LAKE          LASSEN     LOS ANGELES          MADERA           MARIN 
##              14              11            1671              30              62 
##        MARIPOSA       MENDOCINO          MERCED           MODOC            MONO 
##               5              22              58               3               2 
##        MONTEREY            NAPA          NEVADA          ORANGE          PLACER 
##              83              32              19             550              84 
##          PLUMAS       RIVERSIDE      SACRAMENTO      SAN BENITO  SAN BERNARDINO 
##               4             347             271              14             389 
##       SAN DIEGO   SAN FRANCISCO     SAN JOAQUIN SAN LUIS OBISPO       SAN MATEO 
##             557             135             172              49             148 
##   SANTA BARBARA     SANTA CLARA      SANTA CRUZ          SHASTA          SIERRA 
##              90             338              55              42               1 
##        SISKIYOU          SOLANO          SONOMA      STANISLAUS          SUTTER 
##              13              66             111             111              29 
##          TEHAMA         TRINITY          TULARE        TUOLUMNE         VENTURA 
##              14               4             107              10             169 
##            YOLO            YUBA 
##              34              23
cavax_county = 
  cavax %>%
  group_by(county) %>%
  summarize("meanexempt" = mean(exempt)) %>%
  arrange(desc(meanexempt))

cavax_county
## # A tibble: 57 × 2
##    county     meanexempt
##    <chr>           <dbl>
##  1 NEVADA          24.0 
##  2 MARIPOSA        16.5 
##  3 HUMBOLDT        14.2 
##  4 DEL NORTE       13.2 
##  5 LASSEN          12.5 
##  6 SANTA CRUZ      11.8 
##  7 EL DORADO       10.6 
##  8 CALAVERAS        9.8 
##  9 SHASTA           9.78
## 10 MENDOCINO        9.37
## # … with 47 more rows

Answer: Nevada has the highest average of exempt students.

  1. Create a bar chart that shows for private and public schools (type) the percent of students exempt from providing vaccination records (exempt). [2 pts] ( are we looking at average here???)
cavax_type2 = 
  cavax %>%
  group_by(type) %>%
  summarize("sumexempt" = sum(exempt))

ggplot(data=cavax_type2, aes(x=type, y=sumexempt)) +
  geom_bar(stat="identity") + labs(title = "Total Percent of Students Exempt From Providing Vaccination Records and School Types",
      x = "Type of School",
      y = "Total Percent of Students Exempt")

Linear Regression Model

  1. Estimate a model predicting exempt (exempt) by district type (type) and enrollment (enrollment). [2 pts] Treat exempt as a continuous variable (and thus you can use standard OLS). Interpret the intercept and the coefficients. [4 pts] What is the predicted exempt percentage for a public school with 100 students in kindergarten? [2 pts] What is the predicted exempt percentage for a private school with 80 students in kindergarten? [2pts]
# We are predicting the percent of students who are exempt from providing vaccination records by type of school (public vs private) and total kindergarten enrollment

reg1= lm(exempt ~ type + enrollment, data= cavax)
summary(reg1)
## 
## Call:
## lm(formula = exempt ~ type + enrollment, data = cavax)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -5.993 -3.296 -1.727  0.588 86.986 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  6.288602   0.209140  30.069  < 2e-16 ***
## typePUBLIC  -0.860822   0.263483  -3.267  0.00109 ** 
## enrollment  -0.029606   0.002383 -12.425  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.396 on 7029 degrees of freedom
## Multiple R-squared:  0.04034,    Adjusted R-squared:  0.04007 
## F-statistic: 147.7 on 2 and 7029 DF,  p-value: < 2.2e-16

Intercept: This is the estimated percentage of students who are exempt from vaccination for students who are in private schools and have 0 total enrollment. The estimated percentage is 6.28%. This is statistically significant at the p<.001 level. ***

typePUBLIC: This is comparing the percentage of students who are exempt from vaccination for students who are in public compared to those in private while holding total enrollment constant. Students who are in public schools have -0.86% less percentage of students who are exempt from vaccination compared to students in private schools. In other words, students who are in private schools have 0.86% more percentage of students exempt from vaccination. This is significant at the p<.01 level. **

enrollment: this is the effect of enrollment on the estimated percentage of students who are exempt from vaccination while holding type of school constant. As total enrollment goes up by 1 student, the percentage of students exempt from vaccination goes down by -0.02%. This is significant at the p<.001 level. ***

What is the predicted exempt percentage for a public school with 100 students in kindergarten? [2 pts]

#reg1: lm(formula = exempt ~ type(public) + enrollment, data = cavax)

#1st way
pred.exempt1 = as.matrix(c(1,1,100))

exempt = crossprod(pred.exempt1, coef(reg1))
exempt     
##          [,1]
## [1,] 2.467164
# 2.46%


###2nd way

6.288602 + (-0.860822*1) + (-0.029606*100)
## [1] 2.46718
#2.46%

Answer: 2.46%

What is the predicted exempt percentage for a private school with 80 students in kindergarten? [2pts]

#reg1: lm(formula = exempt ~ type(public) + enrollment, data = cavax)
pred.exempt2 = as.matrix(c(1,0,80))
#1st way
exempt2 = crossprod(pred.exempt2, coef(reg1))
exempt2                          
##         [,1]
## [1,] 3.92011
# 3.92%                         
     

###2nd way 
6.288602 + (-0.860822*0) + (-0.029606*80)                                           
## [1] 3.920122
#3.92%

Answer: 3.92%

Testing linear regression assumption

  1. Test whether the assumption of homoskedasticy has been met. [2 pts] Discuss results. [2 pts] Calculate the VIF for each variable. [2 pts] Should we be concerned with multicollinearity. [2 pts] (Note, the vif() command is in the ‘car’ package. You can also calculate the VIF yourself as we did in class)
# test for heteroskedasticity, the 5th assumption of linear regression model. The assumption here is that the variance of the error term is the same across all combinations of values of the explanatory variable. We don't want IV's, in this case, school type and total number of enrolled kids to predict the error term. .

# if heteroskedasticity is present, this means that the variance of the error term(the unexplained) is not the same and changes based on the value of the IV's. 

bptest(reg1)
## 
##  studentized Breusch-Pagan test
## 
## data:  reg1
## BP = 106.31, df = 2, p-value < 2.2e-16
#This is statistically significant so there is heteroskedasticity. We should use robust standard errors

After running the Breusch-Pagan test, the p-value is significant. This means that the variance of the error term is not the same and changes based on the value of the Independent Variables ( school type and enrollment).

vif to detect multicollinearity

vif(reg1)
##       type enrollment 
##   1.413965   1.413965

Looks good. There doesn’t seem to be multicollinearity between school type and enrollment.

Interaction model

  1. Recenter the variable enrollment at its mean. [2 pts] Create an interaction between type (type) and student enrollment (enrollment) recentered and rerun the model predicting exempt (exempt). [2 pts] Assume that type moderates the effect of enrollment in your interpretation of the interaction. (i) Interpret the results on each coefficient. (ii) Create a plot to visualize the interaction. [4 pts]
summary(cavax$enrollment)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   10.00   41.00   73.00   75.65  104.00  556.00
mean(cavax$enrollment) # mean is 75.65
## [1] 75.64562
cavax$enrollment2 = scale(cavax$enrollment, center = TRUE, scale = FALSE) # recenter at the mean. Mean is now 0
cavax$enrollment2 <- as.numeric(cavax$enrollment2)

reg2 = lm(exempt ~ type + enrollment2 + type*enrollment2, data= cavax)
summary(reg2)
## 
## Call:
## lm(formula = exempt ~ type + enrollment2 + type * enrollment2, 
##     data = cavax)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.070 -3.293 -1.726  0.587 86.926 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             3.837202   0.600435   6.391 1.76e-10 ***
## typePUBLIC             -0.651213   0.609131  -1.069  0.28507    
## enrollment2            -0.034017   0.011800  -2.883  0.00395 ** 
## typePUBLIC:enrollment2  0.004599   0.012048   0.382  0.70272    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.396 on 7028 degrees of freedom
## Multiple R-squared:  0.04036,    Adjusted R-squared:  0.03995 
## F-statistic: 98.53 on 3 and 7028 DF,  p-value: < 2.2e-16

Intercept: the predicted percentage of students exempt from providing vaccination records in private schools with 76 students ( the mean enrollment) . The estimated percent is 3.83%. This is significant at the P<.001 level ***.

Type Public: Conditional effect of type of school when enrollment is at the mean. Public vs private exempt at the mean enrollment. At the mean enrollment of 76 students, public schools have -0.65% less percentage of students exempt from providing vaccination records compared to private schools. This is not statistically significant.

Enrollment2: Conditional effect of enrollment for private schools. For every additional kindergartener in private schools starting from the mean enrollment of 76 students, the percentage of students exempt from providing vaccination records in private schools goes down by -0.034017%. This is stat. significant at the P.001 level **.

typepublic*enrollment2: slope of enrollment moderated by the type of school. Whether the school is public or private moderates the effect of enrollment on the percentage of students exempt from providing vaccination records. Here, this is the difference in percentage of students exempt from providing vaccination records between public and private schools. For public schools, the effect of a +1 enrollment starting from the mean enrollment of 76 students is 0.004599% more on exempt percentage compared to private schools. Since +1 enrollment decreases slope by only -0.034017% for private schools, a +1 enrollment decreases slope of public schools by (0.004599 - 0.034017) = -0.029418%. However, this is not statistically significant.

Graph of interaction

#create plot to visualize interaction
summary(cavax$enrollment2)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -65.646 -34.646  -2.646   0.000  28.354 480.354
DATA1= expand.grid(type = c('PRIVATE','PUBLIC'),
                        enrollment2 = c(0:480) )
                   


DATA1$preds = predict(reg2, new=DATA1, type="response")


ggplot(DATA1, aes(x=enrollment2, y=preds, color=factor(type))) + 
  geom_point(size = 3) 

Log Models

Question 5: Let’s log transform (using the natural log) the variable enrollment and call the new variable log_enroll. [2pts] Estimate a model predicting exempt (exempt) by district type (type) and log of enrollment (log_enrollment). [ 2pts] Interpret the coefficient on the log of enrollment. [2 pts] Does it make more sense to use enrollment or the log of enrollment as the predictor variable? Why? [4 pts]

cavax = 
  cavax %>%
  mutate(lenrollment = log(enrollment))



logm1 = lm(exempt ~ type + lenrollment, data= cavax )
summary(logm1)
## 
## Call:
## lm(formula = exempt ~ type + lenrollment, data = cavax)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -8.417 -2.982 -1.461  0.867 85.934 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  14.1408     0.5410  26.137   <2e-16 ***
## typePUBLIC    0.5707     0.2878   1.983   0.0474 *  
## lenrollment  -2.7338     0.1589 -17.201   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.324 on 7029 degrees of freedom
## Multiple R-squared:  0.05888,    Adjusted R-squared:  0.05861 
## F-statistic: 219.9 on 2 and 7029 DF,  p-value: < 2.2e-16

A 1 % increase in total number of students enrolled decreases percentage of students exempt from vaccination by -2.73/100= -0.027338 percentage points while holding type of school constant. This is significant at the p<.001 level ***. Yes, this makes sense because total enrollment across all schools may be different, so a +1 increase in enrollment in a small school is different than a +1 increase in enrollment in a big school. Instead, we want to compare the effect of a 1% increase in enrollment in percentage of students exempt from vaccination records across all the schools regardless of enrollment size.

-2.73/100

Question 6: Create a binary variable to indicate high versus low exempt rates. For schools with exempt percentages equal to or greater than 33 percent, indicate them as “high”, for all other schools indicate them as “low”. [2 pts] Run a logistic regression predicting whether a school is high versus low, in other words, we want our model to predict schools falling into the high category. In your model use the predictors of school type (type) and enrollment (enrollment) (note: do not use log_enroll in this model). [2 pts]

Interpret the coefficients on type and enrollment in terms of both log odds and odds. [ 6 pts]

cavax = 
  cavax %>%
  mutate(exemptrate = case_when( exempt >= 33 ~ 'high',
                                 exempt < 33 ~ 'low'))


cavax = 
  cavax %>%
  mutate(exemptrate = ifelse(exemptrate == "high", 1, 0))



lg3 = glm(exemptrate ~ type + enrollment, family=binomial(link="logit"), data= cavax)
summary(lg3)
## 
## Call:
## glm(formula = exemptrate ~ type + enrollment, family = binomial(link = "logit"), 
##     data = cavax)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.3496  -0.2117  -0.1293  -0.0801   4.5249  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -2.569319   0.174010 -14.765  < 2e-16 ***
## typePUBLIC   0.115309   0.232708   0.496     0.62    
## enrollment  -0.031009   0.004197  -7.388 1.49e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1108.0  on 7031  degrees of freedom
## Residual deviance: 1005.3  on 7029  degrees of freedom
## AIC: 1011.3
## 
## Number of Fisher Scoring iterations: 8

logged odds interpretation:

typePublic: the logged odds of a public school being a high exempt school is 0.11 greater than private schools while holding total enrollment constant. This is not statistically significant. public schools are 1.12 times greater to be a high exempt school than private schools.

enrollment: As the number of enrollment goes up by 1 kindergartner, the logged odds of a school being a high exempt school goes down by -0.03. This is statistically significant at the p<.001 level. as total enrollment goes up by 1 kindergartner, the odds decrease by 3% for every additional kindergartner.

exp(lg3$coef)
## (Intercept)  typePUBLIC  enrollment 
##  0.07658769  1.12222035  0.96946696
(exemptrate ~ type + enrollment)
## exemptrate ~ type + enrollment
#first way
Ex1 = as.matrix(c(1,0,100))

P1= crossprod(Ex1,coef(lg3))

exp(P1)/(1+exp(P1))
##             [,1]
## [1,] 0.003435308
#2nd way
P2= -2.569319 + (-0.031009*100)
exp(P2)/(1+exp(P2))
## [1] 0.003435269

What is the probability of being a high exempt school if the school is private and has 100 students enrolled? [2 pts]

The probability of a private school being a high exempt school with 100 students enrolled is 0.003%

Exploratory Data Analysis II

QUESTION 7 [10 pts] a. Let’s begin by exploring the data. How many years are there in concealed_carry data? How many observations per state? [2 pts]

concealed_carry
## # A tibble: 1,173 × 6
##    stateid statename  year   vio   mur shall
##      <dbl> <chr>     <dbl> <dbl> <dbl> <dbl>
##  1       1 Alabama    1977  414. 14.2      0
##  2       1 Alabama    1978  419. 13.3      0
##  3       1 Alabama    1979  413. 13.2      0
##  4       1 Alabama    1980  448. 13.2      0
##  5       1 Alabama    1981  470. 11.9      0
##  6       1 Alabama    1982  448. 10.6      0
##  7       1 Alabama    1983  416   9.20     0
##  8       1 Alabama    1984  431.  9.40     0
##  9       1 Alabama    1985  458.  9.80     0
## 10       1 Alabama    1986  558  10.1      0
## # … with 1,163 more rows
table(concealed_carry$statename)
## 
##              Alabama               Alaska              Arizona 
##                   23                   23                   23 
##             Arkansas           California             Colorado 
##                   23                   23                   23 
##          Connecticut             Delaware District of Columbia 
##                   23                   23                   23 
##              Florida              Georgia               Hawaii 
##                   23                   23                   23 
##                Idaho             Illinois              Indiana 
##                   23                   23                   23 
##                 Iowa               Kansas             Kentucky 
##                   23                   23                   23 
##            Louisiana                Maine             Maryland 
##                   23                   23                   23 
##        Massachusetts             Michigan            Minnesota 
##                   23                   23                   23 
##          Mississippi             Missouri              Montana 
##                   23                   23                   23 
##             Nebraska               Nevada        New Hampshire 
##                   23                   23                   23 
##           New Jersey           New Mexico      New York     NY 
##                   23                   23                   23 
##       North Carolina         North Dakota                 Ohio 
##                   23                   23                   23 
##             Oklahoma               Oregon         Pennsylvania 
##                   23                   23                   23 
##         Rhode Island       South Carolina         South Dakota 
##                   23                   23                   23 
##            Tennessee                Texas                 Utah 
##                   23                   23                   23 
##              Vermont             Virginia           Washington 
##                   23                   23                   23 
##        West Virginia            Wisconsin              Wyoming 
##                   23                   23                   23

There are 23 years from 1977-1999. Excluding state id and year, there are 3 observations per state for 23 years starting from 1977-1999. These 3 observations are violent crime rate per 100k people, murder rate per 100k people, and whether the state had concealed carry that year ( shall=1 means yes, shall=0 means no).

  1. How many states had concealed carry laws (shall) in 1977 and how many had concealed carry laws in 1999? [ 4pts]
states1977  =
  concealed_carry %>%
  filter(year == "1977", shall == "1")

states1977 
## # A tibble: 4 × 6
##   stateid statename      year   vio   mur shall
##     <dbl> <chr>         <dbl> <dbl> <dbl> <dbl>
## 1      18 Indiana        1977  311.  7.40     1
## 2      33 New Hampshire  1977  113.  3.20     1
## 3      50 Vermont        1977  150.  1.40     1
## 4      53 Washington     1977  375.  4.30     1
states1999  =
  concealed_carry %>%
  filter(year == "1999", shall == "1")
  
  states1999
## # A tibble: 29 × 6
##    stateid statename  year   vio   mur shall
##      <dbl> <chr>     <dbl> <dbl> <dbl> <dbl>
##  1       2 Alaska     1999  632.  8.60     1
##  2       4 Arizona    1999  551.  8        1
##  3       5 Arkansas   1999  425.  5.60     1
##  4      12 Florida    1999  854   5.70     1
##  5      13 Georgia    1999  534   7.5      1
##  6      16 Idaho      1999  245.  2        1
##  7      18 Indiana    1999  375.  6.60     1
##  8      21 Kentucky   1999  301.  5.40     1
##  9      22 Louisiana  1999  733. 10.7      1
## 10      23 Maine      1999  112.  2.20     1
## # … with 19 more rows

Only 4 states had concealed carry in 1977, compared to 29 states in 1999.

  1. Create a plot tracking the violent crime rate (vio) over time for states that have adopted conceal carry laws (shall) and those that have never adopted the law. [4 pts]

Graph 1:

#x = year
#y = vio
#color = vif (1= yes, 0 = no)


#let's have 2 groups per year. VIO=1, VIO=0. Let's have the average of all VIO per year for each of these 2 groups.

concealed_carry$year <- as.double(concealed_carry$year)

concealed_carry %>%
  group_by (shall, year) %>%
  summarize(meanvio = mean(vio)) %>% 
  ggplot() + geom_line(mapping =
aes(x = year, y = meanvio,
color = as.character(shall)), size = 5,
alpha = 0.8,
) + 
  ylab( "Average Violent crime rate per 100,000 people") + 
  xlab( "year") + 
  labs(title = "Differences in VIO between states that have adopted and have not adopted \n conceal carry laws") +
    scale_color_manual(name= "Shall", 
                      labels= c("Have not adopted carry law", "Have adopted carry law"),
                      values = c("antiquewhite3", "cyan1")) 

States that have not adopted concealed carry laws are more likely to have higher violent crime rates compared to states that have adoped concealed carry laws.

concealed_carry %>%
  group_by (shall, year) %>%
  summarize(sumvio = sum(vio)) %>% 
  ggplot() + geom_line(mapping =
aes(x = year, y = sumvio,
color = as.character(shall)), size = 5,
alpha = 0.8,
) + 
  ylab( "Total Violent crime rate per 100,000 people  ") + 
  xlab( "year") + 
  labs(title = "Differences in VIO between states that have adopted and have not adopted \n conceal carry laws") +
    scale_color_manual(name= "Shall", 
                      labels= c("Have not adopted carry law", "Have adopted carry law"),
                      values = c("antiquewhite3", "cyan1"))

Pooled Cross Sectional Analysis

QUESTION 8 [10 pts] Convert the violent crime rate (vio) into a logged variable (using the natural log), call it log_vio. [2 pts] This will be our dependent variable.

Run a pooled regression of the data (i.e., standard OLS model as if this was cross-sectional data) predicting the log of violent crimes (log_vio) as a function of the presence of concealed carry laws (shall) and a set of dummy variables for year.

[2 pts] Interpret the effect of shall. [2 pts]

In general terms, what do the year dummy variables tell us about crime trends? [2 pts] In our current specification of the model, is the effect of shall the same for all years? Why or why not? [2 pts]

pooled cross sectional data without interaction

concealed_carry$log.vio = log(concealed_carry$vio)

reg3 = lm(log.vio ~ shall + factor(year), data = concealed_carry)
summary(reg3)
## 
## Call:
## lm(formula = log.vio ~ shall + factor(year), data = concealed_carry)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.14538 -0.43698  0.05355  0.40556  1.68492 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       5.86727    0.08372  70.086  < 2e-16 ***
## shall            -0.59812    0.04446 -13.452  < 2e-16 ***
## factor(year)1978  0.04838    0.11829   0.409 0.682589    
## factor(year)1979  0.14372    0.11829   1.215 0.224631    
## factor(year)1980  0.18781    0.11829   1.588 0.112634    
## factor(year)1981  0.17746    0.11829   1.500 0.133838    
## factor(year)1982  0.14767    0.11829   1.248 0.212166    
## factor(year)1983  0.08999    0.11829   0.761 0.446945    
## factor(year)1984  0.10077    0.11829   0.852 0.394470    
## factor(year)1985  0.12826    0.11829   1.084 0.278473    
## factor(year)1986  0.20558    0.11832   1.738 0.082559 .  
## factor(year)1987  0.19364    0.11834   1.636 0.102056    
## factor(year)1988  0.24561    0.11837   2.075 0.038211 *  
## factor(year)1989  0.28168    0.11837   2.380 0.017490 *  
## factor(year)1990  0.41694    0.11849   3.519 0.000451 ***
## factor(year)1991  0.48647    0.11868   4.099 4.44e-05 ***
## factor(year)1992  0.52834    0.11883   4.446 9.59e-06 ***
## factor(year)1993  0.53923    0.11883   4.538 6.28e-06 ***
## factor(year)1994  0.51493    0.11883   4.333 1.60e-05 ***
## factor(year)1995  0.54706    0.11921   4.589 4.94e-06 ***
## factor(year)1996  0.54408    0.11983   4.540 6.21e-06 ***
## factor(year)1997  0.55553    0.12028   4.619 4.30e-06 ***
## factor(year)1998  0.49975    0.12028   4.155 3.50e-05 ***
## factor(year)1999  0.44010    0.12028   3.659 0.000265 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5973 on 1149 degrees of freedom
## Multiple R-squared:  0.161,  Adjusted R-squared:  0.1442 
## F-statistic: 9.585 on 23 and 1149 DF,  p-value: < 2.2e-16

Shall coefficient: Compared to states without a concealed carry law, states with concealed carry law on average had 45% less violent crime while holding year constant. This is stat. significant at the p<.001 level.

100*(exp(-0.59812) - 1)
## [1] -45.01556

Currently, it is the same. This assumes that the effect of whether the state had a concealed carry law or not is the same across all years which is not true. This should significantly alter concealed carry law. So let’s run an interaction.

First, let’s test for heteroskedasticity.

bptest(reg3) # not a problem. Let's use same standard errors.
## 
##  studentized Breusch-Pagan test
## 
## data:  reg3
## BP = 31.942, df = 23, p-value = 0.1014

Let’s run an interaction now.

reg4 = lm(log.vio ~ shall + factor(year)*shall , data = concealed_carry)
summary(reg4)
## 
## Call:
## lm(formula = log.vio ~ shall + factor(year) * shall, data = concealed_carry)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.14834 -0.40496  0.06386  0.40265  1.68479 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             5.86037    0.08721  67.196  < 2e-16 ***
## shall                  -0.51019    0.31141  -1.638 0.101640    
## factor(year)1978        0.04667    0.12334   0.378 0.705238    
## factor(year)1979        0.14204    0.12334   1.152 0.249720    
## factor(year)1980        0.18140    0.12334   1.471 0.141638    
## factor(year)1981        0.18449    0.12334   1.496 0.134985    
## factor(year)1982        0.16776    0.12401   1.353 0.176377    
## factor(year)1983        0.10646    0.12401   0.858 0.390819    
## factor(year)1984        0.11160    0.12401   0.900 0.368339    
## factor(year)1985        0.13811    0.12401   1.114 0.265630    
## factor(year)1986        0.27650    0.12542   2.204 0.027691 *  
## factor(year)1987        0.26127    0.12617   2.071 0.038615 *  
## factor(year)1988        0.28130    0.12696   2.216 0.026911 *  
## factor(year)1989        0.31794    0.12696   2.504 0.012408 *  
## factor(year)1990        0.45381    0.12951   3.504 0.000476 ***
## factor(year)1991        0.53436    0.13243   4.035 5.82e-05 ***
## factor(year)1992        0.57196    0.13461   4.249 2.33e-05 ***
## factor(year)1993        0.58199    0.13461   4.323 1.67e-05 ***
## factor(year)1994        0.54888    0.13461   4.077 4.87e-05 ***
## factor(year)1995        0.52475    0.13972   3.756 0.000182 ***
## factor(year)1996        0.44850    0.14801   3.030 0.002499 ** 
## factor(year)1997        0.41117    0.15445   2.662 0.007876 ** 
## factor(year)1998        0.36360    0.15445   2.354 0.018739 *  
## factor(year)1999        0.28321    0.15445   1.834 0.066971 .  
## shall:factor(year)1978  0.02192    0.44041   0.050 0.960320    
## shall:factor(year)1979  0.02138    0.44041   0.049 0.961286    
## shall:factor(year)1980  0.08167    0.44041   0.185 0.852918    
## shall:factor(year)1981 -0.08967    0.44041  -0.204 0.838696    
## shall:factor(year)1982 -0.22255    0.41982  -0.530 0.596142    
## shall:factor(year)1983 -0.18549    0.41982  -0.442 0.658696    
## shall:factor(year)1984 -0.12807    0.41982  -0.305 0.760380    
## shall:factor(year)1985 -0.11806    0.41982  -0.281 0.778596    
## shall:factor(year)1986 -0.55433    0.39519  -1.403 0.160983    
## shall:factor(year)1987 -0.47511    0.38727  -1.227 0.220149    
## shall:factor(year)1988 -0.25106    0.38107  -0.659 0.510134    
## shall:factor(year)1989 -0.25434    0.38107  -0.667 0.504619    
## shall:factor(year)1990 -0.21531    0.36870  -0.584 0.559351    
## shall:factor(year)1991 -0.22730    0.36158  -0.629 0.529714    
## shall:factor(year)1992 -0.19813    0.35850  -0.553 0.580608    
## shall:factor(year)1993 -0.19552    0.35850  -0.545 0.585594    
## shall:factor(year)1994 -0.16910    0.35850  -0.472 0.637241    
## shall:factor(year)1995 -0.01699    0.35485  -0.048 0.961829    
## shall:factor(year)1996  0.11308    0.35359   0.320 0.749175    
## shall:factor(year)1997  0.17808    0.35434   0.503 0.615361    
## shall:factor(year)1998  0.16363    0.35434   0.462 0.644325    
## shall:factor(year)1999  0.20010    0.35434   0.565 0.572375    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5979 on 1127 degrees of freedom
## Multiple R-squared:  0.1755, Adjusted R-squared:  0.1425 
## F-statistic:  5.33 on 45 and 1127 DF,  p-value: < 2.2e-16
  1. The effect of shall in 1977 is no longer significant even though this was significant without the interaction model.

  2. The effect of conceal carry laws(shall) on violent crime rates declines over time ( because the % change in VIO goes up instead of going down). As more states adopt conceal carry laws, the percentage of violent crime rate goes up instead of going down each year starting from 1986 but this is not statistically significant. This can be because of other factors, not just the effect of shall. The effect was fairly good from 1978 to 1986 because VIO went down but again, this is not significant. Moreover, Shall and the interaction are not statistically significant.