PAF 573
Elaine MacPherson
Run a regression that estimates the difference in the mean number of vandalism calls in 2018 between the lots that were greened and those that were not.
# put regression here:
## Est. S.E. t val. p
##----------------- --------- ------- -------- ------
##(Intercept) 1434.09 15.90 90.19 0.00
##greened 97.77 22.50 4.35 0.00
##y18 -78.89 31.80 -2.48 0.01
##greened:y18 -162.44 45.00 -3.61 0.00
##---------------------------------------------------What are some reasons why the difference you estimated might be biased upward? What are some reasons that it might be biased downward?
##**The difference estimated (-162.44) could be biased downward because there is a relationship between areas that are greened and those that don’t get greenery. Thus, with the presence of green areas, there is a direct effect of less calls for vandalism. However - there might be characteristics that aren’t measurable that could be related. These related factors might not be easily measured - such as a new marketing campaign that starts for reporting vandalism, or activism related to vandalism in spaces related to other sociopolitical events.
Create a plot of the mean vandalism calls vs year, breaking out the means by whether or not the lot was in the greened group:
ggplot(lots, aes(x=year, y=vcalls, group=greened, color=as.factor(greened))) +
stat_summary(fun = "mean", geom = "point") +
stat_summary(fun = "mean", geom = "line")##What does this plot imply about which lots were selected for greening? What does it tell you about whether or not the greening had an effect on vandalism?
##ANSWER:The scatterplot implies that the lots selected for greening were the ones that had higher amounts of calls for vandalism to start. The teal line is the factor we created to demonstrate lots chosen for greening, and that line has points that each start about 100 calls above the corresponding points for lots that were not receiving calls about vandalizing. The line also drops off abruptly between 2017 and 2018 - indicatong that something happened during this year that caused a large effect.
Estimate the effect of greening on vandalism by calculating a difference in difference estimate by hand. Use the year right before (2017) and the year right after the greening (2018).
ANSWER:
##OK so I definitely didn’t do this right becuase why are these numbers
so big? The math based on what was generated is:
(1291-1490) - (1355-1391) = -163
Write down an equation for an interaction model that will give you the same information as in Q2.2.
ANSWER:
##The basic explanation of how to use an interaction model is to replicate the function of “(TPost-CPost)-(TPre-CPre) which can in this function be expressed as,”vcalls=B0+B1(greened)+B2(year)+B3(greened+year)
Estimate the model you wrote down in Q3.2. What is the differnce-in-differences estimate of the effect of greening on vandalism? Is it statistically significant?
ANSWER:WAIT SO THE LAST LINE OF THIS ACTUALLY MATCHES THE MATH I DID ABOVE, ##DOES THAT MEAN I DID IT CORRECTLY??!?!?!? Here it’s -162.44, ##but what I calcuated above was -163. It IS statistically significant - p-value ## is very small.
put regression here:
Est. S.E. t val. p
##—————– ——— ——- ——– —— ##(Intercept) 1434.09 15.90 90.19 0.00 ##greened 97.77 22.50 4.35 0.00 ##y18 -78.89 31.80 -2.48 0.01 ##greened:y18 -162.44 45.00 -3.61 0.00
Interpret the other coefficients of the model, including the intercept.
ANSWER: ##With each passing year (or, in other words, when the year passed from 2017-2018) the likelihood of vandalism calls when down by 162 times. I do not understand how to interpret the intercept - is it that when no time passes, there is a 1,434 growth in the chance of vandalism calls due to lack of greening? Greening is a non-varying event that makes it hard to understand from year to year.