PAF 573
YOUR NAME Elaine MacPherson
library(tidyverse)
library(stargazer)
library(jtools)
library(GGally)
# read in data
URL <- "https://raw.githubusercontent.com/spiromar/files/main/paf573/data-crime-levitt.csv"
crime <- read.csv( URL )
What was Detroit’s murder rate in 1991?
ANSWER: 59.34884
What was Omaha’s murder rate in 1991?
ANSWER: 10.32488
Did Portland have a mayoral election in 1976? How do you know?
ANSWER: yes
Is Mesa’s mayoral election held every two years or every four
years?
ANSWER: every two years
What is the average (mean) murder rate across all of the cities in
Levitt’s sample? What is the lowest murder rate recorded? The highest
rate?
ANSWER: average: 18.7602, lowest: .6496, highest:
80.6020
What is the average murder rate across all of the cities in the first year of Levitt’s study period (1970)? 16.494 ANSWER:
What is the average murder rate in the last year of Levitt’s study
period (1992)?
ANSWER: 22.605
What is Detroit’s murder rate in 1970 and 1992?
ANSWER: 1970: 32.69485, 1992: 56.98535
Reproduce the summary statistics presented in Table 1 of Levitt’s
paper. Focus only on recreating the results in the mean, across cities
standard deviation, minimum, and maximum columns. Note that some of your
figures will deviate slightly from those in Table 1. This is due to some
minor changes that Levitt made to his dataset. (Hint: Use stargazer
after selecting the appropriate columns)
ANSWER:
===================================================================
Statistic N Mean St. Dev. Min Max
——————————————————————– citypop 1,353 718,042.500 1,039,385.000
85,000.000 7,896,000.000 violent 1,335 1,158.669 684.721 103.333
4,352.834
property 1,342 7,747.592 2,112.436 2,707.286 16,739.040
sworn 1,342 237.120 98.891 69.904 781.018
x_black 1,357 23.085 17.968 0.100 78.220
x_femhea 1,357 14.957 4.298 5.800 31.860
x_a15_24 1,357 0.171 0.021 0.115 0.253
x_welfare 1,357 255.159 126.021 33.493 847.743
x_education 1,357 765.230 122.939 445.916 1,193.437
x_unemp 1,355 0.067 0.020 0.020 0.155
——————————————————————–
Tabulate the mayoral election year variable. What do the “0” and “1”
mean? Interpret the data. For what fraction of the observations did
cities hold a mayoral election?
ANSWER: 0 means it was not an election year, while 1
means It was an election year. The proportion is .2991894, so roughly
30% of the time, cities held an election.
Produce an analogous set of results and interpretations for the
gubernatorial election year variable, governor_t.
ANSWER: The mean for gubernatorial elections is
.2608696, or about 26% of the time.
Sometimes we want to tabulate the frequences by more than one categorical variable …
# Re-executing code that was already done within the question:
# code here
crime %>% group_by(mayor_t,governor_t) %>% tally()
What is the output of this command telling you? Explain by describing the contents of one row in the table it produced.
ANSWER: This is telling us how many times the following scenarios happened: • There were neither mayoral nor gubernatorial races (640 times) • There were only gubernatorial races (311 times) • There were only mayoral races (363 times) • There were both mayoral and gubernatorial races (43 times). In the first row, both of the factors were zero, indicating neither election.
When you are not using R code, you can just type your answer directly into the file like I have with this sentence.
We will focus first on the relationship between murder and the size
of the police force of a city …
Examine the output it produces. Does the relationship the between the
murder rate in a city and the number of sworn officers make sense to
you?
ANSWER: I’m not sure why both the murder and sworn
distributions appear to be left-skewed while the actually regression
line in the lower-left quadrant appears to be going in the other
direction. The correlation is .571, which makes sense based on the basic
shape of the regression line (there is some discernible relationship,
but not a super strong one). If we were to overlay the murder and sworn
graphs, they would match pretty closely. So there does appear to be a
relationship.
To investigate a little more, let’s split Levitt’s sample of city-years into two groups—cities with large police forces and cities with small police forces—and compare various crime rates across the two groups …
How many observations are categorized as “small”? How many as “large”? Does this makes sense? Do you have the same number of `NA’s before and after?
ANSWER: There are 671 small and 671 large, which does make sense when you consider that there are 15 “NA” to consider. The median would imply that there would be exactly half above and below, so them being split evenly makes sense. When you multiply 671 by 2 and ad 15, this gives 1357 – which is the total number of the dataset.
Compare the murder distributions of the large and small police forces
graphically …
Eyeball the differences between the distributions, then get the exact
means of each distribution by piping the data to the
group_by and summarize functions like we did
in the in-class assignment. Which murder rate is larger and by how much?
Does this conform to your expectations about the effect of police on
crime? In your opinion, what might be driving the differential crime
rates?
ANSWER: I did not successfully run this code, I got stuck. All it said was, “dplyr:::group_by.data.frame(.,sworn_small)
But hold on. Perhaps the difference we are seeing is just due to
chance. We can run a regression to check and see how likely it would be
to observe a sample difference of this size even if there really were no
difference between the means of the underlying populations (the null
hypothesis). Run a regression to estimate the difference in mean murder
rates between cities with large and small police forces. (Save the
output of lm as an object called m1). What is
the difference in mean murder rates? Is it statistically significant
from zero? How can you tell?
ANSWER:The difference is very small: .069 in the first and .07 in the second. Both are at a p-value of < 2e-16 which is so low in chance that we can reject the null hypothesis. (see below)
summary(model)
Call: lm(formula = murder ~ sworn, data = crime)
Residuals: Min 1Q Median 3Q Max -21.257 -6.632 -2.388 5.788 49.124
Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.209985 0.702167 3.147 0.00168 ** sworn 0.069277 0.002731
25.370 < 2e-16 *** — Signif. codes:
0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 9.883 on 1330 degrees of freedom (25 observations deleted due to missingness) Multiple R-squared: 0.3261, Adjusted R-squared: 0.3256 F-statistic: 643.6 on 1 and 1330 DF, p-value: < 2.2e-16
m1 <- lm(murder ~ sworn, data = crime) summ(m1) MODEL INFO: Observations: 1332 (25 missing obs. deleted) Dependent Variable: murder Type: OLS linear regression
MODEL FIT: F(1,1330) = 643.63, p = 0.00 R² = 0.33 Adj. R² = 0.33
| Est. | S.E. | t val. | p | |
|---|---|---|---|---|
| (Intercept) | 2.21 | 0.70 | 3.15 | 0.00 |
| sworn | 0.07 | 0.00 | 25.37 | 0.00 |
Use the regression output to estimate the mean murder rate conditional on being a city with a large police force. Do this by hand.
ANSWER: I don’t understand which values to insert for n (is it 1357?), Xi (doesn’t this just refer to sworn_small, how do I enter this numerically?) and yi (again, isn’t this just murder?). Admittedly, I need to refresh my statistics knowledge but I am of course doing this assignment at 7:13 pm and did not give myself sufficient time to do so.
A reliable and repeatable way to do this that will come in handy when
you have more complicated regressions is to use the information
contained in the m1 object you saved off. You can access
the coefficients using the coef function and then use the
results as any other variable.
Use that approach to calculate the mean murder rate for cities with small police forces.
ANSWER:This did not work for me, I got NA for both
(see below) - I replaced the ) with a 1 (because that dummy variable
would make it for small cities, right?) > # view all coefficients
> coef(m1) (Intercept) sworn 2.2099851 0.0692766 >
> # sview pecific coefficients > coef(m1)[“sworn_small”]
> # calculate conditional mean by plugging in values to > # the
regression equation > coef(m1)[“(Intercept)”] +
coef(m1)[“sworn_small”](0) (Intercept) NA > # calculate
conditional mean by plugging in values to > # the regression equation
> coef(m1)[“(Intercept)”] + coef(m1)[“sworn_small”](1)
(Intercept) NA
The number of sworn police officers in the first column of the table
below correspond to the 25th, 50th, and 75th percentiles of the
sworn distribution …
ANSWER: conditional mean is .07
| Number of Sworn Police Officers | Conditional Mean of Murder Rate (1970) |
|---|---|
| 147.96 (25th percentile) | ? |
| 188.25 (50th percentile) | ? |
| 249.69 (75th percentile) | ? |
Your goal in this problem is to use regression to fill out the table above. Get started by saving off a temporary dataset that includes only the 1970 observations that you will use in your regression:
workingData <- crime %>% filter(year==70)
model <- lm(murder ~ sworn, data = crime) m2 <- lm(murder ~ sworn, data = crime) summ(m2)
The goal of this question is to visualize how well our regression
line does at predicting means conditional on particular values of
sworn…
Plot the conditional mean values for the 25th, 50th, and 75th
percentiles of sworn on your graph as red squares [i.e.,
plot the points (sworn.p25,sworn.p25.cm),(sworn.p50,sworn.p50.cm), and
(sworn.p75,sworn.p75.cm) ) on the graph]. Where do those points fall
with respect to the regression line?
ANSWER: I didn’t do this right. This is the error
message i got: Error in e1 %+% e2: ! Cannot add +.gg(…) 2. e1 %+% e2
Use regression to complete the following table of conditional means based on those percentiles (i.e., each cell contains the mean murder rate for that cell). Hint: You will need to add a variable to the regression.
sworn.p25.cm <- crime %>% filter(sworn > (sworn.p25 - 5) & sworn < (sworn.p25 + 5) ) %>% pull(murder) %>% mean(na.rm=TRUE) sworn.p50.cm <- crime %>% filter(sworn > (sworn.p50 - 5) & sworn < (sworn.p50 + 5) ) %>% pull(murder) %>% mean(na.rm=TRUE) sworn.p75.cm <- crime %>% filter(sworn > (sworn.p75 - 5) & sworn < (sworn.p75 + 5) ) %>% pull(murder) %>% mean(na.rm=TRUE)
ANSWER:ooof. I really don’t know. Would love to have someone walk me through.
Expected Value of Murder Rate (1970), by Number of Sworn Officers and Percent Female Headed Household
| Nbr Sworn | % Fem Head HH | ||
|---|---|---|---|
| _ | 25th (12.4) | 50th (14.1) | 75th (17.5) |
| 25th (147.96) | ? | ? | ? |
| 50th (188.25) | ? | ? | ? |
| 75th (249.69) | ? | ? | ? |
# code here
# code here