HW1 Answers

PAF 573

YOUR NAME Elaine MacPherson

load libraries and read in data

library(tidyverse)
library(stargazer)
library(jtools)
library(GGally)


# read in data
URL <- "https://raw.githubusercontent.com/spiromar/files/main/paf573/data-crime-levitt.csv"
crime <- read.csv( URL )

Problem 1

1a

What was Detroit’s murder rate in 1991?
ANSWER: 59.34884

1b

What was Omaha’s murder rate in 1991?
ANSWER: 10.32488

1c

Did Portland have a mayoral election in 1976? How do you know?
ANSWER: yes

1d

Is Mesa’s mayoral election held every two years or every four years?
ANSWER: every two years

1e

What is the average (mean) murder rate across all of the cities in Levitt’s sample? What is the lowest murder rate recorded? The highest rate?
ANSWER: average: 18.7602, lowest: .6496, highest: 80.6020

1f

What is the average murder rate across all of the cities in the first year of Levitt’s study period (1970)? 16.494 ANSWER:

1g

What is the average murder rate in the last year of Levitt’s study period (1992)?
ANSWER: 22.605

1k

What is Detroit’s murder rate in 1970 and 1992?
ANSWER: 1970: 32.69485, 1992: 56.98535

1i

Reproduce the summary statistics presented in Table 1 of Levitt’s paper. Focus only on recreating the results in the mean, across cities standard deviation, minimum, and maximum columns. Note that some of your figures will deviate slightly from those in Table 1. This is due to some minor changes that Levitt made to his dataset. (Hint: Use stargazer after selecting the appropriate columns)
ANSWER: =================================================================== Statistic N Mean St. Dev. Min Max
——————————————————————– citypop 1,353 718,042.500 1,039,385.000 85,000.000 7,896,000.000 violent 1,335 1,158.669 684.721 103.333 4,352.834
property 1,342 7,747.592 2,112.436 2,707.286 16,739.040
sworn 1,342 237.120 98.891 69.904 781.018
x_black 1,357 23.085 17.968 0.100 78.220
x_femhea 1,357 14.957 4.298 5.800 31.860
x_a15_24 1,357 0.171 0.021 0.115 0.253
x_welfare 1,357 255.159 126.021 33.493 847.743
x_education 1,357 765.230 122.939 445.916 1,193.437
x_unemp 1,355 0.067 0.020 0.020 0.155
——————————————————————–

Problem 2

2a

Tabulate the mayoral election year variable. What do the “0” and “1” mean? Interpret the data. For what fraction of the observations did cities hold a mayoral election?
ANSWER: 0 means it was not an election year, while 1 means It was an election year. The proportion is .2991894, so roughly 30% of the time, cities held an election.

2b

Produce an analogous set of results and interpretations for the gubernatorial election year variable, governor_t.
ANSWER: The mean for gubernatorial elections is .2608696, or about 26% of the time.

2c

Sometimes we want to tabulate the frequences by more than one categorical variable …

# Re-executing code that was already done within the question: 
# code here

crime %>% group_by(mayor_t,governor_t) %>% tally()

What is the output of this command telling you? Explain by describing the contents of one row in the table it produced.

ANSWER: This is telling us how many times the following scenarios happened: • There were neither mayoral nor gubernatorial races (640 times) • There were only gubernatorial races (311 times) • There were only mayoral races (363 times) • There were both mayoral and gubernatorial races (43 times). In the first row, both of the factors were zero, indicating neither election.

When you are not using R code, you can just type your answer directly into the file like I have with this sentence.

Problem 3

3a

We will focus first on the relationship between murder and the size of the police force of a city …

Examine the output it produces. Does the relationship the between the murder rate in a city and the number of sworn officers make sense to you?
ANSWER: I’m not sure why both the murder and sworn distributions appear to be left-skewed while the actually regression line in the lower-left quadrant appears to be going in the other direction. The correlation is .571, which makes sense based on the basic shape of the regression line (there is some discernible relationship, but not a super strong one). If we were to overlay the murder and sworn graphs, they would match pretty closely. So there does appear to be a relationship.

3b

To investigate a little more, let’s split Levitt’s sample of city-years into two groups—cities with large police forces and cities with small police forces—and compare various crime rates across the two groups …

How many observations are categorized as “small”? How many as “large”? Does this makes sense? Do you have the same number of `NA’s before and after?

ANSWER: There are 671 small and 671 large, which does make sense when you consider that there are 15 “NA” to consider. The median would imply that there would be exactly half above and below, so them being split evenly makes sense. When you multiply 671 by 2 and ad 15, this gives 1357 – which is the total number of the dataset.

3c

Compare the murder distributions of the large and small police forces graphically …

Eyeball the differences between the distributions, then get the exact means of each distribution by piping the data to the group_by and summarize functions like we did in the in-class assignment. Which murder rate is larger and by how much? Does this conform to your expectations about the effect of police on crime? In your opinion, what might be driving the differential crime rates?

ANSWER: I did not successfully run this code, I got stuck. All it said was, “dplyr:::group_by.data.frame(.,sworn_small)

3d

But hold on. Perhaps the difference we are seeing is just due to chance. We can run a regression to check and see how likely it would be to observe a sample difference of this size even if there really were no difference between the means of the underlying populations (the null hypothesis). Run a regression to estimate the difference in mean murder rates between cities with large and small police forces. (Save the output of lm as an object called m1). What is the difference in mean murder rates? Is it statistically significant from zero? How can you tell?

ANSWER:The difference is very small: .069 in the first and .07 in the second. Both are at a p-value of < 2e-16 which is so low in chance that we can reject the null hypothesis. (see below)

summary(model)

Call: lm(formula = murder ~ sworn, data = crime)

Residuals: Min 1Q Median 3Q Max -21.257 -6.632 -2.388 5.788 49.124

Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.209985 0.702167 3.147 0.00168 ** sworn 0.069277 0.002731 25.370 < 2e-16 *** — Signif. codes:
0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 9.883 on 1330 degrees of freedom (25 observations deleted due to missingness) Multiple R-squared: 0.3261, Adjusted R-squared: 0.3256 F-statistic: 643.6 on 1 and 1330 DF, p-value: < 2.2e-16

m1 <- lm(murder ~ sworn, data = crime) summ(m1) MODEL INFO: Observations: 1332 (25 missing obs. deleted) Dependent Variable: murder Type: OLS linear regression

MODEL FIT: F(1,1330) = 643.63, p = 0.00 R² = 0.33 Adj. R² = 0.33

Standard errors: OLS

Est. S.E. t val. p
(Intercept) 2.21 0.70 3.15 0.00
sworn 0.07 0.00 25.37 0.00

3e

Use the regression output to estimate the mean murder rate conditional on being a city with a large police force. Do this by hand.

ANSWER: I don’t understand which values to insert for n (is it 1357?), Xi (doesn’t this just refer to sworn_small, how do I enter this numerically?) and yi (again, isn’t this just murder?). Admittedly, I need to refresh my statistics knowledge but I am of course doing this assignment at 7:13 pm and did not give myself sufficient time to do so.

3f

A reliable and repeatable way to do this that will come in handy when you have more complicated regressions is to use the information contained in the m1 object you saved off. You can access the coefficients using the coef function and then use the results as any other variable.

Use that approach to calculate the mean murder rate for cities with small police forces.

ANSWER:This did not work for me, I got NA for both (see below) - I replaced the ) with a 1 (because that dummy variable would make it for small cities, right?) > # view all coefficients > coef(m1) (Intercept) sworn 2.2099851 0.0692766 >
> # sview pecific coefficients > coef(m1)[“sworn_small”] NA > coef(m1)[“(Intercept)”] (Intercept) 2.209985 >
> # calculate conditional mean by plugging in values to > # the regression equation > coef(m1)[“(Intercept)”] + coef(m1)[“sworn_small”](0) (Intercept) NA > # calculate conditional mean by plugging in values to > # the regression equation > coef(m1)[“(Intercept)”] + coef(m1)[“sworn_small”](1) (Intercept) NA

Problem 4

4a

The number of sworn police officers in the first column of the table below correspond to the 25th, 50th, and 75th percentiles of the sworn distribution …

ANSWER: conditional mean is .07

Number of Sworn Police Officers Conditional Mean of Murder Rate (1970)
147.96 (25th percentile) ?
188.25 (50th percentile) ?
249.69 (75th percentile) ?

Your goal in this problem is to use regression to fill out the table above. Get started by saving off a temporary dataset that includes only the 1970 observations that you will use in your regression:

workingData <- crime %>% filter(year==70)

model <- lm(murder ~ sworn, data = crime) m2 <- lm(murder ~ sworn, data = crime) summ(m2)

4b

The goal of this question is to visualize how well our regression line does at predicting means conditional on particular values of sworn

Plot the conditional mean values for the 25th, 50th, and 75th percentiles of sworn on your graph as red squares [i.e., plot the points (sworn.p25,sworn.p25.cm),(sworn.p50,sworn.p50.cm), and (sworn.p75,sworn.p75.cm) ) on the graph]. Where do those points fall with respect to the regression line?

ANSWER: I didn’t do this right. This is the error message i got: Error in e1 %+% e2: ! Cannot add objects together ℹ Did you forget to add this object to a object? Backtrace: 1. GGally:::+.gg(…) 2. e1 %+% e2

4c

Use regression to complete the following table of conditional means based on those percentiles (i.e., each cell contains the mean murder rate for that cell). Hint: You will need to add a variable to the regression.

sworn.p25.cm <- crime %>% filter(sworn > (sworn.p25 - 5) & sworn < (sworn.p25 + 5) ) %>% pull(murder) %>% mean(na.rm=TRUE) sworn.p50.cm <- crime %>% filter(sworn > (sworn.p50 - 5) & sworn < (sworn.p50 + 5) ) %>% pull(murder) %>% mean(na.rm=TRUE) sworn.p75.cm <- crime %>% filter(sworn > (sworn.p75 - 5) & sworn < (sworn.p75 + 5) ) %>% pull(murder) %>% mean(na.rm=TRUE)

ANSWER:ooof. I really don’t know. Would love to have someone walk me through.

Expected Value of Murder Rate (1970), by Number of Sworn Officers and Percent Female Headed Household

Nbr Sworn % Fem Head HH
_ 25th (12.4) 50th (14.1) 75th (17.5)
25th (147.96) ? ? ?
50th (188.25) ? ? ?
75th (249.69) ? ? ?
# code here
# code here