The cities.csv datafile contains hypothetical data on all cities in the U.S. with population greater than 100,000. The following variables are included in this dataset:
- size: number of inhabitants (in tens of thousands)
- pov: percent of inhabitants living below official poverty line
- unemp: percent of inhabitants unemployed
- police: number of full time police officers (in hundreds)
- crime: number of serious crimes during year (in thousands)
- budget: amount of city budget surplus or deficit (in tens of millions of dollars. Note: positive numbers represent surplus, negative numbers deficit.)
Using these data, your goal is to explore how a city’s budget surplus or deficit is affected by its size and the poverty of the residents.
- Start by simply exploring a multiple regression model in which budget is regressed on size and pov. Test whether the set of both predictors significantly predicts budget and whether each predictor, controlling for the other, is also significant. Interpret the resulting parameter estimates. Write a brief news summary of your conclusions.
size.c <- cities$size - mean(cities$size)
pov.c <- cities$pov - mean(cities$pov)
m.a1 <- lm(budget ~ size.c + pov.c, data = cities)
mcSummary(m.a1)## lm(formula = budget ~ size.c + pov.c, data = cities)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 226.938 2 113.469 0.46 72.915 0
## Error 266.108 171 1.556
## Corr Total 493.046 173 2.850
##
## RMSE AdjEtaSq
## 1.247 0.454
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) -1.795 0.095 -18.976 560.392 0.678 NA -1.981 -1.608 0
## size.c -0.047 0.006 -7.518 87.953 0.248 0.952 -0.059 -0.035 0
## pov.c -0.144 0.019 -7.574 89.267 0.251 0.952 -0.182 -0.107 0
m.c1 <- lm(budget ~ 1, data = cities)
mcSummary(m.c1)## lm(formula = budget ~ 1, data = cities)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 0.000 0 Inf 0
## Error 493.046 173 2.85
## Corr Total 493.046 173 2.85
##
## RMSE AdjEtaSq
## 1.688 0
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) -1.795 0.128 -14.022 560.392 0.532 NA -2.047 -1.542 0
modelCompare(m.c1, m.a1)## SSE (Compact) = 493.0462
## SSE (Augmented) = 266.1082
## Delta R-Squared = 0.4602773
## Partial Eta-Squared (PRE) = 0.4602773
## F(2,171) = 72.91467, p = 1.260653e-23
In this study, we wanted to know if the number of inhabitants (in tens of thousands) and percent of inhabitants living below official poverty line predicted city budget for cities with a population over 100,000. We found that for a city of average size, a one percent increase in poverty rate corresponded with a -.0.144 statistically significant change in budget; b = -0.14, p <.001, t(171) = -7.57, PRE = .251, 95% CI = [-0.18,-0.11]. Findings also revealed that for a city of average poverty rate, an increase of one unit (tens of thousands) in size corresponds with a -0.047 statistically significant change in budget; b = -0.05, p <.001, t(171) = -7.52, PRE = .248, 95% CI = [-0.06,-0.04].
- It seems reasonable to think that budget problems of larger cities are more severe than such problems in smaller cities. This might happen in two ways. First of all, bigger cities may simply have smaller budget surplus or larger deficits. The analysis you did in part A should help you address this issue. Secondly, the things that cause budget problems in cities, such as a very poor population, may play a larger role in larger cities. Thus the problems that confront cities (such as having a poor taxbase) may be made more severe as the city grows larger. To examine this second possibility, you should examine whether the effects of pov on budget are larger as size increases in value. To answer this question you will need to create a new variable that is the product of pov and size and then regress budget on pov, size, and their product. Test whether the interaction is significant (you can do this either by creating a new variable manually or by creating the product term directly in the lm() function). Interpret all of the parameter estimates in this model. Write a brief news summary of what you conclude from this model.
m.a2 <- lm(budget ~ size + pov + size*pov, data = cities)
mcSummary(m.a2)## lm(formula = budget ~ size + pov + size * pov, data = cities)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 233.080 3 77.693 0.473 50.806 0
## Error 259.967 170 1.529
## Corr Total 493.046 173 2.850
##
## RMSE AdjEtaSq
## 1.237 0.463
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 0.819 1.298 0.631 0.609 0.002 NA -1.743 3.382 0.529
## size 0.018 0.033 0.536 0.439 0.002 0.034 -0.047 0.082 0.593
## pov -0.039 0.056 -0.700 0.749 0.003 0.109 -0.149 0.071 0.485
## size:pov -0.003 0.001 -2.004 6.142 0.023 0.023 -0.005 0.000 0.047
In this study, we wanted to know if the if there was an effect between number of inhabitants (in tens of thousands) and percent of inhabitants living below official poverty line in predicting city budget for cities with a population over 100,000. We found that for a city size of zero, a one percent increase in poverty rate corresponded with a -0.04 non-statistically significant change in budget; b = -0.04, p = 0.485, t(170) = -0.70, PRE = .003, 95% CI = [-0.15, 0.07]. Findings also revealed that for a city with a poverty rate of zero, an increase of one unit (tens of thousands) in size corresponds with a -0.04 non-statistically significant change in budget; b = 0.02, p = .593, t(170) = 0.54, PRE = .002, 95% CI = [-0.05, 0.08]. Importantly, we found that as poverty rate increases by 1, the size to budget relationship statistically significantly changes by -0.003. The reverse is also true for city size; b = -0.003, p = 0.047, t(170) = -2.00, PRE = .023, 95% CI = [-0.01, 0.00]. Lastly, when size and poverty rate are zero, budget is predicted to be 0.82; b = 0.82, p = 0.53, t(170) = 0.63, PRE = .002, 95% CI = [-1.74, 3.38].
- Redo the analysis in B this time putting both pov and size in mean centered form and recomputing their product. Demonstrate to yourself that the test of the interaction yields the same result as in B. Provide new interpretations of the parameter estimates in this model
size.c <- cities$size - mean(cities$size)
pov.c <- cities$pov - mean(cities$pov)
m.a3 <- lm(budget ~ size.c + pov.c + size.c*pov.c, data = cities)
mcSummary(m.a3)## lm(formula = budget ~ size.c + pov.c + size.c * pov.c, data = cities)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 233.080 3 77.693 0.473 50.806 0
## Error 259.967 170 1.529
## Corr Total 493.046 173 2.850
##
## RMSE AdjEtaSq
## 1.237 0.463
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) -1.747 0.097 -18.077 499.740 0.658 NA -1.938 -1.557 0.000
## size.c -0.044 0.006 -6.895 72.691 0.219 0.899 -0.056 -0.031 0.000
## pov.c -0.144 0.019 -7.615 88.674 0.254 0.952 -0.181 -0.107 0.000
## size.c:pov.c -0.003 0.001 -2.004 6.142 0.023 0.940 -0.005 0.000 0.047
In this study, we wanted to know if the if there was an effect between number of inhabitants (in tens of thousands) and percent of inhabitants living below official poverty line in predicting city budget for cities with a population over 100,000. We found that for a city of average size, a one percent increase in poverty rate corresponded with a -0.14 statistically significant change in budget; b = -0.14, p <.001, t(170) = -7.62, PRE = .254, 95% CI = [-0.18, -0.11]. Findings also revealed that for a city of average poverty rate, an increase of one unit (tens of thousands) in size corresponds with a -0.04 statistically significant change in budget; b = -0.04, p <.001, t(170) = -6.90, PRE = .219, 95% CI = [-0.06,-0.03]. Importantly, we found that as poverty rate increases by 1, the size to budget relationship statistically significantly changes by -0.003. The reverse is also true for city size; b = -0.003, p = 0.047, t(170) = -2.00, PRE = .023, 95% CI = [-0.01, 0.00]. Lastly, when size and poverty rate are zero, budget is predicted to be -1.75; b = -1.75, p < .001, t(170) = -18.01, PRE = 0.658, 95% CI = [-1.94,-1.56].
- You are particularly interested in the relationship between pov and budget for cities of 500,000 in population. In the context of the interactive model, estimate and test the simple relationship between pov and budget in cities where size equals 50 (500,000 inhabitants). To do this, you will need to deviate size from 50 and then recompute the product. You can either do this with pov mean deviated or not. Again, interpret the resulting parameter estimates and write a conclusion about the simple relationship between pov and budget for cites with half a million residents.
size.50 <- (cities$size - 500)
pov.c <- cities$pov - mean(cities$pov)
m.a4 <- lm(budget ~ size.50 + pov.c + size.50*pov.c, data = cities)
mcSummary(m.a4)## lm(formula = budget ~ size.50 + pov.c + size.50 * pov.c, data = cities)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 233.080 3 77.693 0.473 50.806 0
## Error 259.967 170 1.529
## Corr Total 493.046 173 2.850
##
## RMSE AdjEtaSq
## 1.237 0.463
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) -22.006 2.945 -7.471 85.356 0.247 NA -27.820 -16.191 0.000
## size.50 -0.044 0.006 -6.895 72.691 0.219 0.899 -0.056 -0.031 0.000
## pov.c -1.402 0.628 -2.233 7.624 0.028 0.001 -2.641 -0.163 0.027
## size.50:pov.c -0.003 0.001 -2.004 6.142 0.023 0.001 -0.005 0.000 0.047
In this study, we wanted to know if the if there was an effect between number of inhabitants (in tens of thousands) and percent of inhabitants living below official poverty line in predicting city budget for cities with a population over 500,000. We found that for a city of average size, a one percent increase in poverty rate corresponded with a -0.18 statistically significant change in budget; b = -0.18, p <.001, t(170) = -7.18, PRE = .232, 95% CI = [-0.22, -0.13]. Findings also revealed that for a city of average poverty rate, an increase of one unit (tens of thousands) in size corresponds with a -0.04 statistically significant change in budget; b = -0.04, p <.001, t(170) = -6.90, PRE = .219, 95% CI = [-0.06,-0.03]. Importantly, we found that as poverty rate increases by 1 percent, the size to budget relationship statistically significantly changes by -0.003. The reverse is also true for city size; b = -0.003, p = .047, t(170) = -2.00, PRE = .023, 95% CI = [-0.01, 0.00]. Lastly, when size and poverty rate are zero, budget is predicted to be -2.26; b = -2.26, p = .047, t(170) = -18.06, PRE = 0.658, 95% CI = [-2.50,-2.01].
- Based on the analyses you have done so far, draw a graph of the predicted simple relationship between budget and pov at different levels of size, plotting the simple relationship at multiple values of size (include lines for when size equals its mean and when size equals 50, as well as two additional values - you will need to figure out the simple intercepts and slopes at these other values). You can do this any way you’d like. I’d suggest either by using a prebuilt package in R (such as ‘sjPlot’ or ‘effects’) and telling the function what levels to plot, or you can hand calculate the simple slopes by plugging in values to your regression equation and directly specifying the lines in R. See the Chapt 7 R code for examples of ways to do this with ggplot.
setwd("~/Desktop/Grad Stats/Week 15")
plot(predictorEffects(m.a2))plot(predictorEffects(m.a3))plot(predictorEffects(m.a4))