codes = c("testNoArts", "testJan", "testJanMethod")
noArt = c(3/4, 0, 0)
ArtificialsinJan = c(-1/4, -1/3, 1/2)
ArtificialsinJanSpread = c(-1/4, -1/3, -1/2)
ArtificialsinAprilSpread = c(-1/4, 2/3, 0)
codeTable = data.frame(codes, noArt, ArtificialsinJan, ArtificialsinJanSpread, ArtificialsinAprilSpread)
colnames(codeTable) = c("", "No Artificials", "Artificials in January", "Artificials in January with Spreading", "Artificials in April with Spreading")
kable(codeTable, digits = 2, align = "lcccc", caption = "Helmert coding scheme used to test questions 1a-c") %>% kable_styling("striped")
| No Artificials | Artificials in January | Artificials in January with Spreading | Artificials in April with Spreading | |
|---|---|---|---|---|
| testNoArts | 0.75 | -0.25 | -0.25 | -0.25 |
| testJan | 0.00 | -0.33 | -0.33 | 0.67 |
| testJanMethod | 0.00 | 0.50 | -0.50 | 0.00 |
The table above shows the Helmert contrast codes used in the model comparisons for each of question 1a-c, which are given in turn below:
For question 1a) we have the model comparison: \[ Model_{A}: yield_i = \beta_0 + \beta_1*testNoArts_i + \beta_2*testJan_i + \beta_3*testJanMethod_i + \epsilon_i \] \[ Model_{C}: yield_i = \beta_0 + \beta_2*testJan_i + \beta_3*testJanMethod_i + \epsilon_i \] \[ Null: \beta_1 = 0 \] For question 1b) we have \[ Model_{A}: yield_i = \beta_0 + \beta_1*testNoArts_i + \beta_2*testJan_i + \beta_3*testJanMethod_i + \epsilon_i \] \[ Model_{C}: yield_i = \beta_0 + \beta_1*testNoArts_i + \beta_3*testJanMethod_i + \epsilon_i \] \[ Null: \beta_2 = 0 \] Finally, for question 1c) we have \[ Model_{A}: yield_i = \beta_0 + \beta_1*testNoArts_i + \beta_2*testJan_i + \beta_3*testJanMethod_i + \epsilon_i \] \[ Model_{C}: yield_i = \beta_0 + \beta_1*testNoArts_i + \beta_2*testJan_i + \epsilon_i \] \[ Null: \beta_3 = 0 \]
Importantly, the Helmert contrast coding scheme employed here yields a complete set of orthogonal contrast codes. This is readily apparent from the table since the products of the “No Artificials” and “Artificials in April with Spreading” levels each give 0 while the products of the inner two columns together sum to zero (\((-1/4)(-1/3)(1/2) + (-1/4)(-1/3)(-1/2) = 0\)).
To run the appropriate model comparisons for question 2a and 2b, we first summarize and generate the correct set of contrast codes:
codes = c("testAnyCollege", "testOneorBoth")
noCollege = c(2/3, 0)
OneCollege = c(-1/3, 1/2)
TwoCollege = c(-1/3, -1/2)
collegeCodeTable1 = data.frame(codes, noCollege, OneCollege, TwoCollege)
colnames(collegeCodeTable1) = c("", "Neither Parent with Degree", "One Parent with Degree", "Two Parents with Degrees")
kable(collegeCodeTable1, digits = 2, align = "lccc", caption = "Helmert coding scheme used to test questions 2a-b") %>% kable_styling("striped")
| Neither Parent with Degree | One Parent with Degree | Two Parents with Degrees | |
|---|---|---|---|
| testAnyCollege | 0.67 | -0.33 | -0.33 |
| testOneorBoth | 0.00 | 0.50 | -0.50 |
#generate corresponding contrast codes
nursery$testAnyCollege = 2/3*(nursery$COLLEGE=="none") -1/3*(nursery$COLLEGE=="one") -1/3*(nursery$COLLEGE=="both")
nursery$testOneorBoth = 0*(nursery$COLLEGE=="none") + 1/2*(nursery$COLLEGE=="one") -1/2*(nursery$COLLEGE=="both")
The table above summarizes the coding scheme used here. We now run the appropriate model comparisons for 2a and 2b in turn:
model.2aA = lm(nursery$PEABODY ~ nursery$testAnyCollege + nursery$testOneorBoth)
mcSummary(model.2aA)
## lm(formula = nursery$PEABODY ~ nursery$testAnyCollege + nursery$testOneorBoth)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 330.99 2 165.495 0.074 1.882 0.164
## Error 4131.99 47 87.915
## Corr Total 4462.98 49 91.081
##
## RMSE AdjEtaSq
## 9.376 0.035
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5
## (Intercept) 76.743 1.683 45.597 182778.803 0.978 NA 73.357
## nursery$testAnyCollege -5.690 3.110 -1.830 294.328 0.066 0.758 -11.946
## nursery$testOneorBoth -6.720 4.593 -1.463 188.160 0.044 0.758 -15.961
## CI_97.5 p
## (Intercept) 80.129 0.000
## nursery$testAnyCollege 0.566 0.074
## nursery$testOneorBoth 2.521 0.150
In order to run the appropriate model comparison to test 2c, we must first rearrange the coding scheme:
codes = c("testOneCollege", "testNoneorBoth")
noCollege = c(-1/3, 1/2)
OneCollege = c(2/3, 0)
TwoCollege = c(1/3, -1/2)
collegeCodeTable2 = data.frame(codes, noCollege, OneCollege, TwoCollege)
colnames(collegeCodeTable2) = c("", "Neither Parent with Degree", "One Parent with Degree", "Two Parents with Degrees")
kable(collegeCodeTable2, digits = 2, align = "lccc", caption = "Helmert coding scheme used to test questions 2c") %>% kable_styling("striped")
| Neither Parent with Degree | One Parent with Degree | Two Parents with Degrees | |
|---|---|---|---|
| testOneCollege | -0.33 | 0.67 | 0.33 |
| testNoneorBoth | 0.50 | 0.00 | -0.50 |
#generate corresponding contrast codes
nursery$testOneCollege = -1/3*(nursery$COLLEGE=="none") + 2/3*(nursery$COLLEGE=="one") + 1/3*(nursery$COLLEGE=="both")
nursery$testNoneorBoth = 1/2*(nursery$COLLEGE=="none") + 0*(nursery$COLLEGE=="one") -1/2*(nursery$COLLEGE=="both")
We now run the appropriate model comparison for 2c:
model.2cA = lm(nursery$PEABODY ~ nursery$testOneCollege + nursery$testNoneorBoth)
mcSummary(model.2cA)
## lm(formula = nursery$PEABODY ~ nursery$testOneCollege + nursery$testNoneorBoth)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 330.99 2 165.495 0.074 1.882 0.164
## Error 4131.99 47 87.915
## Corr Total 4462.98 49 91.081
##
## RMSE AdjEtaSq
## 9.376 0.035
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5
## (Intercept) 77.475 2.344 33.051 96038.010 0.959 NA 72.759
## nursery$testOneCollege -3.292 4.503 -0.731 47.005 0.011 0.388 -12.351
## nursery$testNoneorBoth -11.245 6.647 -1.692 251.642 0.057 0.388 -24.616
## CI_97.5 p
## (Intercept) 82.191 0.000
## nursery$testOneCollege 5.766 0.468
## nursery$testNoneorBoth 2.126 0.097
We provide statistical summaries for the results of question 2a-c in turn:
2a - In a model not controlling for any other factors, we found that there were no significant differences between the PEABODY means of children of 0, 1, or 2 parents with college degrees, F(2, 47) = 1.882, PRE = .074, p > .1.
2b - Tests of individual, single degree-of-freedom contrasts showed no significant differences between the mean of children of non-degree holding parents versus the combined means of children of parents where one or both attained degrees, F(1, 47) = 3.34, PRE = .066, p = .074.
2c - Moreover, a further single degree-of-freedom contrast showed no significant differences between the PEABODY means of children of no degree-holders versus the means of children of two degree-holders, F = 2.82, PRE = .057, p = .097.
We use the code below to generate a dummy coding scheme to test simultaneously whether there is a difference between the “none” and “one” groups as well as between the “none” and “two” groups:
codes = c("testNoneorBoth_Dummy", "testNoneorOne_Dummy")
noCollege = c(0, 0)
OneCollege = c(0, 1)
TwoCollege = c(1, 0)
collegeCodeTable3 = data.frame(codes, noCollege, OneCollege, TwoCollege)
colnames(collegeCodeTable3) = c("", "Neither Parent with Degree", "One Parent with Degree", "Two Parents with Degrees")
kable(collegeCodeTable3, digits = 2, align = "lccc", caption = "Helmert coding scheme used to test questions 2c") %>% kable_styling("striped")
| Neither Parent with Degree | One Parent with Degree | Two Parents with Degrees | |
|---|---|---|---|
| testNoneorBoth_Dummy | 0 | 0 | 1 |
| testNoneorOne_Dummy | 0 | 1 | 0 |
#generate corresponding contrast codes
nursery$testNoneorBoth_Dummy = 0*(nursery$COLLEGE=="none") + 0*(nursery$COLLEGE=="one") + 1*(nursery$COLLEGE=="both")
nursery$testNoneorOne_Dummy = 0*(nursery$COLLEGE=="none") + 1*(nursery$COLLEGE=="one") + 0*(nursery$COLLEGE=="both")
model.2eA = lm(nursery$PEABODY ~ nursery$testNoneorBoth_Dummy + nursery$testNoneorOne_Dummy)
mcSummary(model.2eA)
## lm(formula = nursery$PEABODY ~ nursery$testNoneorBoth_Dummy +
## nursery$testNoneorOne_Dummy)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 330.99 2 165.495 0.074 1.882 0.164
## Error 4131.99 47 87.915
## Corr Total 4462.98 49 91.081
##
## RMSE AdjEtaSq
## 9.376 0.035
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol
## (Intercept) 72.95 2.097 34.794 106434.050 0.963 NA
## nursery$testNoneorBoth_Dummy 9.05 4.688 1.930 327.610 0.073 0.889
## nursery$testNoneorOne_Dummy 2.33 2.813 0.828 60.321 0.014 0.889
## CI_2.5 CI_97.5 p
## (Intercept) 68.732 77.168 0.000
## nursery$testNoneorBoth_Dummy -0.381 18.481 0.060
## nursery$testNoneorOne_Dummy -3.329 7.989 0.412
The output provided above shows that the PEABODY means of children of no degree-holders were not significantly different from the means of children of two degree-holders, F(1, 47) = 3.72, PRE = .073, p = .060, and from those of children of one degree-holder, F(1, 47) = .685, PRE = .014, p = .412.
We first predict the size of the budget deficit from city size and poverty rate:
model.3aA = lm(cities$budget ~ cities$size + cities$pov)
mcSummary(model.3aA)
## lm(formula = cities$budget ~ cities$size + cities$pov)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 226.938 2 113.469 0.46 72.915 0
## Error 266.108 171 1.556
## Corr Total 493.046 173 2.850
##
## RMSE AdjEtaSq
## 1.247 0.454
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 3.260 0.453 7.188 80.395 0.232 NA 2.364 4.155 0
## cities$size -0.047 0.006 -7.518 87.953 0.248 0.952 -0.059 -0.035 0
## cities$pov -0.144 0.019 -7.574 89.267 0.251 0.952 -0.182 -0.107 0
From the output above, we observe that for every additional 10,000 inhabitants, cities can expect budget surpluses to decrease by $470,000 (or, if they are running a deficit, deficits will increase by this amount). Similarly, for every additional percentage point of residents living below the poverty line, cities can expect budget surpluses to decrease by $1.44 million. We provide a news summary below to summarize the above results:
News Summary: Our group was interested in identifying which economic and social factors predict the magnitude of budget surpluses/deficits in large American cities. In a preliminary analysis, both the size (in tens of thousands) and percentage of residents below the poverty line together predicted budget shortages. We aimed to quantify the indpendent contributions of each variable in a follow-up analysis, which showed that both variables significantly predict budget shortages independently. In particular, we found that for every additional 10,000 inhabitants, cities can expect budget deficits to increase by $470,000. Similarly, for every additional percentage point of residents living below the poverty line, cities can expect budget deficits to increase by $1.44 million.
To extend the results above, we include a size-by-poverty interaction term in the following model comparison:
model.3bA = lm(cities$budget ~ cities$size + cities$pov + cities$size:cities$pov)
mcSummary(model.3bA)
## lm(formula = cities$budget ~ cities$size + cities$pov + cities$size:cities$pov)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 233.080 3 77.693 0.473 50.806 0
## Error 259.967 170 1.529
## Corr Total 493.046 173 2.850
##
## RMSE AdjEtaSq
## 1.237 0.463
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5
## (Intercept) 0.819 1.298 0.631 0.609 0.002 NA -1.743
## cities$size 0.018 0.033 0.536 0.439 0.002 0.034 -0.047
## cities$pov -0.039 0.056 -0.700 0.749 0.003 0.109 -0.149
## cities$size:cities$pov -0.003 0.001 -2.004 6.142 0.023 0.023 -0.005
## CI_97.5 p
## (Intercept) 3.382 0.529
## cities$size 0.082 0.593
## cities$pov 0.071 0.485
## cities$size:cities$pov 0.000 0.047
From the output above we observe that for every additional 10,000 inhabitants, city surpluses can be expected to increase by $180,000 when that city has a 0% poverty level (however, this parameter estimate is far from significance). Similarly, for every percentage point increase in residents below the poverty line, a city with 0 residents can expected to see surpluses decrease by $390,000 (however, this scenario is clearly nonsensical and the associated parameter estimate is far from significance). Finally, an interaction effect was detected: for every percentage point increase in the poverty rate, the negative relationship between city size and surplus levels was magnified by $30,000.
Statistical Summary: In a model predicting surplus levels from the main effects of city size and poverty rate as well as their interaction, simple effects were not significant for city size at a poverty rate of 0%, b = .018, F(1, 170) = .287, PRE = .034, p = .593, nor significant for poverty rate at a city size of 0 residents, b = -.039, F(1, 170) = .49, PRE = .003, .485. However, their interaction was significant, b = -.003, F(1, 170) = 4.016, PRE = .023, p = .047, such that for every percentage point increase in the poverty rate, the negative relationship between city size and surplus levels was magnified by $30,000.
To confirm the results above, we mean-center the predictors and re-run the model comparison:
cities$size_C = cities$size - mean(cities$size)
cities$pov_C = cities$pov - mean(cities$pov)
model.3cA = lm(cities$budget ~ cities$size_C*cities$pov_C)
mcSummary(model.3cA)
## lm(formula = cities$budget ~ cities$size_C * cities$pov_C)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 233.080 3 77.693 0.473 50.806 0
## Error 259.967 170 1.529
## Corr Total 493.046 173 2.850
##
## RMSE AdjEtaSq
## 1.237 0.463
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5
## (Intercept) -1.747 0.097 -18.077 499.740 0.658 NA -1.938
## cities$size_C -0.044 0.006 -6.895 72.691 0.219 0.899 -0.056
## cities$pov_C -0.144 0.019 -7.615 88.674 0.254 0.952 -0.181
## cities$size_C:cities$pov_C -0.003 0.001 -2.004 6.142 0.023 0.940 -0.005
## CI_97.5 p
## (Intercept) -1.557 0.000
## cities$size_C -0.031 0.000
## cities$pov_C -0.107 0.000
## cities$size_C:cities$pov_C 0.000 0.047
From the output above we observe that for every additional 10,000 inhabitants, city surpluses can be expected to decrease by $440,000 when that city has poverty level at the mean. Similarly, for every percentage point increase in residents below the poverty line, a city with the mean number of residents can expected to see surpluses decrease by $1,440,000. Finally, an interaction effect was detected: for every percentage point increase in the poverty rate, the negative relationship between city size and surplus levels was magnified by $30,000.
To investigate these effects in cities with populations at or above 500,000 we deviate size and re-run the corresponding model comparisons:
cities$size_50 = cities$size - 50
model.3dA = lm(cities$budget ~ cities$size_50 + cities$pov + cities$size_50*cities$pov, data = cities)
mcSummary(model.3dA)
## lm(formula = cities$budget ~ cities$size_50 + cities$pov + cities$size_50 *
## cities$pov, data = cities)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 233.080 3 77.693 0.473 50.806 0
## Error 259.967 170 1.529
## Corr Total 493.046 173 2.850
##
## RMSE AdjEtaSq
## 1.237 0.463
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5
## (Intercept) 1.697 0.601 2.824 12.199 0.045 NA 0.511
## cities$size_50 0.018 0.033 0.536 0.439 0.002 0.034 -0.047
## cities$pov -0.175 0.024 -7.176 78.740 0.232 0.568 -0.224
## cities$size_50:cities$pov -0.003 0.001 -2.004 6.142 0.023 0.036 -0.005
## CI_97.5 p
## (Intercept) 2.883 0.005
## cities$size_50 0.082 0.593
## cities$pov -0.127 0.000
## cities$size_50:cities$pov 0.000 0.047
From the output above we observe that for every additional 10,000 inhabitants, city surpluses can be expected to increase by $440,000 when that city has a 0% poverty level. Similarly, for every percentage point increase in residents below the poverty line, a city with 50,000 residents can expected to see surpluses decrease by $1,750,000. Finally, an interaction effect was detected: for every percentage point increase in the poverty rate, the negative relationship between city size and surplus levels was magnified by $30,000.
News Summary: Our group was interested in identifying which economic and social factors predict the magnitude of budget surpluses/deficits in large American cities. We found that city size was not related to budget surpluses in cases where cities experienced a 0% poverty rate. On the other hand, poverty rate did influence surpluses, such that for every percentage point increase in residents below the poverty line, a city with 50,000 residents can expected to see surpluses decrease by $1,750,000. Importantly, this effect was magnified by the city size: for every percentage point increase in the poverty rate, the negative relationship between city size and surplus levels was magnified by $30,000.
To visualize the relationship between poverty and budget at various levels of city size, we plot the regression lines at four selected levels of size:
summary(cities)
## obs size pov unemp
## Min. : 1.00 Min. :10.30 Min. : 9.509 Min. : 4.89
## 1st Qu.: 44.25 1st Qu.:26.72 1st Qu.:19.646 1st Qu.:11.28
## Median : 87.50 Median :36.88 Median :22.241 Median :14.57
## Mean : 87.50 Mean :38.42 Mean :22.544 Mean :14.53
## 3rd Qu.:130.75 3rd Qu.:48.45 3rd Qu.:25.774 3rd Qu.:17.48
## Max. :174.00 Max. :87.53 Max. :39.929 Max. :24.39
## police crime budget size_C
## Min. : 0.2963 Min. : 15.91 Min. :-6.4838 Min. :-28.129
## 1st Qu.:24.0452 1st Qu.: 82.67 1st Qu.:-2.9091 1st Qu.:-11.706
## Median :35.5413 Median :117.17 Median :-1.7208 Median : -1.549
## Mean :34.4456 Mean :118.33 Mean :-1.7946 Mean : 0.000
## 3rd Qu.:43.3959 3rd Qu.:152.83 3rd Qu.:-0.5446 3rd Qu.: 10.025
## Max. :76.5688 Max. :240.67 Max. : 1.6955 Max. : 49.111
## pov_C size_50
## Min. :-13.0351 Min. :-39.705
## 1st Qu.: -2.8979 1st Qu.:-23.282
## Median : -0.3029 Median :-13.125
## Mean : 0.0000 Mean :-11.576
## 3rd Qu.: 3.2295 3rd Qu.: -1.551
## Max. : 17.3854 Max. : 37.535
cities %>% ggplot(aes(pov, budget)) +
geom_point(alpha = .2) +
geom_abline(intercept = .999, slope = -.069, size = 2, color = "red") +
geom_abline(intercept = 1.51, slope = -.154, size = 2, color = "orange") +
geom_abline(intercept = 1.719, slope = -.189, size = 2, color = "yellow") +
geom_abline(intercept = 2.169, slope = -.264, size = 2, color = "green") +
coord_cartesian(ylim = c(-10, 5), xlim = c(0, 45)) +
theme_classic(base_size = 22)