## # A tibble: 10 x 5
## X PUMA_label ed_avg income_median region
## <int> <fct> <dbl> <int> <int>
## 1 1 Cheshire & Sullivan Counties 18.6 35390 0
## 2 2 Concord City 19.0 36790 0
## 3 3 Grafton & Coos Counties 18.5 30000 0
## 4 4 Greater Nashua City 18.9 40800 1
## 5 5 Hillsborough County (Western) 19.2 42900 1
## 6 6 Lakes Region 18.6 33050 0
## 7 7 Manchester City 18.2 32000 1
## 8 8 Outer Manchester City 19.1 44700 1
## 9 9 Portsmouth City 19.3 45000 1
## 10 10 Strafford Region 19.0 36200 0
The given dataset was computed from a sample of 67,248 New Hampshire residents at the age of 25-65. The sample data was obtained from the U.S. Census, 2012-2016 ACS PUMS DATA.
Grafton and Coos counties average years of schooling is 18.5, median income is 30000, and the region is 0 becuase it is not located in the southeastern region.
Hint: Make sure to interpret the direction and the magnitude of the relationship. In addition, keep in mind that correlation (or regression) coefficients do not show causation but only association.
## [1] 0.8622811
The correlation coefficient is 0.8622811, meaning that they have a strong positive rewlationship. As the average years of education increases, median income increases.
##
## Call:
## lm(formula = income_median ~ ed_avg, data = residents_25to65)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3643.9 -2548.6 655.8 1730.7 4150.6
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -201503 49675 -4.056 0.00365 **
## ed_avg 12695 2636 4.816 0.00133 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2891 on 8 degrees of freedom
## Multiple R-squared: 0.7435, Adjusted R-squared: 0.7115
## F-statistic: 23.19 on 1 and 8 DF, p-value: 0.001328
Hint: Discuss your answer in terms of the number of stars in the summary result. Refer to the interpretation section in quiz4_a.
The coefficient of ed_avg is statistically significant at 1 percent because there are two stars, so yes it is statistically significant at 5 percent.
##
## Call:
## lm(formula = income_median ~ ed_avg + region, data = residents_25to65)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2016.2 -778.4 -373.5 353.4 2780.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -166192 30700 -5.413 0.000994 ***
## ed_avg 10701 1638 6.532 0.000324 ***
## region 4524 1136 3.981 0.005314 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1711 on 7 degrees of freedom
## Multiple R-squared: 0.9214, Adjusted R-squared: 0.899
## F-statistic: 41.05 on 2 and 7 DF, p-value: 0.0001359
Hint: Discuss your answer by comparing the residual standard error and the adjusted R squared between the two models.
Model 2 has a lower residual standard error and a higher adjusted r squared compared to model 1, so model 2 is more accurate and is more desirable.
Hint: Note that the second model has two predictors. Use both predictors to compute the predicted income.
Couldn’t find answer.
Hint: Discuss your answer based on the coefficient of region. You may refer to the interpretation section in quiz4_a.
Couldn’t find answer.