The given dataset was computed from a sample of 67,248 New Hampshire residents at the age of 25-65. The sample data was obtained from the U.S. Census, 2012-2016 ACS PUMS DATA.
In the data set Grafton & Coos Counties have an ed_avg of 18.52291 and an income median of 30,000 and the region is 0 which means it is not in southeastern region.
Hint: Make sure to interpret the direction and the magnitude of the relationship. In addition, keep in mind that correlation (or regression) coefficients do not show causation but only association.
## [1] 0.8622811
There is a positive cooralation between varibales median income and education average on the scatterplot.
##
## Call:
## lm(formula = income_median ~ ed_avg, data = residents_25to65)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3643.9 -2548.6 655.8 1730.7 4150.6
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -201503 49675 -4.056 0.00365 **
## ed_avg 12695 2636 4.816 0.00133 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2891 on 8 degrees of freedom
## Multiple R-squared: 0.7435, Adjusted R-squared: 0.7115
## F-statistic: 23.19 on 1 and 8 DF, p-value: 0.001328
Hint: Discuss your answer in terms of the number of stars in the summary result. Refer to the interpretation section in quiz4_a.
** The end of the intercept line shows that the coefficiant is significant at .1% level. It shows we are 99% confident the intercept is true.
##
## Call:
## lm(formula = income_median ~ ed_avg + region, data = residents_25to65)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2016.2 -778.4 -373.5 353.4 2780.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -166192 30700 -5.413 0.000994 ***
## ed_avg 10701 1638 6.532 0.000324 ***
## region 4524 1136 3.981 0.005314 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1711 on 7 degrees of freedom
## Multiple R-squared: 0.9214, Adjusted R-squared: 0.899
## F-statistic: 41.05 on 2 and 7 DF, p-value: 0.0001359
Hint: Discuss your answer by comparing the residual standard error and the adjusted R squared between the two models.
It looks like model 2 is a better mod because the residual standard error is lower. Also r-squared value is slightly higher, indicates stronger predictor.
Hint: Note that the second model has two predictors. Use both predictors to compute the predicted income.
income=y coefficiant + intercept 166192+1070150=701242+166192=867434 166192+4524 50=392392+166192=558584
Hint: Discuss your answer based on the coefficient of region. You may refer to the interpretation section in quiz4_a.
According to the second regression model, residents of southeastern regions will make less income because the intercept is 166192+4524 *50=392392+166192=558584