The given dataset was computed from a sample of 67,248 New Hampshire residents at the age of 25-65. The sample data was obtained from the U.S. Census, 2012-2016 ACS PUMS DATA.
This is an observation for Grafton and Coos countries 18.52291 =ed_avg which is the average years of schooling of New Hampshire residents in 2012-2016. 30000is the income_median which is the is the median income of New Hampshire residents in 2012-2016. It represents a total income including wages and salaries, self-employment income, and interest, dividends and rent income. resion =0 which is the place within New Hampshire: southeastern regions take 1, and 0 otherwise.
Hint: Make sure to interpret the direction and the magnitude of the relationship. In addition, keep in mind that correlation (or regression) coefficients do not show causation but only association.
## [1] 0.8622811
There is a strong, positive correlation between ed_avg and the income_median. This can indicate that having average eduation can result in median income.
##
## Call:
## lm(formula = income_median ~ ed_avg, data = residents_25to65)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3643.9 -2548.6 655.8 1730.7 4150.6
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -201503 49675 -4.056 0.00365 **
## ed_avg 12695 2636 4.816 0.00133 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2891 on 8 degrees of freedom
## Multiple R-squared: 0.7435, Adjusted R-squared: 0.7115
## F-statistic: 23.19 on 1 and 8 DF, p-value: 0.001328
Hint: Discuss your answer in terms of the number of stars in the summary result. Refer to the interpretation section in quiz4_a.
** at the end of the Intercept line indicates that the coefficient is significant at 0.1% signficance level . It means that we are 99.9% confident that the interecept is true.it is signifinant at .1 percent means it is signifiant at 1% it is signifiant at 5%
##
## Call:
## lm(formula = income_median ~ ed_avg + region, data = residents_25to65)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2016.2 -778.4 -373.5 353.4 2780.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -166192 30700 -5.413 0.000994 ***
## ed_avg 10701 1638 6.532 0.000324 ***
## region 4524 1136 3.981 0.005314 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1711 on 7 degrees of freedom
## Multiple R-squared: 0.9214, Adjusted R-squared: 0.899
## F-statistic: 41.05 on 2 and 7 DF, p-value: 0.0001359
Hint: Discuss your answer by comparing the residual standard error and the adjusted R squared between the two models.
Model 2 fits the data better, given the residual standard error is slightly lower and the r-squared value is slightly higher. This indicates a slightly stronger predictor.
Hint: Note that the second model has two predictors. Use both predictors to compute the predicted income.
income=y intercept + coefficent
166192+1070150=701242+166192=867434 166192+4524 50=392392+166192=558584
Hint: Discuss your answer based on the coefficient of region. You may refer to the interpretation section in quiz4_a.
The residents of the southeastern region are likely to make less income The intercept is 166192 means that the resion 4524 median income added with 166192=558584.