The given dataset was computed from a sample of 67,248 New Hampshire residents at the age of 25-65. The sample data was obtained from the U.S. Census, 2012-2016 ACS PUMS DATA.
Grafton and Coos county are listed as number three in the data set. the average year of school in these counties is 18.5229099678457%. The median income for these counties between 2012-2016 is $30,000. Grafton and Coos counties are not located in the southeastern region of the state, this is true because there is a 0 in the data set.
Hint: Make sure to interpret the direction and the magnitude of the relationship. In addition, keep in mind that correlation (or regression) coefficients do not show causation but only association.
## [1] 0.8622811
The sign is a positive relationship of 0.8622811. This is a strong realtionship because the absolute value is geater than .60.
##
## Call:
## lm(formula = income_median ~ ed_avg, data = residents_25to65)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3643.9 -2548.6 655.8 1730.7 4150.6
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -201503 49675 -4.056 0.00365 **
## ed_avg 12695 2636 4.816 0.00133 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2891 on 8 degrees of freedom
## Multiple R-squared: 0.7435, Adjusted R-squared: 0.7115
## F-statistic: 23.19 on 1 and 8 DF, p-value: 0.001328
Hint: Discuss your answer in terms of the number of stars in the summary result. Refer to the interpretation section in quiz4_a.
The coefficient of ed_avg is statistically significant at 5% because there are two stars, meaning we are 99.5% confident.
##
## Call:
## lm(formula = income_median ~ ed_avg + region, data = residents_25to65)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2016.2 -778.4 -373.5 353.4 2780.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -166192 30700 -5.413 0.000994 ***
## ed_avg 10701 1638 6.532 0.000324 ***
## region 4524 1136 3.981 0.005314 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1711 on 7 degrees of freedom
## Multiple R-squared: 0.9214, Adjusted R-squared: 0.899
## F-statistic: 41.05 on 2 and 7 DF, p-value: 0.0001359
Hint: Discuss your answer by comparing the residual standard error and the adjusted R squared between the two models.
The residual standard error for mod_1 is 2891 and the adjusted R squared is 0.7115. The residual standard error for mod_2 is 1711 and the adjusted R squared is 0.899. This means that mod_2 better fits the data set because both the adjusted R squared and residual standard error are more accurate than mod_1. This is true because in mod_1, ed_avg has 2 stars, and in mod_2, it has 3 stars.
Hint: Note that the second model has two predictors. Use both predictors to compute the predicted income.
pedictors= ed_avg, and region.
median income = 10701(ed_avg) + $30,000 Median income = $40,701 predicted median income for ed_avg is $40,701.
Median income = 4524(region) + $30,000 Median income = $34,524 Predicted median income for region is $34,524.
Hint: Discuss your answer based on the coefficient of region. You may refer to the interpretation section in quiz4_a.
Residents of the southearn regions of hte state are likely to make more income because of the the variable “region” in mod_2 is 4524. If any region besides the southeast made more income, the Estimate Std. for the variable “region” would have been 0.