The given dataset was computed from a sample of 67,248 New Hampshire residents at the age of 25-65. The sample data was obtained from the U.S. Census, 2012-2016 ACS PUMS DATA.
Answer:Grafton and Coos Counties are in row 3. PUMA_label is what the counties/cities are listed under. Ed_avg is the average years of schooling of New Hampshire residents in 2012-2016. Ed_avg is represented in a number and in decimal, which would come together to get the average.Ed_Avg for Grafton and Coos Counties is 18.52291. Income_Median is the median income of New Hampshire residents in 2012-2016. Income_Median represents the total amount of income including wages, salaries, self-employment, income, dividends, and rent income. Income_Median is 30,000 for this county, represented in dollars (Accounting). Region indicated the area within New Hampshire, southeastern regions take 1, and 0 otherwise.Grafton and Coos County is 0, meaning that it is not in the southeastern region of the state.
## # A tibble: 10 x 5
## X PUMA_label ed_avg income_median region
## <int> <fct> <dbl> <int> <int>
## 1 1 Cheshire & Sullivan Counties 18.6 35390 0
## 2 2 Concord City 19.0 36790 0
## 3 3 Grafton & Coos Counties 18.5 30000 0
## 4 4 Greater Nashua City 18.9 40800 1
## 5 5 Hillsborough County (Western) 19.2 42900 1
## 6 6 Lakes Region 18.6 33050 0
## 7 7 Manchester City 18.2 32000 1
## 8 8 Outer Manchester City 19.1 44700 1
## 9 9 Portsmouth City 19.3 45000 1
## 10 10 Strafford Region 19.0 36200 0
Hint: Make sure to interpret the direction and the magnitude of the relationship. In addition, keep in mind that correlation (or regression) coefficients do not show causation but only association.
## [1] 0.8622811
Answer: The number that has been calculated based on the info which is currently present, we end with the number of 0.8622811. Due to the rule of thumb, we can determine this to be a strong correlation, as the nuber is greater than .6. It is also a positive correlation due to there being no negative sign in front. Giving us a Strong Positive Correlation
##
## Call:
## lm(formula = income_median ~ ed_avg, data = residents_25to65)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3643.9 -2548.6 655.8 1730.7 4150.6
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -201503 49675 -4.056 0.00365 **
## ed_avg 12695 2636 4.816 0.00133 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2891 on 8 degrees of freedom
## Multiple R-squared: 0.7435, Adjusted R-squared: 0.7115
## F-statistic: 23.19 on 1 and 8 DF, p-value: 0.001328
Hint: Discuss your answer in terms of the number of stars in the summary result. Refer to the interpretation section in quiz4_a.
Answer: 99.5% confident = **= 0.01. We are confident that itβs statistically significant at 5%. This is due to looking at the cofficients table and looking at the far left. The two stars indicate the amount of significance. We can go down the bottom where it has the code, two stars =0.01% confidence. We start out at 100% confidence and we factor in the significance bring it down to 99.5%.
##
## Call:
## lm(formula = income_median ~ ed_avg + region, data = residents_25to65)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2016.2 -778.4 -373.5 353.4 2780.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -166192 30700 -5.413 0.000994 ***
## ed_avg 10701 1638 6.532 0.000324 ***
## region 4524 1136 3.981 0.005314 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1711 on 7 degrees of freedom
## Multiple R-squared: 0.9214, Adjusted R-squared: 0.899
## F-statistic: 41.05 on 2 and 7 DF, p-value: 0.0001359
Hint: Discuss your answer by comparing the residual standard error and the adjusted R squared between the two models.
Answer: Mod_2 would be better fitting for the data. The Residual standard error for mod_1 is 2891 on 8 degrees of freedom. While mod_2 Residual standard for error is 1711 on 7 degrees of freedom. This is a drastic improvement of what we had previously. The siginfiance for ed_avg goes from 99.5% to 99.9%, resulting in an improvement of our overall confidence. This goes from just being confident to be very confident. The adjusted R squared is 0.7115 for mod_1, but in mod_2 it has increased to .899. This tells us that the new term improves the model than would be expected by chance.
Hint: Note that the second model has two predictors. Use both predictors to compute the predicted income.
Answer: = The second predicts that the income will be 38,729, a jump from 30,000 as displayed in the first model.
Hint: Discuss your answer based on the coefficient of region. You may refer to the interpretation section in quiz4_a.
Answer: According to the two models that have been displayed, those who live in the southeastern regions will likely make more income than other regions. When examining both models, model two has a slightly lower intercept point than model one. The confidence for model 2 has improved compared to model one, when calculations are made we see an increase of income.