## # A tibble: 10 x 5
##        X PUMA_label                    ed_avg income_median region
##    <int> <fct>                          <dbl>         <int>  <int>
##  1     1 Cheshire & Sullivan Counties    18.6         35390      0
##  2     2 Concord City                    19.0         36790      0
##  3     3 Grafton & Coos Counties         18.5         30000      0
##  4     4 Greater Nashua City             18.9         40800      1
##  5     5 Hillsborough County (Western)   19.2         42900      1
##  6     6 Lakes Region                    18.6         33050      0
##  7     7 Manchester City                 18.2         32000      1
##  8     8 Outer Manchester City           19.1         44700      1
##  9     9 Portsmouth City                 19.3         45000      1
## 10    10 Strafford Region                19.0         36200      0

The given dataset was computed from a sample of 67,248 New Hampshire residents at the age of 25-65. The sample data was obtained from the U.S. Census, 2012-2016 ACS PUMS DATA.

Q1. Describe the Grafton and Coos Counties using ALL variables in the data set.

Grafton and Coos counties average years of schooling is 18.5, median income is 30000, and the region is 0 becuase it is not located in the southeastern region.

Q2. Create a scatterplot to examine the relationship between ed_avg and income_median.

Q3. Compute the correlation coefficient between the two variables and interpret them.

Hint: Make sure to interpret the direction and the magnitude of the relationship. In addition, keep in mind that correlation (or regression) coefficients do not show causation but only association.

## [1] 0.8622811

The correlation coefficient is 0.8622811, meaning that they have a strong positive rewlationship. As the average years of education increases, median income increases.

Q4. Build a regression model to predict income_median using ed_avg, save the regression result in mod_1, and show the summary result.

## 
## Call:
## lm(formula = income_median ~ ed_avg, data = residents_25to65)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3643.9 -2548.6   655.8  1730.7  4150.6 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  -201503      49675  -4.056  0.00365 **
## ed_avg         12695       2636   4.816  0.00133 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2891 on 8 degrees of freedom
## Multiple R-squared:  0.7435, Adjusted R-squared:  0.7115 
## F-statistic: 23.19 on 1 and 8 DF,  p-value: 0.001328

Q5. Is the coefficient of ed_avg statistically significant at 5%? How do you know?

Hint: Discuss your answer in terms of the number of stars in the summary result. Refer to the interpretation section in quiz4_a.

The coefficient of ed_avg is statistically significant at 1 percent because there are two stars, so yes it is statistically significant at 5 percent.

Q6. Further develop the regression model above by adding another variable, region, save the regression result in mod_2, and show the summary result.

## 
## Call:
## lm(formula = income_median ~ ed_avg + region, data = residents_25to65)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2016.2  -778.4  -373.5   353.4  2780.7 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -166192      30700  -5.413 0.000994 ***
## ed_avg         10701       1638   6.532 0.000324 ***
## region          4524       1136   3.981 0.005314 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1711 on 7 degrees of freedom
## Multiple R-squared:  0.9214, Adjusted R-squared:  0.899 
## F-statistic: 41.05 on 2 and 7 DF,  p-value: 0.0001359

Q7. Compare mod_1 and mod_2. Which of the two models better fits the data?

Hint: Discuss your answer by comparing the residual standard error and the adjusted R squared between the two models.

Model 2 has a lower residual standard error and a higher adjusted r squared compared to model 1, so model 2 is more accurate and is more desirable.

Q8. How much median income does the second model predict for the Grafton and Coos Counties?

Hint: Note that the second model has two predictors. Use both predictors to compute the predicted income.

Couldn’t find answer.

Q9. According to the result of the second regression model, are residents of southeastern regions of the State likely to make more income? Why or why not?

Hint: Discuss your answer based on the coefficient of region. You may refer to the interpretation section in quiz4_a.

Couldn’t find answer.

Q10.a. Hide the code but display the results of the code on the webpage.

Q10.b. Display the title and your name correctly at the top of the webpage.

Q10.c. Use the correct slug.