Q1. Describe the Grafton and Coos Counties using ALL variables in the data set.

Grafton & Coos Counties has an average education of 18.5 years and a median income of $30,000.

Q2. Create a scatterplot to examine the relationship between ed_avg and income_median.

Q3. Compute the correlation coefficient between the two variables and interpret them.

## [1] 0.8622811

There is a strong, positive correlation coefficient between an increase in average education years and an increase in median income for a county. This indicates that with an increase in average education for a county, one can reasonably infer there will be an increase in median income.

Q4. Build a regression model to predict income_median using ed_avg, save the regression result in mod_1, and show the summary result.

## 
## Call:
## lm(formula = income_median ~ ed_avg, data = residents_25to65)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3643.9 -2548.6   655.8  1730.7  4150.6 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  -201503      49675  -4.056  0.00365 **
## ed_avg         12695       2636   4.816  0.00133 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2891 on 8 degrees of freedom
## Multiple R-squared:  0.7435, Adjusted R-squared:  0.7115 
## F-statistic: 23.19 on 1 and 8 DF,  p-value: 0.001328

Q5. Is the coefficient of ed_avg statistically significant at 5%? How do you know?

The coefficient is statistically significant. Given there is a two-star readout on the summary, we can conclude the coefficient is statistically significant at 1%, meaning it will be statistically significant at 5%

Q6. Further develop the regression model above by adding another variable, region, save the regression result in mod_2, and show the summary result.

## 
## Call:
## lm(formula = income_median ~ ed_avg + region, data = residents_25to65)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2016.2  -778.4  -373.5   353.4  2780.7 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -166192      30700  -5.413 0.000994 ***
## ed_avg         10701       1638   6.532 0.000324 ***
## region          4524       1136   3.981 0.005314 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1711 on 7 degrees of freedom
## Multiple R-squared:  0.9214, Adjusted R-squared:  0.899 
## F-statistic: 41.05 on 2 and 7 DF,  p-value: 0.0001359

Q7. Compare mod_1 and mod_2. Which of the two models better fits the data?

Model 2 fits the data better, given there is a higher r-squared value and a smaller residual standard error. This would indicate that the model predictions will be done with higher accuracy.

Q8. How much median income does the second model predict for the Grafton and Coos Counties?

y = 10701x + 4524z - 166192 y = 10701(18.5) + 4524(0) -166192 y = 31776.5

The model predicts a median income of $31,776.50 for Grafton & Coos Counties.

Q9. According to the result of the second regression model, are residents of southeastern regions of the State likely to make more income? Why or why not?

According to the second regression model, residents of the southeastern region of the state are likely to make more income. This is due to the fact that the region income only has two variables, 0 and 1, and when creating the prediction model formula, having a value of 1 for region adds an extra $4524 to the prediction.