# Load packages
library(tidyverse)
library(openintro)
countyComplete <- as_tibble(countyComplete)
This is an observation of Autauga County, Alabama. In 2010 it had a population of 54,571, with 21.6% of its residents as bachelors degree holders and a population density of 91.8 people per sq mile. It has a per-capita income of $24,568.
countyComplete %>%
ggplot(aes(bachelors, per_capita_income)) +
geom_point()
Hint: Make sure to interpret the direction and the magnitude of the relationship. In addition, keep in mind that correlation (or regression) coefficients do not show causation but only association.
cor(countyComplete$bachelors, countyComplete$per_capita_income, use = "pairwise.complete.obs")
## [1] 0.7924464
There is a strong, positive correlation between per-capita income and a percentage of residents being bachelors degree holders. This can indicate that either having a bachelors degree will result in higher earnings, or that people who earn more tend to move to areas with more people that are bachelors degree owners.
mod_1 <- lm(per_capita_income ~ bachelors, data = countyComplete)
summary(mod_1)
##
## Call:
## lm(formula = per_capita_income ~ bachelors, data = countyComplete)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18032.7 -1708.2 73.8 1748.0 21756.5
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13087.680 142.091 92.11 <2e-16 ***
## bachelors 494.753 6.795 72.81 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3299 on 3141 degrees of freedom
## Multiple R-squared: 0.628, Adjusted R-squared: 0.6279
## F-statistic: 5302 on 1 and 3141 DF, p-value: < 2.2e-16
The probability of error is very small, so the coefficient of bachelors is statistically significant.
13087.68+494.753*70 = 47720.39 Per-capita income is expected to be $47,720.39 in a county with 70% of its residents owning a bachelors degree.
The model is estimated to miss the actual per-capita income of a county by $3,299.
The adjusted r-squared value of .6279 would indicate that a county’s percentage of population owning a bachelors degree accounts for 62.79% of the variability in per-capita income
mod_1 <- lm(per_capita_income ~ bachelors + density, data = countyComplete)
summary(mod_1)
##
## Call:
## lm(formula = per_capita_income ~ bachelors + density, data = countyComplete)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18257.9 -1707.7 71.5 1749.6 22101.1
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.319e+04 1.433e+02 92.042 < 2e-16 ***
## bachelors 4.872e+02 6.963e+00 69.973 < 2e-16 ***
## density 1.623e-01 3.499e-02 4.639 3.65e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3289 on 3140 degrees of freedom
## Multiple R-squared: 0.6305, Adjusted R-squared: 0.6303
## F-statistic: 2679 on 2 and 3140 DF, p-value: < 2.2e-16
Model 2 fits the data better, given the residual standard error is slightly lower and the r-squared value is slightly higher. This indicates a slightly stronger predictor.