library(car)
## Loading required package: carData
cor(Duncan$prestige, Duncan$income)
## [1] 0.8378014
cor(Duncan$prestige, Duncan$education)
## [1] 0.8519156
cor(Duncan$income, Duncan$education)
## [1] 0.7245124
Prestige & Income = .8378014
Prestige & Education = .8519156
Income & Education = .7245124
prestige.predictor <- lm(prestige ~ income, data = Duncan)
summary(prestige.predictor)
##
## Call:
## lm(formula = prestige ~ income, data = Duncan)
##
## Residuals:
## Min 1Q Median 3Q Max
## -46.566 -9.421 0.257 9.167 61.855
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.4566 5.1901 0.473 0.638
## income 1.0804 0.1074 10.062 7.14e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.4 on 43 degrees of freedom
## Multiple R-squared: 0.7019, Adjusted R-squared: 0.695
## F-statistic: 101.3 on 1 and 43 DF, p-value: 7.144e-13
The percent of variance : R2 = .7019 - That suggests that 70% of the prestige can be based on income
The estimated regression equation is as follows: \[ \hat{y}_i = 2.456 + 1.08 x_{i} \]
70 years old
2.456 + (1.08*70)
## [1] 78.056
90 Years old
2.456 + (1.08*90)
## [1] 99.656
The prediction of 90 years old would be inappropriate due it being outside of the parameters of the data set size. This is called extrapolating. Predictions must be made within the parameters of the data set.
y <- Duncan$prestige
x <- Duncan$income
y_diff <- y-mean(y)
x_diff <-x-mean(x)
b_1 <- sum(y_diff*x_diff)/sum((x_diff)^2)
b_1
## [1] 1.08039
b_0 <-mean(y) - mean(x)*b_1
b_0
## [1] 2.456574
plot(Duncan$prestige, Duncan$income, pch = 16, xlab = "Income", ylab = "Prestige")
abline(a = b_0, b = b_1, col = "blue")