library(car)
## Loading required package: carData

Question 2:

cor(Duncan$prestige, Duncan$income)
## [1] 0.8378014
cor(Duncan$prestige, Duncan$education)
## [1] 0.8519156
cor(Duncan$income, Duncan$education)
## [1] 0.7245124

Prestige & Income = .8378014

Prestige & Education = .8519156

Income & Education = .7245124

prestige.predictor <- lm(prestige ~ income, data = Duncan)
summary(prestige.predictor)
## 
## Call:
## lm(formula = prestige ~ income, data = Duncan)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -46.566  -9.421   0.257   9.167  61.855 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.4566     5.1901   0.473    0.638    
## income        1.0804     0.1074  10.062 7.14e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.4 on 43 degrees of freedom
## Multiple R-squared:  0.7019, Adjusted R-squared:  0.695 
## F-statistic: 101.3 on 1 and 43 DF,  p-value: 7.144e-13

Question 3

What percent of variance can be explained by your model?

The percent of variance : R2 = .7019 - That suggests that 70% of the prestige can be based on income

The estimated regression equation is as follows: \[ \hat{y}_i = 2.456 + 1.08 x_{i} \]

Interpret the regression slope. What is the predicted prestige when income = 70 vs. 90?

70 years old

2.456 + (1.08*70)
## [1] 78.056

90 Years old

2.456 + (1.08*90)
## [1] 99.656

Which prediction would be inappropriate and why?

The prediction of 90 years old would be inappropriate due it being outside of the parameters of the data set size. This is called extrapolating. Predictions must be made within the parameters of the data set.

y <- Duncan$prestige
x <- Duncan$income
y_diff <- y-mean(y)
x_diff <-x-mean(x)
b_1 <- sum(y_diff*x_diff)/sum((x_diff)^2)
b_1
## [1] 1.08039
b_0 <-mean(y) - mean(x)*b_1
b_0
## [1] 2.456574
plot(Duncan$prestige, Duncan$income, pch = 16, xlab = "Income", ylab = "Prestige")
abline(a = b_0, b = b_1, col = "blue")