data() # looking at base R datasets
#load libraries
library(ggplot2)
For this weeks discussion I selected base R dataset called “mtcars” which is based on Motor Trend Car Road Tests.
# load and review the dataset
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
help(mtcars)
## starting httpd help server ... done
# looking at linear correlation between mpg and hp(horsepower)
theme_set(theme_bw())
ggplot(mtcars, aes(mpg, hp))+
geom_point()+
geom_smooth(method = "lm", se=F)+
labs(title = "Motor Trend Car Road Tests",
x= "Miles Per Gallon",
y= "Horsepower",
subtitle="Linear Model")
lm <- lm(mpg ~ hp, data=mtcars)
lm
##
## Call:
## lm(formula = mpg ~ hp, data = mtcars)
##
## Coefficients:
## (Intercept) hp
## 30.09886 -0.06823
summary(lm)
##
## Call:
## lm(formula = mpg ~ hp, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.7121 -2.1122 -0.8854 1.5819 8.2360
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 30.09886 1.63392 18.421 < 2e-16 ***
## hp -0.06823 0.01012 -6.742 1.79e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.863 on 30 degrees of freedom
## Multiple R-squared: 0.6024, Adjusted R-squared: 0.5892
## F-statistic: 45.46 on 1 and 30 DF, p-value: 1.788e-07
Based on the p value, we can say this this is a good model.
qqnorm(resid(lm))
qqline(resid(lm))
The residuals are normally distributed. Overall this is a good linear model.