# Load packages
library(dplyr)
library(ggplot2)
library(openintro)

Chapter 3: Simple linear regression

3.1 The “best fit” line

The simple linear regression model can be visualized by a straight line, a “best fit” line that cuts through the data in a way that minimizes the distance between the line and the data points. This can be done by using the geom_smooth() function.

# Scatterplot with regression line
ggplot(data = cars, aes(x = weight, y = price)) + 
  geom_point() + 
  geom_smooth(method = "lm", se = FALSE) # lm stands for linear model; se for standard errors

Chapter 4: Interpreting regression models

4.1 Fitting simple linear models

Create a linear model using lm(). This function return a model object having class “lm”. This object contains lots of information about your regression model, including:

  • the data used to fit the model,
  • the specification of the model,
  • the fitted values and residuals,
  • the residuals.
# Linear model for weight as a function of height
lm(price ~ weight, data = cars)
## 
## Call:
## lm(formula = price ~ weight, data = cars)
## 
## Coefficients:
## (Intercept)       weight  
##   -20.29521      0.01326

Interpretation

  • coefficient An one-pound increase in weight of a car is an increase in $13.26 in price.
  • intercept When a cars weight is 0, the price is about -2030 $. Obviously, the intercept is meaningless in this case.
# Create a linear model
mod <- lm(price ~ weight, data = cars)

# View summary of model
summary(mod)
## 
## Call:
## lm(formula = price ~ weight, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -12.767  -3.766  -1.155   2.568  35.440 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -20.295205   4.915159  -4.129 0.000132 ***
## weight        0.013264   0.001582   8.383 3.17e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.575 on 52 degrees of freedom
## Multiple R-squared:  0.5747, Adjusted R-squared:  0.5666 
## F-statistic: 70.28 on 1 and 52 DF,  p-value: 3.173e-11

Interpretation

  • Yes, the coefficient is statistically significant at 5% since it indicates that it has one ‘star’ of confidence.
  • Yes, the y-intercept is also statistically significant at 5%.
  • The price of a car that weight 3,000 pounds would be 19,485 $.
  • The residual standard error is 7.575. This means that the model estimeted price misses the actual price by $7575.

5.4 Interpretation of R^2

The R^2 reported for the regression model for price in terms of weight is 0.5666. This means that 56.66% of the variability in price can be explained by weight.

# Scatterplot with regression line
ggplot(data = cars, aes(x = mpgCity, y = price)) + 
  geom_point() + 
  geom_smooth(method = "lm", se = FALSE) # lm stands for linear model; se for standard errors

Interpretation * We can see a correlation between mpgCity and price. The correlation is that cars with lower mpgCity (Viechale mileage in city, miles per gallon) costs more. Cars that has a higher mpgCity tend to have a lower cost.