# Load packages
library(dplyr)
library(ggplot2)
library(openintro)

# Load Data
data(cars)

Chapter 3: Simple linear regression

3.1 The “best fit” line

The simple linear regression model can be visualized by a straight line, a “best fit” line that cuts through the data in a way that minimizes the distance between the line and the data points. This can be done by using the geom_smooth() function.

# Scatterplot with regression line
ggplot(data = cars, aes(x = price, y = weight)) + 
  geom_point() + 
  geom_smooth(method = "lm", se = FALSE) # lm stands for linear model; se for standard errors

Chapter 4: Interpreting regression models

4.1 Fitting simple linear models

Create a linear model using lm(). This function return a model object having class “lm”. This object contains lots of information about your regression model, including:

  • the data used to fit the model,
  • the specification of the model,
  • the fitted values and residuals,
  • the residuals.
# Linear model for weight as a function of height
lm(price ~ weight, data = cars)
## 
## Call:
## lm(formula = price ~ weight, data = cars)
## 
## Coefficients:
## (Intercept)       weight  
##   -20.29521      0.01326

Interpretation

  • coefficient A one dollar increase in price results in a 0.01326 increase in weight.
  • intercept When a car weights nothing the price is -20.29521, Intercet is meaningless.

Chapter 5: Model Fit

5.2 Standard error of residuals (Residual Standard Error)

Show that the mean of residuals is zero (not exactly zero due to rounding error). Calculate residual standard error.

# Create a linear model
mod <- lm(price ~ weight, data = cars)

# View summary of model
summary(mod)
## 
## Call:
## lm(formula = price ~ weight, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -12.767  -3.766  -1.155   2.568  35.440 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -20.295205   4.915159  -4.129 0.000132 ***
## weight        0.013264   0.001582   8.383 3.17e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.575 on 52 degrees of freedom
## Multiple R-squared:  0.5747, Adjusted R-squared:  0.5666 
## F-statistic: 70.28 on 1 and 52 DF,  p-value: 3.173e-11

Interpretation

  • Yes the coeffcient is significant it has three stars making it 99.9%.
  • Yes the y-intercept is significant it has three stars making it 99.9%.
  • The price of a car that weighs 3000pounds is 20,000 dollars.

5.4 Interpretation of residual standard error and Adjusted R-squared

  • Residual standard error: 7.575 on 52 degrees of freedom it means there a standered divation of 7.575 on 52 degrees. *The Adjusted R-squared is 0.5666, it means that the data represents 56.6% varibility.

Couple sentences on Second varible

  • The higher city mpg typically the lower the price will be.
  • AT -1.106 mpg the car will cost 45.784 dollars