# Load packages
library(dplyr)
library(ggplot2)
library(openintro)

Chapter 3: Simple linear regression

3.1 The “best fit” line

The simple linear regression model can be visualized by a straight line, a “best fit” line that cuts through the data in a way that minimizes the distance between the line and the data points. This can be done by using the geom_smooth() function.

# Scatterplot with regression line
ggplot(data = cars, aes(x = weight, y = price)) + 
  geom_point() + 
  geom_smooth(method = "lm", se = FALSE) # lm stands for linear model; se for standard errors

Chapter 4: Interpreting regression models

4.1 Fitting simple linear models

Create a linear model using lm(). This function return a model object having class “lm”. This object contains lots of information about your regression model, including:

  • the data used to fit the model,
  • the specification of the model,
  • the fitted values and residuals,
  • the residuals.
# Linear model for weight as a function of height
lm(weight ~ price, data = cars)
## 
## Call:
## lm(formula = weight ~ price, data = cars)
## 
## Coefficients:
## (Intercept)        price  
##     2171.11        43.33

Interpretation

Is the coefficient statistically significant at 5%?

No the coefficient is not statistically significant at 5% because there aren’t three *.

Is the y-intercept statistically significant at 5%?

No the y-intercept is not statistically significant at 5% because there aren’t three *.

Chapter 5: Model Fit

5.2 Standard error of residuals (Residual Standard Error)

Show that the mean of residuals is zero (not exactly zero due to rounding error). Calculate residual standard error.

# Create a linear model
mod <- lm(weight ~ price, data = cars)

# View summary of model
summary(mod)
## 
## Call:
## lm(formula = weight ~ price, data = cars)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1328.29  -228.09    10.92   258.19   924.27 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2171.113    118.956  18.251  < 2e-16 ***
## price         43.331      5.169   8.383 3.17e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 433 on 52 degrees of freedom
## Multiple R-squared:  0.5747, Adjusted R-squared:  0.5666 
## F-statistic: 70.28 on 1 and 52 DF,  p-value: 3.173e-11

Interpretation

Interpret the coefficient of weight. The coefficient of weight is 2171.11.

Interpret the y-intercept. The y-intercept is 43.33 thousand dollars, which is price.

What would be the price of a car that weighs 3000 pounds? It would cost 20,000 dollars.

What is the reported residual standard error? What does it mean? 433 on 52 degrees of freedom is the residual standard error. The estimate is 52 degrees off from the actual number.

What is the reported adjusted R squared? What does it mean? The reported adjusted R squared is 0.5666 which means that the variability in weight can be explained by price by 57%.

Carat and Price

# Scatterplot with regression line
ggplot(data = diamonds, aes(x = carat, y = price)) + 
  geom_point() + 
  geom_smooth(method = "lm", se = FALSE) # lm stands for linear model; se for standard errors

# Create a linear model
mod <- lm(carat ~ price, data = diamonds)

# View summary of model
summary(mod)
## 
## Call:
## lm(formula = carat ~ price, data = diamonds)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.35765 -0.11329 -0.02442  0.10344  2.66973 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3.673e-01  1.112e-03   330.2   <2e-16 ***
## price       1.095e-04  1.986e-07   551.4   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.184 on 53938 degrees of freedom
## Multiple R-squared:  0.8493, Adjusted R-squared:  0.8493 
## F-statistic: 3.041e+05 on 1 and 53938 DF,  p-value: < 2.2e-16

The larger size carats a diamond has the more expensive it will be. People will pay up to about $18,000 though for any size carat.