# Load packages
library(dplyr)
library(ggplot2)
library(openintro)

Chapter 3: Simple linear regression

3.1 The “best fit” line

The simple linear regression model can be visualized by a straight line, a “best fit” line that cuts through the data in a way that minimizes the distance between the line and the data points. This can be done by using the geom_smooth() function.

# Scatterplot with regression line
ggplot(data = cars, aes(x = weight, y = price)) + 
  geom_point() + 
  geom_smooth(method = "lm", se = FALSE) # lm stands for linear model; se for standard errors

Chapter 5: Model Fit

5.2 Standard error of residuals (Residual Standard Error)

Show that the mean of residuals is zero (not exactly zero due to rounding error). Calculate residual standard error.

# Create a linear model
mod <- lm(price ~ weight, data = cars)

# View summary of model
summary(mod)
## 
## Call:
## lm(formula = price ~ weight, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -12.767  -3.766  -1.155   2.568  35.440 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -20.295205   4.915159  -4.129 0.000132 ***
## weight        0.013264   0.001582   8.383 3.17e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.575 on 52 degrees of freedom
## Multiple R-squared:  0.5747, Adjusted R-squared:  0.5666 
## F-statistic: 70.28 on 1 and 52 DF,  p-value: 3.173e-11

Interpretation

Is the coefficient statistically significant at 5%? Yes, the data is more that 95% significant

Is the y-intercept statistically significant at 5%? Yes, the data is more that 95% significant

Interpret the coefficient of weight. For every pound the car weighs the price is $13.26

Interpret the y-intercept. The y-intercept is negative making the data invalid

What would be the price of a car that weighs 3000 pounds? $39,771.70

What is the reported residual standard error? What does it mean? 7.575 on 52 degrees of freedom meaning the standard error away from the best fit line is around 7.575

What is the reported adjusted R squared? What does it mean? The adjusted R squared is .5666 meaning 56.66% of the variability in price is in weight

Find another variable that is strongly correlated to price. Demonstrate the nature of the relationship with a scatterplot and a regression model. Explain them in a sentence or two. ## Chapter 3: Simple linear regression

3.1 The “best fit” line

The simple linear regression model can be visualized by a straight line, a “best fit” line that cuts through the data in a way that minimizes the distance between the line and the data points. This can be done by using the geom_smooth() function.

# Scatterplot with regression line
ggplot(data = cars, aes(x = price, y = mpgCity)) + 
  geom_point() + 
  geom_smooth(method = "lm", se = FALSE) # lm stands for linear model; se for standard errors

Chapter 5: Model Fit

5.2 Standard error of residuals (Residual Standard Error)

Show that the mean of residuals is zero (not exactly zero due to rounding error). Calculate residual standard error.

# Create a linear model
mod <- lm(price ~ mpgCity, data = cars)

# View summary of model
summary(mod)
## 
## Call:
## lm(formula = price ~ mpgCity, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -11.447  -5.368  -2.850   5.140  37.134 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  45.7839     4.4983  10.178 5.63e-14 ***
## mpgCity      -1.1062     0.1857  -5.956 2.26e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.956 on 52 degrees of freedom
## Multiple R-squared:  0.4056, Adjusted R-squared:  0.3941 
## F-statistic: 35.48 on 1 and 52 DF,  p-value: 2.256e-07

Price is negatively correlated to mpgcity meaning generally, as the mpgcity increases the price decreases