# Load packages
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(openintro)
## Please visit openintro.org for free statistics materials
##
## Attaching package: 'openintro'
## The following object is masked from 'package:ggplot2':
##
## diamonds
## The following objects are masked from 'package:datasets':
##
## cars, trees
The simple linear regression model can be visualized by a straight line, a “best fit” line that cuts through the data in a way that minimizes the distance between the line and the data points. This can be done by using the geom_smooth() function.
# Scatterplot with regression line
ggplot(data = cars, aes(x = weight, y = price)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) # lm stands for linear model; se for standard errors
Create a linear model using lm(). This function return a model object having class “lm”. This object contains lots of information about your regression model, including:
# Linear model for weight as a function of height
lm(price ~ weight, data = cars)
##
## Call:
## lm(formula = price ~ weight, data = cars)
##
## Coefficients:
## (Intercept) weight
## -20.29521 0.01326
Interpretation
Show that the mean of residuals is zero (not exactly zero due to rounding error). Calculate residual standard error.
# Create a linear model
mod <- lm(price ~ weight, data = cars)
# View summary of model
summary(mod)
##
## Call:
## lm(formula = price ~ weight, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.767 -3.766 -1.155 2.568 35.440
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -20.295205 4.915159 -4.129 0.000132 ***
## weight 0.013264 0.001582 8.383 3.17e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.575 on 52 degrees of freedom
## Multiple R-squared: 0.5747, Adjusted R-squared: 0.5666
## F-statistic: 70.28 on 1 and 52 DF, p-value: 3.173e-11
Interpretation coefficient is not statistically significant at 5% because there are 3 stars showing. y-intercept is not statistically significant at 5% because there are 3 stars showing. *The price of the car that weighs 3000 pounds would be $68.23. I found the price by looking at the R package and dividing the weight which was 3000 pounds by the price which was $43.33.
The R^2 reported for the regression model for weight in terms of height is 0.5136. This means that 51.36% of the variability in weight can be explained by height.
# Scatterplot with regression line
ggplot(data = cars, aes(x = price, y = weight)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) # lm stands for linear model; se for standard errors
Interpretation