title: “Quiz on regression” author: “karl kuus” date: “12/11/2017” output: html_document
# Load packages
library(dplyr)
library(ggplot2)
library(openintro)
The simple linear regression model can be visualized by a straight line, a “best fit” line that cuts through the data in a way that minimizes the distance between the line and the data points. This can be done by using the geom_smooth() function.
# Scatterplot with regression line
ggplot(data = cars, aes(x = weight, y = price)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) # lm stands for linear model; se for standard errors
Create a linear model using lm(). This function return a model object having class “lm”. This object contains lots of information about your regression model, including:
# Linear model for weight as a function of height
lm(price ~ weight, data = cars)
##
## Call:
## lm(formula = price ~ weight, data = cars)
##
## Coefficients:
## (Intercept) weight
## -20.29521 0.01326
Interpretation * Both the coefficient and the Y-intercept are statiscally 5%, the reason for this is beacuse they have 3 stars which means they’re vaild
coefficent a one-pounf increase in the weight of a car is direcrly assoicted with an increase of 0.01326 thousand USD in the price.
y-intercept if a cars weight is 0 pounds. its price is -20.29521 thousand USD. The intercept is meaningless in this case
Price = y-intercept + coefficient x 3000 pounds = -20.29521 + 0.01326 x 3000 pounds = 19.48479 (if the cars weight is 3000 pounds, the price will be 19.5 thousand dollars)
Show that the mean of residuals is zero (not exactly zero due to rounding error). Calculate residual standard error.
# Create a linear model
mod <- lm(price ~ weight, data = cars)
# View summary of model
summary(mod)
##
## Call:
## lm(formula = price ~ weight, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.767 -3.766 -1.155 2.568 35.440
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -20.295205 4.915159 -4.129 0.000132 ***
## weight 0.013264 0.001582 8.383 3.17e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.575 on 52 degrees of freedom
## Multiple R-squared: 0.5747, Adjusted R-squared: 0.5666
## F-statistic: 70.28 on 1 and 52 DF, p-value: 3.173e-11
Interpretation * the magnitude of a typical reisdual is 7.575 thousand USD * in terms, the model estimated price misses the aactual price by about 7.6 thousand USD
interpretation of the adjusted R^2 the new adjusted R^2 report for the regression model for price in terms of weight is 0.5666. this means that 56.66% of the variablility price can be explained by the weight.