title: “Quiz on regression” author: “karl kuus” date: “12/11/2017” output: html_document

# Load packages
library(dplyr)
library(ggplot2)
library(openintro)

Chapter 3: Simple linear regression

3.1 The “best fit” line

The simple linear regression model can be visualized by a straight line, a “best fit” line that cuts through the data in a way that minimizes the distance between the line and the data points. This can be done by using the geom_smooth() function.

# Scatterplot with regression line
ggplot(data = cars, aes(x = weight, y = price)) + 
geom_point() + 
geom_smooth(method = "lm", se = FALSE) # lm stands for linear model; se for standard errors

Chapter 4: Interpreting regression models

4.1 Fitting simple linear models

Create a linear model using lm(). This function return a model object having class “lm”. This object contains lots of information about your regression model, including:

  • the data used to fit the model,
  • the specification of the model,
  • the fitted values and residuals,
  • the residuals.
# Linear model for weight as a function of height
lm(price ~ weight, data = cars)
## 
## Call:
## lm(formula = price ~ weight, data = cars)
## 
## Coefficients:
## (Intercept)       weight  
##   -20.29521      0.01326

Interpretation * Both the coefficient and the Y-intercept are statiscally 5%, the reason for this is beacuse they have 3 stars which means they’re vaild

  • coefficent a one-pounf increase in the weight of a car is direcrly assoicted with an increase of 0.01326 thousand USD in the price.

  • y-intercept if a cars weight is 0 pounds. its price is -20.29521 thousand USD. The intercept is meaningless in this case

  • Price = y-intercept + coefficient x 3000 pounds = -20.29521 + 0.01326 x 3000 pounds = 19.48479 (if the cars weight is 3000 pounds, the price will be 19.5 thousand dollars)

Chapter 5: Model Fit

5.2 Standard error of residuals (Residual Standard Error)

Show that the mean of residuals is zero (not exactly zero due to rounding error). Calculate residual standard error.

# Create a linear model
mod <- lm(price ~ weight, data = cars)

# View summary of model
summary(mod)
## 
## Call:
## lm(formula = price ~ weight, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -12.767  -3.766  -1.155   2.568  35.440 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -20.295205   4.915159  -4.129 0.000132 ***
## weight        0.013264   0.001582   8.383 3.17e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.575 on 52 degrees of freedom
## Multiple R-squared:  0.5747, Adjusted R-squared:  0.5666 
## F-statistic: 70.28 on 1 and 52 DF,  p-value: 3.173e-11

Interpretation * the magnitude of a typical reisdual is 7.575 thousand USD * in terms, the model estimated price misses the aactual price by about 7.6 thousand USD

interpretation of the adjusted R^2 the new adjusted R^2 report for the regression model for price in terms of weight is 0.5666. this means that 56.66% of the variablility price can be explained by the weight.

Another strongly correlated variable

# Scatterplot with regression line
ggplot(data = cars, aes(x = mpgCity, y = price)) + 
geom_point() + 
geom_smooth(method = "lm", se = FALSE) # lm stands for linear model; se for standard errors

# Linear model for price as a function of mpgCity
lm(price ~ mpgCity, data = cars)
## 
## Call:
## lm(formula = price ~ mpgCity, data = cars)
## 
## Coefficients:
## (Intercept)      mpgCity  
##      45.784       -1.106
# Create a linear model
mod <- lm(price ~ mpgCity, data = cars)

# View summary of model
summary(mod)
## 
## Call:
## lm(formula = price ~ mpgCity, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -11.447  -5.368  -2.850   5.140  37.134 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  45.7839     4.4983  10.178 5.63e-14 ***
## mpgCity      -1.1062     0.1857  -5.956 2.26e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.956 on 52 degrees of freedom
## Multiple R-squared:  0.4056, Adjusted R-squared:  0.3941 
## F-statistic: 35.48 on 1 and 52 DF,  p-value: 2.256e-07

INTERPRETATION