suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(UsingR))

2. The data set homedata* (UsingR) contains assessed values of houses in Maplewood, NH for the years 1970 and 2000. Suppose you are interested in predicting house values in 2000 from house values in 1970 using a linear model. *

head(homedata)
##    y1970  y2000
## 1  89700 359100
## 2 118400 504500
## 3 116400 477300
## 4 122000 500400
## 5  91500 433900
## 6 102800 464800
str(homedata)
## 'data.frame':    6841 obs. of  2 variables:
##  $ y1970: num  89700 118400 116400 122000 91500 ...
##  $ y2000: num  359100 504500 477300 500400 433900 ...

a. Create a scatter plot showing the data

sp1 = ggplot(homedata, aes(x = y1970, y = y2000)) + 
        geom_point() +
        xlab("1970") +
        ylab("2000")
sp1

b. Include a least-squares regression line on the plot

sp2 = sp1 + geom_smooth(method=lm, color="red",se=FALSE)
sp2
## `geom_smooth()` using formula 'y ~ x'

### c. Determine the equation for the regression line and interpret the model

model = lm(y2000 ~ y1970, data = homedata)
model
## 
## Call:
## lm(formula = y2000 ~ y1970, data = homedata)
## 
## Coefficients:
## (Intercept)        y1970  
##  -1.040e+05    5.258e+00

– The equation for the regression line is y2000 = (-1.040e+05) + (5.258e+00)* y1970.

– The value of a House rises by 5.258e+00 with each passing year.

d. Use the model to predict housing values in 2000 for houses assessed values of $55,000, $60,000, and $65,000 in 1970

predict(model, data.frame(y1970 = c(55000, 60000, 65000)))
##        1        2        3 
## 185183.8 211473.6 237763.5

– Using the model prediction : The homes worth $55,000, $60,000, and $65,000 in 1970 will be worth 185183.8, 211473.6, and 237763.5, respectively, in 2000.