Multiple linear regression on housing price

hdata <- read.csv(file="C:/Users/fangy/Desktop/All/3study/harrisburg/anly502/mrl.csv", header=TRUE, sep=",")

1. Scattor point for relations

p1<-ggplot(hdata,aes(x=Size,y=Price))+geom_point(color="blue")
p1

p2<-ggplot(hdata,aes(x=Tax,y=Price))+geom_point(color="green")
p2

p3<-ggplot(hdata,aes(x=Bedroom,y=Price))+geom_point(color="red")
p3

#### From the scattor plots, we cant find an obvious relations between price and size,tax or bedroom

2.We perform a correlation analysis and linear regression

library("Hmisc")
## Warning: package 'Hmisc' was built under R version 3.4.1
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## 
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
## 
##     format.pval, round.POSIXt, trunc.POSIXt, units
rcorr(as.matrix(hdata))
##         Price  Size  Tax Bedroom
## Price    1.00 -0.11 0.17   -0.14
## Size    -0.11  1.00 0.10    0.13
## Tax      0.17  0.10 1.00    0.05
## Bedroom -0.14  0.13 0.05    1.00
## 
## n= 100 
## 
## 
## P
##         Price  Size   Tax    Bedroom
## Price          0.2654 0.0948 0.1637 
## Size    0.2654        0.3117 0.2124 
## Tax     0.0948 0.3117        0.5922 
## Bedroom 0.1637 0.2124 0.5922
myvars=c("Price","Size","Tax","Bedroom")
hdata2=hdata[myvars]
plot(hdata2)

#### Predictions with just tax, or tax and size , or size, tax and bedroom

guess=lm(Price~Tax, data=hdata2)
guess
## 
## Call:
## lm(formula = Price ~ Tax, data = hdata2)
## 
## Coefficients:
## (Intercept)          Tax  
##   322708.22        22.53
result=lm(Price~Size+Tax,data=hdata2)
result
## 
## Call:
## lm(formula = Price ~ Size + Tax, data = hdata2)
## 
## Coefficients:
## (Intercept)         Size          Tax  
##   385107.62       -26.55        24.33
full=lm(Price~Size+Tax+Bedroom,data=hdata2)
full
## 
## Call:
## lm(formula = Price ~ Size + Tax + Bedroom, data = hdata2)
## 
## Coefficients:
## (Intercept)         Size          Tax      Bedroom  
##   424506.61       -23.20        25.09    -16318.30

3. Check 5% signicants level

g1<-predict(guess,data.frame(Price=500000,Tax=4000),interval="confidence")
g1
##        fit      lwr      upr
## 1 412843.6 367740.8 457946.3
g2<-predict(result,data.frame(Price=500000,Size=3000,Tax=4000),interval="confidence")
g2
##        fit      lwr    upr
## 1 402782.3 355336.6 450228
g3<-predict(full,data.frame(Price=500000,Size=3000,Tax=4000,Bedroom=3),interval="confidence")
g3
##        fit      lwr      upr
## 1 406333.8 358810.7 453856.8

As a result, we can see our prediction is not great, maybe we need to consider more variables such as age of the house, areas and so on