R Essentials: Acme Realty

Jiang Li

8/29/2017

R Essentials: Acme Realty

You are the data scientist at Acme Realty, a real estate company specializing in listings in upscale Hillsborough, California. Senior management has asked you to build a predictive sales model using R.

You have two sets of data at your disposal:

To be thorough, you plan to build two regression-based predictive models in R. The first will use time series regression to forecast future sales, based on historic sales. The second will use multivariate regression to forecast sales prices based on the house size and lot size.

Load at the data

library(readxl)
## Read time series dataset
time.df = read_excel(path = "DataScience_7_Case_TimeSeries.xls",sheet = "Sheet1")
dim(time.df)
## [1] 12  2
head(time.df)
##      Date Price ($M)
## 1 2008.00        2.6
## 2 2008.25        2.5
## 3 2008.50        2.5
## 4 2008.75        2.6
## 5 2009.00        2.7
## 6 2009.25        2.7
## Read the house dataset with sale, size and lot
sale.df = read_excel(path = "DataScience_7_Case_realdata.xls")
dim(sale.df)
## [1] 18  3
sale.df
##    Price House  Lot
## 1    6.0   6.9 42.7
## 2    5.8   8.0 36.6
## 3    5.6   8.0 44.0
## 4    3.5   3.8 18.0
## 5    3.4   6.1 27.4
## 6    3.4   4.3 22.2
## 7    2.7   3.8 22.0
## 8    2.6   5.0 29.3
## 9    2.6   3.6 31.4
## 10   2.3   3.1 22.2
## 11   2.3   3.9 21.7
## 12   2.3   3.2 24.4
## 13   1.9   3.5 25.3
## 14   1.9   3.4 24.0
## 15   1.9   3.2 21.8
## 16   1.6   3.3  6.6
## 17   1.6   2.3 15.9
## 18   1.5   2.3 21.2

Forecast sales for 2011

Using the regression analysis capability in R, forecast home sales for 2011. Use the TimeSeries dataset. State the governing equation.

library(ggplot2)
ggplot(data = time.df,aes(x = Date,y = `Price ($M)`)) + 
  geom_bar(stat = "identity",fill='grey') +
  geom_point(color="blue")+
  geom_line(color='red')+
  ggtitle("Sale price from years 2008 to 2010")

fit = lm(formula = `Price ($M)`~Date,data=time.df)
summary(fit)
## 
## Call:
## lm(formula = `Price ($M)` ~ Date, data = time.df)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.243473 -0.046970 -0.003613  0.039744  0.203380 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 216.06725   83.86487   2.576   0.0276 *
## Date         -0.10629    0.04174  -2.547   0.0290 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1248 on 10 degrees of freedom
## Multiple R-squared:  0.3934, Adjusted R-squared:  0.3328 
## F-statistic: 6.486 on 1 and 10 DF,  p-value: 0.02902
## Replot with fitted line
ggplot(data = time.df,aes(x = Date,y = `Price ($M)`)) + 
  geom_bar(stat = "identity",fill='grey') +
  geom_point(color="blue")+
  geom_line(color='red')+
  geom_abline(slope = fit$coefficients[2],intercept = fit$coefficients[1],color = 'purple')+ ## Added regression line
  geom_smooth(method='lm',color='purple',fill='green')+ ## Added shade
  ggtitle("Sale price from years 2008 to 2010\n(Purple is the regression line)")

d.2011 = data.frame(Date=2011)
predict.2011 = predict(fit,d.2011)

cat("Forecast sale price for 2011 is $",round(predict.2011,2),"M",sep = "")
## Forecast sale price for 2011 is $2.31M

Forcast price basd on size and lot size

Using the regression analysis capability in R, forecast the price for a house size of 4000 square feet and a lot size of 22000 square feet. State the governing equation. Use the realdata dataset.

library(scatterplot3d)
## view the data
scatterplot3d(x = sale.df$House,sale.df$Lot,sale.df$Price,
              xlab = "House size (1000 square feet)",
              ylab = "Lot size (1000 square feet)",
              zlab = "Price ($M)",
              main="Sale price vs house and lot size",highlight.3d=TRUE,type="h")


## built model
fit2 = lm(formula = Price~House+Lot,data = sale.df)
summary(fit2)
## 
## Call:
## lm(formula = Price ~ House + Lot, data = sale.df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.88939 -0.25993 -0.03057  0.21752  1.09898 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.55415    0.39559  -1.401 0.181614    
## House        0.64680    0.12713   5.088 0.000134 ***
## Lot          0.02763    0.02478   1.115 0.282361    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5486 on 15 degrees of freedom
## Multiple R-squared:  0.8738, Adjusted R-squared:  0.857 
## F-statistic: 51.94 on 2 and 15 DF,  p-value: 1.808e-07
s3d = scatterplot3d(x = sale.df$House,sale.df$Lot,sale.df$Price,
              xlab = "House size (1000 square feet)",
              ylab = "Lot size (1000 square feet)",
              zlab = "Price ($M)",
              main="Sale price vs house and lot size",highlight.3d=TRUE,type = "h")
s3d$plane3d(fit2)

xx = data.frame("House"=4,"Lot"=22)
pp = predict(object = fit2,xx)
cat("Forecast sale price for a house with 4000 sf and 22000 sf lot is $",round(pp,2),"M",sep = "")
## Forecast sale price for a house with 4000 sf and 22000 sf lot is $2.64M