#Introduction

This report aims to build a linear regression model to predict the sale price of residential in Ames. The analysis will include data cleaning, transformation, modeling, & performance evaluation.

Bathrooms are not the only thing that significiantly contribute to higher sale prices, but so does Overall Quality, Above Ground Living Area (GrLivArea), and the Neighborhood.

#Data Description

The data used in this analysis is from Ames, Iowa, from the year 2006 to the year 2010. The data comes from the Assessor’s Office of the town. Missing values of “SalePrice” have been removed completely.

ames <- read.csv("ames.csv")

#Data Cleaning and Transformation

ames <- ames %>% 
  filter(!is.na(SalePrice))

ames$Neighborhood <- as.factor(ames$Neighborhood)

ames$TotalBathrooms <- ames$FullBath + 0.5 * ames$HalfBath

ames$LogSalePrice <- log(ames$SalePrice)
model <- lm(LogSalePrice ~ GrLivArea+ OverallQual + Neighborhood , data = ames)

prediction <- predict(model, newdata = ames)

rmse <- sqrt(mean((prediction - ames$LogSalePrice)^2))

r2 <- summary(model)$r.squared

#Key Findings -GrLivArea: larger living areas lead to higher sale prices, as we expected -OverallQual: higher quality ratings strongly correlate with higher prices -Neighborhood: certain neighborhoods are associated with higher prices

#Model Performance

kable(data.frame(RMSE = rmse, R_squared = r2))
RMSE R_squared
0.1664275 0.8332147

#Plot

ggplot(data = NULL, aes(x = prediction, y = prediction - ames$LogSalePrice)) +
  geom_point(alpha = 0.5) + 
  geom_hline(yintercept = 0, linetype = "dashed") + 
  labs(x = "Predicted Values", y = "Redisuals", title = "Plot")

###URL https://rpubs.com/ef0052/1300945