Competition Details

Sold! How do home features add up to its price tag?

With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges you to predict the final price of each home.

Data Loading

source_github <- function(u) {
  suppressWarnings(require(RCurl))
  script <- getURL(u, ssl.verifypeer = FALSE)
  eval(parse(text = script))
}
suppressWarnings(source("https://raw.githubusercontent.com/RobertSellers/r_Scripts/master/regression_suggestions.R"))

train <- read.csv("https://raw.githubusercontent.com/RobertSellers/605_MATH/master/final/train.csv", header = TRUE)
train<-train%>%select(-Id)

test <-read.csv("https://raw.githubusercontent.com/RobertSellers/605_MATH/master/final/test.csv", header = TRUE)

Data Exploration

Predictor Variable analysis

The following script is disabled, however when run, it outputs a .txt file with predictor variable interpretations useful when determining transformations.

Example variable output from suggest function.

Example variable output from suggest function.

Numeric Predictor Analysis

Only numeric variables are shown to save on document space.

predictorPlots(train,train$SalePrice)

Model Selection

Test data was treated to the same variable transformations as the original training data set. A stepwise procedure was used.

reshape<-function(data){
  subset.int<-select_if(data, is.numeric)
  subset.int[is.na(subset.int)] <- 0
  subset.int<-subset.int[complete.cases(subset.int),]
  return(subset.int)
}
trainMod<-reshape(train)
testMod<-reshape(test)
all<-lm(SalePrice~.,trainMod)
null<-lm(SalePrice~1, trainMod)

stepResults<-step(null,scope=list(lower=null,upper=all),direction="both",trace=0)
#append scored data
result<-data.frame('Observed'=testMod$Id,'Predicted'=predict(stepResults, testMod))
par(mfrow=c(2,1))
plot(density(trainMod$SalePrice))
plot(density(na.omit(result$Predicted)))