Machine Learning and Kaggle.

Following instructions, I read the data, run the code, get my predictions, and submit the predictions. I get a picture of my submission.

necessaryPackages <-c("tidyverse","rpart", "rattle")
new.packages <- necessaryPackages[!(necessaryPackages%in% installed.packages()[,"Package"])]
if(length(new.packages))
  install.packages(new.packages, repos = "http://cran.us.r-project.org")
lapply(necessaryPackages, require, character.only = TRUE)
## [[1]]
## [1] TRUE
## 
## [[2]]
## [1] TRUE
## 
## [[3]]
## [1] TRUE
train <- read.csv2("data/train.csv", sep = ",", stringsAsFactors = TRUE)
test <- read.csv2("data/test.csv", sep = ",", stringsAsFactors = TRUE)
replace_na_most <- function(x){fct_explicit_na(x, na_level = names(which.max(table(x))))}

replace_na_med <- function(x){
  x[is.na(x)] <- median(x,na.rm = TRUE)
  x
}

cleanup_minimal <- function(data){
  nomis <- data %>% mutate_if(is.factor, replace_na_most) %>% mutate_if(is.numeric, replace_na_med)
  nomis
}

train_minclean <- cleanup_minimal(train)

test_minclean <- cleanup_minimal(test)


mod_rpart <- rpart(SalePrice~., data = train_minclean)
fancyRpartPlot(mod_rpart, caption = NULL)

pred_rpart <- predict(mod_rpart, newdata = test_minclean)

submission_rpart <- tibble(Id = test$Id, SalePrice = pred_rpart)

## These are my predictions
write_csv(submission_rpart, file = "submission_rpart.csv")

head(submission_rpart)
## # A tibble: 6 x 2
##      Id SalePrice
##   <int>     <dbl>
## 1  1461   118199.
## 2  1462   151246.
## 3  1463   185210.
## 4  1464   185210.
## 5  1465   249392.
## 6  1466   185210.

After I submit my 2 column predictions to Kaggle submission, I get the follwing rank.

Thank you so much Professor for the wonderful class! Have a great summer!!