Kaggle Digit Recognizer

knn_benchmark.R baseline solution

img

This is the knn_benchmark.R solution directly from Kaggle except as noted below

Added: Set up and timing code

tms = paste("", Sys.time(), sep = "")
fileout = paste("kaggle-digit-winter14-", substr(tms, 6, 10), "-", substr(tms, 
    12, 13), substr(tms, 15, 16), ".csv", sep = "")
setwd("C:\\R\\digit\\src")
startClock <- proc.time()

This section is directly from the Kaggle knn_benchmark.R solution:

library(FNN)
## Warning: package 'FNN' was built under R version 2.15.3

train <- read.csv("../data/train.csv", header = TRUE)
test <- read.csv("../data/test.csv", header = TRUE)

# To test subset: train <- train[1:100,] test <- test[1:100,]

labels <- train[, 1]
train <- train[, -1]
results <- (0:9)[knn(train, test, labels, k = 10, algorithm = "cover_tree")]

The Kaggle knn_benchmark.R example writes a single column of predicted digits to the output file, but that format isn't accepted by Kaggle when you submit it. Instead, two columns are needed as follows: the first column is simply the row number and the second column is the predicited digit. The two columns are separated by a comma.

# Invalid format: write(results, file='knn_benchmark.csv', ncolumns=1)
# Correct format: Col headers
write("ImageId,label", file = fileout, ncolumns = 1)
# Col data
write(t(cbind(c(1:length(results)), results)), file = fileout, ncolumns = 2, 
    sep = ",", append = TRUE)

Added: Timing code

cat("elapsed minutes: ", (proc.time()[3] - startClock[3])/60)
## elapsed minutes:  31.25