Below we have an example of how to input missing data using the k nearest neighbors methods with the iris data set using the knnImpute function in the caret package.

First, I created a data set with some missing values. Because the knnImpute function only works with metric data, we are only using the first four, metric, variables in the iris data set. We then use the prodNA function to produce 10% missing data for the first four variables in the iris dataset.

#data(iris)
#iris.mis = prodNA(iris[1:4], noNA = 0.1); head(iris.mis)
#write.csv(iris.mis, "iris.mis.csv")
iris.mis = read.csv("iris.mis.csv", header = TRUE); head(iris.mis)
##   X Sepal.Length Sepal.Width Petal.Length Petal.Width
## 1 1           NA         3.5          1.4         0.2
## 2 2          4.9         3.0           NA         0.2
## 3 3          4.7         3.2          1.3          NA
## 4 4           NA         3.1          1.5         0.2
## 5 5          5.0         3.6          1.4          NA
## 6 6          5.4         3.9          1.7          NA

Then we set the seed for reproducibility and library the caret package. The specific package we are using the caret package is knnInpute, which uses the k nearest neighbors algorithm. The k nearest neighbors function uses data from other variables with similar non-missing values in the dataset to predict missing values. The caret package automatically standardizes the values, because the creators state that standardization almost always produces more accurate results.

To create the model of the missing values that will be used to predict the missing values, we use the preProcess function, which takes the data and the function that we want to run the data through, which in this case is “knnImpute”.

set.seed(12345)
library(caret)
iris.mis.model = preProcess(iris.mis, "knnImpute")

Finally, we use the predict function to predict the missing values using the model we just developed.

iris.mis.pred = predict(iris.mis.model, iris.mis); head(iris.mis.pred)
##           X Sepal.Length Sepal.Width Petal.Length Petal.Width
## 1 -1.714797   -0.9749811  0.97343759    -1.320918   -1.295742
## 2 -1.691780   -1.1682466 -0.15105066    -1.320918   -1.295742
## 3 -1.668762   -1.4098285  0.29874464    -1.377452   -1.321792
## 4 -1.645745   -1.4339867  0.07384699    -1.264384   -1.295742
## 5 -1.622728   -1.0474557  1.19833524    -1.320918   -1.191542
## 6 -1.599710   -0.5642920  1.87302820    -1.151316   -1.191542