This is the simulation to show the difference in using the technique of regression imputation.
Deterministic: This method assumes that the imputed values appear close to the regression line.
Stochastic: It is a improvement to the previous method by aiming to preserve the variability of data by adding an error(residual) factor to the predicted value.
install.packages("mice")
WARNING: Rtools is required to build R packages but is not currently installed. Please download and install the appropriate version of Rtools before proceeding:
https://cran.rstudio.com/bin/windows/Rtools/
Installing package into 㤼㸱C:/Users/anilp/OneDrive/Documents/R/win-library/3.6㤼㸲
(as 㤼㸱lib㤼㸲 is unspecified)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.6/mice_3.6.0.zip'
Content type 'application/zip' length 1855619 bytes (1.8 MB)
downloaded 1.8 MB
package ‘mice’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\anilp\AppData\Local\Temp\RtmpuYzw6r\downloaded_packages
# Install and load the R package mice
install.packages("mice") # Needs to be done only once
WARNING: Rtools is required to build R packages but is not currently installed. Please download and install the appropriate version of Rtools before proceeding:
https://cran.rstudio.com/bin/windows/Rtools/
Installing package into 㤼㸱C:/Users/anilp/OneDrive/Documents/R/win-library/3.6㤼㸲
(as 㤼㸱lib㤼㸲 is unspecified)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.6/mice_3.6.0.zip'
Content type 'application/zip' length 1855619 bytes (1.8 MB)
downloaded 1.8 MB
package ‘mice’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\anilp\AppData\Local\Temp\RtmpuYzw6r\downloaded_packages
library("mice") # Load package
Loading required package: lattice
Attaching package: 㤼㸱mice㤼㸲
The following objects are masked from 㤼㸱package:base㤼㸲:
cbind, rbind
MyData <- read.csv(file="C:/Development/diabetes_test.csv", header=TRUE, sep=";")
data <- data.frame(MyData)
# Deterministic regression imputation
imp <- mice(data, method = "norm.predict", m = 1) # Impute data
iter imp variable
1 1 Glucose BloodPressure SkinThickness Insulin BMI
2 1 Glucose BloodPressure SkinThickness Insulin BMI
3 1 Glucose BloodPressure SkinThickness Insulin BMI
4 1 Glucose BloodPressure SkinThickness Insulin BMI
5 1 Glucose BloodPressure SkinThickness Insulin BMI
data_det <- complete(imp) # Store data
# Stochastic regression imputation
imp <- mice(data, method = "norm.nob", m = 1) # Impute data
iter imp variable
1 1 Glucose BloodPressure SkinThickness Insulin BMI
2 1 Glucose BloodPressure SkinThickness Insulin BMI
3 1 Glucose BloodPressure SkinThickness Insulin BMI
4 1 Glucose BloodPressure SkinThickness Insulin BMI
5 1 Glucose BloodPressure SkinThickness Insulin BMI
data_sto <- complete(imp) # Store data
# Graphical comparison of deterministic and stochastic regression imputation
par(mfrow = c(1, 2)) # Both plots in one graphic
# Deterministic regression imputation plot
plot(data$Glucose[!is.na(data$Insulin)], data_det$Insulin[!is.na(data$Insulin)],# Plot of observed values
xlim = c(40, 200), ylim = c(10, 900),
main = "Deterministic Regression",
xlab = "Glucose", ylab = "Insulin")
points(data$Glucose[is.na(data$Insulin)], data_det$Insulin[is.na(data$Insulin)],# Plot of missing values
col = "red")
abline(lm(data$Insulin ~ data$Glucose, data_det), col = "#1b98e0", lwd = 1.5) # Regression slope
# Stochastic regression imputation plot
plot(data$Glucose[!is.na(data$Insulin)], data_sto$Insulin[!is.na(data$Insulin)],# Plot of observed values
xlim = c(40, 200), ylim = c(10, 900),
main = "Stochastic Regression",
xlab = "Glucose", ylab = "Insulin")
points(data$Glucose[is.na(data$Insulin)], data_sto$Insulin[is.na(data$Insulin)],# Plot of missing values
col = "red")
abline(lm(data$Insulin ~ data$Glucose, data_sto), col = "#1b98e0", lwd = 1.5) # Regression slope