Demonstration of Mean Imputation with R

1. Create a “True” Data set

TrueX <- rnorm(100,100,15)
TrueX <- round(TrueX)
TrueX
##   [1] 123  92  98 107  95 122 124 105 120  86  90 115 102 103 103  72 121  93
##  [19] 117  96 113 123 104 105 105 109  92 129 110 103 134 104  95  97 118 107
##  [37] 124  93 105  82 100  77 117  91 114  96 100 107  85  86  93 108  96 135
##  [55]  85 123  76  82  99 119 102  89  76  93  89  97 107  98  76  93  91 107
##  [73]  64 103  95 120 132  94  94  85 108 130 106  82 102  88 102  88 110  91
##  [91]  86  89  80 124  79 103  89 115  95  64

2. Summary and Confidence Interval

summary(TrueX)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   64.00   90.75  100.00  100.61  109.25  135.00
t.test(TrueX,mu=100)
## 
##  One Sample t-test
## 
## data:  TrueX
## t = 0.39547, df = 99, p-value = 0.6933
## alternative hypothesis: true mean is not equal to 100
## 95 percent confidence interval:
##   97.54943 103.67057
## sample estimates:
## mean of x 
##    100.61

3. Create a data set with some value missing

# "part" for partial

index <- sample(1:100,10)
partX <- TrueX 
partX[index] <- NA

partX
##   [1] 123  92  98 107  95 122 124 105 120  86  90  NA 102 103 103  NA 121  93
##  [19] 117  NA 113 123 104  NA 105 109  92 129 110 103 134 104  NA  97  NA 107
##  [37] 124  93 105  82 100  77 117  91 114  96 100 107  85  86  93 108  96 135
##  [55]  85 123  76  82  99 119 102  89  76  93  89  97 107  98  76  93  91 107
##  [73]  64 103  95 120 132  94  94  85 108 130 106  82 102  NA 102  88 110  91
##  [91]  86  89  80 124  79  NA  NA 115  NA  64

4. Summary and Confidence Interval

summary(partX)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    64.0    91.0   101.0   100.9   109.8   135.0      10
t.test(partX,mu=100)
## 
##  One Sample t-test
## 
## data:  partX
## t = 0.57232, df = 89, p-value = 0.5685
## alternative hypothesis: true mean is not equal to 100
## 95 percent confidence interval:
##   97.66552 104.22337
## sample estimates:
## mean of x 
##  100.9444

5. Create an Imputation Value (mean or median)

# Wrong
imputeValue <- median(partX)
#Right
imputeValue <- mean(partX,na.rm=TRUE)

6. Impute this value into the data

imputedX <- partX

imputedX [is.na(imputedX)] <- imputeValue

7. Summary and Confidence Interval

summary(imputedX)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   64.00   91.75  100.94  100.94  108.00  135.00
t.test(imputedX,mu=100)
## 
##  One Sample t-test
## 
## data:  imputedX
## t = 0.63627, df = 99, p-value = 0.5261
## alternative hypothesis: true mean is not equal to 100
## 95 percent confidence interval:
##   97.99917 103.88972
## sample estimates:
## mean of x 
##  100.9444