Demonstration of Mean Imputation with R
1. Create a “True” Data set
TrueX <- rnorm(100,100,15)
TrueX <- round(TrueX)
TrueX
## [1] 123 92 98 107 95 122 124 105 120 86 90 115 102 103 103 72 121 93
## [19] 117 96 113 123 104 105 105 109 92 129 110 103 134 104 95 97 118 107
## [37] 124 93 105 82 100 77 117 91 114 96 100 107 85 86 93 108 96 135
## [55] 85 123 76 82 99 119 102 89 76 93 89 97 107 98 76 93 91 107
## [73] 64 103 95 120 132 94 94 85 108 130 106 82 102 88 102 88 110 91
## [91] 86 89 80 124 79 103 89 115 95 64
2. Summary and Confidence Interval
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 64.00 90.75 100.00 100.61 109.25 135.00
##
## One Sample t-test
##
## data: TrueX
## t = 0.39547, df = 99, p-value = 0.6933
## alternative hypothesis: true mean is not equal to 100
## 95 percent confidence interval:
## 97.54943 103.67057
## sample estimates:
## mean of x
## 100.61
3. Create a data set with some value missing
# "part" for partial
index <- sample(1:100,10)
partX <- TrueX
partX[index] <- NA
partX
## [1] 123 92 98 107 95 122 124 105 120 86 90 NA 102 103 103 NA 121 93
## [19] 117 NA 113 123 104 NA 105 109 92 129 110 103 134 104 NA 97 NA 107
## [37] 124 93 105 82 100 77 117 91 114 96 100 107 85 86 93 108 96 135
## [55] 85 123 76 82 99 119 102 89 76 93 89 97 107 98 76 93 91 107
## [73] 64 103 95 120 132 94 94 85 108 130 106 82 102 NA 102 88 110 91
## [91] 86 89 80 124 79 NA NA 115 NA 64
4. Summary and Confidence Interval
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 64.0 91.0 101.0 100.9 109.8 135.0 10
##
## One Sample t-test
##
## data: partX
## t = 0.57232, df = 89, p-value = 0.5685
## alternative hypothesis: true mean is not equal to 100
## 95 percent confidence interval:
## 97.66552 104.22337
## sample estimates:
## mean of x
## 100.9444
6. Impute this value into the data
imputedX <- partX
imputedX [is.na(imputedX)] <- imputeValue
7. Summary and Confidence Interval
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 64.00 91.75 100.94 100.94 108.00 135.00
##
## One Sample t-test
##
## data: imputedX
## t = 0.63627, df = 99, p-value = 0.5261
## alternative hypothesis: true mean is not equal to 100
## 95 percent confidence interval:
## 97.99917 103.88972
## sample estimates:
## mean of x
## 100.9444