Mean Imputation with R

Demonstration of Mean Imputation with R

1. Create a “True” Data set

TrueX <- rnorm(100,100,15)
TrueX <- round(TrueX)
TrueX

##   [1] 123  92  98 107  95 122 124 105 120  86  90 115 102 103 103  72 121  93
##  [19] 117  96 113 123 104 105 105 109  92 129 110 103 134 104  95  97 118 107
##  [37] 124  93 105  82 100  77 117  91 114  96 100 107  85  86  93 108  96 135
##  [55]  85 123  76  82  99 119 102  89  76  93  89  97 107  98  76  93  91 107
##  [73]  64 103  95 120 132  94  94  85 108 130 106  82 102  88 102  88 110  91
##  [91]  86  89  80 124  79 103  89 115  95  64

2. Summary and Confidence Interval

summary(TrueX)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   64.00   90.75  100.00  100.61  109.25  135.00

t.test(TrueX,mu=100)

## 
##  One Sample t-test
## 
## data:  TrueX
## t = 0.39547, df = 99, p-value = 0.6933
## alternative hypothesis: true mean is not equal to 100
## 95 percent confidence interval:
##   97.54943 103.67057
## sample estimates:
## mean of x 
##    100.61

3. Create a data set with some value missing

# "part" for partial

index <- sample(1:100,10)
partX <- TrueX 
partX[index] <- NA

partX

##   [1] 123  92  98 107  95 122 124 105 120  86  90  NA 102 103 103  NA 121  93
##  [19] 117  NA 113 123 104  NA 105 109  92 129 110 103 134 104  NA  97  NA 107
##  [37] 124  93 105  82 100  77 117  91 114  96 100 107  85  86  93 108  96 135
##  [55]  85 123  76  82  99 119 102  89  76  93  89  97 107  98  76  93  91 107
##  [73]  64 103  95 120 132  94  94  85 108 130 106  82 102  NA 102  88 110  91
##  [91]  86  89  80 124  79  NA  NA 115  NA  64

4. Summary and Confidence Interval

summary(partX)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    64.0    91.0   101.0   100.9   109.8   135.0      10

t.test(partX,mu=100)

## 
##  One Sample t-test
## 
## data:  partX
## t = 0.57232, df = 89, p-value = 0.5685
## alternative hypothesis: true mean is not equal to 100
## 95 percent confidence interval:
##   97.66552 104.22337
## sample estimates:
## mean of x 
##  100.9444

5. Create an Imputation Value (mean or median)

# Wrong
imputeValue <- median(partX)

#Right
imputeValue <- mean(partX,na.rm=TRUE)

6. Impute this value into the data

imputedX <- partX

imputedX [is.na(imputedX)] <- imputeValue

7. Summary and Confidence Interval

summary(imputedX)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   64.00   91.75  100.94  100.94  108.00  135.00

t.test(imputedX,mu=100)

## 
##  One Sample t-test
## 
## data:  imputedX
## t = 0.63627, df = 99, p-value = 0.5261
## alternative hypothesis: true mean is not equal to 100
## 95 percent confidence interval:
##   97.99917 103.88972
## sample estimates:
## mean of x 
##  100.9444

Mean Imputation with R

Data Science with R

DragonflyStats.github.io

Demonstration of Mean Imputation with R

1. Create a “True” Data set

2. Summary and Confidence Interval

3. Create a data set with some value missing

4. Summary and Confidence Interval

5. Create an Imputation Value (mean or median)

6. Impute this value into the data

7. Summary and Confidence Interval