require(mice)
## Loading required package: mice
## Warning: package 'mice' was built under R version 3.3.3
set.seed(1234)
sleep<-mammalsleep
imp<-mice(sleep[,-1],m=5)
##
## iter imp variable
## 1 1 sws ps ts mls gt
## 1 2 sws ps ts mls gt
## 1 3 sws ps ts mls gt
## 1 4 sws ps ts mls gt
## 1 5 sws ps ts mls gt
## 2 1 sws ps ts mls gt
## 2 2 sws ps ts mls gt
## 2 3 sws ps ts mls gt
## 2 4 sws ps ts mls gt
## 2 5 sws ps ts mls gt
## 3 1 sws ps ts mls gt
## 3 2 sws ps ts mls gt
## 3 3 sws ps ts mls gt
## 3 4 sws ps ts mls gt
## 3 5 sws ps ts mls gt
## 4 1 sws ps ts mls gt
## 4 2 sws ps ts mls gt
## 4 3 sws ps ts mls gt
## 4 4 sws ps ts mls gt
## 4 5 sws ps ts mls gt
## 5 1 sws ps ts mls gt
## 5 2 sws ps ts mls gt
## 5 3 sws ps ts mls gt
## 5 4 sws ps ts mls gt
## 5 5 sws ps ts mls gt
mult_imp_fit<-with(data = imp, expr = lm(ps~bw))
combined_fit<-pool(mult_imp_fit)
summary(combined_fit)
## est se t df Pr(>|t|)
## (Intercept) 1.9676140435 0.1967887440 9.998611 39.59701 2.170042e-12
## bw -0.0002395193 0.0002039416 -1.174450 53.96416 2.453714e-01
## lo 95 hi 95 nmis fmi lambda
## (Intercept) 1.5697630270 2.3654650600 NA 0.18380748 0.14359799
## bw -0.0006484039 0.0001693653 0 0.07853831 0.04500877
When we use mice (multiple imputation chained equations) on the model under the MAR assumption, the estimated intercept is 1.9676140435 and the estimated slope of bw is -0.0002395193.
M=5
kajal<-list()
set.seed(1234)
delta<-c(-5,-3,-1.5,0,1.5,3,5)
for (d in 1:length(delta)){
mids<-mice(sleep[,-1],m=M, print=FALSE)
for (m in 1:M){
mids$imp$ps[[m]] <- mids$imp$ps[[m]] + delta[d]
}
kajal[[d]]<-mids
}
B0<- list()
B1 <- list()
for (d in 1:length(delta)){
wit<-with(kajal[[d]],lm(ps~bw))
B0[d]<-summary(pool(wit))[1,1:2]
B1[d]<-summary(pool(wit))[2,1:2]
}
## Warning in B0[d] <- summary(pool(wit))[1, 1:2]: number of items to replace
## is not a multiple of replacement length
## Warning in B1[d] <- summary(pool(wit))[2, 1:2]: number of items to replace
## is not a multiple of replacement length
## Warning in B0[d] <- summary(pool(wit))[1, 1:2]: number of items to replace
## is not a multiple of replacement length
## Warning in B1[d] <- summary(pool(wit))[2, 1:2]: number of items to replace
## is not a multiple of replacement length
## Warning in B0[d] <- summary(pool(wit))[1, 1:2]: number of items to replace
## is not a multiple of replacement length
## Warning in B1[d] <- summary(pool(wit))[2, 1:2]: number of items to replace
## is not a multiple of replacement length
## Warning in B0[d] <- summary(pool(wit))[1, 1:2]: number of items to replace
## is not a multiple of replacement length
## Warning in B1[d] <- summary(pool(wit))[2, 1:2]: number of items to replace
## is not a multiple of replacement length
## Warning in B0[d] <- summary(pool(wit))[1, 1:2]: number of items to replace
## is not a multiple of replacement length
## Warning in B1[d] <- summary(pool(wit))[2, 1:2]: number of items to replace
## is not a multiple of replacement length
## Warning in B0[d] <- summary(pool(wit))[1, 1:2]: number of items to replace
## is not a multiple of replacement length
## Warning in B1[d] <- summary(pool(wit))[2, 1:2]: number of items to replace
## is not a multiple of replacement length
## Warning in B0[d] <- summary(pool(wit))[1, 1:2]: number of items to replace
## is not a multiple of replacement length
## Warning in B1[d] <- summary(pool(wit))[2, 1:2]: number of items to replace
## is not a multiple of replacement length
cbind(delta,do.call(rbind, B0))
## delta
## [1,] -5.0 1.097878
## [2,] -3.0 1.416747
## [3,] -1.5 1.710745
## [4,] 0.0 1.945853
## [5,] 1.5 2.247662
## [6,] 3.0 2.508167
## [7,] 5.0 2.848550
cbind(delta,do.call(rbind, B1))
## delta
## [1,] -5.0 -0.0007325303
## [2,] -3.0 -0.0004704532
## [3,] -1.5 -0.0003867074
## [4,] 0.0 -0.0002550001
## [5,] 1.5 -0.0001050738
## [6,] 3.0 0.0001163221
## [7,] 5.0 0.0003058731
Here, we assume the missingness mechanism is MNAR and use pattern mixture models to perform a sensitivity analysis. We notice several things. First, from the \(\beta_0\) list, we see that as the \(\delta\) values increase, so did the \(\beta_0\) values. All of these values were positive and ranged from 1.098 to 2.84. Secondly, from the \(\beta_1\) list, we see that as the \(\delta\) values increase, the \(\beta_1\) values also increased. The values shifted from negative to positive. At \(\delta\) = 3, we see the first positive value for \(\beta_1\), and this change in sign and increase in values would suggest that the data is sensitive to changes in the delta values, even if the change is small.
The predictive mean matching (PMM) algorithm follows the following steps.
First, we create a linear regression model to get a set of regression coefficients. This model is created with the fully observed values.
Second, we sample from a multivariate normal posterior predictive distribution centered at \(\hat{\beta}\). We use this to get another set of regression coefficients.
Third, we use this new set of regression coefficients to generate another regression model. We use this model to generate expected values for both the fully observed x values and not fully observed x values.
Fourth, for each x value that is not fully observed, we look at its predicted value. We check for all the predicted values from the completed cases that are close to the predicted values.
Fifth, from all values that are close to the predicted values, we will randomly choose one of the values and replace the missing value with its observed value–not the model value.
Sixthy, we repeat and terate until all the datasets and complete.
PMM Pros and Cons:
Pro: PMM is a useful and accurate tool for imputation as it preserves the original data distributions, as it does not allow imputations to be outside the observed values and does not have predictive power.
Con: With outliers in the dataset, PMM is not a strong imputation tool.