This dataset is a snapshot of the OpenPowerlifting database from Feb 2018, and was downloaded from Kaggle. The data has competitor results in three different categories. There is missing cases in some of the categories for example “Best Squat Kg”. I will be using the package Amelia to control for these missing cases. The variables selected are as follows for each competitor; Age, Weight in Kg, Best squat in Kg, Best bench in Kg, and Best bench in Kg. Total Kg, and Sex.
powerlifting$BestBenchKg<- as.integer(powerlifting$BestBenchKg)head(powerlifting)summary(lm(BestSquatKg ~ WeightClassKg + Age + TotalKg + BestDeadliftKg + BestBenchKg,  data = powerlifting, na.action = na.omit))
Call:
lm(formula = BestSquatKg ~ WeightClassKg + Age + TotalKg + BestDeadliftKg + 
    BestBenchKg, data = powerlifting, na.action = na.omit)
Residuals:
   Min     1Q Median     3Q    Max 
-2.522 -0.531  0.415  0.464  1.525 
Coefficients:
                Estimate Std. Error  t value Pr(>|t|)
(Intercept)    -6.09e-01   1.15e-02   -52.97  < 2e-16
WeightClassKg  -1.27e-04   1.46e-04    -0.87     0.38
Age             1.38e-03   1.90e-04     7.27  3.7e-13
TotalKg         1.00e+00   8.72e-05 11466.51  < 2e-16
BestDeadliftKg -9.99e-01   1.51e-04 -6613.10  < 2e-16
BestBenchKg    -9.99e-01   1.71e-04 -5846.53  < 2e-16
Residual standard error: 0.528 on 61001 degrees of freedom
  (325407 observations deleted due to missingness)
Multiple R-squared:     1,  Adjusted R-squared:     1 
F-statistic: 2.03e+08 on 5 and 61001 DF,  p-value: <2e-16325,407 cases deleted due to missingness. These observations contain valuable information about the relationships between the existing variables, and should not be deleted. Multiple imputation using Amelia, will help retrieve the missing value information to make better inferences.
data(powerlifting)data set <U+393C><U+3E31>powerlifting<U+393C><U+3E32> not foundp.out <- amelia(x=powerlifting,  m = 15)-- Imputation 1 --
  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
 61 62 63 64 65 66 67
-- Imputation 2 --
  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
 61 62 63 64 65 66 67 68
-- Imputation 3 --
  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
 61 62 63 64 65 66 67 68
-- Imputation 4 --
  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
 61 62 63 64 65 66 67
-- Imputation 5 --
  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
 61 62 63 64 65 66 67
-- Imputation 6 --
  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
 61 62 63 64 65 66 67 68 69 70 71 72
-- Imputation 7 --
  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
 61 62 63 64 65 66 67
-- Imputation 8 --
  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
 61 62 63 64 65 66 67 68 69
-- Imputation 9 --
  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
 61 62 63 64 65 66 67
-- Imputation 10 --
  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
 61 62 63 64 65 66 67 68
-- Imputation 11 --
  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
 61 62 63 64 65 66 67 68 69
-- Imputation 12 --
  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
 61 62 63 64 65 66
-- Imputation 13 --
  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
 61 62 63 64 65 66 67 68 69
-- Imputation 14 --
  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
 61 62 63 64 65 66 67 68 69 70
-- Imputation 15 --
  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
 61 62 63 64 65 66 67 68table(p.out$imputations[[1]]$Sex)This is an example of how to change a variable to nominal (or ordinal) if needed. Here I used sex to demonstrate a nominal variable, however there were no missing cases in this variable.
View(p.out$imputations$imp1)hist(p.out$imputations[[4]]$BestSquatKg, col="grey", border="white")z.out <- zelig(BestSquatKg ~ WeightClassKg + Age + TotalKg + BestDeadliftKg + BestBenchKg, model = "ls", data = p.out, cite = FALSE)summary(z.out, subset = 2)Imputed Dataset 2
Call:
z5$zelig(formula = BestSquatKg ~ WeightClassKg + Age + TotalKg + 
    BestDeadliftKg + BestBenchKg, data = p.out)
Residuals:
    Min      1Q  Median      3Q     Max 
-2.7532 -0.4365 -0.0327  0.5330  2.4118 
Coefficients:
                Estimate Std. Error  t value Pr(>|t|)
(Intercept)    -6.70e-01   4.74e-03   -141.4   <2e-16
WeightClassKg  -2.38e-03   5.78e-05    -41.1   <2e-16
Age             1.19e-03   7.62e-05     15.6   <2e-16
TotalKg         9.97e-01   8.49e-06 117363.2   <2e-16
BestDeadliftKg -9.94e-01   3.11e-05 -31950.2   <2e-16
BestBenchKg    -9.94e-01   2.55e-05 -38941.9   <2e-16
Residual standard error: 0.584 on 386408 degrees of freedom
Multiple R-squared:     1,  Adjusted R-squared:     1 
F-statistic: 4.29e+09 on 5 and 386408 DF,  p-value: <2e-16
Next step: Use 'setx' methodz.out$setx()
z.out$sim()
plot(z.out)tmp <- amelia(powerlifting, idvars = c("Age", "WeightClassKg", "Sex"))Comparing the litewise method to multiple imputations we can see that the results vary. The litewise deletion method showed that for every point of the best squat performance the weight class was about -1.27 less, and our imputation #2 from Amelia, it is -2.22 less.