Powerlifting is becoming a popular sport now-a-days. The idea behind powerlifting is not focused on gaining muscle but rather building strength to lift as much weight as possible. The three powerlifting events are bench press, squat and deadlift. This analysis will look at the effect that squatting weight and bench press weight have on a person’s deadlifting weight. Powerlifters want to have a high overall lifting weight throughout all the lifting categories. So determining if a person’s deadlifting weight is affected by their bench press and swuatting weight will be interesting to look into.
This data was taken from kaggle.com which was collected from 3,000 meets with over 300,000 lifts or people. The data was cleaned to only select the variable for the best squat, the best bench press and the best deadlifting weights. The data set was too large to run the imputations on so a sample size of 200,000 units was selected.
library(Amelia)
library(Zelig)
library(ZeligChoice)
library(texreg)
library(readr)
library(dplyr)
power <- read_csv("/Users/paulkim/Downloads/openpowerlifting.csv")
power2 <- power%>%
select(-Squat4Kg, -Bench4Kg, -Deadlift4Kg)%>%
mutate(Sex = as.factor(Sex),
Equipment = as.factor(Equipment),
Division = as.factor(Division),
WeightClassKg = as.factor(WeightClassKg),
Place = as.factor(Place),
Name = as.factor(Name))
power3 <- sample_n(power2, size = 200000, replace = FALSE)
power4 <- na.omit(power2)
head(power)
The variable power3 is the subset of the entire sample that is made up of 200,000 random lifts in order for the imputations to run smoother. The variable power4 is the listwise deletion variable with no missing values in the dataset.
z.out1 <- zelig(BestDeadliftKg ~ BestSquatKg + BestBenchKg, model="ls", data=power4, cite = FALSE)
summary(z.out1)
## Model:
##
## Call:
## z5$zelig(formula = BestDeadliftKg ~ BestSquatKg + BestBenchKg,
## data = power4)
##
## Residuals:
## Min 1Q Median 3Q Max
## -330.27 -14.28 -0.18 15.17 263.56
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 58.691973 0.283386 207.11 <2e-16
## BestSquatKg 0.546946 0.003369 162.37 <2e-16
## BestBenchKg 0.389667 0.004587 84.96 <2e-16
##
## Residual standard error: 25.41 on 63566 degrees of freedom
## Multiple R-squared: 0.8319, Adjusted R-squared: 0.8319
## F-statistic: 1.573e+05 on 2 and 63566 DF, p-value: < 2.2e-16
##
## Statistical Warning: The GIM test suggests this model is misspecified
## (based on comparisons between classical and robust SE's; see http://j.mp/GIMtest).
## We suggest you run diagnostics to ascertain the cause, respecify the model
## and run it again.
##
## Next step: Use 'setx' method
The intercept here is describing a person who only deadlifts without squatting or benching will have a starting lifting weight of 58.691973 kg. The coefficients are showing that as people squat more weight their deadlift increases by 0.546946 kg and as people bench press more their deadlift increases by 0.389667 kg. This analysis is showing that as people squat and bench press more their deadlift will increase as well.
z.out1$setx()
z.out1$sim()
plot(z.out1)
a.out <- amelia(x = power3, cs = "Name", idvars = c("Sex", "Equipment", "Division", "WeightClassKg", "Place"))
## -- Imputation 1 --
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
## 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
## 61 62 63 64 65 66 67 68 69 70 71 72 73 74
##
## -- Imputation 2 --
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
## 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
## 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
##
## -- Imputation 3 --
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
## 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
## 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
##
## -- Imputation 4 --
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
## 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
## 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78
##
## -- Imputation 5 --
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
## 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
## 61 62 63 64 65 66 67 68 69 70 71
The amelia package is used here to create imputation for the missing values for each variable. The imputations were based off of the age, the best bench press, the best squat and the best deadlift variables. These numbers were used to predict the missing values.
a.out
##
## Amelia output with 5 imputed datasets.
## Return code: 1
## Message: Normal EM convergence.
##
## Chain Lengths:
## --------------
## Imputation 1: 74
## Imputation 2: 75
## Imputation 3: 75
## Imputation 4: 78
## Imputation 5: 71
The chain length for each imputation was around 70.
z.out <- zelig(BestDeadliftKg ~ BestSquatKg + BestBenchKg, model="ls", data=a.out, cite = FALSE)
summary(z.out)
## Model: Combined Imputations
##
## Estimate Std.Error z value Pr(>|z|)
## (Intercept) 71.94149 0.59141 121.6 <2e-16
## BestSquatKg 0.19664 0.00275 71.4 <2e-16
## BestBenchKg 0.77902 0.00814 95.7 <2e-16
##
## For results from individual imputed datasets, use summary(x, subset = i:j)
## Statistical Warning: The GIM test suggests this model is misspecified
## (based on comparisons between classical and robust SE's; see http://j.mp/GIMtest).
## We suggest you run diagnostics to ascertain the cause, respecify the model
## and run it again.
##
## Next step: Use 'setx' method
This analysis on deadlifting based off the multiple imputation dataset is showing a much lower intercept than the analysis done on the listwise deletion data. There is a large different between the bench pressing and squatting numbers. The bench pressing coefficients for the listwise deletion shows a 0.389667 increase but the multiple imputation show a 0.77498 increase in deadlifting. The effect that squatting has on deadlifting weight increase is different between the multiple imputation and the listwise deletion. The listwise deletion shows a 0.546946 increase in deadlift as squatting increases but the multiple imputation shows a 0.19786 increase in deadlifting weight as squatting weight increases. The multiple imputation analysis shows a more complete picture of the dataset because it fills in for the missing values rather than getting rid of them for the analysis.
z.out$setx()
z.out$sim()
plot(z.out)