Introduction

  • Olympic Weightlifting has been a part of the Summer Olympic Games since 1896. Athletes perform Clean-and-Jerk and Snatch, and they have three attempts to aim for lifting the heaviest weight. Prizes are given for the heaviest weights lifted in each lift and in the overall. The overall is the maximum lifts of both added. Women and men compete separately, and they are split up into different weight classes. In the following report, we will use statistical methods to research correlation between Snatch and Clean-and-Jerk, improvements in Olympic Weightlifting over a 20-year span and if bodyweight affects the athletes’ maximum weight lifted.

The Data

  • The dataset that is used is collected from Kaggle and is from the 2000 to 2020 Summer Olympics in Olympic Weightlifting. It includes 10 different columns and 1233 data points and each row represents a different athlete.
Weightlifting.data = read.csv("weightlifting.csv") # reading the data from csv excel file
attach(Weightlifting.data)

Cleaning up the data

# Getting rid of all rows that have -1 as their value in the numerical columns. Also getting rid of the URL column.

Lifting.data <- Weightlifting.data[ , -8] %>% 
  filter(Total..kg. != -1) %>% 
  filter(Bodyweight..kg. > 10)
str(Lifting.data)
## 'data.frame':    1233 obs. of  10 variables:
##  $ X                : int  0 1 2 3 4 5 6 7 8 9 ...
##  $ Athlete          : chr  "Halil Mutlu (TUR)" "Wu Wenxiong (CHN)" "Zhang Xiangxiang (CHN)" "Wang Shin-yuan (TPE)" ...
##  $ Bodyweight..kg.  : num  55.6 55.5 55.9 55.4 55.7 ...
##  $ Snatch..kg.      : num  138 125 125 125 120 ...
##  $ Clean...Jerk..kg.: num  168 162 162 160 155 ...
##  $ Total..kg.       : num  305 288 288 285 275 ...
##  $ Ranking          : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Title            : chr  "Weightlifting at the 2000 Summer Olympics – Men's 56 kg" "Weightlifting at the 2000 Summer Olympics – Men's 56 kg" "Weightlifting at the 2000 Summer Olympics – Men's 56 kg" "Weightlifting at the 2000 Summer Olympics – Men's 56 kg" ...
##  $ Year             : int  2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 ...
##  $ Gender           : chr  "Men" "Men" "Men" "Men" ...
detach(Weightlifting.data)
attach(Lifting.data)
Lifting.data <- Lifting.data %>% mutate(Title = as.factor(Lifting.data$Title), Year = as.factor(Lifting.data$Year), Gender = as.factor(Lifting.data$Gender))
  • Lifting.data will be the data frame used for this project.
head(Lifting.data)
Des <- data.frame(
  Column = c("X","Athlete", "Bodyweight", "Snatch", "Clean & Jerk", "Total", "Ranking", "Title", "Year", "Gender"),
  Type = c("Int", "Chr", "Num", "Num", "Num", "Num", "Int", "Fct", "Fct", "Fct"),
  Description = c(
    "Athletes id",
    "Name",
    "Athletes' bodyweight",
    "Maximum weight of Snatch",
    "Maximum weight of Clean&Jerk",
    "Maximum lifts of both added",
    "Athletes' placement",
    "Athletes' bodyweight class/category",
    "Olympic Year",
    "Men or Women"))

datatable(Des, rownames = FALSE, caption = 'Lifting.data description')
# Separating genders into two different data frames for further research

LiftingWomen <- Lifting.data %>% 
  filter(Gender == "Women")

LiftingMen <- Lifting.data %>% 
  filter(Gender == "Men")

EDA of the data

  • Filter the table as you wish
datatable(Lifting.data, 
          colnames = c('Bodyweight' = 3, 'Snatch' = 4, 'Clean&Jerk' = 5, 'Total' = 6), 
          rownames = FALSE, 
          filter="top", 
          options = list(pageLength = 5, scrollX=T, autoWidth = TRUE) )
  • Summary table separated by genders
Lifting.data %>% 
  group_by(Gender) %>% 
  summarise(count = n(), min = min(Total..kg.), 
            mean = mean(Total..kg.), sd = sd(Total..kg.), median = median(Total..kg.),
            max = max(Total..kg.))
  • Summary table separated by years for men only
Lifting.data %>% 
  group_by(Year) %>% 
  filter(Gender == "Men") %>% 
  summarise(count = n(), min = min(Total..kg.), 
            mean = mean(Total..kg.), sd = sd(Total..kg.), median = median(Total..kg.),
            max = max(Total..kg.))
  • Summary table separated by years for women only
Lifting.data %>% 
  group_by(Year) %>% 
  filter(Gender == "Women") %>% 
  summarise(count = n(), min = min(Total..kg.), 
            mean = mean(Total..kg.), sd = sd(Total..kg.), median = median(Total..kg.),
            max = max(Total..kg.))

Methods

  1. Testing for correlation between two variables
  • Linear Regression

    • Predicts the value of unknown data.
  • Variance Inflation Factor (VIF)

    • It is a measure for a method to detect whether multicollinearity is a problem in a model. A VIF of 5 or higher indicates that multicollinearity could be a problem (Multicollinearity).
  1. Predicting total weight
  • Linear Regression

    • Predicts the coefficients for a linear combination.
  1. Comparing variables in different groups
  • T-test

    • One sided Hypothesis testing

    • Compares the mean in two groups

  1. Contrast and Linear Combination Comparison
  • Analysis of Variance and Anova

    • Compares factored statistics
  • Contrast

    • Computes and test arbitrary contrasts for regression objects

Section 1

Testing for correlation between two variables

  • Are Clean-and-Jerk and Snatch correlated with one another?

Graphical Analysis

poinpl.w.20 <- LiftingWomen %>% 
  ggplot(aes(x = Clean...Jerk..kg., y = Snatch..kg., color = Total..kg.)) +
  geom_point() + geom_smooth(method = "lm") +
  scale_color_gradient(low="blue", high="red") +
  labs(fill = "Total Weight") +
  xlab("Clean & Jerk") +
  ylab("Snatch") +
  labs(subtitle = "Women") +
  theme_ipsum()

poinpl.w.00 <- LiftingMen %>% 
  ggplot(aes(x = Clean...Jerk..kg., y = Snatch..kg., color = Total..kg.)) +
  geom_point() + geom_smooth(method = "lm") +
  scale_color_gradient(low="blue", high="red") +
  labs(color = "Total Weight") +
  xlab("Clean & Jerk") +
  ylab("Snatch") +
  labs(subtitle = "Men") +
  theme_ipsum()

plot <- ggarrange(poinpl.w.00, poinpl.w.20,
                  common.legend = TRUE,
                  ncol = 2, nrow = 1, legend = "bottom")
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
annotate_figure(plot, top = text_grob("Clean & Jerk and Snatch Relationship", 
               color = "black", face = "bold", size = 14))

  • The scatterplots show the relationship between Clean-and-Jerk and Snatch in the Olympics in 2000 and 2020. For both men and women, the scatterplots represent a positive relationship with barely any outliers and indicate that higher intensity in Clean-and-Jerk increases your intensity in your Snatch.

Testing – Linear Regression and Multicollinearity

corr.lm = lm(Clean...Jerk..kg. ~ Snatch..kg., data = Lifting.data)
summary(corr.lm)
## 
## Call:
## lm(formula = Clean...Jerk..kg. ~ Snatch..kg., data = Lifting.data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -23.4084  -4.5620  -0.1226   4.1845  31.1899 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 9.997528   0.724687    13.8   <2e-16 ***
## Snatch..kg. 1.153572   0.005326   216.6   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.792 on 1231 degrees of freedom
## Multiple R-squared:  0.9744, Adjusted R-squared:  0.9744 
## F-statistic: 4.691e+04 on 1 and 1231 DF,  p-value: < 2.2e-16
vif = 1/(1-0.9744)
vif
## [1] 39.0625

Analysis

  • With a VIF value of 15 and R-squared as 0.93, we can assume that clean-and-Jerk and Snatch are correlated with each other. If an athlete has a heavy Clean-and-Jerk, we can assume that he also has a heavy Snatch.

  • These results also indicate that if we want to do further testing, we do not want to have both variables together in a regression model.

Further Research

  • The two lifts are very different from one another, and weightlifters must get comfortable with both lifting techniques if they are aspiring to compete. Clean-and-Jerk is about pure power while Snatch is about finesse. One of the biggest differences is how you pull the bar off the ground. In Snatch, you have to elevate the bar way higher than you have to for the Jerk. That means that you should be able to load significantly more weight onto your Clean-and-Jerk than onto your Snatch. Even though they are very different, both lifts have the same goal. It is to lift a loaded barbell from the ground to overhead. The correlation between them is that they train similar muscle categories which could explain that if you lift heavy in one lift, you lift heavy in the other (Dickson, The snatch vs. the Clean & Jerk: Pros and cons of the two Olympic lifts 2023).

Section 2

Predicting total weight

  • What is the linear combination to predict the total weight from the athlete using bodyweight, Gender, and Clean-and-Jerk or Snatch?

    • In the final regression model, does Clean-and-Jerk or Snatch fit better.

    • The exploratory variable is Total Weight and the Responce variable is Bodyweight, Gender and Clean-and-Jerk or Snatch.

Graphical analysis

Lifting.data %>% ggpairs(columns = 3:6, aes(color = Gender, alpha = 0.5), upper = list(continuous = wrap("cor", size = 2.5)))

  • As proven in the first section, Clean-and-Jerk and Snatch have a high correlation. The total score also has a high correlation with the two variables because the it is the sum of the both.

Testing - Linear Regression

Model with clean-and-Jerk

  • Testing the model with Clean-and-Jerk variable
clean.lm = lm(Total..kg. ~ Clean...Jerk..kg. + Bodyweight..kg. + Gender, data = Lifting.data)
summary(clean.lm)
## 
## Call:
## lm(formula = Total..kg. ~ Clean...Jerk..kg. + Bodyweight..kg. + 
##     Gender, data = Lifting.data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -25.4820  -3.5256  -0.1228   3.8385  21.0726 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -2.4400377  1.3364284  -1.826   0.0681 .  
## Clean...Jerk..kg.  1.8313704  0.0099139 184.728   <2e-16 ***
## Bodyweight..kg.    0.0008984  0.0104507   0.086   0.9315    
## GenderWomen       -1.4104495  0.6402219  -2.203   0.0278 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.809 on 1229 degrees of freedom
## Multiple R-squared:  0.9945, Adjusted R-squared:  0.9945 
## F-statistic: 7.466e+04 on 3 and 1229 DF,  p-value: < 2.2e-16
plot(clean.lm, which = 1)

Model with Snatch

  • Testing the model with Snatch variable
snatch.lm = lm(Total..kg. ~ Snatch..kg. + Bodyweight..kg. + Gender, data = Lifting.data)
summary(snatch.lm)
## 
## Call:
## lm(formula = Total..kg. ~ Snatch..kg. + Bodyweight..kg. + Gender, 
##     data = Lifting.data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -22.0683  -4.2852   0.0374   4.3012  29.0767 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     20.13082    1.36685  14.728   <2e-16 ***
## Snatch..kg.      2.02465    0.01220 165.995   <2e-16 ***
## Bodyweight..kg.  0.11906    0.01107  10.752   <2e-16 ***
## GenderWomen     -6.45724    0.68654  -9.406   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.437 on 1229 degrees of freedom
## Multiple R-squared:  0.9933, Adjusted R-squared:  0.9933 
## F-statistic: 6.071e+04 on 3 and 1229 DF,  p-value: < 2.2e-16
plot(snatch.lm, which = 1)

Transformed model

  • Testing the snatch model with transformed variables
log.snatch.lm = lm(log(Total..kg.) ~ log(Snatch..kg.) + Bodyweight..kg. + Gender, data = Lifting.data)
summary(log.snatch.lm)
## 
## Call:
## lm(formula = log(Total..kg.) ~ log(Snatch..kg.) + Bodyweight..kg. + 
##     Gender, data = Lifting.data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.090294 -0.014899  0.000236  0.015333  0.103899 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       1.1819436  0.0245252   48.19  < 2e-16 ***
## log(Snatch..kg.)  0.9181441  0.0052970  173.33  < 2e-16 ***
## Bodyweight..kg.   0.0003341  0.0000370    9.03  < 2e-16 ***
## GenderWomen      -0.0200015  0.0024662   -8.11 1.21e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0229 on 1229 degrees of freedom
## Multiple R-squared:  0.9933, Adjusted R-squared:  0.9933 
## F-statistic: 6.086e+04 on 3 and 1229 DF,  p-value: < 2.2e-16
plot(log.snatch.lm, which = 1)

Analysis

  1. In the model with Clean-and-Jerk variable, the R-squared is .99 which is extremely good. However, the regression line in the plot has a lot of residual errors with the value of 5.8. One thing that is interesting about the model is that the p-value for the bodyweight coefficient is .93. A variable with that high of a p-value is irrelevant to the model.

  2. In the model with the Snatch, the R-squared and the regression line stay the same but all p-values for the variables are very low. That means that all variables are relevant to the exploratory variable and that makes this model better than the previous one. However, we want to make it even better and we do so by transforming some of the variables.

  3. By taking the logarithm of the exploratory variable and the logarithm of the snatch variable makes this model the ideal regression model. The R-squared stays at .99 but the residual standard error went down to .02. This is our final model.

  • The linear combination for Women is:
    • log(Total) = 1.16 + .92 log(Snatch) + .0003 Bodyweight
  • The linear combination for Men is:
    • log(Total) = 1.18 + .92 log(Snatch) + .0003 Bodyweight

Section 3

Comparing variables in different groups

  • Have Men and Women improved significantly in Olympic Weightlifting from the year 2000 to 2020?

  • To do this test we are going to compare men and women separately and only use the data from the 2000 Summer Olympics and the 2020 Summer Olympics.

Graphical Analysis

LiftingMen$Year <- as.character(LiftingMen$Year)
LiftingWomen$Year <- as.character(LiftingWomen$Year)

box.men <- LiftingMen %>% 
  group_by(Year) %>% 
  filter(Year == 2000 | Year == 2020) %>% 
  ggplot(aes(x = Year, y = Total..kg., fill = Year)) +
  geom_boxplot() +
  scale_fill_brewer(palette="Pastel2") +
  ylab("Total Weight in Kg") +
  labs(subtitle = "Men") +
  theme_ipsum()

box.women <- LiftingWomen %>% 
  group_by(Year) %>% 
  filter(Year == 2000 | Year == 2020) %>% 
  ggplot(aes(x = Year, y = Total..kg., fill = Year)) +
  geom_boxplot() +
  scale_fill_brewer(palette="Pastel2") +
  ylab("Total Weight in Kg") +
  labs(subtitle = "Women") +
  theme_ipsum()

box.plot <- ggarrange(box.men, box.women,
                  common.legend = TRUE,
                  ncol = 2, nrow = 1, legend = "bottom")
annotate_figure(box.plot, top = text_grob("Total Weight difference between 2000 and 2020 Summer Olympics", 
               color = "black", face = "bold", size = 14))

  • The box plots presented show different things for the women’s category and the men’s category. Our research question asks if the total weight is higher in 2020 than in 2000. In the men’s category it looks like on average the athletes lifted heavier in 2000. The spread is longer in 2000 but it has a higher median. In the women’s category it seems like the women got stronger in the twenty-year span. The spread is bigger in 2000 but you can see that the median line and the third quarter are higher in 2020. However, there is one outlier in 2020 that seems to represent one athlete that lifted significantly heavier than the rest. If we would be interested in doing further research, we could take the outlier in 2020 out of the data and see if the results are different.

Men

Additional graphical analysis

density.men <- LiftingMen %>% 
  filter(Year == 2000 | Year == 2020) %>% 
  ggplot(aes(x = Total..kg., y = Year, fill = stat(x), alpha = 0.7)) +
  geom_density_ridges_gradient(scale = 3, rel_min_height = 0.01, alpha = 0.5) +
  scale_fill_viridis(name = "Total Weight", option = "C") +
  labs(title = 'Difference in Mens total weight') +
  xlab("Total Weight in Kg") +
  theme_ipsum()
density.men
## Picking joint bandwidth of 19.9

  • By the look of the density plot, it looks like that in 2000 the lightest weight is lower than in 2020 but there are more density on the right side of the plot. It is the opposite for the year 2020. The heaviest weight lifted looks like it was a little heavier than in 2000 but most of the density is on the left side of the plot.

Hypothesis:

  • H0: µ2000 = µ2020

  • Hα: µ2000 < µ2020

One-Sided T-Test

# Seperating Mens Summer Olympics 2000 and 2020

Men2000 <- LiftingMen %>% 
  filter(Year == 2000)

Men2020 <- LiftingMen %>% 
  filter(Year == 2020)
t.test(Men2020$Total..kg., Men2000$Total..kg., paired = FALSE, alternative = "greater")
## 
##  Welch Two Sample t-test
## 
## data:  Men2020$Total..kg. and Men2000$Total..kg.
## t = -0.15364, df = 159.46, p-value = 0.561
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -14.33783       Inf
## sample estimates:
## mean of x mean of y 
##  344.3803  345.5986

Analysis

  • With a p-value of 0.5 we are not able to reject the H0 on a 5% significance level. The t value is negative and that indicates that the relationship between the groups has a negative slope. However, if you look at the mean for both groups, they are almost equal. Like the graphical analysis showed, the men in 2020 did not lift heavier than the men in 2000.

Women

Additional graphical analysis

density.women <- LiftingWomen %>% 
  filter(Year == 2000 | Year == 2020) %>% 
  ggplot(aes(x = Total..kg., y = Year, fill = stat(x), alpha = 0.7)) +
  geom_density_ridges_gradient(scale = 3, rel_min_height = 0.01, alpha = 0.5) +
  scale_fill_viridis(name = "Total Weight", option = "C") +
  labs(title = 'Difference in Womens total weight') +
  xlab("Total Weight in Kg") +
  theme_ipsum()
density.women
## Picking joint bandwidth of 11.3

  • The density plot for the year 2020 is way more on the right than the one for 2000. The lightest and the heaviest weight lifted in 2020 were way higher than in 2000. The data from 2020 has a very normal distribution where the highest density is in the middle. For the year 2000, the density is high in two places. That could point out that about half of the athletes lifted lightly while the other half lifted a lot heavier and not as much balance as there was in 2020.

Hypothesis

  • H0: µ2000 = µ2020

  • Hα: µ2000 < µ2020

One-Sided T-Test

# Seperating Womens Summer Olympics 2000 and 2020

Women2000 <- LiftingWomen %>% 
  filter(Year == 2000)

Women2020 <- LiftingWomen %>% 
  filter(Year == 2020)
t.test(Women2020$Total..kg., Women2000$Total..kg., paired = FALSE, alternative = "greater")
## 
##  Welch Two Sample t-test
## 
## data:  Women2020$Total..kg. and Women2000$Total..kg.
## t = 2.0306, df = 152.79, p-value = 0.02201
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  1.926703      Inf
## sample estimates:
## mean of x mean of y 
##  217.5595  207.1474

Analysis

  • With a p-value of 0.022 we can reject the p-value on a 5% significance level. The t value is positive which represents a positive slope between the groups. The means are significantly different as well. The women in 2020 lifted significantly heavier than the women in 2000.

Section 4

Contrast and Linear Combination Comparisons

  • Does a higher bodyweight increase your total weight in Olympic Weightlifting?

  • We will be testing men and women separately and using data from the 2000 and 2020 Summer Olympics. We have a total of 14 bodyweight categories for both men and women.

Graphical Analysis

# Separating light bodyweight and heavy bodyweight into mens and womens data frames from the Years 2000 and 2020

Men.low.bodyweight <- Lifting.data %>% 
  group_by(Title) %>% 
  filter(Gender == "Men" && 
           Year == 2000 && 
           Title == "Weightlifting at the 2000 Summer Olympics – Men's 56 kg" | 
           Title == "Weightlifting at the 2000 Summer Olympics – Men's 62 kg" | 
           Title == "Weightlifting at the 2000 Summer Olympics – Men's 69 kg" | 
           Title == "Weightlifting at the 2000 Summer Olympics – Men's 77 kg" | 
           Year == 2020 &&
           Title ==  "Weightlifting at the 2020 Summer Olympics – Men's 61 kg" | 
           Title ==  "Weightlifting at the 2020 Summer Olympics – Men's 67 kg" | 
           Title == "Weightlifting at the 2020 Summer Olympics – Men's 73 kg")

Men.high.bodyweight <- Lifting.data %>% 
  group_by(Title) %>% 
  filter(Gender == "Men" && 
           Year == 2000 && 
           Title == "Weightlifting at the 2000 Summer Olympics – Men's 85 kg" | 
           Title == "Weightlifting at the 2000 Summer Olympics – Men's 94 kg" | 
           Title == "Weightlifting at the 2000 Summer Olympics – Men's 105 kg" | 
           Title == "Weightlifting at the 2000 Summer Olympics – Men's +105 kg" | 
           Year == 2020 &&
           Title ==  "Weightlifting at the 2020 Summer Olympics – Men's +109 kg" | 
           Title ==  "Weightlifting at the 2020 Summer Olympics – Men's 109 kg" | 
           Title == "Weightlifting at the 2020 Summer Olympics – Men's 81 kg")
men.2000.plot <- Men2000 %>% 
  group_by(Title) %>% 
  ggplot(aes(x = Title, y = Total..kg., fill = Title)) +
  geom_boxplot() + 
  ylim(150, 500) +
  ylab("Total Weight in Kg") +
  labs(subtitle = "2000 Summer Olympics") +
  scale_x_discrete(
    name = "Category",
    limits = c("Weightlifting at the 2000 Summer Olympics – Men's 56 kg", 
               "Weightlifting at the 2000 Summer Olympics – Men's 62 kg", 
               "Weightlifting at the 2000 Summer Olympics – Men's 69 kg",
               "Weightlifting at the 2000 Summer Olympics – Men's 77 kg",
               "Weightlifting at the 2000 Summer Olympics – Men's 85 kg",
               "Weightlifting at the 2000 Summer Olympics – Men's 94 kg",
               "Weightlifting at the 2000 Summer Olympics – Men's 105 kg",
               "Weightlifting at the 2000 Summer Olympics – Men's +105 kg"),
    labels=c(' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ')) +
  scale_fill_discrete(
    breaks = c("Weightlifting at the 2000 Summer Olympics – Men's 56 kg", 
               "Weightlifting at the 2000 Summer Olympics – Men's 62 kg", 
               "Weightlifting at the 2000 Summer Olympics – Men's 69 kg",
               "Weightlifting at the 2000 Summer Olympics – Men's 77 kg",
               "Weightlifting at the 2000 Summer Olympics – Men's 85 kg",
               "Weightlifting at the 2000 Summer Olympics – Men's 94 kg",
               "Weightlifting at the 2000 Summer Olympics – Men's 105 kg",
               "Weightlifting at the 2000 Summer Olympics – Men's +105 kg"),
    labels = c("56 kg", 
               "62 kg", 
               "69 kg", 
               "77 kg", 
               "85 kg",
               "94 kg",
               "105 kg",
               "+105 kg"),
    name = " ") + 
  theme_ipsum()

men.2020.plot <- Men2020 %>% 
  group_by(Title) %>% 
  ggplot(aes(x = Title, y = Total..kg., fill = Title)) +
  geom_boxplot() + 
  ylim(150, 500) +
  ylab("Total Weight in Kg") +
  labs(subtitle = "2020 Summer Olympics") +
  scale_x_discrete(
    name = "Category",
    limits = c("Weightlifting at the 2020 Summer Olympics – Men's 61 kg", 
               "Weightlifting at the 2020 Summer Olympics – Men's 67 kg", 
               "Weightlifting at the 2020 Summer Olympics – Men's 73 kg",
               "Weightlifting at the 2020 Summer Olympics – Men's 81 kg",
               "Weightlifting at the 2020 Summer Olympics – Men's 109 kg",
               "Weightlifting at the 2020 Summer Olympics – Men's +109 kg"),
    labels=c(' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ')) +
  scale_fill_discrete(
    breaks = c("Weightlifting at the 2020 Summer Olympics – Men's 61 kg", 
               "Weightlifting at the 2020 Summer Olympics – Men's 67 kg", 
               "Weightlifting at the 2020 Summer Olympics – Men's 73 kg",
               "Weightlifting at the 2020 Summer Olympics – Men's 81 kg",
               "Weightlifting at the 2020 Summer Olympics – Men's 109 kg",
               "Weightlifting at the 2020 Summer Olympics – Men's +109 kg"),
    labels = c("61 kg", 
               "67 kg", 
               "73 kg", 
               "81 kg", 
               "109 kg",
               "+109 kg"),
    name = " ") + 
  theme_ipsum()

ggarrange(men.2000.plot, men.2020.plot, 
          labels = c("Total Weight between Men's Categories"),
          ncol = 2, nrow = 1, legend = "bottom")

women.2000.plot <- Women2000 %>% 
  group_by(Title) %>% 
  ggplot(aes(x = Title, y = Total..kg., fill = Title)) +
  geom_boxplot() + 
  ylim(110, 325) +
  ylab("Total Weight in Kg") +
  labs(subtitle = "2000 Summer Olympics") +
  scale_x_discrete(
    name = "Category",
    limits = c("Weightlifting at the 2000 Summer Olympics – Women's 48 kg", 
               "Weightlifting at the 2000 Summer Olympics – Women's 53 kg", 
               "Weightlifting at the 2000 Summer Olympics – Women's 58 kg",
               "Weightlifting at the 2000 Summer Olympics – Women's 63 kg",
               "Weightlifting at the 2000 Summer Olympics – Women's 69 kg",
               "Weightlifting at the 2000 Summer Olympics – Women's 75 kg",
               "Weightlifting at the 2000 Summer Olympics – Women's +75 kg"),
    labels=c(' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ')) +
  scale_fill_discrete(
    breaks = c("Weightlifting at the 2000 Summer Olympics – Women's 48 kg", 
               "Weightlifting at the 2000 Summer Olympics – Women's 53 kg", 
               "Weightlifting at the 2000 Summer Olympics – Women's 58 kg",
               "Weightlifting at the 2000 Summer Olympics – Women's 63 kg",
               "Weightlifting at the 2000 Summer Olympics – Women's 69 kg",
               "Weightlifting at the 2000 Summer Olympics – Women's 75 kg",
               "Weightlifting at the 2000 Summer Olympics – Women's +75 kg"),
    labels = c("48 kg", 
               "53 kg", 
               "58 kg", 
               "63 kg", 
               "69 kg",
               "75 kg",
               "+75 kg"),
    name = " ") + 
  theme_ipsum(base_family = "Arial Narrow")


women.2020.plot <- Women2020 %>% 
  group_by(Title) %>% 
  ggplot(aes(x = Title, y = Total..kg., fill = Title)) +
  geom_boxplot() + 
  ylim(110, 325) +
  ylab("Total Weight in Kg") +
  labs(subtitle = "2020 Summer Olympics") +
  scale_x_discrete(
    name = "Category",
    limits = c("Weightlifting at the 2020 Summer Olympics – Women's 49 kg", 
               "Weightlifting at the 2020 Summer Olympics – Women's 55 kg", 
               "Weightlifting at the 2020 Summer Olympics – Women's 59 kg",
               "Weightlifting at the 2020 Summer Olympics – Women's 64 kg",
               "Weightlifting at the 2020 Summer Olympics – Women's 76 kg",
               "Weightlifting at the 2020 Summer Olympics – Women's 87 kg",
               "Weightlifting at the 2020 Summer Olympics – Women's +87 kg"),
    labels=c(' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ')) +
  scale_fill_discrete(
    breaks = c("Weightlifting at the 2020 Summer Olympics – Women's 49 kg", 
               "Weightlifting at the 2020 Summer Olympics – Women's 55 kg", 
               "Weightlifting at the 2020 Summer Olympics – Women's 59 kg",
               "Weightlifting at the 2020 Summer Olympics – Women's 64 kg",
               "Weightlifting at the 2020 Summer Olympics – Women's 76 kg",
               "Weightlifting at the 2020 Summer Olympics – Women's 87 kg",
               "Weightlifting at the 2020 Summer Olympics – Women's +87 kg"),
    labels = c("49 kg", 
               "55 kg", 
               "59 kg", 
               "64 kg", 
               "76 kg",
               "87 kg",
               "+87 kg"),
    name = " ") + 
  theme_ipsum(base_family = "Arial Narrow")

ggarrange(women.2000.plot, women.2020.plot,
          labels = c("Total Weight between Women's Categories"),
          ncol = 2, nrow = 1, legend = "bottom")

  • In all four box plots presented, they all have the same trend of showing a positive relationship between each group. There is significant difference between each weight category with both genders in both Olympics. By looking at the values of the legends for both genders, the 2020 Olympics increased their weight in the bodyweight categories. For women that did not change the trend of the box plot. For men however, in the 2020 box plot there is not a big difference in the total weight in the top two categories. Does that mean that it was unnecessary for the change in categories? For men, the interquartile ranges in 2000 are way larger than the ones in 2020 and if we focus on the median line, it is higher in 2020. So, the change of bodyweight categories was successful in the men’s category. For the women however, it is the opposite. For the heaviest category, the interquartile ranges are larger in 2020 and the median is higher in 2000. That implies that it could have been unnecessary for the category change. Some could argue and say that maybe the women were stronger in 2000 but we have already done statistical testing and proven that that is incorrect.

Methods & Testing

  • Step 1, we test if all bodyweight categories have the same mean in total weight. The best way for that is to do hypothesis testing by fitting an Analysis of Variance Model into an Anova test.

  • Step 2, we are interested in comparing the average total weight of the heaviest bodyweight categories vs. lightest bodyweight categories. The best way of testing is to Compute and test arbitrary contrasts for regression objects.

Step 1

Hypothesis

  • Men

    • H0: µ56kg = µ61kg = µ62kg = µ67kg = µ69kg = µ73kg = µ77kg = µ81kg = µ85kg = µ94kg = µ105kg = µ105+kg = µ109kg = µ109+kg

    • Hα: At least one mean is different

  • Women

    • H0: µ48kg = µ49kg = µ53kg = µ55kg = µ58kg = µ59kg = µ63kg = µ64kg = µ69kg = µ75kg = µ75+kg = µ76kg = µ87kg = µ87+kg

    • Hα: At least one mean is different

Analysis of Variance

  • Men
# Data frame with only males athletes from 2000 and 2020
men.2000.2020 <- LiftingMen %>% filter(Year == 2000 | Year == 2020) %>% mutate(Title = as.factor(Title))
men.aov <- aov(men.2000.2020$Total..kg. ~ men.2000.2020$Title)
anova(men.aov)
  • Women
# Data frame with only female athletes from 2000 and 2020
women.2000.2020 <- LiftingWomen %>% filter(Year == 2000 | Year == 2020) %>% mutate(Title = as.factor(Title))
women.aov <- aov(women.2000.2020$Total..kg. ~ women.2000.2020$Title)
anova(women.aov)

Analysis and conclusion

  • Both genders have a p-value of 2.2e-16 so we reject both null hypothesis on a 5% significance level. There is at least one bodyweight category for both genders that does not have the same mean as another.

  • The p-value is less than 0.05 so we can reject the null hypothesis on a 5% significance level.

Step 2

Linear Combination

  • Men

    • (µ56kg + µ61kg + µ62kg + µ67kg + µ69kg + µ73kg + µ77kg)/7 - (µ81kg + µ85kg + µ94kg + µ105kg + µ+105kg + µ109kg + µ+109kg)/7 = 0
  • Women

    • (µ48kg + µ49kg + µ53kg + µ55kg + µ58kg + µ59kg + µ63kg)/7 - (µ64kg+ µ69kg + µ75kg + µ+75kg + µ76kg + µ87kg + µ+87kg)/7 = 0

Hypothesis (both genders)

  • H0: µlower = µhigher

  • Hα: µ2lower < µhigher

Testing

  • Men:
fit.contrast(men.aov, men.2000.2020$Title, c(-(1/7), -(1/7), (1/7), (1/7), (1/7), (1/7), -(1/7), -(1/7), -(1/7), -(1/7), (1/7), (1/7), (1/7), -(1/7)), conf.int = 0.95)
##                                                                                                                                                                                                                                                                                               Estimate
## men.2000.2020$Title c=( -0.142857142857143 -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 ) -82.96056
##                                                                                                                                                                                                                                                                                              Std. Error
## men.2000.2020$Title c=( -0.142857142857143 -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 )    4.18016
##                                                                                                                                                                                                                                                                                                t value
## men.2000.2020$Title c=( -0.142857142857143 -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 ) -19.84626
##                                                                                                                                                                                                                                                                                                  Pr(>|t|)
## men.2000.2020$Title c=( -0.142857142857143 -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 ) 4.632297e-49
##                                                                                                                                                                                                                                                                                               lower CI
## men.2000.2020$Title c=( -0.142857142857143 -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 ) -91.20366
##                                                                                                                                                                                                                                                                                               upper CI
## men.2000.2020$Title c=( -0.142857142857143 -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 ) -74.71747
## attr(,"class")
## [1] "fit_contrast"
  • Women
fit.contrast(women.aov, women.2000.2020$Title, c(-(1/7), (1/7), (1/7), (1/7), (1/7), -(1/7), -(1/7), -(1/7), (1/7), (1/7), (1/7), -(1/7), -(1/7), -(1/7)), conf.int = 0.95)
##                                                                                                                                                                                                                                                                                                 Estimate
## women.2000.2020$Title c=( -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 ) -44.56471
##                                                                                                                                                                                                                                                                                                Std. Error
## women.2000.2020$Title c=( -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 )   3.434913
##                                                                                                                                                                                                                                                                                                  t value
## women.2000.2020$Title c=( -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 ) -12.97404
##                                                                                                                                                                                                                                                                                                    Pr(>|t|)
## women.2000.2020$Title c=( -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 ) 3.467122e-26
##                                                                                                                                                                                                                                                                                                 lower CI
## women.2000.2020$Title c=( -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 ) -51.35251
##                                                                                                                                                                                                                                                                                                upper CI
## women.2000.2020$Title c=( -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 ) -37.7769
## attr(,"class")
## [1] "fit_contrast"

Analysis

Step 1

  • Both genders have a p-value of 2.2e-16 so we reject both null hypothesis on a 5% significance level. There is at least one bodyweight category for both genders that does not have the same mean as another.

Step 2

  • For both genders the p-value is so low that we can reject both null hypothesis on a 5% significance level. The heavier the athlete is, the heavier his total weight score is.

Further Research and Analysis

  • As you can tell by the data presented in the research, the weight categories for men and women change between Olympics. The adjustment is made to improve the sport and make it fair between the athletes (Why change weightlifting’s bodyweight categories … again?).

  • The bodyweight categories will change for the Olympics in 2024. To reach full gender equality, there will be 5 weight categories for each gender. Men will compete in 61kg, 73kg, 89kg, 102kg and +102kg. Women will compete in 49kg, 59kg, 71kg, 81kg and +81kg (Paris 2024: Weight categories for the Olympic Weightlifting Competition).

  • Those categories are very different from the ones in our data. There is a bigger range between each men’s category and the heaviest one will be 7kg lighter than the heaviest one from our data. The women’s future plus category is between the plus categories from 2000 and 2020. In our graphical analysis we concluded that the women in the plus category in 2000 were stronger than the one in the plus category from 2020. Since the plus category in 2024 will be in between them, it is exciting to see if the results will improve from previous Olympic Games.

Conclusion

  • The statistical results from these observational studies can be applied to Olympic Weightlifting in general with 95 percent confidence. If an athlete is good at one Olympic lift, he is most likely good at the other one as well. That is about the athlete’s general strength.
  • The results presented in section 2 is an equation to predict an athletes sum of Clean-and-Jerk and Snatch from bodyweight and Snatch information. You can also calculate the prediction of your Clean-and-Jerk from those results by detucting your Snatch variable from your total weight prediction.
  • The evidence collected in section 3 cannot be applied to a broader audience than the elite athletes because that was only tested with data from Olympic Games. Male elite athletes have not improved over the 21st century, while female elite athletes have.
  • We also statistically proved that athletes with heavy bodyweight lift heavier weights than athletes with light bodyweight.

References

