Weightlifting.data = read.csv("weightlifting.csv") # reading the data from csv excel file
attach(Weightlifting.data)# Getting rid of all rows that have -1 as their value in the numerical columns. Also getting rid of the URL column.
Lifting.data <- Weightlifting.data[ , -8] %>%
filter(Total..kg. != -1) %>%
filter(Bodyweight..kg. > 10)
str(Lifting.data)## 'data.frame': 1233 obs. of 10 variables:
## $ X : int 0 1 2 3 4 5 6 7 8 9 ...
## $ Athlete : chr "Halil Mutlu (TUR)" "Wu Wenxiong (CHN)" "Zhang Xiangxiang (CHN)" "Wang Shin-yuan (TPE)" ...
## $ Bodyweight..kg. : num 55.6 55.5 55.9 55.4 55.7 ...
## $ Snatch..kg. : num 138 125 125 125 120 ...
## $ Clean...Jerk..kg.: num 168 162 162 160 155 ...
## $ Total..kg. : num 305 288 288 285 275 ...
## $ Ranking : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Title : chr "Weightlifting at the 2000 Summer Olympics – Men's 56 kg" "Weightlifting at the 2000 Summer Olympics – Men's 56 kg" "Weightlifting at the 2000 Summer Olympics – Men's 56 kg" "Weightlifting at the 2000 Summer Olympics – Men's 56 kg" ...
## $ Year : int 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 ...
## $ Gender : chr "Men" "Men" "Men" "Men" ...
detach(Weightlifting.data)
attach(Lifting.data)
Lifting.data <- Lifting.data %>% mutate(Title = as.factor(Lifting.data$Title), Year = as.factor(Lifting.data$Year), Gender = as.factor(Lifting.data$Gender))head(Lifting.data)Des <- data.frame(
Column = c("X","Athlete", "Bodyweight", "Snatch", "Clean & Jerk", "Total", "Ranking", "Title", "Year", "Gender"),
Type = c("Int", "Chr", "Num", "Num", "Num", "Num", "Int", "Fct", "Fct", "Fct"),
Description = c(
"Athletes id",
"Name",
"Athletes' bodyweight",
"Maximum weight of Snatch",
"Maximum weight of Clean&Jerk",
"Maximum lifts of both added",
"Athletes' placement",
"Athletes' bodyweight class/category",
"Olympic Year",
"Men or Women"))
datatable(Des, rownames = FALSE, caption = 'Lifting.data description')# Separating genders into two different data frames for further research
LiftingWomen <- Lifting.data %>%
filter(Gender == "Women")
LiftingMen <- Lifting.data %>%
filter(Gender == "Men")datatable(Lifting.data,
colnames = c('Bodyweight' = 3, 'Snatch' = 4, 'Clean&Jerk' = 5, 'Total' = 6),
rownames = FALSE,
filter="top",
options = list(pageLength = 5, scrollX=T, autoWidth = TRUE) )Lifting.data %>%
group_by(Gender) %>%
summarise(count = n(), min = min(Total..kg.),
mean = mean(Total..kg.), sd = sd(Total..kg.), median = median(Total..kg.),
max = max(Total..kg.))Lifting.data %>%
group_by(Year) %>%
filter(Gender == "Men") %>%
summarise(count = n(), min = min(Total..kg.),
mean = mean(Total..kg.), sd = sd(Total..kg.), median = median(Total..kg.),
max = max(Total..kg.))Lifting.data %>%
group_by(Year) %>%
filter(Gender == "Women") %>%
summarise(count = n(), min = min(Total..kg.),
mean = mean(Total..kg.), sd = sd(Total..kg.), median = median(Total..kg.),
max = max(Total..kg.))Linear Regression
Variance Inflation Factor (VIF)
Linear Regression
T-test
One sided Hypothesis testing
Compares the mean in two groups
Analysis of Variance and Anova
Contrast
poinpl.w.20 <- LiftingWomen %>%
ggplot(aes(x = Clean...Jerk..kg., y = Snatch..kg., color = Total..kg.)) +
geom_point() + geom_smooth(method = "lm") +
scale_color_gradient(low="blue", high="red") +
labs(fill = "Total Weight") +
xlab("Clean & Jerk") +
ylab("Snatch") +
labs(subtitle = "Women") +
theme_ipsum()
poinpl.w.00 <- LiftingMen %>%
ggplot(aes(x = Clean...Jerk..kg., y = Snatch..kg., color = Total..kg.)) +
geom_point() + geom_smooth(method = "lm") +
scale_color_gradient(low="blue", high="red") +
labs(color = "Total Weight") +
xlab("Clean & Jerk") +
ylab("Snatch") +
labs(subtitle = "Men") +
theme_ipsum()
plot <- ggarrange(poinpl.w.00, poinpl.w.20,
common.legend = TRUE,
ncol = 2, nrow = 1, legend = "bottom")## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
annotate_figure(plot, top = text_grob("Clean & Jerk and Snatch Relationship",
color = "black", face = "bold", size = 14))corr.lm = lm(Clean...Jerk..kg. ~ Snatch..kg., data = Lifting.data)
summary(corr.lm)##
## Call:
## lm(formula = Clean...Jerk..kg. ~ Snatch..kg., data = Lifting.data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -23.4084 -4.5620 -0.1226 4.1845 31.1899
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.997528 0.724687 13.8 <2e-16 ***
## Snatch..kg. 1.153572 0.005326 216.6 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.792 on 1231 degrees of freedom
## Multiple R-squared: 0.9744, Adjusted R-squared: 0.9744
## F-statistic: 4.691e+04 on 1 and 1231 DF, p-value: < 2.2e-16
vif = 1/(1-0.9744)
vif## [1] 39.0625
With a VIF value of 15 and R-squared as 0.93, we can assume that clean-and-Jerk and Snatch are correlated with each other. If an athlete has a heavy Clean-and-Jerk, we can assume that he also has a heavy Snatch.
These results also indicate that if we want to do further testing, we do not want to have both variables together in a regression model.
What is the linear combination to predict the total weight from the athlete using bodyweight, Gender, and Clean-and-Jerk or Snatch?
In the final regression model, does Clean-and-Jerk or Snatch fit better.
The exploratory variable is Total Weight and the Responce variable is Bodyweight, Gender and Clean-and-Jerk or Snatch.
Lifting.data %>% ggpairs(columns = 3:6, aes(color = Gender, alpha = 0.5), upper = list(continuous = wrap("cor", size = 2.5)))clean.lm = lm(Total..kg. ~ Clean...Jerk..kg. + Bodyweight..kg. + Gender, data = Lifting.data)
summary(clean.lm)##
## Call:
## lm(formula = Total..kg. ~ Clean...Jerk..kg. + Bodyweight..kg. +
## Gender, data = Lifting.data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -25.4820 -3.5256 -0.1228 3.8385 21.0726
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.4400377 1.3364284 -1.826 0.0681 .
## Clean...Jerk..kg. 1.8313704 0.0099139 184.728 <2e-16 ***
## Bodyweight..kg. 0.0008984 0.0104507 0.086 0.9315
## GenderWomen -1.4104495 0.6402219 -2.203 0.0278 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.809 on 1229 degrees of freedom
## Multiple R-squared: 0.9945, Adjusted R-squared: 0.9945
## F-statistic: 7.466e+04 on 3 and 1229 DF, p-value: < 2.2e-16
plot(clean.lm, which = 1)snatch.lm = lm(Total..kg. ~ Snatch..kg. + Bodyweight..kg. + Gender, data = Lifting.data)
summary(snatch.lm)##
## Call:
## lm(formula = Total..kg. ~ Snatch..kg. + Bodyweight..kg. + Gender,
## data = Lifting.data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -22.0683 -4.2852 0.0374 4.3012 29.0767
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 20.13082 1.36685 14.728 <2e-16 ***
## Snatch..kg. 2.02465 0.01220 165.995 <2e-16 ***
## Bodyweight..kg. 0.11906 0.01107 10.752 <2e-16 ***
## GenderWomen -6.45724 0.68654 -9.406 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.437 on 1229 degrees of freedom
## Multiple R-squared: 0.9933, Adjusted R-squared: 0.9933
## F-statistic: 6.071e+04 on 3 and 1229 DF, p-value: < 2.2e-16
plot(snatch.lm, which = 1)log.snatch.lm = lm(log(Total..kg.) ~ log(Snatch..kg.) + Bodyweight..kg. + Gender, data = Lifting.data)
summary(log.snatch.lm)##
## Call:
## lm(formula = log(Total..kg.) ~ log(Snatch..kg.) + Bodyweight..kg. +
## Gender, data = Lifting.data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.090294 -0.014899 0.000236 0.015333 0.103899
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.1819436 0.0245252 48.19 < 2e-16 ***
## log(Snatch..kg.) 0.9181441 0.0052970 173.33 < 2e-16 ***
## Bodyweight..kg. 0.0003341 0.0000370 9.03 < 2e-16 ***
## GenderWomen -0.0200015 0.0024662 -8.11 1.21e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0229 on 1229 degrees of freedom
## Multiple R-squared: 0.9933, Adjusted R-squared: 0.9933
## F-statistic: 6.086e+04 on 3 and 1229 DF, p-value: < 2.2e-16
plot(log.snatch.lm, which = 1)In the model with Clean-and-Jerk variable, the R-squared is .99 which is extremely good. However, the regression line in the plot has a lot of residual errors with the value of 5.8. One thing that is interesting about the model is that the p-value for the bodyweight coefficient is .93. A variable with that high of a p-value is irrelevant to the model.
In the model with the Snatch, the R-squared and the regression line stay the same but all p-values for the variables are very low. That means that all variables are relevant to the exploratory variable and that makes this model better than the previous one. However, we want to make it even better and we do so by transforming some of the variables.
By taking the logarithm of the exploratory variable and the logarithm of the snatch variable makes this model the ideal regression model. The R-squared stays at .99 but the residual standard error went down to .02. This is our final model.
Have Men and Women improved significantly in Olympic Weightlifting from the year 2000 to 2020?
To do this test we are going to compare men and women separately and only use the data from the 2000 Summer Olympics and the 2020 Summer Olympics.
LiftingMen$Year <- as.character(LiftingMen$Year)
LiftingWomen$Year <- as.character(LiftingWomen$Year)
box.men <- LiftingMen %>%
group_by(Year) %>%
filter(Year == 2000 | Year == 2020) %>%
ggplot(aes(x = Year, y = Total..kg., fill = Year)) +
geom_boxplot() +
scale_fill_brewer(palette="Pastel2") +
ylab("Total Weight in Kg") +
labs(subtitle = "Men") +
theme_ipsum()
box.women <- LiftingWomen %>%
group_by(Year) %>%
filter(Year == 2000 | Year == 2020) %>%
ggplot(aes(x = Year, y = Total..kg., fill = Year)) +
geom_boxplot() +
scale_fill_brewer(palette="Pastel2") +
ylab("Total Weight in Kg") +
labs(subtitle = "Women") +
theme_ipsum()
box.plot <- ggarrange(box.men, box.women,
common.legend = TRUE,
ncol = 2, nrow = 1, legend = "bottom")
annotate_figure(box.plot, top = text_grob("Total Weight difference between 2000 and 2020 Summer Olympics",
color = "black", face = "bold", size = 14))density.men <- LiftingMen %>%
filter(Year == 2000 | Year == 2020) %>%
ggplot(aes(x = Total..kg., y = Year, fill = stat(x), alpha = 0.7)) +
geom_density_ridges_gradient(scale = 3, rel_min_height = 0.01, alpha = 0.5) +
scale_fill_viridis(name = "Total Weight", option = "C") +
labs(title = 'Difference in Mens total weight') +
xlab("Total Weight in Kg") +
theme_ipsum()
density.men## Picking joint bandwidth of 19.9
H0: µ2000 = µ2020
Hα: µ2000 < µ2020
# Seperating Mens Summer Olympics 2000 and 2020
Men2000 <- LiftingMen %>%
filter(Year == 2000)
Men2020 <- LiftingMen %>%
filter(Year == 2020)t.test(Men2020$Total..kg., Men2000$Total..kg., paired = FALSE, alternative = "greater")##
## Welch Two Sample t-test
##
## data: Men2020$Total..kg. and Men2000$Total..kg.
## t = -0.15364, df = 159.46, p-value = 0.561
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## -14.33783 Inf
## sample estimates:
## mean of x mean of y
## 344.3803 345.5986
density.women <- LiftingWomen %>%
filter(Year == 2000 | Year == 2020) %>%
ggplot(aes(x = Total..kg., y = Year, fill = stat(x), alpha = 0.7)) +
geom_density_ridges_gradient(scale = 3, rel_min_height = 0.01, alpha = 0.5) +
scale_fill_viridis(name = "Total Weight", option = "C") +
labs(title = 'Difference in Womens total weight') +
xlab("Total Weight in Kg") +
theme_ipsum()
density.women## Picking joint bandwidth of 11.3
H0: µ2000 = µ2020
Hα: µ2000 < µ2020
# Seperating Womens Summer Olympics 2000 and 2020
Women2000 <- LiftingWomen %>%
filter(Year == 2000)
Women2020 <- LiftingWomen %>%
filter(Year == 2020)t.test(Women2020$Total..kg., Women2000$Total..kg., paired = FALSE, alternative = "greater")##
## Welch Two Sample t-test
##
## data: Women2020$Total..kg. and Women2000$Total..kg.
## t = 2.0306, df = 152.79, p-value = 0.02201
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 1.926703 Inf
## sample estimates:
## mean of x mean of y
## 217.5595 207.1474
Does a higher bodyweight increase your total weight in Olympic Weightlifting?
We will be testing men and women separately and using data from the 2000 and 2020 Summer Olympics. We have a total of 14 bodyweight categories for both men and women.
# Separating light bodyweight and heavy bodyweight into mens and womens data frames from the Years 2000 and 2020
Men.low.bodyweight <- Lifting.data %>%
group_by(Title) %>%
filter(Gender == "Men" &&
Year == 2000 &&
Title == "Weightlifting at the 2000 Summer Olympics – Men's 56 kg" |
Title == "Weightlifting at the 2000 Summer Olympics – Men's 62 kg" |
Title == "Weightlifting at the 2000 Summer Olympics – Men's 69 kg" |
Title == "Weightlifting at the 2000 Summer Olympics – Men's 77 kg" |
Year == 2020 &&
Title == "Weightlifting at the 2020 Summer Olympics – Men's 61 kg" |
Title == "Weightlifting at the 2020 Summer Olympics – Men's 67 kg" |
Title == "Weightlifting at the 2020 Summer Olympics – Men's 73 kg")
Men.high.bodyweight <- Lifting.data %>%
group_by(Title) %>%
filter(Gender == "Men" &&
Year == 2000 &&
Title == "Weightlifting at the 2000 Summer Olympics – Men's 85 kg" |
Title == "Weightlifting at the 2000 Summer Olympics – Men's 94 kg" |
Title == "Weightlifting at the 2000 Summer Olympics – Men's 105 kg" |
Title == "Weightlifting at the 2000 Summer Olympics – Men's +105 kg" |
Year == 2020 &&
Title == "Weightlifting at the 2020 Summer Olympics – Men's +109 kg" |
Title == "Weightlifting at the 2020 Summer Olympics – Men's 109 kg" |
Title == "Weightlifting at the 2020 Summer Olympics – Men's 81 kg")men.2000.plot <- Men2000 %>%
group_by(Title) %>%
ggplot(aes(x = Title, y = Total..kg., fill = Title)) +
geom_boxplot() +
ylim(150, 500) +
ylab("Total Weight in Kg") +
labs(subtitle = "2000 Summer Olympics") +
scale_x_discrete(
name = "Category",
limits = c("Weightlifting at the 2000 Summer Olympics – Men's 56 kg",
"Weightlifting at the 2000 Summer Olympics – Men's 62 kg",
"Weightlifting at the 2000 Summer Olympics – Men's 69 kg",
"Weightlifting at the 2000 Summer Olympics – Men's 77 kg",
"Weightlifting at the 2000 Summer Olympics – Men's 85 kg",
"Weightlifting at the 2000 Summer Olympics – Men's 94 kg",
"Weightlifting at the 2000 Summer Olympics – Men's 105 kg",
"Weightlifting at the 2000 Summer Olympics – Men's +105 kg"),
labels=c(' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ')) +
scale_fill_discrete(
breaks = c("Weightlifting at the 2000 Summer Olympics – Men's 56 kg",
"Weightlifting at the 2000 Summer Olympics – Men's 62 kg",
"Weightlifting at the 2000 Summer Olympics – Men's 69 kg",
"Weightlifting at the 2000 Summer Olympics – Men's 77 kg",
"Weightlifting at the 2000 Summer Olympics – Men's 85 kg",
"Weightlifting at the 2000 Summer Olympics – Men's 94 kg",
"Weightlifting at the 2000 Summer Olympics – Men's 105 kg",
"Weightlifting at the 2000 Summer Olympics – Men's +105 kg"),
labels = c("56 kg",
"62 kg",
"69 kg",
"77 kg",
"85 kg",
"94 kg",
"105 kg",
"+105 kg"),
name = " ") +
theme_ipsum()
men.2020.plot <- Men2020 %>%
group_by(Title) %>%
ggplot(aes(x = Title, y = Total..kg., fill = Title)) +
geom_boxplot() +
ylim(150, 500) +
ylab("Total Weight in Kg") +
labs(subtitle = "2020 Summer Olympics") +
scale_x_discrete(
name = "Category",
limits = c("Weightlifting at the 2020 Summer Olympics – Men's 61 kg",
"Weightlifting at the 2020 Summer Olympics – Men's 67 kg",
"Weightlifting at the 2020 Summer Olympics – Men's 73 kg",
"Weightlifting at the 2020 Summer Olympics – Men's 81 kg",
"Weightlifting at the 2020 Summer Olympics – Men's 109 kg",
"Weightlifting at the 2020 Summer Olympics – Men's +109 kg"),
labels=c(' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ')) +
scale_fill_discrete(
breaks = c("Weightlifting at the 2020 Summer Olympics – Men's 61 kg",
"Weightlifting at the 2020 Summer Olympics – Men's 67 kg",
"Weightlifting at the 2020 Summer Olympics – Men's 73 kg",
"Weightlifting at the 2020 Summer Olympics – Men's 81 kg",
"Weightlifting at the 2020 Summer Olympics – Men's 109 kg",
"Weightlifting at the 2020 Summer Olympics – Men's +109 kg"),
labels = c("61 kg",
"67 kg",
"73 kg",
"81 kg",
"109 kg",
"+109 kg"),
name = " ") +
theme_ipsum()
ggarrange(men.2000.plot, men.2020.plot,
labels = c("Total Weight between Men's Categories"),
ncol = 2, nrow = 1, legend = "bottom")women.2000.plot <- Women2000 %>%
group_by(Title) %>%
ggplot(aes(x = Title, y = Total..kg., fill = Title)) +
geom_boxplot() +
ylim(110, 325) +
ylab("Total Weight in Kg") +
labs(subtitle = "2000 Summer Olympics") +
scale_x_discrete(
name = "Category",
limits = c("Weightlifting at the 2000 Summer Olympics – Women's 48 kg",
"Weightlifting at the 2000 Summer Olympics – Women's 53 kg",
"Weightlifting at the 2000 Summer Olympics – Women's 58 kg",
"Weightlifting at the 2000 Summer Olympics – Women's 63 kg",
"Weightlifting at the 2000 Summer Olympics – Women's 69 kg",
"Weightlifting at the 2000 Summer Olympics – Women's 75 kg",
"Weightlifting at the 2000 Summer Olympics – Women's +75 kg"),
labels=c(' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ')) +
scale_fill_discrete(
breaks = c("Weightlifting at the 2000 Summer Olympics – Women's 48 kg",
"Weightlifting at the 2000 Summer Olympics – Women's 53 kg",
"Weightlifting at the 2000 Summer Olympics – Women's 58 kg",
"Weightlifting at the 2000 Summer Olympics – Women's 63 kg",
"Weightlifting at the 2000 Summer Olympics – Women's 69 kg",
"Weightlifting at the 2000 Summer Olympics – Women's 75 kg",
"Weightlifting at the 2000 Summer Olympics – Women's +75 kg"),
labels = c("48 kg",
"53 kg",
"58 kg",
"63 kg",
"69 kg",
"75 kg",
"+75 kg"),
name = " ") +
theme_ipsum(base_family = "Arial Narrow")
women.2020.plot <- Women2020 %>%
group_by(Title) %>%
ggplot(aes(x = Title, y = Total..kg., fill = Title)) +
geom_boxplot() +
ylim(110, 325) +
ylab("Total Weight in Kg") +
labs(subtitle = "2020 Summer Olympics") +
scale_x_discrete(
name = "Category",
limits = c("Weightlifting at the 2020 Summer Olympics – Women's 49 kg",
"Weightlifting at the 2020 Summer Olympics – Women's 55 kg",
"Weightlifting at the 2020 Summer Olympics – Women's 59 kg",
"Weightlifting at the 2020 Summer Olympics – Women's 64 kg",
"Weightlifting at the 2020 Summer Olympics – Women's 76 kg",
"Weightlifting at the 2020 Summer Olympics – Women's 87 kg",
"Weightlifting at the 2020 Summer Olympics – Women's +87 kg"),
labels=c(' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ')) +
scale_fill_discrete(
breaks = c("Weightlifting at the 2020 Summer Olympics – Women's 49 kg",
"Weightlifting at the 2020 Summer Olympics – Women's 55 kg",
"Weightlifting at the 2020 Summer Olympics – Women's 59 kg",
"Weightlifting at the 2020 Summer Olympics – Women's 64 kg",
"Weightlifting at the 2020 Summer Olympics – Women's 76 kg",
"Weightlifting at the 2020 Summer Olympics – Women's 87 kg",
"Weightlifting at the 2020 Summer Olympics – Women's +87 kg"),
labels = c("49 kg",
"55 kg",
"59 kg",
"64 kg",
"76 kg",
"87 kg",
"+87 kg"),
name = " ") +
theme_ipsum(base_family = "Arial Narrow")
ggarrange(women.2000.plot, women.2020.plot,
labels = c("Total Weight between Women's Categories"),
ncol = 2, nrow = 1, legend = "bottom")Step 1, we test if all bodyweight categories have the same mean in total weight. The best way for that is to do hypothesis testing by fitting an Analysis of Variance Model into an Anova test.
Step 2, we are interested in comparing the average total weight of the heaviest bodyweight categories vs. lightest bodyweight categories. The best way of testing is to Compute and test arbitrary contrasts for regression objects.
Men
H0: µ56kg = µ61kg = µ62kg = µ67kg = µ69kg = µ73kg = µ77kg = µ81kg = µ85kg = µ94kg = µ105kg = µ105+kg = µ109kg = µ109+kg
Hα: At least one mean is different
Women
H0: µ48kg = µ49kg = µ53kg = µ55kg = µ58kg = µ59kg = µ63kg = µ64kg = µ69kg = µ75kg = µ75+kg = µ76kg = µ87kg = µ87+kg
Hα: At least one mean is different
# Data frame with only males athletes from 2000 and 2020
men.2000.2020 <- LiftingMen %>% filter(Year == 2000 | Year == 2020) %>% mutate(Title = as.factor(Title))men.aov <- aov(men.2000.2020$Total..kg. ~ men.2000.2020$Title)
anova(men.aov)# Data frame with only female athletes from 2000 and 2020
women.2000.2020 <- LiftingWomen %>% filter(Year == 2000 | Year == 2020) %>% mutate(Title = as.factor(Title))women.aov <- aov(women.2000.2020$Total..kg. ~ women.2000.2020$Title)
anova(women.aov)Both genders have a p-value of 2.2e-16 so we reject both null hypothesis on a 5% significance level. There is at least one bodyweight category for both genders that does not have the same mean as another.
The p-value is less than 0.05 so we can reject the null hypothesis on a 5% significance level.
Men
Women
H0: µlower = µhigher
Hα: µ2lower < µhigher
fit.contrast(men.aov, men.2000.2020$Title, c(-(1/7), -(1/7), (1/7), (1/7), (1/7), (1/7), -(1/7), -(1/7), -(1/7), -(1/7), (1/7), (1/7), (1/7), -(1/7)), conf.int = 0.95)## Estimate
## men.2000.2020$Title c=( -0.142857142857143 -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 ) -82.96056
## Std. Error
## men.2000.2020$Title c=( -0.142857142857143 -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 ) 4.18016
## t value
## men.2000.2020$Title c=( -0.142857142857143 -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 ) -19.84626
## Pr(>|t|)
## men.2000.2020$Title c=( -0.142857142857143 -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 ) 4.632297e-49
## lower CI
## men.2000.2020$Title c=( -0.142857142857143 -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 ) -91.20366
## upper CI
## men.2000.2020$Title c=( -0.142857142857143 -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 ) -74.71747
## attr(,"class")
## [1] "fit_contrast"
fit.contrast(women.aov, women.2000.2020$Title, c(-(1/7), (1/7), (1/7), (1/7), (1/7), -(1/7), -(1/7), -(1/7), (1/7), (1/7), (1/7), -(1/7), -(1/7), -(1/7)), conf.int = 0.95)## Estimate
## women.2000.2020$Title c=( -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 ) -44.56471
## Std. Error
## women.2000.2020$Title c=( -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 ) 3.434913
## t value
## women.2000.2020$Title c=( -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 ) -12.97404
## Pr(>|t|)
## women.2000.2020$Title c=( -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 ) 3.467122e-26
## lower CI
## women.2000.2020$Title c=( -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 ) -51.35251
## upper CI
## women.2000.2020$Title c=( -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 0.142857142857143 0.142857142857143 0.142857142857143 -0.142857142857143 -0.142857142857143 -0.142857142857143 ) -37.7769
## attr(,"class")
## [1] "fit_contrast"
As you can tell by the data presented in the research, the weight categories for men and women change between Olympics. The adjustment is made to improve the sport and make it fair between the athletes (Why change weightlifting’s bodyweight categories … again?).
The bodyweight categories will change for the Olympics in 2024. To reach full gender equality, there will be 5 weight categories for each gender. Men will compete in 61kg, 73kg, 89kg, 102kg and +102kg. Women will compete in 49kg, 59kg, 71kg, 81kg and +81kg (Paris 2024: Weight categories for the Olympic Weightlifting Competition).
Those categories are very different from the ones in our data. There is a bigger range between each men’s category and the heaviest one will be 7kg lighter than the heaviest one from our data. The women’s future plus category is between the plus categories from 2000 and 2020. In our graphical analysis we concluded that the women in the plus category in 2000 were stronger than the one in the plus category from 2020. Since the plus category in 2024 will be in between them, it is exciting to see if the results will improve from previous Olympic Games.
Dickson, J. (2023b, February 10). The snatch vs. the Clean & Jerk: Pros and cons of the two olympic lifts. BarBend. https://barbend.com/snatch-vs-clean-and-jerk/
Multicollinearity. JMP. (n.d.). https://www.jmp.com/en_us/statistics-knowledge-portal/what-is-multiple-regression/multicollinearity.html
Paris 2024: Weight categories for the Olympic Weightlifting Competition. (n.d.). https://olympics.com/en/news/paris-2024-weight-categories-olympic-weightlifting-competition
Why change weightlifting’s bodyweight categories … again? SportsEdTV. (n.d.). https://sportsedtv.com/blog/why-change-weightliftings-bodyweight-categories-again