DATA 606 Project
DATA 606 Project
Load Libraries
Introduction
Research question
My research question is to investigate the relationship between the amount that someone lifts and their age, bodyweight, and the other factors that are involved in this dataset. I want to detemine what variables are the most important determining factors when it comes to total Olympic lifting weight.
Cases
What are the cases, and how many are there? There are 52347 cases of lifting performances and these span many years and also many different locations,countries, indivividuals. They include the weight of the individual and what their lifting numbers where along with plenty of other attributes about the person
Data collection
Describe the method of data collection. I found this data from a research paper that was conducted on the sinclair coefficient https://www.reddit.com/r/weightlifting/comments/6wjdxm/analysis_of_weightlifting_data/
Type of study
This is an observational study
Dependent Variable
The response variable is the lifting numbers so both the snatch result and the clean and jerk result, the result that I will pay attention to the mostis the Total variable which is the two lift results added together.
Independent Variable
Some of the quantitative variables are both bodyweight and age which can both have a tremendous effect on the lifting numbers of the individual, one qualitative variable is the location or country that the individual is from. Another variable that I am paying attention to is the gender of the lifters.
Independence of Cases
I am assuming that each lifting record are independent of each other I think this is a fair assumption.
Data
Data Preparation
Reorganizing the Locations to currently display where the competition took place, also cleaned up the age variable to remove faulty values I use the separate function to turn the event locaiton into Cities, Countries and also States where applicable
data <- read.csv("https://raw.githubusercontent.com/crarnouts/CUNY-MSDS/master/lifting_data.csv", header = TRUE)
data$Event.Date.1 <- NULL
data$Event.Year.1 <- NULL
data <- data %>% filter(AGE<75)
data <- data %>% separate(Event.Location, c("Country","City"),sep ="-")
data$Country_2 <- data$Country
data <- data %>% separate(Country_2, c("City_2","State"),sep =",")
data$State[which(is.na(data$State))]<-0
data$Country <- ifelse(data$State != 0, "USA",data$Country)
data$City <- ifelse(data$State != 0, data$City_2,data$City)
data$City_2 <- NULL
data <- data[c(1,2,3,4,22,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21)]
data$State <- ifelse(data$State == 0, NA,data$State)
data$AGE <- as.numeric(data$AGE)
data$Body.Weight_1 <- as.character(data$Body.Weight)
data$Body.Weight_2 <- as.numeric(data$Body.Weight_1)
data$Sinclair.Total <- as.numeric(data$Sinclair.Total)
data$DOB <- as.Date(data$DOB)
data <- filter(data, data$Total >0)
dataset <- data
data <- filter(data,data$Total > 0)
data$bw <- data$Body.Weight_2*2.2
data <- filter(data, data$bw > 0)
data$Nation <- as.factor(data$Nation)
data$Country <- as.factor(data$Country)
data$Total <- data$Total*2.2
data <- sample_n(data,10000)
Exploratory Analysis
One of my first throughts is that Body weight and gender were the biggest contributing factors so I wanted to do a regression of those variables and the result was relatively similar to what I expected Males have a higher lifting total and as Bodyweight goes up the Total that someone lifts also goes up. In the graphs below I fit regression lines to age, bodyweight and date of birth
lift_pred<- lm(Total ~ Body.Weight_2 + Gender, data = data)
summary(lift_pred)
##
## Call:
## lm(formula = Total ~ Body.Weight_2 + Gender, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -544.79 -66.60 12.46 79.05 276.32
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 102.00465 4.00633 25.46 <2e-16 ***
## Body.Weight_2 4.15826 0.05656 73.52 <2e-16 ***
## GenderMale 148.78476 2.50672 59.35 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 113.7 on 9997 degrees of freedom
## Multiple R-squared: 0.5881, Adjusted R-squared: 0.588
## F-statistic: 7135 on 2 and 9997 DF, p-value: < 2.2e-16
#multiLines(lift_pred)
data_mult <- data[c(9,10,16,25)]
data_mult2 = melt(data_mult, id.vars='Total')
ggplot(data_mult2) +
geom_jitter(aes(value,Total, colour=variable),) + geom_smooth(aes(value,Total, colour=variable), method=lm, se=FALSE) + facet_wrap(~variable, scales="free_x") +
labs(x = "Variable", y = "Total")
data_2 <- data
ggplot(data_2, aes(x=data$bw, y=data_2$Snatch.Result)) +
geom_point()
ggplot(data_2, aes(x=data$bw, y=data_2$Clean...Jerk.Result)) +
geom_point()
Summary Statistics
The Graphs beow show the relationship between age and total and they color the dots by Country and then by Gender respectively.
ggplot(data, aes(x=data$AGE, y=data$Total, color = data$Country)) +
geom_point(shape=1)
ggplot(data, aes(x=data$AGE, y=data$Total, color = data$Gender)) +
geom_point(shape=1)+
scale_colour_hue(l=50) + # Use a slightly darker palette than normal
geom_smooth(method=lm, # Add linear regression lines
se=FALSE, # Don't add shaded confidence region
fullrange=TRUE) # Extend regression lines
Bodyweight and Gender are probably the two most significant attributes to estimating the total of someone’s lifts
Combined Bodyweight and Gender explain roughly 57% of the variance in the data. The R squared value for the linear model that contained both of these variabels was .57.
lift_pred<- lm(Total ~ bw + Gender, data = data)
summary(lift_pred)
##
## Call:
## lm(formula = Total ~ bw + Gender, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -544.79 -66.60 12.46 79.05 276.32
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 102.00465 4.00633 25.46 <2e-16 ***
## bw 1.89012 0.02571 73.52 <2e-16 ***
## GenderMale 148.78476 2.50672 59.35 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 113.7 on 9997 degrees of freedom
## Multiple R-squared: 0.5881, Adjusted R-squared: 0.588
## F-statistic: 7135 on 2 and 9997 DF, p-value: < 2.2e-16
Decision Trees
The decision tree displays the different factors that determine a personal overall lifting numbers. It shows specific groupings and displays the lifting metrics for those specific buckets.
tree_data <- sample_n(data, 10000)
tree_data$Event.Date <- NULL
tree_data$Event.Year<- NULL
tree_data$Event.Name<- NULL
tree_data$Country<- NULL
tree_data$State<- NULL
tree_data$Name<- NULL
tree_data$City<- NULL
tree_data$Body.Weight<- NULL
tree_data$Body.Weight_1<- NULL
tree_data$Sinclair.Total<- NULL
tree_data$Sinclair.Coefficient<- NULL
tree_data$Sinclair.A<- NULL
tree_data$Sinclair.b<- NULL
tree_data$AX2<- NULL
tree_data$X<- NULL
tree_data$Clean...Jerk.Result<- NULL
tree_data$Snatch.Result<- NULL
tree_data$Nation<- NULL
tree_data$Category<- NULL
tree_data$Body.Weight_2 <- NULL
form <- as.formula(Total ~ .)
# A fast plot
decision_tree <- fancyRpartPlot(rpart(form,tree_data),caption ="Olympic Lifting Totals in lbs by Attributes")
###The decision tree above is measuring how much weight someone’s combined total is based off all of the attributes. As you can see some of the most important variables are bodyweight, gender, and age.
Boxplot of the different categorical variables
Looking at the strenght of men compared to women and also relative strenght of men vs women, to take the bodyweight component out of it
p <- ggplot(data, aes(x=Gender, y=Total)) +
geom_boxplot()
p
#look at strenght to weight ration for males vs females
data$rel_str <- (data$Total/data$bw)/2
p <- ggplot(data, aes(x=Gender, y=rel_str)) +
geom_boxplot()
p
###Looking at the relative strenght of males vs females is a little more fair because most males are significantly heavier than females. There is some overlap in relative strength. When calculating the relative strenght varibale I divided it by two so that it would be more representative of what they are actually lifting at any one time, because the raw total is actually two lifts combined together.
Inference
signifigance of Age Variable
\[ H_0:\] There is no relationship between age and the amount that someone lifts \[ H_A:\] There is a relationship between age and the amount that someone lifts
#Look at the Relationship Between Age and Total
Age_model <- lm(Total ~ AGE, data = data)
summary(Age_model)
##
## Call:
## lm(formula = Total ~ AGE, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -728.03 -123.97 -14.05 121.18 608.15
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 210.6617 6.7390 31.26 <2e-16 ***
## AGE 13.7993 0.3162 43.64 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 162.4 on 9998 degrees of freedom
## Multiple R-squared: 0.16, Adjusted R-squared: 0.1599
## F-statistic: 1905 on 1 and 9998 DF, p-value: < 2.2e-16
p <- ggplot(data = data, aes(x=AGE, y=Total)) +
geom_point(shape=1) + # Use hollow circles
geom_smooth(method=lm) # Add linear regression line
# (by default includes 95% confidence region)
p
Since the p-value is virtually zero we can conclude that the Relationship is indeed significant
Linear Regression Relationsip
\[ \hat{total} = 236.7 + 12.3844 * Age(years) \]
Signifigance of Body Weight Variable
\[ H_0:\] There is no relationship between Body Weight and the amount that someone lifts \[ H_A:\] There is a relationship between Body Weight and the amount that someone lifts
#Look at the Relationship Between Age and Total
bw_model <- lm(Total ~ bw, data = data)
summary(Age_model)
##
## Call:
## lm(formula = Total ~ AGE, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -728.03 -123.97 -14.05 121.18 608.15
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 210.6617 6.7390 31.26 <2e-16 ***
## AGE 13.7993 0.3162 43.64 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 162.4 on 9998 degrees of freedom
## Multiple R-squared: 0.16, Adjusted R-squared: 0.1599
## F-statistic: 1905 on 1 and 9998 DF, p-value: < 2.2e-16
p <- ggplot(data = data, aes(x=bw, y=Total)) +
geom_point(shape=1) + # Use hollow circles
geom_smooth(method=lm) # Add linear regression line
# (by default includes 95% confidence region)
p
Since the p-value is virtually zero we can conclude that the Relationship is indeed significant
Linear Regression Relationsip
\[ \hat{total} = 105.66 + 2.414 * Bodyweight(lbs) \]
Polynomial Regresssion
Polynomial Regression may be potentially more accurate because the line that best fits the data no longer has to be a straight line.
Looking at the age variable with polynomial regression
#Logistic Regression using Age as the Primary variable
fit <- lm(Total ~ AGE + I(AGE^2) + I(AGE^3), data = data)
prd <- data.frame(AGE = seq(from = range(data$AGE)[1], to = range(data$AGE)[2], length.out = 100))
err <- predict(fit, newdata = prd, se.fit = TRUE)
prd$lci <- err$fit - 1.96 * err$se.fit
prd$fit <- err$fit
prd$uci <- err$fit + 1.96 * err$se.fit
p <- ggplot(prd, aes(x = AGE, y = fit)) +
theme_bw() +
geom_line() +
geom_smooth(aes(ymin = lci, ymax = uci), stat = "identity") +
geom_point(data = data, aes(x = AGE, y = Total))
ggplotly(p)
Looking at the bodyweight variable with polynomial regression
fit <- lm(Total ~ bw + I(bw^2) + I(bw^3), data = data)
prd <- data.frame(bw = seq(from = range(data$bw)[1], to = range(data$bw)[2], length.out = 100))
err <- predict(fit, newdata = prd, se.fit = TRUE)
prd$lci <- err$fit - 1.96 * err$se.fit
prd$fit <- err$fit
prd$uci <- err$fit + 1.96 * err$se.fit
p1 <-
ggplot(prd, aes(x = bw, y = fit)) +
theme_bw() +
geom_line() +
geom_smooth(aes(ymin = lci, ymax = uci), stat = "identity") +
geom_point(data = data, aes(x = bw, y = Total))
ggplotly(p1)
Linear vs Polynomial Regression
BodyWeight Models
summary(fit)
##
## Call:
## lm(formula = Total ~ bw + I(bw^2) + I(bw^3), data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -536.50 -79.73 4.81 92.61 305.43
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.332e+02 3.248e+01 -19.50 <2e-16 ***
## bw 1.280e+01 5.226e-01 24.50 <2e-16 ***
## I(bw^2) -4.292e-02 2.655e-03 -16.17 <2e-16 ***
## I(bw^3) 5.129e-05 4.248e-06 12.07 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 125.5 on 9996 degrees of freedom
## Multiple R-squared: 0.4986, Adjusted R-squared: 0.4984
## F-statistic: 3313 on 3 and 9996 DF, p-value: < 2.2e-16
summary(bw_model)
##
## Call:
## lm(formula = Total ~ bw, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -635.02 -88.07 -2.17 95.62 322.32
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 97.92688 4.65815 21.02 <2e-16 ***
## bw 2.46724 0.02767 89.15 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 132.3 on 9998 degrees of freedom
## Multiple R-squared: 0.4429, Adjusted R-squared: 0.4428
## F-statistic: 7948 on 1 and 9998 DF, p-value: < 2.2e-16
plot(fit$residuals ~ data$bw)
abline(h = 0, lty = 3)
plot(bw_model$residuals ~ data$bw)
abline(h = 0, lty = 3)
Linear vs Polynomial Models
I did a comparsion of the linear regression compared to the polynomial regression for the BodyWeight variable and the standard error went down from 131.2 to 125.2 and the R squared went up from .42 to .48, Meaning that the polynomial regression better described the data
Combining the BodyWeight and Age Features to Make a better Predictor
Feature Scaling and Combine Variables
#find the average for each of the variables
mean_age <- mean(data$AGE)
mean_bw <- mean(data$bw)
min_age <- min(data$AGE)
min_bw <- min(data$bw)
max_age <- max(data$AGE)
max_bw <- max(data$bw)
#create adjusted features
data$bw_adj <- (data$bw - mean_bw)/(max_bw - min_bw)
data$age_adj <- (data$AGE - mean_age)/(max_age - min_age)
#combine the features into a sinlge variable
data$bw_age <- data$bw_adj + data$age_adj
bw_age_model <- lm(Total ~ bw_age, data = data)
summary(bw_age_model)
##
## Call:
## lm(formula = Total ~ bw_age, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -634.01 -88.31 -5.40 95.92 374.99
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 496.111 1.277 388.36 <2e-16 ***
## bw_age 656.028 6.824 96.14 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 127.7 on 9998 degrees of freedom
## Multiple R-squared: 0.4804, Adjusted R-squared: 0.4803
## F-statistic: 9243 on 1 and 9998 DF, p-value: < 2.2e-16
p <- ggplot(data = data, aes(x=bw_age, y=Total)) +
geom_point(shape=1) + # Use hollow circles
geom_smooth(method=lm) # Add linear regression line
# (by default includes 95% confidence region)
p
Polynomial Model for combined Variable
fit <- lm(Total ~ bw_age + I(bw_age^2) + I(bw_age^3), data = data)
prd <- data.frame(bw_age = seq(from = range(data$bw_age)[1], to = range(data$bw_age)[2], length.out = 100))
err <- predict(fit, newdata = prd, se.fit = TRUE)
prd$lci <- err$fit - 1.96 * err$se.fit
prd$fit <- err$fit
prd$uci <- err$fit + 1.96 * err$se.fit
p1 <-
ggplot(prd, aes(x = bw_age, y = fit)) +
theme_bw() +
geom_line() +
geom_smooth(aes(ymin = lci, ymax = uci), stat = "identity") +
geom_point(data = data, aes(x = bw_age, y = Total))
ggplotly(p1)
summary(fit)
##
## Call:
## lm(formula = Total ~ bw_age + I(bw_age^2) + I(bw_age^3), data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -551.39 -83.90 -1.26 93.63 371.32
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 515.675 1.548 333.219 < 2e-16 ***
## bw_age 697.294 9.489 73.486 < 2e-16 ***
## I(bw_age^2) -587.805 30.709 -19.141 < 2e-16 ***
## I(bw_age^3) 238.676 64.920 3.676 0.000238 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 124.7 on 9996 degrees of freedom
## Multiple R-squared: 0.5053, Adjusted R-squared: 0.5051
## F-statistic: 3403 on 3 and 9996 DF, p-value: < 2.2e-16
plot(fit$residuals ~ data$bw)
abline(h = 0, lty = 3)
What if we just look at Males
male_data <- filter(data, data$Gender == "Male")
fit <- lm(Total ~ bw_age + I(bw_age^2) + I(bw_age^3), data = male_data)
prd <- data.frame(bw_age = seq(from = range(male_data$bw_age)[1], to = range(male_data$bw_age)[2], length.out = 100))
err <- predict(fit, newdata = prd, se.fit = TRUE)
prd$lci <- err$fit - 1.96 * err$se.fit
prd$fit <- err$fit
prd$uci <- err$fit + 1.96 * err$se.fit
p1 <-
ggplot(prd, aes(x = bw_age, y = fit)) +
theme_bw() +
geom_line() +
geom_smooth(aes(ymin = lci, ymax = uci), stat = "identity") +
geom_point(data = male_data, aes(x = bw_age, y = Total))
ggplotly(p1)
summary(fit)
##
## Call:
## lm(formula = Total ~ bw_age + I(bw_age^2) + I(bw_age^3), data = male_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -442.68 -72.04 10.10 78.54 312.16
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 584.469 1.713 341.099 <2e-16 ***
## bw_age 710.215 10.030 70.810 <2e-16 ***
## I(bw_age^2) -937.925 36.895 -25.422 <2e-16 ***
## I(bw_age^3) 615.688 69.666 8.838 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 105.2 on 5982 degrees of freedom
## Multiple R-squared: 0.5883, Adjusted R-squared: 0.5881
## F-statistic: 2849 on 3 and 5982 DF, p-value: < 2.2e-16
plot(fit$residuals ~ male_data$bw_age)
abline(h = 0, lty = 3)
cor(male_data$bw_age,male_data$Total)
## [1] 0.7262144
What if we just look at females
female_data <- filter(data, data$Gender == "Female")
fit <- lm(Total ~ bw_age + I(bw_age^2) + I(bw_age^3), data = female_data)
prd <- data.frame(bw_age = seq(from = range(female_data$bw_age)[1], to = range(female_data$bw_age)[2], length.out = 100))
err <- predict(fit, newdata = prd, se.fit = TRUE)
prd$lci <- err$fit - 1.96 * err$se.fit
prd$fit <- err$fit
prd$uci <- err$fit + 1.96 * err$se.fit
p1 <-
ggplot(prd, aes(x = bw_age, y = fit)) +
theme_bw() +
geom_line() +
geom_smooth(aes(ymin = lci, ymax = uci), stat = "identity") +
geom_point(data = female_data, aes(x = bw_age, y = Total))
ggplotly(p1)
How well does the bw_age polynomial regression predictor work for females?
The results below are interesting because the R squared for this model is only .42 which means that only 42% of variance in lifting numbers can be explained by age and bodyweight for women, while this number was 60% for men, meaning that men. Comparatively also the correlation between bw_age and total lifts for women is .6 while for men it is .74,
summary(fit)
##
## Call:
## lm(formula = Total ~ bw_age + I(bw_age^2) + I(bw_age^3), data = female_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -302.271 -51.559 2.013 51.752 315.808
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 404.467 1.498 270.047 < 2e-16 ***
## bw_age 307.128 10.492 29.274 < 2e-16 ***
## I(bw_age^2) -622.422 28.971 -21.485 < 2e-16 ***
## I(bw_age^3) 507.735 77.084 6.587 5.08e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 74.19 on 4010 degrees of freedom
## Multiple R-squared: 0.4278, Adjusted R-squared: 0.4274
## F-statistic: 999.5 on 3 and 4010 DF, p-value: < 2.2e-16
plot(fit$residuals ~ female_data$bw_age)
abline(h = 0, lty = 3)
cor(female_data$bw_age, female_data$Total)
## [1] 0.6014351
Run a Model with all the Variables
#Run a Full Model
lift_full <- lm(Total ~ Gender + bw_age + Category + DOB, data = data)
summary(lift_full)
##
## Call:
## lm(formula = Total ~ Gender + bw_age + Category + DOB, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -425.74 -51.21 4.88 57.35 312.04
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.378e+02 2.081e+01 16.234 < 2e-16 ***
## GenderMale 1.671e+02 1.871e+00 89.327 < 2e-16 ***
## bw_age 4.808e+02 5.957e+00 80.711 < 2e-16 ***
## CategoryJUNIOR -8.843e-01 2.095e+01 -0.042 0.96633
## CategoryMASTER -1.614e+02 2.461e+01 -6.559 5.67e-11 ***
## CategorySENIOR -1.344e+01 2.095e+01 -0.642 0.52110
## CategoryYOUTH -6.514e+01 2.091e+01 -3.114 0.00185 **
## CategoryYOUTH WORLDS 2.626e+01 4.360e+01 0.602 0.54693
## DOB 3.323e-03 6.443e-05 51.567 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 85.69 on 9955 degrees of freedom
## (36 observations deleted due to missingness)
## Multiple R-squared: 0.7663, Adjusted R-squared: 0.7661
## F-statistic: 4080 on 8 and 9955 DF, p-value: < 2.2e-16
Conclusion
In Conclusion the most important variables were very similar to what I predicted, Bodyweight, Gender, and Age are the most important factors in determining what someone’s lifting capabilities are.
I discovered that Polynomial models are better at predicting outcomes than Linear Models
I also did some backward elimination to find that that Gender, age, date of birth, and sometimes Category/Nation are the only relevant or significant variables in my dataset that can accurately predict how much a certain lifter can lift