## `geom_smooth()` using formula = 'y ~ x'
## [1] 0
In the following regression, the dependent variable is wins and the independent variable is the Turnover Difference variable.
Reg1 <- lm(data = merged_data, wins ~ Turnover_Difference)
summary(Reg1)
##
## Call:
## lm(formula = wins ~ Turnover_Difference, data = merged_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.2210 -1.2132 0.1421 2.0788 4.0633
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.50000 0.44631 19.045 <2e-16 ***
## Turnover_Difference 0.14212 0.05486 2.591 0.0147 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.525 on 30 degrees of freedom
## Multiple R-squared: 0.1828, Adjusted R-squared: 0.1556
## F-statistic: 6.711 on 1 and 30 DF, p-value: 0.01465
The results indicate that on average for each additional unit increase in the Turnover differential variable, the team will win 0.14212 more games, cetris paribus. This means that if a team’s turnover difference increased by ten, the team would be expected to win 1.4 more games.
The following regression uses the wins variable as the dependent variable, and turnover difference, average yards allowed per game, and average yards gained per game, as independent variables.
Reg3 <- lm(data = merged_result, wins~Turnover_Difference+avgydsdef+avgydsoff)
summary(Reg3)
##
## Call:
## lm(formula = wins ~ Turnover_Difference + avgydsdef + avgydsoff,
## data = merged_result)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9703 -0.6275 0.0983 0.7858 2.9393
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.656649 5.242005 -1.461 0.1571
## Turnover_Difference 0.096112 0.035745 2.689 0.0128 *
## avgydsdef -0.009387 0.011704 -0.802 0.4304
## avgydsoff 0.055177 0.008634 6.391 1.31e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.595 on 24 degrees of freedom
## (4 observations deleted due to missingness)
## Multiple R-squared: 0.7106, Adjusted R-squared: 0.6745
## F-statistic: 19.65 on 3 and 24 DF, p-value: 1.189e-06
The results indicate that it is in fact offense that has a demonstrable impact on victory. On average, for each additional average yard per game, the team is going to win 0.055 games more in a season, cetris paribus. This means that a team that averages 200 yards of offense per game will be expected to win 11 games, holding all else constant. The R squared of the second model is 0.71, an 0.53 increase from the first model. This means that 71% of variation in the dependent variable (wins) can be explained by the explanotary variables in the model.
The second example I decided to investigate is the impact that physical build has on a UFC fighters likelihood of victory. I have acquired a dataset which contains 4111 fighters of all weightclasses. In order to avoid confusion, I will restrict this dataset to only the Light-Heavyweight division (206 lbs). This restriction reduces the dataset to 423 observations. My main point of interest is fighter reach, or wingspan. Theoretically, the longer the fighter is, the better he should be able to control distance and his oponent. I will also observe other physical characteristics, such as weight and stance.
DataUFC <- DataUFC %>%
filter(DataUFC$weight_in_kg > 84)
DataUFC <- DataUFC %>%
filter(DataUFC$weight_in_kg < 93.5)
Surprisingly, it seems that there is no obvious relationship between wins and reach, as demonstrated by the graphs below. THe trend line even appears to be slightly downard tilted, meaning that the longer the reach, the less wins the fighter has, something that goes against most present MMA knowledge.
## Warning: Removed 249 rows containing missing values or values outside the scale range
## (`geom_point()`).
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 249 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 249 rows containing missing values or values outside the scale range
## (`geom_point()`).
DataUFC_clean <- DataUFC %>% filter(!is.na(stance))
DataUFC_with_dummies <- dummy_cols(DataUFC_clean, select_columns = "stance", remove_first_dummy = FALSE, remove_selected_columns = FALSE)
UFC1 <- lm(data = DataUFC, wins ~ reach_in_cm)
summary(UFC1)
##
## Call:
## lm(formula = wins ~ reach_in_cm, data = DataUFC)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.251 -5.101 -1.139 3.824 21.861
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.98021 15.96874 1.126 0.262
## reach_in_cm -0.01472 0.08282 -0.178 0.859
##
## Residual standard error: 7.109 on 172 degrees of freedom
## (249 observations deleted due to missingness)
## Multiple R-squared: 0.0001836, Adjusted R-squared: -0.005629
## F-statistic: 0.03159 on 1 and 172 DF, p-value: 0.8591
UFC2 <- lm(data = DataUFC_with_dummies, wins ~ reach_in_cm + weight_in_kg + DataUFC_with_dummies$"stance_Open Stance" + stance_Sideways + stance_Southpaw+ stance_Switch)
summary(UFC2)
##
## Call:
## lm(formula = wins ~ reach_in_cm + weight_in_kg + DataUFC_with_dummies$"stance_Open Stance" +
## stance_Sideways + stance_Southpaw + stance_Switch, data = DataUFC_with_dummies)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.162 -4.970 -1.018 3.886 21.982
##
## Coefficients: (2 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 18.67975 16.35175 1.142 0.255
## reach_in_cm -0.01897 0.08488 -0.223 0.823
## weight_in_kg NA NA NA NA
## DataUFC_with_dummies$"stance_Open Stance" 11.03045 7.18470 1.535 0.127
## stance_Sideways NA NA NA NA
## stance_Southpaw 0.49404 1.61847 0.305 0.761
## stance_Switch 0.62385 3.65355 0.171 0.865
##
## Residual standard error: 7.155 on 167 degrees of freedom
## (177 observations deleted due to missingness)
## Multiple R-squared: 0.01456, Adjusted R-squared: -0.009047
## F-statistic: 0.6167 on 4 and 167 DF, p-value: 0.6512
Surprisingly, both the single independent variable and the multiple independent variable regressions show no statistically significant results, even having a negative R2. The weight and sideway stance variables generated no beta estimates, as there was insufficient variations within the variables to do so. These results indicate that style and skillset in MMA is much more important than size, which is quite surprising.