NFL Example

Graphs

Graph 1

Graph 2

## `geom_smooth()` using formula = 'y ~ x'

## [1] 0

Regressions

OLS Regression

In the following regression, the dependent variable is wins and the independent variable is the Turnover Difference variable.

Reg1 <- lm(data = merged_data, wins ~ Turnover_Difference)
summary(Reg1)

## 
## Call:
## lm(formula = wins ~ Turnover_Difference, data = merged_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.2210 -1.2132  0.1421  2.0788  4.0633 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          8.50000    0.44631  19.045   <2e-16 ***
## Turnover_Difference  0.14212    0.05486   2.591   0.0147 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.525 on 30 degrees of freedom
## Multiple R-squared:  0.1828, Adjusted R-squared:  0.1556 
## F-statistic: 6.711 on 1 and 30 DF,  p-value: 0.01465

The results indicate that on average for each additional unit increase in the Turnover differential variable, the team will win 0.14212 more games, cetris paribus. This means that if a team’s turnover difference increased by ten, the team would be expected to win 1.4 more games.

Multivariable Regression

The following regression uses the wins variable as the dependent variable, and turnover difference, average yards allowed per game, and average yards gained per game, as independent variables.

Reg3 <- lm(data = merged_result, wins~Turnover_Difference+avgydsdef+avgydsoff)
summary(Reg3)

## 
## Call:
## lm(formula = wins ~ Turnover_Difference + avgydsdef + avgydsoff, 
##     data = merged_result)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9703 -0.6275  0.0983  0.7858  2.9393 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -7.656649   5.242005  -1.461   0.1571    
## Turnover_Difference  0.096112   0.035745   2.689   0.0128 *  
## avgydsdef           -0.009387   0.011704  -0.802   0.4304    
## avgydsoff            0.055177   0.008634   6.391 1.31e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.595 on 24 degrees of freedom
##   (4 observations deleted due to missingness)
## Multiple R-squared:  0.7106, Adjusted R-squared:  0.6745 
## F-statistic: 19.65 on 3 and 24 DF,  p-value: 1.189e-06

The results indicate that it is in fact offense that has a demonstrable impact on victory. On average, for each additional average yard per game, the team is going to win 0.055 games more in a season, cetris paribus. This means that a team that averages 200 yards of offense per game will be expected to win 11 games, holding all else constant. The R squared of the second model is 0.71, an 0.53 increase from the first model. This means that 71% of variation in the dependent variable (wins) can be explained by the explanotary variables in the model.

UFC Example

The second example I decided to investigate is the impact that physical build has on a UFC fighters likelihood of victory. I have acquired a dataset which contains 4111 fighters of all weightclasses. In order to avoid confusion, I will restrict this dataset to only the Light-Heavyweight division (206 lbs). This restriction reduces the dataset to 423 observations. My main point of interest is fighter reach, or wingspan. Theoretically, the longer the fighter is, the better he should be able to control distance and his oponent. I will also observe other physical characteristics, such as weight and stance.

DataUFC <- DataUFC %>%
  filter(DataUFC$weight_in_kg > 84)

DataUFC <- DataUFC %>%
  filter(DataUFC$weight_in_kg < 93.5)

Graphs

Surprisingly, it seems that there is no obvious relationship between wins and reach, as demonstrated by the graphs below. THe trend line even appears to be slightly downard tilted, meaning that the longer the reach, the less wins the fighter has, something that goes against most present MMA knowledge.

Graph 1

## Warning: Removed 249 rows containing missing values or values outside the scale range
## (`geom_point()`).

Graph 2

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 249 rows containing non-finite outside the scale range
## (`stat_smooth()`).

## Warning: Removed 249 rows containing missing values or values outside the scale range
## (`geom_point()`).

Regressions

OLS Regression

DataUFC_clean <- DataUFC %>% filter(!is.na(stance))

DataUFC_with_dummies <- dummy_cols(DataUFC_clean, select_columns = "stance", remove_first_dummy = FALSE, remove_selected_columns = FALSE)

UFC1 <- lm(data = DataUFC, wins ~ reach_in_cm)
summary(UFC1)

## 
## Call:
## lm(formula = wins ~ reach_in_cm, data = DataUFC)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -14.251  -5.101  -1.139   3.824  21.861 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.98021   15.96874   1.126    0.262
## reach_in_cm -0.01472    0.08282  -0.178    0.859
## 
## Residual standard error: 7.109 on 172 degrees of freedom
##   (249 observations deleted due to missingness)
## Multiple R-squared:  0.0001836,  Adjusted R-squared:  -0.005629 
## F-statistic: 0.03159 on 1 and 172 DF,  p-value: 0.8591

Multivariable Regression

UFC2 <- lm(data = DataUFC_with_dummies, wins ~ reach_in_cm + weight_in_kg + DataUFC_with_dummies$"stance_Open Stance" + stance_Sideways + stance_Southpaw+ stance_Switch)
summary(UFC2)

## 
## Call:
## lm(formula = wins ~ reach_in_cm + weight_in_kg + DataUFC_with_dummies$"stance_Open Stance" + 
##     stance_Sideways + stance_Southpaw + stance_Switch, data = DataUFC_with_dummies)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -14.162  -4.970  -1.018   3.886  21.982 
## 
## Coefficients: (2 not defined because of singularities)
##                                           Estimate Std. Error t value Pr(>|t|)
## (Intercept)                               18.67975   16.35175   1.142    0.255
## reach_in_cm                               -0.01897    0.08488  -0.223    0.823
## weight_in_kg                                    NA         NA      NA       NA
## DataUFC_with_dummies$"stance_Open Stance" 11.03045    7.18470   1.535    0.127
## stance_Sideways                                 NA         NA      NA       NA
## stance_Southpaw                            0.49404    1.61847   0.305    0.761
## stance_Switch                              0.62385    3.65355   0.171    0.865
## 
## Residual standard error: 7.155 on 167 degrees of freedom
##   (177 observations deleted due to missingness)
## Multiple R-squared:  0.01456,    Adjusted R-squared:  -0.009047 
## F-statistic: 0.6167 on 4 and 167 DF,  p-value: 0.6512

Surprisingly, both the single independent variable and the multiple independent variable regressions show no statistically significant results, even having a negative R2. The weight and sideway stance variables generated no beta estimates, as there was insufficient variations within the variables to do so. These results indicate that style and skillset in MMA is much more important than size, which is quite surprising.

Discussion 2

Samuel C. Singer

2024-09-07

NFL Example

Graphs

Graph 1

Graph 2

Regressions

OLS Regression

Multivariable Regression

UFC Example

Graphs

Graph 1

Graph 2

Regressions

OLS Regression

Multivariable Regression