class: center, middle, inverse, title-slide .title[ # Factors Influencing Golf Earnings ] .subtitle[ ## Linear Regression ] .author[ ### Tyler Battaglini & Ryan Lebo ] .date[ ### 2025-02-16 ] --- <!-- every new slide is created under three dashes (---) --> <!-- (<h1) makes the title for the slide --> <h1 align="center"> Table of Contents</h1> <BR> .pull-left[ - Introduction - Variables - Research Question - Exploratory Data Analysis - Linear Model - Log Model - Bootstrapping - Model Selection - Conclusion ] --- <h1 align = "center"> Introduction <font color="orange"></font></h1> <BR> .pull-left[ - PGA 2004 data set (196 participants) - What is the PGA? - Data set provides (earnings and player stats) ] --- <h1 align = "center"> Variables <font color="orange"></font></h1> <BR> .pull-left[ - Name - Age - Avg Drive - Driving Accuracy - Greens in Regulation - Avg Number of Putts ] .pull-right[ - Save Percentage - Money Rank - Number of Events - Total Winnings - Average Winnings ] --- <h1 align = "center"> Research Question <font color="orange"></font></h1> <BR> - What variables affect the players winnings of this given season? - Looking at average drive vs earnings --- ## Exploratory Data Analysis - Check for high correlation - Take out missing observations - Remove some variables <img src="Presentation-1-Capstone-_files/figure-html/unnamed-chunk-2-1.png" width="100%" /> --- <h1 align = "center"> Linear Model <font color="orange"></font></h1> <BR> - Non-normal distribution - Several Outliers - Non-constant variance <img src="Presentation-1-Capstone-_files/figure-html/unnamed-chunk-4-1.png" width="100%" /><img src="Presentation-1-Capstone-_files/figure-html/unnamed-chunk-4-2.png" width="100%" /> --- <h1 align = "center"> Linear Model Cont. <font color="orange"></font></h1> <BR> .pull-left[ - All below 5 - Little to no multicollinearity ] --- ## <div style="display: flex; justify-content: center; align-items: center; height: 80vh; font-size: 50px;"> Box-Cox Transformation </div> --- ## Box-Cox Transformation - All the lambda values are close to 0 - Proceed with log transformation <img src="Presentation-1-Capstone-_files/figure-html/unnamed-chunk-6-1.png" width="100%" /> --- ## Log Transformation Model - Response variable – log (Average Winnings) - Explanatory Variables – Average Drive, Greens on Regulation, Save Percentage, Number of Events, and Age Above 30 | | Estimate| Std. Error| t value| Pr(>|t|)| |:----------------|--------:|----------:|-------:|------------------:| |(Intercept) | -5.451| 2.383| -2.287| 0.023| |Average_Drive | 0.006| 0.007| 0.819| 0.414| |Greens_on_reg | 0.192| 0.019| 9.875| 0.000| |Save_Percent | 0.056| 0.010| 5.562| 0.000| |Number_events | -0.045| 0.012| -3.857| 0.000| |Age_Above_30TRUE | 0.033| 0.139| 0.240| 0.811| --- ## Goodness of Fit Measures - Improvement in constant variance - Improvement in Normality <img src="Presentation-1-Capstone-_files/figure-html/unnamed-chunk-8-1.png" width="100%" /><img src="Presentation-1-Capstone-_files/figure-html/unnamed-chunk-8-2.png" width="100%" /> --- ## Comparison of Models - Log model outperforms linear model - SSE signifies better fit - R squared and R adjusted better in log model | | SSE| R.sq| R.adj| Cp| AIC| SBC| PRESS| |:------------|-------:|-----:|-----:|--:|--------:|-------:|-------:| |full.model | 123.038| 0.335| 0.316| 6| -64.865| -45.510| 135.110| |log.winnings | 100.246| 0.455| 0.440| 6| -102.970| -83.615| 107.534| --- ## Comparison of Models Cont. - Vast improvement in Log transformation model - Normality? <img src="Presentation-1-Capstone-_files/figure-html/unnamed-chunk-11-1.png" width="100%" /> --- ## Comparison of Model Cont. - Residuals vs. Fitted improvement in log transformation - Can assume constant variance <img src="Presentation-1-Capstone-_files/figure-html/unnamed-chunk-12-1.png" width="100%" /> --- ## <div style="display: flex; justify-content: center; align-items: center; height: 80vh; font-size: 50px;"> Bootstrapping </div> --- ## Bootstrapping <div style="font-size: 26px; line-height: 2;"> <ul style="margin-top: 15px; margin-bottom: 15px;"> <li>Have not assumed normality in our QQ plot.</li> <li>Uses a nonparametric model for comparison.</li> <li>Estimating confidence intervals.</li> </ul> </div> --- ## Boostrapping Cont. with Cases - The red line in the curves is used to show the p-values and uses the estimated regression coefficients and their corresponding standard error in the output of the regression procedure - The blue curve is used to used to show the bootstrap CI which is based on a non-parametric data-driven estimate of the density of bootstrap sampling distribution - All appear to be normal <img src="Presentation-1-Capstone-_files/figure-html/unnamed-chunk-13-1.png" width="100%" /> --- ## Bootstrapping Cont. with Residuals - The red line in the curves is used to show the p-values and uses the estimated regression coefficients and their corresponding standard error in the output of the regression procedure - The blue curve is used to used to show the bootstrap CI which is based on a non-parametric data-driven estimate of the density of bootstrap sampling distribution - Same output as cases <img src="Presentation-1-Capstone-_files/figure-html/unnamed-chunk-15-1.png" width="100%" /> --- ## Bootstrapping Cont. CIs - CIs match our p-values - Violation in two variables Table: Final Combined Inferential Statistics: Coefficients, p-values, and Bootstrap CIs |Coefficients |95% CI (Bootstrap t) |95% CI (Bootstrap r) |p-values | |:------------|:---------------------|:-----------------------|:--------| |-5.4513 |[ -10.21 , -0.763 ] |[ -10.3252 , -0.6365 ] |0.0233 | |0.0058 |[ -0.0067 , 0.0186 ] |[ -0.0075 , 0.0198 ] |0.4139 | |0.0563 |[ 0.148 , 0.2356 ] |[ 0.1546 , 0.2302 ] |0.0000 | |-0.0450 |[ 0.0363 , 0.0748 ] |[ 0.0368 , 0.0747 ] |0.0002 | |0.0334 |[ -0.068 , -0.0206 ] |[ -0.0676 , -0.0226 ] |0.8108 | |-5.4513 |[ -0.2485 , 0.3697 ] |[ -0.2338 , 0.3216 ] |0.0233 | --- ## Model Selection <div style="font-size: 26px; line-height: 1.5;"> <ul style="margin-top: 15px; margin-bottom: 15px;"> <li><u>Log Transformation</u> <br> - Positives: Can assume normality and constant variance, best R-squared and adjusted R-squared values. <br> - Negatives: Age and Average Drive are insignificant.</li> <li><u>Linear Model</u> <br> - Positives: Only Age is insignificant, and the model is simple. <br> - Negatives: Cannot assume constant variance or normality.</li> </ul> </div> --- ## <div style="display: flex; justify-content: center; align-items: center; height: 80vh; font-size: 50px;"> Conclusion </div> --- ## Conclusion <div style="font-size: 26px; line-height: 2;"> <ul style="margin-top: 15px; margin-bottom: 15px;"> <li>Greens on Regulation is the biggest indicator of increase in Winnings 21.18%.</li> <li>When holding all other variables constant, an increase of one unit in Average Drive leads to a 0.58% increase in Average Winnings.</li> <li>Short game (e.g., greens on regulation or save percent) has a larger impact on winnings compared to the long game. <br> - Short game impact:5.8% <br> - Long game (Average Drive) impact: 0.58%</li> <li>Number of Events decreases Average Winnings by 4.4%.</li> </ul> </div> --- ## Limitations <div style="font-size: 26px; line-height: 2;"> <ul style="margin-top: 15px; margin-bottom: 15px;"> <li>Average Drive and Age are insignificant values.</li> <li>Log transformation is sensitive to outliers and could amplify small values.</li> <li>Assumes a linear relationship.</li> <li>Factors such as injury, weather, start time, and mental focus are not included in this dataset.</li> </ul> </div> --- ## <div style="display: flex; justify-content: center; align-items: center; height: 80vh; font-size: 50px;"> Questions? </div> --- ## Works Cited <div style="font-size: 35px; line-height: 1.5;"> <ul style="margin-top: 15px; margin-bottom: 15px;"> <li>Datasets. (n.d.). <a href="https://users.stat.ufl.edu/~winner/datasets.html">https://users.stat.ufl.edu/~winner/datasets.html</a></li> </ul> </div> --- ## <style> section { background-color: #A9D1D6; height: 100%; } </style> <div style="display: flex; justify-content: center; align-items: center; height: 100%; font-size: 50px;"> Thank You! </div> --- ## Slide Contributors <style> section { background-color: #D1E2FF; height: 100%; } </style> <div style="font-size: 40px; line-height: 2;"> <ul> <li>Ryan Lebo did slides from the Introduction to the Linear Regression Model</li> <li>Tyler Battaglini did the Box-Cox Transformation to the Conclusion</li> </ul> </div>