Activity 2: Football Punting

 

Investigators studied physical characteristics and ability in 13 football punters. Each volunteer punted a
football ten times. The investigators recorded the average distance for the ten punts, in feet. They also
recorded the average hang time (time the ball is in the air before the receiver catches it) for the ten punts, in seconds. This means that there are two possible response variables, distance and hang time. In addition, the investigators recorded five measures of strength and flexibility for each punter: right leg strength (pounds), left leg strength (pounds), right hamstring muscle flexibility (degrees), left hamstring muscle flexibility (degrees), and overall leg strength (foot-pounds). The data comes from the study “The relationship between selected physical performance variables and football punting ability” by the Department of Health, Physical Education and Recreation at the Virginia Polytechnic Institute and State University, 1983. The dataset file is punting.csv.

Variables: Variable Description
Distance
Hang
R_Strength
L_Strength
Distance travelled in feet
Time in air in seconds
Right leg strength in pounds
Left leg strength in pounds
R_Flexibility
L_Flexibility
Right leg flexibility in degrees
Left leg flexibility in degrees
O_Strength Overall leg strength in pounds

This is a very small dataset, only 13 observations, so in relative terms the number of variables, 7, is quite
high (p ≈ n). In the era of Big Data, people may be easily mislead to believe that only problems involving
very large volumes of data are difficult to tackle. While there are various ways to tackle large volumes of
data (e.g. parallel and distributed computation, among many others), it is important to understand that
lack of data can also pose very difficult challenges in statistics.
Your Task: Perform linear regression analysis in this dataset, as follows: (a) Standard least squares; (b)
Ridge Regression; (c) Lasso. First perform the analysis with (i) Distance as a function of all variables
except Hang, then repeat the analysis now with (ii) Hang as a function of all variables except Distance.

rm( list=ls ())
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))
library(ggplot2)
#install.packages('glmnet')
library(glmnet)
## Loading required package: Matrix
## Loaded glmnet 4.1-4
punt <- read.csv('punting.csv')

summary(punt)
##     Distance          Hang         R_Strength      L_Strength   
##  Min.   :104.9   Min.   :3.020   Min.   :110.0   Min.   :110.0  
##  1st Qu.:140.2   1st Qu.:3.640   1st Qu.:130.0   1st Qu.:130.0  
##  Median :150.2   Median :4.040   Median :150.0   Median :150.0  
##  Mean   :148.2   Mean   :3.921   Mean   :147.7   Mean   :143.8  
##  3rd Qu.:163.5   3rd Qu.:4.180   3rd Qu.:170.0   3rd Qu.:160.0  
##  Max.   :192.0   Max.   :4.750   Max.   :180.0   Max.   :180.0  
##  R_Flexibility    L_Flexibility      O_Strength   
##  Min.   : 85.00   Min.   : 78.00   Min.   :130.2  
##  1st Qu.: 90.00   1st Qu.: 86.00   1st Qu.:153.9  
##  Median : 93.00   Median : 93.00   Median :197.1  
##  Mean   : 95.69   Mean   : 91.23   Mean   :196.2  
##  3rd Qu.:103.00   3rd Qu.: 94.00   3rd Qu.:240.6  
##  Max.   :108.00   Max.   :106.00   Max.   :266.6
names(punt)
## [1] "Distance"      "Hang"          "R_Strength"    "L_Strength"   
## [5] "R_Flexibility" "L_Flexibility" "O_Strength"

Set up functions

# We set seed for cross validation, so the results will be the same.
set.seed(8)

week3_lm <- function(output='Distance'){
      punt_lm <- lm(data=punt, get(output,punt) ~ R_Strength + L_Strength + R_Flexibility + L_Flexibility + O_Strength)
      cat('\n\n*** Standard lm for ',output)
      print(summary(punt_lm))
      plot(punt_lm, which = 1, pch='🏈',sub=paste('standard least square lm of ',output))
}


week3 <- function (a=1, output='Distance'){
      X <- model.matrix(data=punt, get(output,punt) ~ R_Strength + L_Strength + R_Flexibility + L_Flexibility + O_Strength)[,-1]
      Y <- get(output,punt)
      
      plot(glmnet(X,Y,alpha=a),sub=paste('Function of ',output, ', alpha= ',a))
      punt_cv <- cv.glmnet(X,Y,alpha=a, grouped = FALSE)
      plot(punt_cv,sub=paste('k-fold function of ',output, ', alpha= ',a))
      best_ridge_lambda <- punt_cv$lambda.min
      cat('\n\n*** Coefficients for model with ', output, 'as repsonse variable and alpha = ', a ,'\n')
      print(coef(punt_cv,lambda=punt_cv$lambda.min))
      cat('\nMean of MSE with alpha = ', a, ' is: ' ,mean(punt_cv$cvm))
}
# Distance as response variable
week3_lm() 
## 
## 
## *** Standard lm for  Distance
## Call:
## lm(formula = get(output, punt) ~ R_Strength + L_Strength + R_Flexibility + 
##     L_Flexibility + O_Strength, data = punt)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -17.3829  -9.5711  -0.2166   5.4988  20.0188 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)
## (Intercept)   -29.58047   65.70042  -0.450    0.666
## R_Strength      0.27877    0.45638   0.611    0.561
## L_Strength      0.06971    0.48388   0.144    0.890
## R_Flexibility   1.24146    1.44927   0.857    0.420
## L_Flexibility  -0.39535    0.74472  -0.531    0.612
## O_Strength      0.22369    0.13053   1.714    0.130
## 
## Residual standard error: 14.65 on 7 degrees of freedom
## Multiple R-squared:  0.8144, Adjusted R-squared:  0.6818 
## F-statistic: 6.142 on 5 and 7 DF,  p-value: 0.01694

week3(0)

## 
## 
## *** Coefficients for model with  Distance as repsonse variable and alpha =  0 
## 6 x 1 sparse Matrix of class "dgCMatrix"
##                        s1
## (Intercept)   23.63545797
## R_Strength     0.17131996
## L_Strength     0.14701238
## R_Flexibility  0.49525762
## L_Flexibility  0.12927353
## O_Strength     0.09665104
## 
## Mean of MSE with alpha =  0  is:  469.6738
week3(1)

## 
## 
## *** Coefficients for model with  Distance as repsonse variable and alpha =  1 
## 6 x 1 sparse Matrix of class "dgCMatrix"
##                       s1
## (Intercept)   23.3327076
## R_Strength     0.2498696
## L_Strength     .        
## R_Flexibility  0.6174439
## L_Flexibility  .        
## O_Strength     0.1473687
## 
## Mean of MSE with alpha =  1  is:  429.1399
# now Hang
week3_lm('Hang')
## 
## 
## *** Standard lm for  Hang
## Call:
## lm(formula = get(output, punt) ~ R_Strength + L_Strength + R_Flexibility + 
##     L_Flexibility + O_Strength, data = punt)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.35802 -0.05761  0.02460  0.05586  0.31529 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)  
## (Intercept)    0.6449613  0.9856374   0.654   0.5338  
## R_Strength     0.0011040  0.0068466   0.161   0.8764  
## L_Strength     0.0121468  0.0072592   1.673   0.1382  
## R_Flexibility -0.0002985  0.0217420  -0.014   0.9894  
## L_Flexibility  0.0069159  0.0111723   0.619   0.5555  
## O_Strength     0.0038897  0.0019581   1.986   0.0873 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2198 on 7 degrees of freedom
## Multiple R-squared:  0.8821, Adjusted R-squared:  0.7979 
## F-statistic: 10.47 on 5 and 7 DF,  p-value: 0.003782

week3(0,'Hang')

## 
## 
## *** Coefficients for model with  Hang as repsonse variable and alpha =  0 
## 6 x 1 sparse Matrix of class "dgCMatrix"
##                        s1
## (Intercept)   1.196343186
## R_Strength    0.003418564
## L_Strength    0.003773457
## R_Flexibility 0.009525175
## L_Flexibility 0.004903473
## O_Strength    0.001620386
## 
## Mean of MSE with alpha =  0  is:  0.1581544
week3(1,'Hang')

## 
## 
## *** Coefficients for model with  Hang as repsonse variable and alpha =  1 
## 6 x 1 sparse Matrix of class "dgCMatrix"
##                         s1
## (Intercept)   1.278409e+00
## R_Strength    2.977796e-05
## L_Strength    9.537705e-03
## R_Flexibility 7.831749e-03
## L_Flexibility .           
## O_Strength    2.632997e-03
## 
## Mean of MSE with alpha =  1  is:  0.07776399
# somthing in the middle, elastic
week3(0.5,'Hang')

## 
## 
## *** Coefficients for model with  Hang as repsonse variable and alpha =  0.5 
## 6 x 1 sparse Matrix of class "dgCMatrix"
##                        s1
## (Intercept)   1.563684131
## R_Strength    0.002638934
## L_Strength    0.004788556
## R_Flexibility 0.010104490
## L_Flexibility .          
## O_Strength    0.001588263
## 
## Mean of MSE with alpha =  0.5  is:  0.09111192
# Distance as response variable
week3_lm() 
## 
## 
## *** Standard lm for  Distance
## Call:
## lm(formula = get(output, punt) ~ R_Strength + L_Strength + R_Flexibility + 
##     L_Flexibility + O_Strength, data = punt)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -17.3829  -9.5711  -0.2166   5.4988  20.0188 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)
## (Intercept)   -29.58047   65.70042  -0.450    0.666
## R_Strength      0.27877    0.45638   0.611    0.561
## L_Strength      0.06971    0.48388   0.144    0.890
## R_Flexibility   1.24146    1.44927   0.857    0.420
## L_Flexibility  -0.39535    0.74472  -0.531    0.612
## O_Strength      0.22369    0.13053   1.714    0.130
## 
## Residual standard error: 14.65 on 7 degrees of freedom
## Multiple R-squared:  0.8144, Adjusted R-squared:  0.6818 
## F-statistic: 6.142 on 5 and 7 DF,  p-value: 0.01694

week3(0)

## 
## 
## *** Coefficients for model with  Distance as repsonse variable and alpha =  0 
## 6 x 1 sparse Matrix of class "dgCMatrix"
##                        s1
## (Intercept)   31.25399197
## R_Strength     0.15932951
## L_Strength     0.13839581
## R_Flexibility  0.46148911
## L_Flexibility  0.13278278
## O_Strength     0.08800146
## 
## Mean of MSE with alpha =  0  is:  464.363
week3(1)

## 
## 
## *** Coefficients for model with  Distance as repsonse variable and alpha =  1 
## 6 x 1 sparse Matrix of class "dgCMatrix"
##                       s1
## (Intercept)   18.1282781
## R_Strength     0.2624381
## L_Strength     .        
## R_Flexibility  0.6386499
## L_Flexibility  .        
## O_Strength     0.1540913
## 
## Mean of MSE with alpha =  1  is:  440.2299
# now Hang
week3_lm('Hang')
## 
## 
## *** Standard lm for  Hang
## Call:
## lm(formula = get(output, punt) ~ R_Strength + L_Strength + R_Flexibility + 
##     L_Flexibility + O_Strength, data = punt)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.35802 -0.05761  0.02460  0.05586  0.31529 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)  
## (Intercept)    0.6449613  0.9856374   0.654   0.5338  
## R_Strength     0.0011040  0.0068466   0.161   0.8764  
## L_Strength     0.0121468  0.0072592   1.673   0.1382  
## R_Flexibility -0.0002985  0.0217420  -0.014   0.9894  
## L_Flexibility  0.0069159  0.0111723   0.619   0.5555  
## O_Strength     0.0038897  0.0019581   1.986   0.0873 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2198 on 7 degrees of freedom
## Multiple R-squared:  0.8821, Adjusted R-squared:  0.7979 
## F-statistic: 10.47 on 5 and 7 DF,  p-value: 0.003782

week3(0,'Hang')

## 
## 
## *** Coefficients for model with  Hang as repsonse variable and alpha =  0 
## 6 x 1 sparse Matrix of class "dgCMatrix"
##                        s1
## (Intercept)   1.274348721
## R_Strength    0.003317280
## L_Strength    0.003638719
## R_Flexibility 0.009281820
## L_Flexibility 0.004807993
## O_Strength    0.001560915
## 
## Mean of MSE with alpha =  0  is:  0.1526909
week3(1,'Hang')

## 
## 
## *** Coefficients for model with  Hang as repsonse variable and alpha =  1 
## 6 x 1 sparse Matrix of class "dgCMatrix"
##                         s1
## (Intercept)   1.278409e+00
## R_Strength    2.977796e-05
## L_Strength    9.537705e-03
## R_Flexibility 7.831749e-03
## L_Flexibility .           
## O_Strength    2.632997e-03
## 
## Mean of MSE with alpha =  1  is:  0.09410517
week3(0.5,'Hang')

## 
## 
## *** Coefficients for model with  Hang as repsonse variable and alpha =  0.5 
## 6 x 1 sparse Matrix of class "dgCMatrix"
##                        s1
## (Intercept)   1.372475483
## R_Strength    0.002750878
## L_Strength    0.005302828
## R_Flexibility 0.010664087
## L_Flexibility .          
## O_Strength    0.001828595
## 
## Mean of MSE with alpha =  0.5  is:  0.08429597

Model for Hang is stronger than Distance. The Ridge and Lasso improves the model, compare to the standard least square. With the same alpha value, there is not much different between Ridge and Lasso, however the Lasso reduces some independent variables.