Data Description

The lineman data comes from the NFL combine, which is essentially the tryout for college players to try to get NFL teams to draft them.

They ask the players for their height and weight (weight), then the players can participate in different events. The most common event is the 40 yard dash (dash40). We want to see if there is an association between weight and dash40

There is an association between weight and dash40, but it doesn’t appear to be a constant, linear association.

What we can do is create a ‘broken-stick’ style model:

\[\hat{y} = b_0 + b_1x + b_2(x - k)^+\]

where \((x - k)^+\) is the difference between \(x\) and \(k\) if it is positive and 0 otherwise.

For example, if \(k = 5\) and \(x = 7\), then \((7 - 5)^+ = 2\), but if \(x = 3\) then \((3 - 5)^+ = 0\)

The code chunk below shows how to a broken-stick model with \(k = 300\)

## # A tibble: 3 × 5
##   term        estimate std.error statistic   p.value
##   <chr>          <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)  2.39     0.0994       24.0  1.55e-107
## 2 weight       0.00925  0.000346     26.7  5.20e-128
## 3 bend        -0.00375  0.000637     -5.89 4.75e-  9

And we can see the model that it fits:

The goal of this homework is to find the best ‘bending’ point.

Question 1: Creating the needed function

Write a function named bend_lm() with 1 argument, k that has a default value of 0. The function should take the lineman data set, fit the broken stick model using the specified value of k, and it should return two separate objects:

  1. model: the fitted linear model
  2. sse: the squared sum of the residuals
## # A tibble: 3 × 5
##   term        estimate std.error statistic   p.value
##   <chr>          <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)  2.39     0.0994       24.0  1.55e-107
## 2 weight       0.00925  0.000346     26.7  5.20e-128
## 3 bend        -0.00375  0.000637     -5.89 4.75e-  9

The code chunk below will check if you’ve done it correctly:

## # A tibble: 3 × 2
##   term        estimate
##   <chr>          <dbl>
## 1 (Intercept)   2.86  
## 2 weight        0.0076
## 3 bend         NA
## # A tibble: 3 × 2
##   term        estimate
##   <chr>          <dbl>
## 1 (Intercept)   2.70  
## 2 weight        0.0081
## 3 bend         -0.0007
## # A tibble: 3 × 2
##   term        estimate
##   <chr>          <dbl>
## 1 (Intercept)   2.77  
## 2 weight        0.0079
## 3 bend         -0.0141

Question 3: Gradient descent

Gradient descent description

A quicker alternative to a grid search is to perform gradient descent. It uses the value of the derivative to help find a better guess of the bend point than the current one. When run well, it is much quicker than using a grid search.

How gradient descent works is by updating the current value (\(k_0\)) into the new value (\(k_1\)). The formula to update the value of the bend point is:

\[k_1 = k_0 - \alpha \times f'(k_0)\]

where \(\alpha\) is some predetermined value (we’ll use 0.000001) and \(f'(k_0)\) is the value of the derivative evaluated at \(k_0\) (the derivative with the current value plugged in).

For example, let’s try to find the minimum of \(f(x) = x^2\). Let’s say our current value of \(x\) is \(x_0 = 0.5\) and the derivative is \(f'(x) = 2x\). Using \(\alpha = 0.1\), we can find a better guess by:

\[x_1 = x_0 - \alpha \times 2x_0 = 0.5 - (0.1)(2)(0.5) = 0.4\]

which is closer to the actual minimum of 0. We’d repeat this process until the value of the objective function changes by a very small amount. Our stopping criteria will be:

\[\left|\frac{f(x_1) - f(x_0)}{f(x_0)} \right| < c\]

where \(c\) is a number chosen before hand. If we choose c = 0.001, we’d check if we stop by

\[\left|\frac{f(0.4) - f(0.5)}{f(0.5)}\right| = \left|\frac{0.4^2 - 0.5^2}{0.5^2}\right|\]

\[\left|\frac{0.4^2 - 0.5^2}{0.5^2} \right| = 0.36\]

\[0.36 \gt 0.001\]

Since \(0.36 > 0.001\), we keep going and repeat and update our new guess by replacing \(x_0\) with \(x_1\) to update our next guess, \(x_2\):

\[x_2 = 0.4 - 0.1(2)(0.4) = 0.32\]

Part 3a) Function of the derivative

Create a function of the derivative that takes the value of \(k\) and returns the value of the derivative evaluated at \(k\)

The derivative of the objective function with respect to \(k\) is:

\[f'(k) = b_2 \sum(x - k)^+ - n_k b_2 (\bar{y}_k - b_0 - b_1 \bar{x}_k) \]

where:

\[n_k = \text{the number of players with weight above the specified k}\]

\[\bar{y}_k = \text{the average 40 yard dash time of players with weight above k}\]

\[\bar{x}_k = \text{the average weight of players with weight above k}\]

and \(b_0\), \(b_1\), \(b_2\) are the coefficients from the fitted model. You can get these values from bend_lm(k = ...)$model$coef

## (Intercept) 
##    2.386428
##      weight 
## 0.009247701
##         bend 
## -0.003750159

The function should:

  1. Fit the model for the specified value of k
  2. Calculate the derivative (specified above)
  • Hint: Helpful to break up the derivative into different pieces
  1. Return the value of the derivative (doesn’t need to be in a list)

Call the function bend_lm_der(k)

To make this a little easier:

  1. Calculate a vector of \((x - k)^+\) by using (x - k)*(x > k)

  2. Fit the model and get the coefficients using bend_lm(k)$model$coef

  3. Calculate \(n_k\), \(\bar{y}_k = \sum(y*(x > k))/n_k\), and \(\bar{x}_k = \sum(x - k)^+ / n_k\)

  4. Calculate the value of the derivative. This can be easier if you break it up into two or more pieces.

  5. Return the derivative

The code chunk below will check if you created the function correctly

##      bend 
## -42.68175

Part 3b) Performing Gradient Descent

Write the code to perform gradient descent below. For each loop, it should

  1. update the number of iterations (number of loops). Call it iters. On the 5th loop, iters = 5, on the 10th loop, iters = 10, etc…

  2. calculate the value of the objective function with the k, bend_lm(k = k)$sse (call it of_curr)

  3. calculate the value of the derivative using the current value of k (call it gradient)

  4. update the value of k using gradient descent

  5. calculate the value of the objective function using the new value of k (call it of_new)

You’ll be using alpha = 1e-4

# objects need to perform gradient descent
iters <- 0 # Number of iterations for gradient descent
k <- 300   # Initial value of the bend point

# Finding the value of the objective function with the current b
of_new <- bend_lm(k = k)$sse

of_curr <- 1   # Needed to get the loop started
alpha <- 1e-4  # how much the value of k changes based on the derivative
c <- 1e-8      # how much the derivative needs to change to stop the algorithm
##    bend 
## 320.904

The code chunk below will check the results and that they match what is in Brightspace

##     k.bend        sse iterations 
##    320.904     43.249   8876.000

While the results aren’t 100% the same between question 2 and question 3, they are very similar!