The lineman
data comes from the NFL combine, which is
essentially the tryout for college players to try to get NFL teams to
draft them.
They ask the players for their height and weight
(weight
), then the players can participate in different
events. The most common event is the 40 yard dash (dash40
).
We want to see if there is an association between weight
and dash40
There is an association between weight
and
dash40
, but it doesn’t appear to be a constant, linear
association.
What we can do is create a ‘broken-stick’ style model:
\[\hat{y} = b_0 + b_1x + b_2(x - k)^+\]
where \((x - k)^+\) is the difference between \(x\) and \(k\) if it is positive and 0 otherwise.
For example, if \(k = 5\) and \(x = 7\), then \((7 - 5)^+ = 2\), but if \(x = 3\) then \((3 - 5)^+ = 0\)
The code chunk below shows how to a broken-stick model with \(k = 300\)
## # A tibble: 3 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 2.39 0.0994 24.0 1.55e-107
## 2 weight 0.00925 0.000346 26.7 5.20e-128
## 3 bend -0.00375 0.000637 -5.89 4.75e- 9
And we can see the model that it fits:
The goal of this homework is to find the best ‘bending’ point.
Write a function named bend_lm()
with 1 argument,
k
that has a default value of 0. The function should take
the lineman
data set, fit the broken stick model using the
specified value of k
, and it should return two separate
objects:
model
: the fitted linear modelsse
: the squared sum of the residuals## # A tibble: 3 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 2.39 0.0994 24.0 1.55e-107
## 2 weight 0.00925 0.000346 26.7 5.20e-128
## 3 bend -0.00375 0.000637 -5.89 4.75e- 9
The code chunk below will check if you’ve done it correctly:
## # A tibble: 3 × 2
## term estimate
## <chr> <dbl>
## 1 (Intercept) 2.86
## 2 weight 0.0076
## 3 bend NA
## # A tibble: 3 × 2
## term estimate
## <chr> <dbl>
## 1 (Intercept) 2.70
## 2 weight 0.0081
## 3 bend -0.0007
## # A tibble: 3 × 2
## term estimate
## <chr> <dbl>
## 1 (Intercept) 2.77
## 2 weight 0.0079
## 3 bend -0.0141
Create a data frame named bend_df
that has two
columns:
k
= 275 to 350 by increments of 0.01. This will be the
different values of k
to check where the model should be
‘bent’sse
= a column of 0s. These values will be written over
during the grid search## # A tibble: 7,501 × 2
## k sse
## <dbl> <dbl>
## 1 275 0
## 2 275. 0
## 3 275. 0
## 4 275. 0
## 5 275. 0
## 6 275. 0
## 7 275. 0
## 8 275. 0
## 9 275. 0
## 10 275. 0
## # ℹ 7,491 more rows
Note: Start by using an increment of 1 instead of 0.01 until you get part 2b working properly, then change the increment to the specified 0.01
Use the function you created in question 1 to perform the grid
search, searching for k
across the sequence specified in
part 2a).
Write the grid search in the code chunk below:
## # A tibble: 7,501 × 2
## k sse
## <dbl> <dbl>
## 1 275 45.1
## 2 275. 45.1
## 3 275. 45.1
## 4 275. 45.1
## 5 275. 45.1
## 6 275. 45.1
## 7 275. 45.1
## 8 275. 45.1
## 9 275. 45.1
## 10 275. 45.1
## # ℹ 7,491 more rows
Now find the row of bend_df
that has the lowest value of
sse
## k sse
## 1 322.00 43.24199
## 2 321.99 43.24203
## 3 321.98 43.24206
## 4 322.01 43.24207
## 5 321.97 43.24210
## 6 321.96 43.24214
## 7 322.02 43.24214
## 8 321.95 43.24218
## 9 321.94 43.24222
## 10 322.03 43.24222
sse
vs k
Create a line graph with the different values of k
on
the x-axis and sse
on the y-axis. Does it appear that the
grid search was ‘wide’ enough?
A quicker alternative to a grid search is to perform gradient descent. It uses the value of the derivative to help find a better guess of the bend point than the current one. When run well, it is much quicker than using a grid search.
How gradient descent works is by updating the current value (\(k_0\)) into the new value (\(k_1\)). The formula to update the value of the bend point is:
\[k_1 = k_0 - \alpha \times f'(k_0)\]
where \(\alpha\) is some predetermined value (we’ll use 0.000001) and \(f'(k_0)\) is the value of the derivative evaluated at \(k_0\) (the derivative with the current value plugged in).
For example, let’s try to find the minimum of \(f(x) = x^2\). Let’s say our current value of \(x\) is \(x_0 = 0.5\) and the derivative is \(f'(x) = 2x\). Using \(\alpha = 0.1\), we can find a better guess by:
\[x_1 = x_0 - \alpha \times 2x_0 = 0.5 - (0.1)(2)(0.5) = 0.4\]
which is closer to the actual minimum of 0. We’d repeat this process until the value of the objective function changes by a very small amount. Our stopping criteria will be:
\[\left|\frac{f(x_1) - f(x_0)}{f(x_0)} \right| < c\]
where \(c\) is a number chosen before hand. If we choose c = 0.001, we’d check if we stop by
\[\left|\frac{f(0.4) - f(0.5)}{f(0.5)}\right| = \left|\frac{0.4^2 - 0.5^2}{0.5^2}\right|\]
\[\left|\frac{0.4^2 - 0.5^2}{0.5^2} \right| = 0.36\]
\[0.36 \gt 0.001\]
Since \(0.36 > 0.001\), we keep going and repeat and update our new guess by replacing \(x_0\) with \(x_1\) to update our next guess, \(x_2\):
\[x_2 = 0.4 - 0.1(2)(0.4) = 0.32\]
Create a function of the derivative that takes the value of \(k\) and returns the value of the derivative evaluated at \(k\)
The derivative of the objective function with respect to \(k\) is:
\[f'(k) = b_2 \sum(x - k)^+ - n_k b_2 (\bar{y}_k - b_0 - b_1 \bar{x}_k) \]
where:
\[n_k = \text{the number of players with weight above the specified k}\]
\[\bar{y}_k = \text{the average 40 yard dash time of players with weight above k}\]
\[\bar{x}_k = \text{the average weight of players with weight above k}\]
and \(b_0\), \(b_1\), \(b_2\) are the coefficients from the fitted
model. You can get these values from
bend_lm(k = ...)$model$coef
## (Intercept)
## 2.386428
## weight
## 0.009247701
## bend
## -0.003750159
The function should:
Call the function bend_lm_der(k)
To make this a little easier:
Calculate a vector of \((x -
k)^+\) by using (x - k)*(x > k)
Fit the model and get the coefficients using
bend_lm(k)$model$coef
Calculate \(n_k\), \(\bar{y}_k = \sum(y*(x > k))/n_k\), and \(\bar{x}_k = \sum(x - k)^+ / n_k\)
Calculate the value of the derivative. This can be easier if you break it up into two or more pieces.
Return the derivative
The code chunk below will check if you created the function correctly
## bend
## -42.68175
Write the code to perform gradient descent below. For each loop, it should
update the number of iterations (number of loops). Call it
iters
. On the 5th loop, iters = 5
, on the 10th
loop, iters = 10
, etc…
calculate the value of the objective function with the
k
, bend_lm(k = k)$sse
(call it
of_curr
)
calculate the value of the derivative using the current value of
k
(call it gradient
)
update the value of k
using gradient
descent
calculate the value of the objective function using the new value
of k
(call it of_new
)
You’ll be using alpha = 1e-4
# objects need to perform gradient descent
iters <- 0 # Number of iterations for gradient descent
k <- 300 # Initial value of the bend point
# Finding the value of the objective function with the current b
of_new <- bend_lm(k = k)$sse
of_curr <- 1 # Needed to get the loop started
alpha <- 1e-4 # how much the value of k changes based on the derivative
c <- 1e-8 # how much the derivative needs to change to stop the algorithm
## bend
## 320.904
The code chunk below will check the results and that they match what is in Brightspace
## k.bend sse iterations
## 320.904 43.249 8876.000
While the results aren’t 100% the same between question 2 and question 3, they are very similar!