DS 2870: Homework 7 - Writing Functions and Loops

head(lineman)

##              name weight dash40
## 1    Oday Aboushi    308   5.45
## 2      Zach Allen    332   5.43
## 3   Ezekial Ansah    271   4.63
## 4 Terron Armstead    306   4.71
## 5       Jeff Baca    302   5.03
## 6    Alvin Bailey    312   4.95

Data Description

The lineman data comes from the NFL combine, which is essentially the tryout for college players to try to get NFL teams to draft them.

They ask the players for their height and weight (weight), then the players can participate in different events. The most common event is the 40 yard dash (dash40). We want to see if there is an association between weight and dash40

gg_dash40_weight <- 
  ggplot(
    data = lineman,
    mapping = aes(
      x = weight,
      y = dash40
    )
  ) + 
  geom_point() + 
  labs(
    x = 'Weight (lbs)',
    y = '40 yard dash (seconds)'
  ) + 
  theme_bw()

gg_dash40_weight +
  geom_smooth(
    color = 'red',
    method = 'loess',
    se = F,
    formula = y ~ x
  )

There is an association between weight and dash40, but it doesn’t appear to be a constant, linear association.

What we can do is create a ‘broken-stick’ style model:

\[\hat{y} = b_0 + b_1x + b_2(x - k)^+\]

where $(x - k)^+$ is the difference between $x$ and $k$ if it is positive and 0 otherwise.

For example, if $k = 5$ and $x = 7$, then $(7 - 5)^+ = 2$, but if $x = 3$ then $(3 - 5)^+ = 0$

The code chunk below shows how to a broken-stick model with $k = 300$

example_lm <- 
  lineman |> 
  # Adding a column for (x - k)^+
  # rowwise() will look at the summary function by row instead of the entire col
  rowwise() |> 
  mutate(
    bend = max(0, weight - 300)
  ) |> 
  # Undoing rowwise()
  ungroup() |> 
  # Fitting the model:
  lm(formula = dash40 ~ weight + bend)

# Displaying the results of the linear model
tidy(example_lm)

## # A tibble: 3 × 5
##   term        estimate std.error statistic   p.value
##   <chr>          <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)  2.39     0.0994       24.0  1.55e-107
## 2 weight       0.00925  0.000346     26.7  5.20e-128
## 3 bend        -0.00375  0.000637     -5.89 4.75e-  9

And we can see the model that it fits:

gg_dash40_weight +
  geom_line(
    data = lineman |> 
      mutate(
        # Adding the predicted time
        time_hat = example_lm$fitted.values
      ),
    mapping = aes(y = time_hat),
    color = 'red',
    linewidth = 1
  ) + 
  theme_bw()

The goal of this homework is to find the best ‘bending’ point.

Question 1: Creating the needed function

Write a function named bend_lm() with 3 arguments:

y = the response variable
x = the explanatory variable
k = the ‘bend’ in the explanatory variable

Should have a default value of 0 (no broken stick model)

It should return two separate objects:

model: the fitted linear model from lm()
sse: the squared sum of the residuals

You can extract the residuals from an object created by lm() using $residuals

bend_lm <- function(x, y, k = 0){
  
  # Placing x & y in the same data frame to use in the lm function
  func_df <- 
    data.frame(y = y, x = x) |> 
    # Calculating (x - k)+
    rowwise() |> 
    mutate(bend = max(0, x - k)) |> 
    ungroup()
  
  func_lm <- 
    # Fitting the model:
    lm(formula = y ~ x + bend, data = func_df)
  
  # Calculating the SSE
  sse <- sum((func_lm$residuals)^2)
  
  return(
    list(
      'model' = func_lm,
      'sse'   = sse      
    )
  )
}

The code chunks below will test out your function. If it works correctly it should match what is in Brighspace.

# Testing the function. Should return the same 
tidy(
  bend_lm(
    y = iris$Petal.Length,
    x = iris$Petal.Width,
    k = 2
  )$model
)

## # A tibble: 3 × 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)    0.926    0.0654     14.2  2.90e-29
## 2 x              2.46     0.0529     46.4  1.06e-89
## 3 bend          -2.74     0.361      -7.60 3.20e-12

# Testing the function. Should return the same 
tidy(
  bend_lm(
    y = MASS::cats$Hwt,
    x = MASS::cats$Bwt - mean(MASS::cats$Bwt)
  )$model
)

## # A tibble: 3 × 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)    10.3      0.228     45.2  9.89e-86
## 2 x               3.05     0.603      5.05 1.32e- 6
## 3 bend            1.71     0.953      1.80 7.42e- 2

Question 2: Grid search

Conduct a grid search for the value of $k$ for the model:

\[\hat{y} = b_0 + b_1x + b_2(x - k)^+\]

over the values of 275 to 350 in increments of 0.01.

Save the results in a data frame named bend_df with two columns named k and sse.

Write the grid search in the code chunk below.

I recommend searching from 275 to 350 by increments of 1 until you get the code to run. Then change it from 1 to 0.01.

bend_df <- 
  data.frame(
    k = seq(275, 350, 0.01),   
    sse = -1
  )

for (i in 1:nrow(bend_df)){
  bend_df[i, 'sse'] <- 
    bend_lm(
      x = lineman$weight, 
      y = lineman$dash40,
      k = bend_df$k[i]
    )$sse
}

tibble(bend_df)

## # A tibble: 7,501 × 2
##        k   sse
##    <dbl> <dbl>
##  1  275   45.1
##  2  275.  45.1
##  3  275.  45.1
##  4  275.  45.1
##  5  275.  45.1
##  6  275.  45.1
##  7  275.  45.1
##  8  275.  45.1
##  9  275.  45.1
## 10  275.  45.1
## # ℹ 7,491 more rows

If done correctly, the code chunk below should create the same graph seen in Brightspace.

ggplot(
  data = bend_df,
  mapping = aes(
    x = k,
    y = sse
  )
) + 
  # Adding the line for the function
  geom_line(
    linewidth = 1
  ) + 
  # Adding a vertical line at k that minimizes sse
  geom_vline(
    color = 'red',
    linetype = 'dashed',
    xintercept = 322,
    linewidth = 1
  ) + 
  labs(x = 'Bend Point') + 
  theme_bw()

What should the chosen value of $k$ be?

Question 3: Gradient descent

Gradient descent description

A quicker alternative to a grid search if the model has many parameters is to perform gradient descent. It uses the value of the derivative to help find a better guess of the bend point than the current one. When run well, it is much quicker than using a grid search.

How gradient descent works is by updating the current value ($k_0$) into the new value ($k_1$). The formula to update the value of the bend point is:

\[k_1 = k_0 - \alpha \times f'(k_0)\]

where $\alpha$ is some predetermined value (we’ll use 0.000001) and $f'(k_0)$ is the value of the derivative evaluated at $k_0$ (the derivative with the current value plugged in).

For example, let’s try to find the minimum of $f(x) = x^2$. Let’s say our current value of $x$ is $x_0 = 0.5$ and the derivative is $f'(x) = 2x$. Using $\alpha = 0.1$, we can find a better guess by:

\[x_1 = x_0 - \alpha \times 2x_0 = 0.5 - (0.1)(2)(0.5) = 0.4\]

which is closer to the actual minimum of 0. We’d repeat this process until the value of the objective function changes by a very small amount. Our stopping criteria will be:

\[\left|\frac{f(x_1) - f(x_0)}{f(x_0)} \right| < c\]

where $c$ is a number chosen before hand. If we choose c = 0.001, we’d check if we stop by

\[\left|\frac{f(0.4) - f(0.5)}{f(0.5)}\right| = \left|\frac{0.4^2 - 0.5^2}{0.5^2}\right|\]

\[\left|\frac{0.4^2 - 0.5^2}{0.5^2} \right| = 0.36\]

\[0.36 \gt 0.001\]

Since $0.36 > 0.001$, we keep going and repeat and update our new guess by replacing $x_0$ with $x_1$ to update our next guess, $x_2$:

\[x_2 = 0.4 - 0.1(2)(0.4) = 0.32\]

Part 3a) Function of the derivative

Create a function of the derivative that takes the value of $k$ and returns the value of the derivative evaluated at $k$

The derivative of the objective function with respect to $k$ is:

\[f'(k) = -b_2 n_k\left[ \bar{y}_k - b_0 - (b_1 + 1) \bar{x}_k \right]\]

where:

\[n_k = \text{the number of players with weight above the specified k}\]

\[\bar{y}_k = \text{the average 40 yard dash time of players with weight above k}\]

\[\bar{x}_k = \text{the average weight of players with weight above k} = \frac{1}{n_k}\sum(x-k)^+\]

and $b_0$, $b_1$, $b_2$ are the coefficients from the fitted model. You can get these values from bend_lm(...)$model$coef

## (Intercept) 
##    2.386428

##           x 
## 0.009247701

##         bend 
## -0.003750159

The function should:

Fit the model for the specified value of $k$
Calculate the derivative (specified above)

Hint: Helpful to break up the derivative into different pieces

Return the value of the derivative (doesn’t need to be in a list)

Call the function bend_lm_der(y, x, k)

To make this a little easier:

Calculate a vector of $(x - k)^+$ by using (x - k)*(x > k)
Fit the model and get the coefficients using bend_lm(k)$model$coef
Calculate $n_k$, $\bar{y}_k = \sum(y*(x > k))/n_k$, and $\bar{x}_k = \sum(x - k)^+ / n_k$
Calculate the value of the derivative. This can be easier if you break it up into two or more pieces.
Return the derivative

bend_lm_der <- function(y, x, k){
  
  # Step 1: Calculating (x - k)+
  x_k_pos <- (x - k)*(x > k)
  
  # Step 2: Getting the model coefficients for the specified value of k
  bend_coef <- bend_lm(y = y, x = x, k = k)$model$coef
  
  
  # Step 3: Getting the values
  nk <- sum(x > k)
  x_bar_k <- sum(x_k_pos) / nk
  y_bar_k <- sum(y * (x > k)) / nk
  
  
  ## Step 4: Calculating the derivative by breaking the derivative into pieces
  # Part 1 - The positive part: b2*sum(x - k)+
  part1 <- -1 * nk * bend_coef[3]
  
  # Part 2 - The piece being subtracted
  part2 <- y_bar_k - bend_coef[1] - (bend_coef[2] + 1) * x_bar_k
  
  # Returning the derivative: part1 * part2
  return(part1 * part2)
  
}

The code chunk below will check if you created the function correctly

# Check 1: Lineman data
bend_lm_der(
  x = lineman$weight,
  y = lineman$dash40,
  k = 300
)

##      bend 
## -42.68175

# Check 2: iris data
bend_lm_der(
  x = iris$Petal.Length,
  y = iris$Petal.Width,
  k = 2
)

##     bend 
## 26.60027

# Check 3: iris data
bend_lm_der(
  x = MASS::cats$Bwt,
  y = MASS::cats$Hwt,
  k = 3
)

##     bend 
## -563.769

Part 3b) Performing Gradient Descent

Write the code to perform gradient descent below. For each loop, it should

update the number of iterations (number of loops). Call it iters. On the 5th loop, iters = 5, on the 10th loop, iters = 10, etc…
calculate the value of the objective function with the k, bend_lm(y = ..., x = ..., k = k)$sse (call it of_curr)
calculate the value of the derivative using the current value of k (call it gradient)
update the value of k using gradient descent
calculate the value of the objective function using the new value of k (call it of_new)

You’ll be using alpha = 1e-4

Set up gradient descent in the code chunk below using the following initial values:

k = 300
alpha = 0.0001
c = $10^{-8}$

# objects need to perform gradient descent
iters <- 0 # Number of iterations for gradient descent
k <- 300   # Initial value of the bend point

# Finding the value of the objective function with the current b
of_new <- bend_lm(y = lineman$dash40, 
                  x = lineman$weight, 
                  k = k)$sse

of_curr <- 1   # Needed to get the loop started
alpha <- 1e-4  # how much the value of k changes based on the derivative
c <- 1e-8      # how much the derivative needs to change to stop the algorithm

When writing the code below for gradient descent, use c = 1e^-5 until you get it working. Then change it to c = 1e^-8

while (abs(of_new - of_curr)/of_curr > c) {
  # Step 1) update iters
  iters <- iters + 1
  
  # Step 2) calculate the value of the objective function
  of_curr <- 
    bend_lm(
      y = lineman$dash40, 
      x = lineman$weight, 
      k = k
    )$sse
  
  # Step 3) calculate the value of the derivative
  gradient <- 
    bend_lm_der(
      y = lineman$dash40, 
      x = lineman$weight, 
      k = k
    )
  
  # Step 4) updating the value of k
  k <- k - alpha * gradient
  
  # Step 5) Update the value of the objective function
  of_new <- bend_lm(
    y = lineman$dash40, 
    x = lineman$weight, 
    k = k
  )$sse
}

The code chunk below will check the results and that they match what is in Brightspace

##     k.bend        sse iterations 
##    320.904     43.249   8876.000

While the results aren’t 100% the same between question 2 and question 3, they are very similar!