Set up

Logistic regression

For our example of objective functions, we looked a simple linear regression since it is a very common machine learning method. Another common machine learning method is logistic regression, which attempts to estimate the probability of success of a binary (categorical with two outcome) variable.

While we won’t be looking at logistic regression in this class, we’ll use it as our example for this homework assignment!

The logistic regression model is:

\[\log\left(\frac{p}{1-p}\right) = a + bx\]

where \(\log()\) is the natural log, \(a\) is the intercept, and \(b\) is the slope (like with linear regression!).

Note: The logistic regression model is shown above to show how it looks, you won’t actually be using it in this assignment

Data description

The field goals.csv data set is about the 4303 NFL field goal attempts for the 2020 - 2023 seasons. It has 2 columns:

  1. distance: The distance away the kicker was from the goal

  2. result: A dummy variable if the kick was successful (1) or if it missed (0)

Our goal is to make a logistic regression model where we estimate the probability a field goal attempt is successful based on the distance of the attempt.

Question 1: Logistic regression objective function

Write a function for the objective function for logistic regression named logit_of. The objective function is:

\[h(a, b) = \sum\left( \log\left(1 + e^{a + bx} \right) - y(a + bx) \right)\]

The function will need 4 arguments:

  1. x: A vector of the explanatory variable (predictor)
  2. y: A vector of 1s and 0s representing the response variable
  3. a: The chosen value of the intercept (default to 6.25 for later purposes)
  4. b: The chosen value of the slope (default to 0)

How to calculate \(e^x\) in R: exp(x)

Run the code chunks below and see if they match what is in Brightspace

# a = 6.25, b = 0
logit_of(x = fg$distance, y = fg$result)
## [1] 3958.299
# a = 0, b = 0
logit_of(x = fg$distance, y = fg$result, a = 0)
## [1] 2982.612
# a = 6.25, b = -0.1
logit_of(x = fg$distance, y = fg$result, b = -0.1)
## [1] 1556.397

Question 2: Grid search for the slope

For question 2, you’ll perform a grid search to find the best value of the slope, \(b\), when we keep the intercept the same at \(a = 6.25\)

Part 2a: Data frame to save the results

Create a data frame named logit_search that has 2 columns:

  1. b_val: The different values of \(b\) to be searched across. Start at -1, end at 1, and change by increments of 0.0001

  2. of_val: The value of the objective function for the corresponding version of \(b\)

Note: While you’ll want to search over the range of -1 to +1 in increments of 0.0001 for your solutions, start just by searching over -1 to +1 by increments of 0.01 until you get your loop in question 2b working.

## # A tibble: 20,001 × 2
##     b_val of_val
##     <dbl>  <dbl>
##  1 -1         -1
##  2 -1.00      -1
##  3 -1.00      -1
##  4 -1.00      -1
##  5 -1.00      -1
##  6 -1.00      -1
##  7 -0.999     -1
##  8 -0.999     -1
##  9 -0.999     -1
## 10 -0.999     -1
## # ℹ 19,991 more rows

Part 2b: Grid search for the values of \(b\)

Conduct a grid search using the data frame created in part 2a for logistic regression.

Use the code chunk below to check that it worked by looking at the results in Brightspace

##      b_val    of_val
## 1  -0.6341 64361.326
## 2  -0.4769 42725.551
## 3  -0.4286 36090.728
## 4   0.1461  8312.914
## 5   0.2940 12729.430
## 6   0.3426 14180.722
## 7   0.7321 25811.970
## 8   0.8750 30079.250
## 9   0.9205 31437.971
## 10  0.9334 31823.191

Question 3: Gradient Descent

Gradient descent description

A quicker alternative to a grid search is to perform gradient descent. It uses the value of the derivative to help find a better guess of the slope than the current one. When run well, it is much quicker than using a grid search.

How gradient descent works is by updating the current value (\(b_0\)) into the new value (\(b_1\)). The formula to update the value of the slope is:

\[b_1 = b_0 - \alpha \times f'(b_0)\]

where \(\alpha\) is some predetermined value (we’ll use 0.000001) and \(f'(b_0)\) is the value of the derivative evaluated at \(b_0\) (the derivative with the current value plugged in).

For example, let’s try to find the minimum of \(f(x) = x^2\). Let’s say our current value of \(x\) is \(x_0 = 0.5\) and the derivative is \(f'(x) = 2x\). Using \(\alpha = 0.1\), we can find a better guess by:

\[x_1 = x_0 - \alpha \times 2x_0 = 0.5 - (0.1)(2)(0.5) = 0.4\]

which is closer to the actual minimum of 0. We’d repeat this process until the value of the objective function changes by a very small amount. Our stopping criteria will be:

\[\left|\frac{f(x_1) - f(x_0)}{f(x_0)} \right| < c\]

where \(c\) is a number chosen before hand. If we choose c = 0.001, we’d check if we stop by

\[\left|\frac{f(0.4) - f(0.5)}{f(0.5)}\right| = \left|\frac{0.4^2 - 0.5^2}{0.5^2}\right|\]

\[\left|\frac{0.4^2 - 0.5^2}{0.5^2} \right| = 0.36\]

\[0.36 \gt 0.001\]

Since \(0.36 > 0.001\), we keep going and repeat and update our new guess by replacing \(b_0\) with \(b_1\):

\[b_1 = 0.4 - 0.1(2)(0.4) = 0.32\]

We keep going until the stopping condition is met

The code chunk below will create the function called logit_der() to calculate the derivative that you’ll use to answer this question:

logit_der <- function(x, y, a = 6.25, b){
  return(-1 * sum(x * y - (x * exp(a + b * x))/(1 + exp(a + b * x))))
}

# Using the function:
logit_der(x = fg$distance, y = fg$result, a = 6.25, b = 0.5)
## [1] 29862

Writing the gradient descent code

Write the code to perform gradient descent below. For each loop, it should

  • update the number of iterations (number of loops). Call it iters. On the 5th loop, iters = 5, on the 10th loop, iters = 10, etc…

  • calculate the value of the objective function with the current slope (call it of_curr)

  • calculate the value of the derivative (call it gradient)

  • update the value of the slope using gradient descent

  • calculate the new value of the objective function (call it of_new)

You’ll be using alpha = 1e-6 and keep the intercept the same as in question 2 (\(a = 6.25\))

# objects need to perform gradient descent
iters <- 0 # Number of iterations for gradient descent
b <- 0.5   # Initial value of the slope

# Finding the value of the objective function with the current b
of_new <- logit_of(x = fg$distance, y = fg$result, a = 6.25, b = b)

of_curr <- 1  # Needed to get the loop started
alpha <- 1e-6 # how much the value of the slope changes based on the derivative
c <- 1e-5     # how much the derivative needs to change to stop the algorithm

With the code chunk below, perform gradient descent

The code chunk below will check the results and that they match what is in Brightspace

##              slope objective function         iterations 
##            -0.1053          1543.0000            23.0000