For our example of objective functions, we looked a simple linear regression since it is a very common machine learning method. Another common machine learning method is logistic regression, which attempts to estimate the probability of success of a binary (categorical with two outcome) variable.
While we won’t be looking at logistic regression in this class, we’ll use it as our example for this homework assignment!
The logistic regression model is:
\[\log\left(\frac{p}{1-p}\right) = a + bx\]
where \(\log()\) is the natural log, \(a\) is the intercept, and \(b\) is the slope (like with linear regression!).
Note: The logistic regression model is shown above to show how it looks, you won’t actually be using it in this assignment
The field goals.csv data set is about the 4303 NFL field goal attempts for the 2020 - 2023 seasons. It has 2 columns:
distance: The distance away the kicker was from the goal
result: A dummy variable if the kick was successful (1) or if it missed (0)
Our goal is to make a logistic regression model where we estimate the probability a field goal attempt is successful based on the distance of the attempt.
Write a function for the objective function for logistic
regression named logit_of
. The objective function
is:
\[h(a, b) = \sum\left( \log\left(1 + e^{a + bx} \right) - y(a + bx) \right)\]
The function will need 4 arguments:
How to calculate \(e^x\) in
R: exp(x)
logit_of <-
function(x, y, a = 6.25, b = 0){
# Calculating a + bx first
linear_comp <- a + b * x
# The unsummed vector of the OF values
of_vec <- log(1 + exp(linear_comp)) - y * linear_comp
# Returning the OF value
return(sum(of_vec))
}
Run the code chunks below and see if they match what is in Brightspace
# a = 6.25, b = 0
logit_of(x = fg$distance, y = fg$result)
## [1] 3958.299
# a = 0, b = 0
logit_of(x = fg$distance, y = fg$result, a = 0)
## [1] 2982.612
# a = 6.25, b = -0.1
logit_of(x = fg$distance, y = fg$result, b = -0.1)
## [1] 1556.397
For question 2, you’ll perform a grid search to find the best value of the slope, \(b\), when we keep the intercept the same at \(a = 6.25\)
Create a data frame named logit_search that has 2 columns:
b_val: The different values of \(b\) to be searched across. Start at -1, end at 1, and change by increments of 0.0001
of_val: The value of the objective function for the corresponding version of \(b\)
Note: While you’ll want to search over the range of -1 to +1 in increments of 0.0001 for your solutions, start just by searching over -1 to +1 by increments of 0.01 until you get your loop in question 2b working.
logit_search <-
data.frame(
b_val = seq(-1, 1, by = 0.0001),
of_val = -1
)
tibble(logit_search)
## # A tibble: 20,001 × 2
## b_val of_val
## <dbl> <dbl>
## 1 -1 -1
## 2 -1.00 -1
## 3 -1.00 -1
## 4 -1.00 -1
## 5 -1.00 -1
## 6 -1.00 -1
## 7 -0.999 -1
## 8 -0.999 -1
## 9 -0.999 -1
## 10 -0.999 -1
## # ℹ 19,991 more rows
Conduct a grid search using the data frame created in part 2a for logistic regression.
for (i in 1:nrow(logit_search)){
# Looping through the different values of the slope and finding the OF val
of_loop <-
logit_of(x = fg$distance, y = fg$result, a = 6.25, b = logit_search[i, 1])
# Saving the results of the ith iteration in the ith row and second column
# of the logit_search data frame
logit_search[i, 2] <- of_loop
}
Use the code chunk below to check that it worked by looking at the results in Brightspace
RNGversion("4.1.0"); set.seed(2870)
logit_search |>
slice_sample(n = 10) |>
arrange(b_val)
## b_val of_val
## 1 -0.6341 64361.326
## 2 -0.4769 42725.551
## 3 -0.4286 36090.728
## 4 0.1461 8312.914
## 5 0.2940 12729.430
## 6 0.3426 14180.722
## 7 0.7321 25811.970
## 8 0.8750 30079.250
## 9 0.9205 31437.971
## 10 0.9334 31823.191
To ensure that we searched over enough values of the slope to
find a true minimum, we graph the results. Create a line graph with the
values of \(b\) on the x-axis and the
values of the objective function on the y-axis. Add a vertical red line
(geom_vline()
) at the minimum value and the value of the
slope.
ggplot(
data = logit_search,
mapping = aes(
x = b_val,
y = of_val
)
) +
geom_line() +
# Adding the vertical red line at the minimum
geom_vline(
data = logit_search |> slice_min(of_val, n = 1),
mapping = aes(xintercept = b_val),
color = "red",
linetype = "dashed"
) +
# Adding the value of the slope that minimizes the OF
geom_text(
data = logit_search |> slice_min(of_val, n = 1),
mapping = aes(label = paste("slope =", b_val)),
y = 30000,
nudge_x = 0.2,
color = "red"
) +
# Changing the x and y-axis labels
labs(
x = "Slope",
y = "Objective Function"
) +
# Changing the theme
theme_bw()
A quicker alternative to a grid search is to perform gradient descent. It uses the value of the derivative to help find a better guess of the slope than the current one. When run well, it is much quicker than using a grid search.
How gradient descent works is by updating the current value (\(b_0\)) into the new value (\(b_1\)). The formula to update the value of the slope is:
\[b_1 = b_0 - \alpha \times f'(b_0)\]
where \(\alpha\) is some predetermined value (we’ll use 0.000001) and \(f'(b_0)\) is the value of the derivative evaluated at \(b_0\) (the derivative with the current value plugged in).
For example, let’s try to find the minimum of \(f(x) = x^2\). Let’s say our current value of \(x\) is \(x_0 = 0.5\) and the derivative is \(f'(x) = 2x\). Using \(\alpha = 0.1\), we can find a better guess by:
\[x_1 = x_0 - \alpha \times 2x_0 = 0.5 - (0.1)(2)(0.5) = 0.4\]
which is closer to the actual minimum of 0. We’d repeat this process until the value of the objective function changes by a very small amount. Our stopping criteria will be:
\[\left|\frac{f(x_1) - f(x_0)}{f(x_0)} \right| < c\]
where \(c\) is a number chosen before hand. If we choose c = 0.001, we’d check if we stop by
\[\left|\frac{f(0.4) - f(0.5)}{f(0.5)}\right| = \left|\frac{0.4^2 - 0.5^2}{0.5^2}\right|\]
\[\left|\frac{0.4^2 - 0.5^2}{0.5^2} \right| = 0.36\]
\[0.36 \gt 0.001\]
Since \(0.36 > 0.001\), we keep going and repeat and update our new guess by replacing \(b_0\) with \(b_1\):
\[b_1 = 0.4 - 0.1(2)(0.4) = 0.32\]
We keep going until the stopping condition is met
The code chunk below will create the function called
logit_der()
to calculate the derivative that you’ll use to
answer question 3:
logit_der <- function(x, y, a = 6.25, b){
return(-1 * sum(x * y - (x * exp(a + b * x))/(1 + exp(a + b * x))))
}
# Using the function:
logit_der(x = fg$distance, y = fg$result, a = 6.25, b = 0.5)
## [1] 29862
Write the code to perform gradient descent below. For each loop, it should
update the number of iterations (number of loops). Call it
iters
. On the 5th loop, iters = 5, on the 10th loop, iters
= 10, etc…
calculate the value of the objective function with the current
slope (call it of_curr
)
calculate the value of the derivative (call it
gradient
)
update the value of the slope using gradient descent
calculate the new value of the objective function (call it
of_new
)
You’ll be using alpha = 1e-6
and keep the intercept the
same as in question 2 (\(a =
6.25\))
# objects need to perform gradient descent
iters <- 0 # Number of iterations for gradient descent
b <- 0.5 # Initial value of the slope
# Finding the value of the objective function with the current b
of_new <- logit_of(x = fg$distance, y = fg$result, a = 6.25, b = b)
of_curr <- 1 # Needed to get the loop started
alpha <- 1e-6 # how much the value of the slope changes based on the derivative
c <- 1e-5 # how much the derivative needs to change to stop the algorithm
With the code chunk below, perform gradient descent
while(abs((of_new - of_curr)/of_curr) > c){
# 1) Updating the number of iterations
iters <- iters + 1
# 2) Calculating the value of the objective function with the current slope
of_curr <- logit_of(x = fg$distance, y = fg$result, a = 6.25, b = b)
# 3) Calculating the value of the derivative
gradient <- logit_der(x = fg$distance, y = fg$result, a = 6.25, b = b)
# 4) Updating the value of the slope
b <- b - alpha * gradient
# 5) Finding the objective function for the new value of the slope
of_new <- logit_of(x = fg$distance, y = fg$result, a = 6.25, b = b)
}
The code chunk below will check the results and that they match what is in Brightspace
## slope objective function iterations
## -0.1053 1543.0000 23.0000