Housekeeping

Today’s plan 📋

  • Review of Log and Natural Log

  • Non-Linear Models

  • Example 1: Old Faithful Eruption Intervals

  • Example 2: Ferrari Acceleration Time

  • Example 3: BMW Fuel Efficiency

In-class Polling (Session ID: bua345s25)

💥 Lecture 22 In-class Exercises - Q1 💥

Session ID: bua345s25

Review Question from Lecture 19:

If the estimated log odds(Y’) of you making a late payment on your credit card in -0.257, what is the PERCENT CHANCE you will submit a late payment. Round percentage to one decimal place.

Recall:

  • If estimated log odds = Y’ then probability = exp(Y’)/(1+exp(y’))
  • Percent = Probability X 100%

Logs, Natural Logs, Exponential (exp) Functions

  • Why do these matter?

  • Up until now:

    • We have assumed that the relationship between each X (predictor) variable and Y was a straight line when Y is quantitative.
  • OR

    • We used a linear transformation to transform a curvilinear relationship into straight line relationship (MAS 261)

    • Transformations like LN(Y), are common and effectively used in Finance, Accounting, etc.

    • An alternative is to model the data as non-linear or curvilinear.

Why use non-linear models?

Pros:

  • No transformation and back-transforming of estimates

  • Model fits the data as shown

  • Common Simple Linear Regression models can be done in Excel (or R)

  • R functions can expedite process (next week)

Cons

  • Requires trial and error (like transformation) to determine model

  • Interpretation must account for non-linear relationship

  • For Multiple Linear Regression would have to be done in R

Model for Old Faithful

  • Suppose you are examining thermal energy for a new start-up company.

  • As part of your research, you take a trip to the most famous geyser in the US – Old Faithful

  • The park ranger explains that it has a highly predictable geothermal output

    • AND the duration of each eruption in minutes is related to how long it will be until the next eruption.
  • You decide to fit a model to this relationship based on one month of data.

Scatterplot of Old Faithful Data

  • How do we model this relationship?

  • Is it linear?

  • If not, what is the best model?

    • Relationship appears slightly concave down.
  • How do we interpret the results?

Model for Old Faithful in Yellowstone NP

Trendlines in Excel

  • Adding a trendline in Excel is very quick.

  • The provided worksheets will allow you to compare linear and non-linear options.

  1. Select points and right-click then click ‘Add Trendline’.

  1. Select one of these five trendline options.

  1. Scroll down to bottom of trendline menu and select these two options.
  • Optional: Rewrite equation so intercept is first.

Linear Model

Exponential Model

Logarithmic Model

Polynomial Model

Power Model

Old Faithful Model Summary

💥 Lecture 22 In-class Exercises - Q2 💥

Based on our data, for an Old Faithful eruption of 2 minutes (X = 2), the average duration until the next eruption is 57 minutes, Y = 57.


Use this average to find the residual for the linear model:

Linear Model: \(Y = 34.347 + 10.537\times X\)


In Excel: = 57 – (34.347 + 10.537*X)

In R:

X <- 2

57 - (34.347 + 10.537*X)


Answer will be decimal minutes, not minutes and seconds.

💥 Lecture 22 In-class Exercises - Q3 💥

Based on our data, for an Old Faithful eruption of 2 minutes (X = 2), the average duration until the next eruption is 57 minutes, Y = 57.


Use this average to find the residual for the power model:

Power Model: \(Y = 39.144 \times X^{0.487}\)


In Excel: = 57 – (39.144*X^(0.487))

In R:

X <- 2

57 - (39.144*X^(0.487))


Answer will be decimal minutes, not minutes and seconds.

💥 Lecture 22 In-class Exercises - Q4 💥

From a tourism point of view for Old Faithful, a prediction error of less than 5 minutes does not matter.


Based on the Adjusted \(R^2\) for the Linear and Power models (Slide 15) and the residuals found in the two previous questions, which model would you choose?


A. Power model because it more accurate.

B. Linear model because difference in accuracy is negligible and linear model is simpler

Ferrari Acceleration Time

  • For marketing purposes, you want to predict the acceleration time of the new Ferrari

  • You collect data on speed (mph) and acceleration Time in seconds for a number of vehicles

  • You notice the data aren’t linear, but want to fit the model as accurately as possible.

  • Which model provides the best fit?

Some Possible Ferrari Models

More Possible Ferrari Models

Ferrari Model Summary

Thinking Question

Use R or Excel to calculate the estimated time in seconds it takes for the Ferrari to go from 0 to 100 mph (X = 100) for both the Exponential model and the Polynomial Model (shown below).

  • Exponential Model: \(\hat{Y} = 0.9936 \times e^{0.0154X}\)

  • Polynomial Model: \(1.8123 - 0.0165X + 0.0005X^2\)

Select which statement(s) is/are true:

A. The polynomial model estimates a longer time for 0 to 100 mph acceleration than the exponential model.

B. The exponential model estimates a longer time for 0 to 100 mph acceleration than the polynomial model.

C. The two model estimates are within half a second of each other.

D. The two model estimates are within 1 second of each other

Conceptual Question

Given that the difference in Adjusted \(R^2\) between the Polynomial and Exponential models for the Ferrari data is negligible, you opt for the model that is easier to explain to someone without any quantitative analytical training.


Which model do you choose and why?


Note this is subjective and the answer depends on discipline.

BMW Fuel Economy

  • As part of new sales campaign for BMW, you want to model the fuel economy (MPG) of the BMW 430i based on speed.

  • Although BMW sells electric cars, you also have customers that want gas vehicles.

  • You have a small data set examining average fuel economy at 8 different speeds.

  • You notice the data definitely aren’t linear but want to fit the model as accurately as possible.

  • Which model provides the best fit?

BMW Fuel Ecomomy Model Options

💥 Lecture 22 In-class Exercises - Q5 💥

It is clear from the provided plot, that a linear mode would be inappropriate for these data.

Other than the linear model, which model choice is ALWAYS inappropriate for concave down relationships like the BMW data?

Hint: If you are unsure, examine the trendlines in Excel or the R html file.

A. Exponential

B. Logarithmic

C. Polynomial

D. Power

💥 Lecture 22 In-class Exercises - Q6 💥

Based on the plots and Adjusted \(R^2\) values (see below), which model fits this relationship the best for the BMW fuel economy data?