BUA 345 - Lecture 22
Introduction to Non-Linear Models
Housekeeping
Today’s plan
Review of Log and Natural Log
Non-Linear Models
Example 1: Old Faithful Eruption Intervals
Example 2: Ferrari Acceleration Time
Example 3: BMW Fuel Efficiency
In-class Polling (Session ID: bua345s25)
Lecture 22 In-class Exercises - Q1
Session ID: bua345s25
Review Question from Lecture 19:
If the estimated log odds(Y’) of you making a late payment on your credit card in -0.257, what is the PERCENT CHANCE you will submit a late payment. Round percentage to one decimal place.
Recall:
- If estimated log odds = Y’ then probability = exp(Y’)/(1+exp(y’))
- Percent = Probability X 100%
Logs, Natural Logs, Exponential (exp) Functions
Why do these matter?
Up until now:
- We have assumed that the relationship between each X (predictor) variable and Y was a straight line when Y is quantitative.
OR
We used a linear transformation to transform a curvilinear relationship into straight line relationship (MAS 261)
Transformations like LN(Y), are common and effectively used in Finance, Accounting, etc.
An alternative is to model the data as non-linear or curvilinear.
Why use non-linear models?
Pros:
No transformation and back-transforming of estimates
Model fits the data as shown
Common Simple Linear Regression models can be done in Excel (or R)
R functions can expedite process (next week)
Cons
Requires trial and error (like transformation) to determine model
Interpretation must account for non-linear relationship
For Multiple Linear Regression would have to be done in R
Model for Old Faithful
Suppose you are examining thermal energy for a new start-up company.
As part of your research, you take a trip to the most famous geyser in the US – Old Faithful
The park ranger explains that it has a highly predictable geothermal output
- AND the duration of each eruption in minutes is related to how long it will be until the next eruption.
You decide to fit a model to this relationship based on one month of data.
Scatterplot of Old Faithful Data
Model for Old Faithful in Yellowstone NP
Trendlines in Excel
Adding a trendline in Excel is very quick.
The provided worksheets will allow you to compare linear and non-linear options.
Linear Model
Exponential Model
Logarithmic Model
Polynomial Model
Power Model
Old Faithful Model Summary
Lecture 22 In-class Exercises - Q2
Based on our data, for an Old Faithful eruption of 2 minutes (X = 2), the average duration until the next eruption is 57 minutes, Y = 57
.
Use this average to find the residual for the linear model:
Linear Model: \(Y = 34.347 + 10.537\times X\)
In Excel: = 57 – (34.347 + 10.537*X)
In R:
X <- 2
57 - (34.347 + 10.537*X)
Answer will be decimal minutes, not minutes and seconds.
Lecture 22 In-class Exercises - Q3
Based on our data, for an Old Faithful eruption of 2 minutes (X = 2), the average duration until the next eruption is 57 minutes, Y = 57
.
Use this average to find the residual for the power model:
Power Model: \(Y = 39.144 \times X^{0.487}\)
In Excel: = 57 – (39.144*X^(0.487))
In R:
X <- 2
57 - (39.144*X^(0.487))
Answer will be decimal minutes, not minutes and seconds.
Lecture 22 In-class Exercises - Q4
From a tourism point of view for Old Faithful, a prediction error of less than 5 minutes does not matter.
Based on the Adjusted \(R^2\) for the Linear and Power models (Slide 15) and the residuals found in the two previous questions, which model would you choose?
A. Power model because it more accurate.
B. Linear model because difference in accuracy is negligible and linear model is simpler
Ferrari Acceleration Time
Some Possible Ferrari Models
More Possible Ferrari Models
Ferrari Model Summary
Thinking Question
Use R or Excel to calculate the estimated time in seconds it takes for the Ferrari to go from 0 to 100 mph (X = 100) for both the Exponential model and the Polynomial Model (shown below).
Exponential Model:
\(\hat{Y} = 0.9936 \times e^{0.0154X}\)Polynomial Model:
\(1.8123 - 0.0165X + 0.0005X^2\)
Select which statement(s) is/are true:
A. The polynomial model estimates a longer time for 0 to 100 mph acceleration than the exponential model.
B. The exponential model estimates a longer time for 0 to 100 mph acceleration than the polynomial model.
C. The two model estimates are within half a second of each other.
D. The two model estimates are within 1 second of each other
Conceptual Question
Given that the difference in Adjusted \(R^2\) between the Polynomial and Exponential models for the Ferrari data is negligible, you opt for the model that is easier to explain to someone without any quantitative analytical training.
Which model do you choose and why?
Note this is subjective and the answer depends on discipline.
BMW Fuel Economy
As part of new sales campaign for BMW, you want to model the fuel economy (MPG) of the BMW 430i based on speed.
Although BMW sells electric cars, you also have customers that want gas vehicles.
You have a small data set examining average fuel economy at 8 different speeds.
You notice the data definitely aren’t linear but want to fit the model as accurately as possible.
Which model provides the best fit?
BMW Fuel Ecomomy Model Options
Lecture 22 In-class Exercises - Q5
It is clear from the provided plot, that a linear mode would be inappropriate for these data.
Other than the linear model, which model choice is ALWAYS inappropriate for concave down relationships like the BMW data?
Hint: If you are unsure, examine the trendlines in Excel or the R html file.
A. Exponential
B. Logarithmic
C. Polynomial
D. Power
Lecture 22 In-class Exercises - Q6
Based on the plots and Adjusted \(R^2\) values (see below), which model fits this relationship the best for the BMW fuel economy data?
Key Points from Today
Non-linear model are a useful and flexible alternative to linear transformations.
IN R models are specified with the transformations.
Excel is great for comparing multiple trendline options quickly
For better information on model fit and residuals, software such as R is required.
Important to understand for BUA 345:
- How each model option is structured
- How to calculate a regression estimate using each model option
- Why \(R^2\) is inappropriate because polynomial models include more than one X variable (\(X\) and \(X^2\))
Lecture 23 will look at unconstrained optimization using data like the BMW data set.
Including today, there are six lectures and engagement questions remaining.
To submit an Engagement Question or Comment about material from Lecture 22: Submit it by midnight today (day of lecture).