BUA 345 - Lecture 21
Introduction to Non-Linear Models
Housekeeping
NO CLASS on Thursday, April 2nd.
- No Office Hours on Wed., April 1st, or Thu., April 2nd.
Quiz 2 scores and solutions are now posted.
You can also now see your quiz average on Blackboard.
Please look over your quiz.
Blackboard is not perfect
I am happy to go over any questions you have after class, during office hours, or in meeting by appointment.
Today’s plan
Review of Log and Natural Log
Non-Linear Models
Example 1: Old Faithful Eruption Intervals
Example 2: Ferrari Acceleration Time
Example 3: BMW Fuel Efficiency
Lecture 21 In-class Exercises - Q1
Poll Everywhere - My User Name: penelopepoolereisenbies685
If the estimated log odds(Y’) of you making a late payment on your credit card in -0.257, what is the PERCENT CHANCE you will submit a late payment. Round percentage to one decimal place.
Recall:
- If estimated log odds = Y’ then probability = exp(Y’)/(1+exp(y’))
- Percent = Probability X 100%
Logs, Natural Logs, Exponential (exp) Functions
Why do these matter?
Up until now:
- We have assumed that the relationship between each X (predictor) variable and Y was a straight line when Y is quantitative.
OR
We used a linear transformation to transform a curvilinear relationship into straight line relationship (MAS 261)
Transformations like LN(Y), are common and effectively used in Finance, Accounting, etc.
An alternative is to model the data as non-linear or curvilinear.
Why use non-linear models?
Pros:
No transformation and back-transforming of estimates
Model fits the data as shown
Common Simple Linear Regression models can be done in Excel (or R)
R functions can expedite process
Cons
Requires trial and error (like transformation) to determine model
Interpretation must account for non-linear relationship
For Multiple Linear Regression would have to be done in R
Model for Old Faithful
Suppose you are examining thermal energy for a new start-up company.
As part of your research, you take a trip to the most famous geyser in the US – Old Faithful
The park ranger explains that it has a highly predictable geothermal output
- AND the duration of each eruption in minutes is related to how long it will be until the next eruption.
You decide to fit a model to this relationship based on one month of data.
Scatterplot of Old Faithful Data
Model for Old Faithful in Yellowstone NP
Trendlines in Excel
Adding a trendline in Excel is very quick.
The provided worksheets will allow you to compare linear and non-linear options.
Linear Model
Exponential Model
Logarithmic Model
Polynomial Model
Power Model
Old Faithful Model Summary
Lecture 21 In-class Exercises - Q2
Poll Everywhere - My User Name: penelopepoolereisenbies685
Based on our data, for an Old Faithful eruption of 2 minutes (X = 2), the average duration until the next eruption is 57 minutes, Y = 57.
Use this average to find the residual for the linear model:
Linear Model: \(Y = 34.347 + 10.537\times X\)
In Excel: = 57 – (34.347 + 10.537*X)
In R:
X <- 2
57 - (34.347 + 10.537*X)
Answer will be decimal minutes, not minutes and seconds.
Lecture 21 In-class Exercises - Q3
Poll Everywhere - My User Name: penelopepoolereisenbies685
Based on our data, for an Old Faithful eruption of 2 minutes (X = 2), the average duration until the next eruption is 57 minutes, Y = 57.
Use this average to find the residual for the power model:
Power Model: \(Y = 39.144 \times X^{0.487}\)
In Excel: = 57 – (39.144*X^(0.487))
In R:
X <- 2
57 - (39.144*X^(0.487))
Answer will be decimal minutes, not minutes and seconds.
Lecture 21 In-class Exercises - Q4
Poll Everywhere - My User Name: penelopepoolereisenbies685
From a tourism point of view for Old Faithful, a prediction error of less than 5 minutes does not matter.
Based on the Adjusted \(R^2\) for the Linear and Power models (Slide 16) and the residuals found in the two previous questions, which model would you choose?
Power model because the Adjusted \(R^2\) is slightly smaller.
Linear model because the difference in the Adjusted \(R^2\) is negligible and linear model is simpler.
Ferrari Acceleration Time
Some Possible Ferrari Models
More Possible Ferrari Models
Ferrari Model Summary
Thinking Question
Use R or Excel to calculate the estimated time in seconds it takes for the Ferrari to go from 0 to 100 mph (X = 100) for both the Exponential model and the Polynomial Model (shown below).
Exponential Model:\(\hat{Y} = 0.9936 \times e^{0.0154X}\)Polynomial Model:\(1.8123 - 0.0165X + 0.0005X^2\)
Select which statement(s) is/are true:
A. The polynomial model estimates a longer time for 0 to 100 mph acceleration than the exponential model.
B. The exponential model estimates a longer time for 0 to 100 mph acceleration than the polynomial model.
C. The two model estimates are within half a second of each other.
D. The two model estimates are within 1 second of each other
Conceptual Question
Given that the difference in Adjusted \(R^2\) between the Polynomial and Exponential models for the Ferrari data is negligible, you opt for the model that is easier to explain to someone without any quantitative analytical training.
Which model do you choose and why?
Note this is subjective and the answer depends on discipline.
BMW Fuel Economy
As part of new sales campaign for BMW, you want to model the fuel economy (MPG) of the BMW 430i based on speed.
Although BMW sells electric cars, you also have customers that want gas vehicles.
You have a small data set examining average fuel economy at 8 different speeds.
You notice the data definitely aren’t linear but want to fit the model as accurately as possible.
Which model provides the best fit?
BMW Fuel Ecomomy Model Options
Lecture 21 In-class Exercises - Q5
Poll Everywhere - My User Name: penelopepoolereisenbies685
It is clear from the provided plot, that a linear mode would be inappropriate for these data.
Other than the linear model, which model choice is ALWAYS inappropriate for concave down relationships like the BMW data?
Hint: If you are unsure, examine the trendlines in Excel or the R html file.
Exponential
Logarithmic
Polynomial
Power
Lecture 21 In-class Exercises - Q6
Poll Everywhere - My User Name: penelopepoolereisenbies685
Based on the plots and Adjusted \(R^2\) values (see below), which model fits this relationship the best for the BMW fuel economy data?
Key Points from Today
Non-linear model are a useful and flexible alternative to linear transformations.
IN R models are specified with the transformations.
Excel is great for comparing multiple trendline options quickly
For better information on model fit and residuals, software such as R is required.
Important to understand for BUA 345:
- How each model option is structured
- How to calculate a regression estimate using each model option
- Why \(R^2\) is inappropriate because polynomial models include more than one X variable (\(X\) and \(X^2\))
Lecture 23 will look at unconstrained optimization using data like the BMW data set.
Including today, there are seven lectures and engagement questions remaining.
To submit an Engagement Question or Comment about material from Lecture 21: Submit it by midnight today (day of lecture).