BUA 345 - Lecture 22

Introduction to Non-Linear Models

Author

Penelope Pooler Eisenbies

Published

April 3, 2025

Housekeeping

Today’s plan

Review of Log and Natural Log
Non-Linear Models
Example 1: Old Faithful Eruption Intervals
Example 2: Ferrari Acceleration Time
Example 3: BMW Fuel Efficiency

In-class Polling (Session ID: bua345s25)

Lecture 22 In-class Exercises - Q1

Session ID: bua345s25

Review Question from Lecture 19:

If the estimated log odds(Y’) of you making a late payment on your credit card in -0.257, what is the PERCENT CHANCE you will submit a late payment. Round percentage to one decimal place.

Recall:

If estimated log odds = Y’ then probability = exp(Y’)/(1+exp(y’))
Percent = Probability X 100%

Logs, Natural Logs, Exponential (exp) Functions

Why do these matter?
Up until now:
- We have assumed that the relationship between each X (predictor) variable and Y was a straight line when Y is quantitative.
OR
- We used a linear transformation to transform a curvilinear relationship into straight line relationship (MAS 261)
- Transformations like LN(Y), are common and effectively used in Finance, Accounting, etc.
- An alternative is to model the data as non-linear or curvilinear.

Why use non-linear models?

Pros:

No transformation and back-transforming of estimates
Model fits the data as shown
Common Simple Linear Regression models can be done in Excel (or R)
R functions can expedite process (next week)

Cons

Requires trial and error (like transformation) to determine model
Interpretation must account for non-linear relationship
For Multiple Linear Regression would have to be done in R

Model for Old Faithful

Suppose you are examining thermal energy for a new start-up company.
As part of your research, you take a trip to the most famous geyser in the US – Old Faithful
The park ranger explains that it has a highly predictable geothermal output
- AND the duration of each eruption in minutes is related to how long it will be until the next eruption.
You decide to fit a model to this relationship based on one month of data.

Scatterplot of Old Faithful Data

How do we model this relationship?
Is it linear?
If not, what is the best model?
- Relationship appears slightly concave down.
How do we interpret the results?

Model for Old Faithful in Yellowstone NP

Trendlines in Excel

Adding a trendline in Excel is very quick.
The provided worksheets will allow you to compare linear and non-linear options.

Select points and right-click then click ‘Add Trendline’.

Select one of these five trendline options.

Scroll down to bottom of trendline menu and select these two options.

Optional: Rewrite equation so intercept is first.

Linear Model

Exponential Model

Logarithmic Model

Polynomial Model

Power Model

Old Faithful Model Summary

Lecture 22 In-class Exercises - Q2

Based on our data, for an Old Faithful eruption of 2 minutes (X = 2), the average duration until the next eruption is 57 minutes, Y = 57.

Use this average to find the residual for the linear model:

Linear Model: $Y = 34.347 + 10.537\times X$

In Excel: = 57 – (34.347 + 10.537*X)

In R:

X <- 2

57 - (34.347 + 10.537*X)

Answer will be decimal minutes, not minutes and seconds.

Lecture 22 In-class Exercises - Q3

Based on our data, for an Old Faithful eruption of 2 minutes (X = 2), the average duration until the next eruption is 57 minutes, Y = 57.

Use this average to find the residual for the power model:

Power Model: $Y = 39.144 \times X^{0.487}$

In Excel: = 57 – (39.144*X^(0.487))

In R:

X <- 2

57 - (39.144*X^(0.487))

Answer will be decimal minutes, not minutes and seconds.

Lecture 22 In-class Exercises - Q4

From a tourism point of view for Old Faithful, a prediction error of less than 5 minutes does not matter.

Based on the Adjusted $R^2$ for the Linear and Power models (Slide 15) and the residuals found in the two previous questions, which model would you choose?

A. Power model because it more accurate.

B. Linear model because difference in accuracy is negligible and linear model is simpler

Ferrari Acceleration Time

For marketing purposes, you want to predict the acceleration time of the new Ferrari
You collect data on speed (mph) and acceleration Time in seconds for a number of vehicles
You notice the data aren’t linear, but want to fit the model as accurately as possible.
Which model provides the best fit?

Some Possible Ferrari Models

More Possible Ferrari Models

Ferrari Model Summary

Thinking Question

Use R or Excel to calculate the estimated time in seconds it takes for the Ferrari to go from 0 to 100 mph (X = 100) for both the Exponential model and the Polynomial Model (shown below).

Exponential Model: $\hat{Y} = 0.9936 \times e^{0.0154X}$
Polynomial Model: $1.8123 - 0.0165X + 0.0005X^2$

Select which statement(s) is/are true:

A. The polynomial model estimates a longer time for 0 to 100 mph acceleration than the exponential model.

B. The exponential model estimates a longer time for 0 to 100 mph acceleration than the polynomial model.

C. The two model estimates are within half a second of each other.

D. The two model estimates are within 1 second of each other

Conceptual Question

Given that the difference in Adjusted $R^2$ between the Polynomial and Exponential models for the Ferrari data is negligible, you opt for the model that is easier to explain to someone without any quantitative analytical training.

Which model do you choose and why?

Note this is subjective and the answer depends on discipline.

BMW Fuel Economy

As part of new sales campaign for BMW, you want to model the fuel economy (MPG) of the BMW 430i based on speed.
Although BMW sells electric cars, you also have customers that want gas vehicles.
You have a small data set examining average fuel economy at 8 different speeds.
You notice the data definitely aren’t linear but want to fit the model as accurately as possible.
Which model provides the best fit?

BMW Fuel Ecomomy Model Options

Lecture 22 In-class Exercises - Q5

It is clear from the provided plot, that a linear mode would be inappropriate for these data.

Other than the linear model, which model choice is ALWAYS inappropriate for concave down relationships like the BMW data?

Hint: If you are unsure, examine the trendlines in Excel or the R html file.

A. Exponential

B. Logarithmic

C. Polynomial

D. Power

Lecture 22 In-class Exercises - Q6

Based on the plots and Adjusted $R^2$ values (see below), which model fits this relationship the best for the BMW fuel economy data?

Key Points from Today

Non-linear model are a useful and flexible alternative to linear transformations.
IN R models are specified with the transformations.
Excel is great for comparing multiple trendline options quickly
For better information on model fit and residuals, software such as R is required.
Important to understand for BUA 345:
- How each model option is structured
- How to calculate a regression estimate using each model option
- Why $R^2$ is inappropriate because polynomial models include more than one X variable ($X$ and $X^2$)
Lecture 23 will look at unconstrained optimization using data like the BMW data set.
Including today, there are six lectures and engagement questions remaining.

To submit an Engagement Question or Comment about material from Lecture 22: Submit it by midnight today (day of lecture).

--- title: "BUA 345 - Lecture 22" subtitle: "Introduction to Non-Linear Models" author: "Penelope Pooler Eisenbies" date: last-modified lightbox: true toc: true toc-depth: 3 toc-location: left toc-title: "Table of Contents" toc-expand: 1 format: html: code-line-numbers: true code-fold: true code-tools: true execute: echo: fenced --- ## Housekeeping ```{r setup, echo=FALSE, warning=F, message=F, include=F} #| include: false # this line specifies options for default options for all R Chunks knitr::opts_chunk$set(echo=F) # suppress scientific notation options(scipen=100) # install helper package that loads and installs other packages, if needed if (!require("pacman")) install.packages("pacman", repos = "http://lib.stat.cmu.edu/R/CRAN/") # install and load required packages pacman::p_load(pacman,tidyverse, magrittr, olsrr,gridExtra, knitr, viridis, png, kableExtra) # verify packages # p_loaded() ``` ### Today's plan - Review of Log and Natural Log - Non-Linear Models - Example 1: Old Faithful Eruption Intervals - Example 2: Ferrari Acceleration Time - Example 3: BMW Fuel Efficiency ::: fragment **In-class Polling (Session ID: bua345s25)** ::: ## ### Lecture 22 In-class Exercises - Q1 ***Session ID: bua345s25*** **Review Question from Lecture 19:** If the estimated log odds(Y’) of you making a late payment on your credit card in -0.257, what is the PERCENT CHANCE you will submit a late payment. Round percentage to one decimal place. Recall: - If estimated log odds = Y’ then probability = exp(Y’)/(1+exp(y’)) - Percent = Probability X 100% ## ### Logs, Natural Logs, Exponential (exp) Functions - Why do these matter? - Up until now: - We have assumed that the relationship between each X (predictor) variable and Y was a straight line when Y is quantitative. - OR - We used a linear transformation to transform a curvilinear relationship into straight line relationship (MAS 261) - Transformations like LN(Y), are common and effectively used in Finance, Accounting, etc. - An alternative is to model the data as non-linear or curvilinear. ## ### Why use non-linear models? ::: fragment Pros: ::: - No transformation and back-transforming of estimates - Model fits the data as shown - Common Simple Linear Regression models can be done in Excel (or R) - R functions can expedite process (next week) ::: fragment Cons ::: - Requires trial and error (like transformation) to determine model - Interpretation must account for non-linear relationship - For Multiple Linear Regression would have to be done in R ## ### Model for Old Faithful :::::: columns ::: {.column width="58%"} - Suppose you are examining thermal energy for a new start-up company. - As part of your research, you take a trip to the most famous geyser in the US – Old Faithful - The park ranger explains that it has a highly predictable geothermal output - AND the duration of each eruption in minutes is related to how long it will be until the next eruption. - You decide to fit a model to this relationship based on one month of data. ::: ::: {.column width="2%"} ::: ::: {.column width="38%"} ![](img/l22_of_pic.png){fig-align="center"} ::: :::::: ## ### Scatterplot of Old Faithful Data :::::: columns ::: {.column width="48%"} ![](img/l22_of_pl1.png){fig-align="center"} ::: ::: {.column width="4%"} ::: ::: {.column width="48%"} - How do we model this relationship? - Is it linear? - If not, what is the best model? - Relationship appears slightly concave down. - How do we interpret the results? ::: :::::: ## ### Model for Old Faithful in Yellowstone NP ![](img/l22_model_options.png){fig-align="\"center"} ## ### Trendlines in Excel - Adding a trendline in Excel is very quick. - The provided worksheets will allow you to compare linear and non-linear options. ::::::::: columns :::: {.column width="35%"} ::: fragment ![](img/l22_add_trendline.png){fing-aling="center"} 1. Select points and right-click then click 'Add Trendline'. ::: :::: :::: {.column width="35%"} ::: fragment ![](img/l22_trendline_options.png){fing-aling="center"} 2. Select one of these five trendline options. ::: :::: :::: {.column width="30%"} ::: fragment ![](img/l22_trendline_requests.png){fig-align="center"} 3. Scroll down to bottom of trendline menu and select these two options. - Optional: Rewrite equation so intercept is first. ::: :::: ::::::::: ## ### Linear Model ![](img/l22_of_linmod.png){fig-align="center"} ## ### Exponential Model ![](img/l22_of_expmod.png){fig-align="center"} ## ### Logarithmic Model ![](img/l22_of_logmod.png){fig-align="center"} ## ### Polynomial Model ![](img/l22_of_polymod.png){fig-align="center"} ## ### Power Model ![](img/l22_of_powmod.png){fig-align="center"} ## ### Old Faithful Model Summary ![](img/l22_of_modsmry.png){fig-align="center"} ## ### Lecture 22 In-class Exercises - Q2 Based on our data, for an Old Faithful eruption of 2 minutes (X = 2), the average duration until the next eruption is 57 minutes, `Y = 57`. Use this average to find the residual for the linear model: Linear Model: $Y = 34.347 + 10.537\times X$ In Excel: `= 57 – (34.347 + 10.537*X)` In R: `X <- 2` `57 - (34.347 + 10.537*X)` Answer will be decimal minutes, not minutes and seconds. ## ### Lecture 22 In-class Exercises - Q3 Based on our data, for an Old Faithful eruption of 2 minutes (X = 2), the average duration until the next eruption is 57 minutes, `Y = 57`. Use this average to find the residual for the power model: Power Model: $Y = 39.144 \times X^{0.487}$ In Excel: `= 57 – (39.144*X^(0.487))` In R: `X <- 2` `57 - (39.144*X^(0.487))` Answer will be decimal minutes, not minutes and seconds. ## ### Lecture 22 In-class Exercises - Q4 From a tourism point of view for Old Faithful, a prediction error of less than 5 minutes does not matter. Based on the Adjusted $R^2$ for the Linear and Power models (Slide 15) and the residuals found in the two previous questions, which model would you choose? ::: nonincremental A. Power model because it more accurate. B. Linear model because difference in accuracy is negligible and linear model is simpler ::: ## ### Ferrari Acceleration Time ::::::: columns ::: {.column width="48%"} - For marketing purposes, you want to predict the acceleration time of the new Ferrari - You collect data on speed (mph) and acceleration Time in seconds for a number of vehicles - You notice the data aren’t linear, but want to fit the model as accurately as possible. - Which model provides the best fit? ::: ::: {.column width="2%"} ::: :::: {.column width="50%"} ![](img/l22_fer_pic.png){fig-align="center" height="1.5in"} ::: fragment ![](img/l22_fer_pl1.png){fig-align="center" height="4in"} ::: :::: ::::::: ## #### Some Possible Ferrari Models ![](img/l22_fer_lin_exp_log_mods.png){fig-align="center"} ## #### More Possible Ferrari Models ![](img/l22_fer_poly_pow_mods.png){fig-align="center"} ## ### Ferrari Model Summary ![](img/l22_fer_modsmry.png){fig-align="center"} ## ### Thinking Question Use R or Excel to calculate the estimated time in seconds it takes for the Ferrari to go from 0 to 100 mph (X = 100) for both the Exponential model and the Polynomial Model (shown below). ::: nonincremental - **`Exponential Model:`** $\hat{Y} = 0.9936 \times e^{0.0154X}$ - **`Polynomial Model:`** $1.8123 - 0.0165X + 0.0005X^2$ ::: **Select which statement(s) is/are true:** ::: nonincremental A. The polynomial model estimates a longer time for 0 to 100 mph acceleration than the exponential model. B. The exponential model estimates a longer time for 0 to 100 mph acceleration than the polynomial model. C. The two model estimates are within half a second of each other. D. The two model estimates are within 1 second of each other ::: ## ### Conceptual Question :::::: columns ::: {.column width="58%"} Given that the difference in Adjusted $R^2$ between the Polynomial and Exponential models for the Ferrari data is negligible, you opt for the model that is easier to explain to someone without any quantitative analytical training. Which model do you choose and why? Note this is subjective and the answer depends on discipline. ::: ::: {.column width="2%"} ::: ::: {.column width="40%"} ![](img/l22_fer_pic.png){fig-align="center"} ::: :::::: ## ### BMW Fuel Economy ::::::: columns ::: {.column width="48%"} - As part of new sales campaign for BMW, you want to model the fuel economy (MPG) of the BMW 430i based on speed. - Although BMW sells electric cars, you also have customers that want gas vehicles. - You have a small data set examining average fuel economy at 8 different speeds. - You notice the data definitely aren’t linear but want to fit the model as accurately as possible. - Which model provides the best fit? ::: ::: {.column width="2%"} ::: :::: {.column width="50%"} ![](img/l22_bmw_pic.png){fig-align="center" height="1.5in"} ::: fragment ![](img/l22_bmw_pl1.png){fig-align="center" height="4in"} ::: :::: ::::::: ## #### BMW Fuel Ecomomy Model Options ![](img/l22_bmw_trendline_options.png){fig-align="center"} ## ### Lecture 22 In-class Exercises - Q5 It is clear from the provided plot, that a linear mode would be inappropriate for these data. **Other than the linear model, which model choice is ALWAYS inappropriate for concave down relationships like the BMW data?** Hint: If you are unsure, examine the trendlines in Excel or the R html file. ::: nonincremental A. Exponential B. Logarithmic C. Polynomial D. Power ::: ## ### Lecture 22 In-class Exercises - Q6 Based on the plots and Adjusted $R^2$ values (see below), which model fits this relationship the best for the BMW fuel economy data? ![](img/l22_bmw_modsmry.png){fig-align="center"} ## ### Key Points from Today - Non-linear model are a useful and flexible alternative to linear transformations. - IN R models are specified with the transformations. - Excel is great for comparing multiple trendline options quickly - For better information on model fit and residuals, software such as R is required. - Important to understand for BUA 345: - How each model option is structured - How to calculate a regression estimate using each model option - Why $R^2$ is inappropriate because polynomial models include more than one X variable ($X$ and $X^2$) - Lecture 23 will look at unconstrained optimization using data like the BMW data set. - **Including today, there are six lectures and engagement questions remaining.** ::: fragment **To submit an Engagement Question or Comment about material from Lecture 22:** Submit it by midnight today (day of lecture). :::