Bank Revenue Regression Analysis

Consider the model:

m6 <- lm(lrev_total ~ lbal_total + offer + age + age2 + card + loan + insur + check, data = tb.bankrev)

  1. Write down the population model.

  2. Write down the estimated model.

  3. Interpret the magnitude and statistical significance of โ€˜lbal_totalโ€™.

  4. Interpret the magnitude and statistical significance of โ€˜loanyesโ€™.

  5. Expand your model by adding an interaction term between โ€˜loanโ€™ and โ€˜lbal_totalโ€™. Interpret the interaction term.

  6. Assess the overall quality of the model using the available metrics.

  7. Consider a 48 year old customer with a total balance of $25,736 who received the promotional offer, has a credit card, does not have a loan or insurance, and has a high checking account activity. Predict the total revenue this customer is likely to generate based on the selected model.


Get started by loading libraries, reading and preprocessing data.

library(tidyverse)
library(stargazer)
library(ggthemes)
library(GGally)
library(ggfortify)

load("data/bankrev.RData")

# Create log(rev_total) and log(bal_total)
tb.bankrev <- tb.bankrev %>% 
  mutate(lrev_total = log(rev_total), 
         lbal_total = log(bal_total))

# Format factor variables
tb.bankrev <- tb.bankrev %>% mutate(
  offer   = factor(offer  , levels = c(0,1), labels = c("no", "yes"))
  , chq     = factor(chq    , levels = c(0,1), labels = c("no", "yes"))
  , card    = factor(card   , levels = c(0,1), labels = c("no", "yes"))
  , sav1    = factor(sav1   , levels = c(0,1), labels = c("no", "yes"))
  , loan    = factor(loan   , levels = c(0,1), labels = c("no", "yes"))
  , mort    = factor(mort   , levels = c(0,1), labels = c("no", "yes"))
  , insur   = factor(insur  , levels = c(0,1), labels = c("no", "yes"))
  , pens    = factor(pens   , levels = c(0,1), labels = c("no", "yes"))
  , check   = factor(check  , levels = c(0,1), labels = c("no", "yes"))
  , cd      = factor(cd     , levels = c(0,1), labels = c("no", "yes"))
  , mm      = factor(mm     , levels = c(0,1), labels = c("no", "yes"))
  , savings = factor(savings, levels = c(0,1), labels = c("no", "yes"))
)

# Create age2
tb.bankrev <- tb.bankrev %>% mutate(age2 = age^2)


Consider the model:

m6 <- lm(lrev_total ~ lbal_total + offer + age + age2 + card + loan + insur + check, data = tb.bankrev)


  1. Write down the population model.

\(log(rev\_total) = \beta_0 + \beta_1log(bal\_total) + \beta_2offer + \beta_3age + \beta_4 age^2 + \beta_5card + \beta_6loan + \beta_7insur + \beta_8 check + \epsilon_i\)


  1. Write down the estimated model.
stargazer(m6, type = "text", no.space = TRUE)
## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                             lrev_total         
## -----------------------------------------------
## lbal_total                   0.440***          
##                               (0.005)          
## offeryes                     0.122***          
##                               (0.039)          
## age                           0.004**          
##                               (0.002)          
## age2                        -0.00005**         
##                              (0.00002)         
## cardyes                      1.590***          
##                               (0.048)          
## loanyes                      -0.120***         
##                               (0.035)          
## insuryes                     -0.087***         
##                               (0.027)          
## checkyes                     -1.391***         
##                               (0.046)          
## Constant                     -2.649***         
##                               (0.055)          
## -----------------------------------------------
## Observations                   7,398           
## R2                             0.588           
## Adjusted R2                    0.588           
## Residual Std. Error      0.810 (df = 7389)     
## F Statistic         1,319.844*** (df = 8; 7389)
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01


\(\hat{log(rev\_total)} = -2.649 + 0.440log(bal\_total) + 0.122offer + 0.004age - 0.00005age^2 + 1.590card - 0.120loan - 0.087insur - 1.391check\)


  1. Interpret the magnitude and statistical significance of lbal_total.

This is a log-log (aka constant elasticity model). Thus, on average, all else constant, a one percent increase in the account balance is associated with a 0.440% increase in revenue. This effect is statistically significant (p<0.01).


  1. Interpret the magnitude and statistical significance of loanyes.
100*(exp(-0.120)-1)
## [1] -11.30796

This is a log-level case. Thus, on average, all else constant, customers who have a loan yield -11.308% less in revenue that customers who do not.


  1. Expand your model by adding an interaction term between loan and lbal_total. Interpret the interaction term.
# Regression Analysis
m7 <- lm(lrev_total ~ lbal_total + offer + age + age2 + card + loan*lbal_total + insur + check, data = tb.bankrev)

stargazer(m7, type = "html")
Dependent variable:
lrev_total
lbal_total 0.440***
(0.005)
offeryes 0.115***
(0.039)
age 0.004**
(0.002)
age2 -0.00004**
(0.00002)
cardyes 1.604***
(0.048)
loanyes -0.892***
(0.245)
insuryes -0.104***
(0.027)
checkyes -1.422***
(0.047)
lbal_total:loanyes 0.092***
(0.029)
Constant -2.622***
(0.056)
Observations 7,398
R2 0.589
Adjusted R2 0.588
Residual Std. Error 0.810 (df = 7388)
F Statistic 1,175.767*** (df = 9; 7388)
Note: p<0.1; p<0.05; p<0.01


On average, all else constant, when total balance increases by one percent, the increase in total revenue is 0.09% higher for customers with a loan relative to those without a loan. The difference is statistically significant (p<0.01).


  1. Assess the overall quality of the model using the available metrics.
autoplot(m7)



  1. Consider a 48 year old customer with a total balance of $25,736 who received the promotional offer, has a credit card, does not have a loan or insurance, and has a checking account. Predict the total revenue this customer is likely to generate based on the selected model.

Remember that whenever the dependent variable is in log form, our predictions will be logged values. To convert these values to the original variable (without the log) we use the following formula:

\[e^{\hat{ln(y)}} \times e^{\frac{\hat{\sigma}^2}{2}}\]

# Create a tibble with the new customer data:
new_cust <- tibble(lbal_total = log(25736),
                   offer = "yes",
                   age = 48,
                   age2 = 48^2,
                   card  = "yes",
                   loan = "no",
                   insur = "no",
                   check = "yes")

# Predict log(revenue)
predict(m7, newdata = new_cust) # output: 2.2249 
##        1 
## 2.224856
# Predicted revenue using the formula
exp(2.2249)*exp((0.810^2)/2) # 12.845 dollars
## [1] 12.84494