Bill Collector Problem

The problem is as stated:

We are a bill collector collecting small amounts whos business model is based on collecting delinquent accounts quickly.

The marketing department has come up with a slogan “under 60 days or your money back.”

We will investigate under which conditions this promotion will be profitable.

DATA:

random sample of accounts over a 6 months.

variables:

initial size of account

total days to collect in full

first 48 entries are residential

second 48 are commercial

Tidying Data

Adding the factor column

file<-here('data','overdue.txt')
dat<-read.table(file, header=TRUE)
#create residential vs commercial factor

dat_res<-dat[1:48,]%>%mutate(TYPE='RESIDENTIAL')
dat_com<-dat[-(1:48),]%>%mutate(TYPE='COMMERCIAL')
dat<-rbind(dat_res, dat_com)

Examining the data

library(GGally)
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
ggpairs(dat, mapping=(aes(color=TYPE)))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

There are clearly two unrelated regression candidates in this model. It seems that commercial businesses will wait a long time to pay small debts, but will pay large debts more quickly, and residential businesses will do the opposite.

Regression models

Regression without using a dummy variable (no correlation)

summary(lm.full<-lm(LATE~BILL, data=dat))
## 
## Call:
## lm(formula = LATE ~ BILL, data = dat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -45.846 -17.212  -0.793  19.007  47.774 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 51.98390    5.96405   8.716 9.84e-14 ***
## BILL        -0.01264    0.03128  -0.404    0.687    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 23.72 on 94 degrees of freedom
## Multiple R-squared:  0.001734,   Adjusted R-squared:  -0.008885 
## F-statistic: 0.1633 on 1 and 94 DF,  p-value: 0.687

Regression using a dummy variable to catch the group differences

summary(lm.split<-lm(LATE~BILL+TYPE+TYPE:BILL:., data=dat))
## 
## Call:
## lm(formula = LATE ~ BILL + TYPE + TYPE:BILL:., data = dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.1211  -2.2163   0.0974   1.9556   8.6995 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          101.758184   1.198504   84.90   <2e-16 ***
## BILL                  -0.190961   0.006285  -30.38   <2e-16 ***
## TYPERESIDENTIAL      -99.548561   1.694940  -58.73   <2e-16 ***
## BILL:TYPERESIDENTIAL   0.356644   0.008888   40.12   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.371 on 92 degrees of freedom
## Multiple R-squared:  0.9803, Adjusted R-squared:  0.9796 
## F-statistic:  1524 on 3 and 92 DF,  p-value: < 2.2e-16

ANOVA test, to decide if the models are significantly different

anova(lm.full,lm.split)

Since the F-statistic is high, and the p-value low. We can reject the null hypothesis that these two models are the same. We will select the split model which captures the group differences

The equations for the linear regression

so when TYPE=0 =commercial…

intercept 101.7 slope =-.19

and when TYPE=1= residential

intercept=101+-99, slope = -.19+.35

Analysis of the Promotion

In order for the promotion to be profitable, we first need to figure out how much profit we are losing by paying back customers who cross our 60 day threshold.

I want to analyze mathematically the impact of the ‘less than 60 days or your money back’ promotion, first piecewise by type and then optimizing for both commercial and residential

Strategy:

calculate profit by calling the money we collect that falls inside our 60 day threshold as INCOME and money that falls outside our threshold as COST \[PROFIT=INCOME-COST\]

Since we have two different models our income and cost will be split by debt type

\[PROFIT=profit_{res}+profit_{com}-cost_{res}-cost_{com}\]

we will calculate the proft and cost by properly segmenting the BILL column of our dataframe and summing the results

\[PROFIT=sum(BILL_{res}<x_{res})+sum(BILL_{com}>x_{com})-(sum(BILL_{res}>x_{res})+sum(BILL_{com}<x_{com}))\]

\(x_{res}\) will be the amount that a residential bill is likely to be paid after the 60 day window \(x_{com}\) will be the amount that a commercial bill is likely to be paid before the 60 day window

these will be calculated from the regression equations that we fit to our data

Since there likely will be a decrease in the total profit (since without the promotion, we are categorizing all of the business as profit), we will calculate what percent of the “null profit” we expect to retain after our promotion

\[\%_{profit}=\frac{promo}{null}\]

Finally we will calculate how much gross profit is required to break even with our ‘null profit’ and determine the percent increase in business required to achieve this

\[C=null*\frac{null}{promo}\]

Percent increase required

\[\frac{C}{null}\]

Residential

what we can tell from the residential data, is that people will wait longer to pay if the bill is higher. However from our data set, it does not look like it ever crosses 60 days.

(bill_60<-solve(.166,58))
## [1] 349.3976

since we dont accept debts above this limit, there will be no cost.

Commercial:

The commercial sector is a bit more complicated, because businesses WILL wait to pay if the debt is small enough.

(bill_60_commercial<-solve(-.19, (60-101.7)))
## [1] 219.4737

so income will be debts collected OVER this limit, and cost will be debts uncollected UNDER this limit

#profit from commmercial debts

cost_com<-dat%>%filter(TYPE=='COMMERCIAL' & BILL<bill_60_commercial)%>%summarize(sum(BILL))
income_com<-dat%>%filter(TYPE=='COMMERCIAL' & BILL>bill_60_commercial)%>%summarize(sum(BILL))
(profit_com=income_com-cost_com)
#profits from residential debts

cost_res<-dat%>%filter(TYPE=='RESIDENTIAL' & BILL>bill_60)%>%summarize(sum(BILL))
income_res<-dat%>%filter(TYPE=='RESIDENTIAL' & BILL<bill_60)%>%summarize(sum(BILL))
(profit_res=income_res-cost_res)

as we can see this promo will decrease our profits to 42% of what we would have gotten without the promo in place

com_profit_pre_promo<-dat%>%filter(TYPE=='COMMERCIAL' )%>%summarize(sum(BILL))
res_profit_pre_promo<-profit_res
profit_promo<-profit_res+profit_com
profit_null<-com_profit_pre_promo+res_profit_pre_promo
(percentage_income<-profit_promo/profit_null)

In order for it to make sense to run this promotion, we need to calculate the increase in business required to break even.

(C=profit_null*1/percentage_income)

This is the gross amount required to run the promotion, which is equal to

(C/profit_null)

a 234% increase in business. This does not account for a potential effect that our promotion does not change the way customers use our business, causing our regression equations to change.