The problem is as stated:
We are a bill collector collecting small amounts whos business model is based on collecting delinquent accounts quickly.
The marketing department has come up with a slogan “under 60 days or your money back.”
We will investigate under which conditions this promotion will be profitable.
DATA:
random sample of accounts over a 6 months.
variables:
initial size of account
total days to collect in full
first 48 entries are residential
second 48 are commercial
Adding the factor column
file<-here('data','overdue.txt')
dat<-read.table(file, header=TRUE)
#create residential vs commercial factor
dat_res<-dat[1:48,]%>%mutate(TYPE='RESIDENTIAL')
dat_com<-dat[-(1:48),]%>%mutate(TYPE='COMMERCIAL')
dat<-rbind(dat_res, dat_com)
Examining the data
library(GGally)
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
ggpairs(dat, mapping=(aes(color=TYPE)))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
There are clearly two unrelated regression candidates in this model. It seems that commercial businesses will wait a long time to pay small debts, but will pay large debts more quickly, and residential businesses will do the opposite.
Regression without using a dummy variable (no correlation)
summary(lm.full<-lm(LATE~BILL, data=dat))
##
## Call:
## lm(formula = LATE ~ BILL, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -45.846 -17.212 -0.793 19.007 47.774
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 51.98390 5.96405 8.716 9.84e-14 ***
## BILL -0.01264 0.03128 -0.404 0.687
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 23.72 on 94 degrees of freedom
## Multiple R-squared: 0.001734, Adjusted R-squared: -0.008885
## F-statistic: 0.1633 on 1 and 94 DF, p-value: 0.687
Regression using a dummy variable to catch the group differences
summary(lm.split<-lm(LATE~BILL+TYPE+TYPE:BILL:., data=dat))
##
## Call:
## lm(formula = LATE ~ BILL + TYPE + TYPE:BILL:., data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.1211 -2.2163 0.0974 1.9556 8.6995
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 101.758184 1.198504 84.90 <2e-16 ***
## BILL -0.190961 0.006285 -30.38 <2e-16 ***
## TYPERESIDENTIAL -99.548561 1.694940 -58.73 <2e-16 ***
## BILL:TYPERESIDENTIAL 0.356644 0.008888 40.12 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.371 on 92 degrees of freedom
## Multiple R-squared: 0.9803, Adjusted R-squared: 0.9796
## F-statistic: 1524 on 3 and 92 DF, p-value: < 2.2e-16
ANOVA test, to decide if the models are significantly different
anova(lm.full,lm.split)
Since the F-statistic is high, and the p-value low. We can reject the null hypothesis that these two models are the same. We will select the split model which captures the group differences
so when TYPE=0 =commercial…
intercept 101.7 slope =-.19
and when TYPE=1= residential
intercept=101+-99, slope = -.19+.35
In order for the promotion to be profitable, we first need to figure out how much profit we are losing by paying back customers who cross our 60 day threshold.
I want to analyze mathematically the impact of the ‘less than 60 days or your money back’ promotion, first piecewise by type and then optimizing for both commercial and residential
Strategy:
calculate profit by calling the money we collect that falls inside our 60 day threshold as INCOME and money that falls outside our threshold as COST \[PROFIT=INCOME-COST\]
Since we have two different models our income and cost will be split by debt type
\[PROFIT=profit_{res}+profit_{com}-cost_{res}-cost_{com}\]
we will calculate the proft and cost by properly segmenting the BILL column of our dataframe and summing the results
\[PROFIT=sum(BILL_{res}<x_{res})+sum(BILL_{com}>x_{com})-(sum(BILL_{res}>x_{res})+sum(BILL_{com}<x_{com}))\]
\(x_{res}\) will be the amount that a residential bill is likely to be paid after the 60 day window \(x_{com}\) will be the amount that a commercial bill is likely to be paid before the 60 day window
these will be calculated from the regression equations that we fit to our data
Since there likely will be a decrease in the total profit (since without the promotion, we are categorizing all of the business as profit), we will calculate what percent of the “null profit” we expect to retain after our promotion
\[\%_{profit}=\frac{promo}{null}\]
Finally we will calculate how much gross profit is required to break even with our ‘null profit’ and determine the percent increase in business required to achieve this
\[C=null*\frac{null}{promo}\]
Percent increase required
\[\frac{C}{null}\]
what we can tell from the residential data, is that people will wait longer to pay if the bill is higher. However from our data set, it does not look like it ever crosses 60 days.
(bill_60<-solve(.166,58))
## [1] 349.3976
since we dont accept debts above this limit, there will be no cost.
The commercial sector is a bit more complicated, because businesses WILL wait to pay if the debt is small enough.
(bill_60_commercial<-solve(-.19, (60-101.7)))
## [1] 219.4737
so income will be debts collected OVER this limit, and cost will be debts uncollected UNDER this limit
#profit from commmercial debts
cost_com<-dat%>%filter(TYPE=='COMMERCIAL' & BILL<bill_60_commercial)%>%summarize(sum(BILL))
income_com<-dat%>%filter(TYPE=='COMMERCIAL' & BILL>bill_60_commercial)%>%summarize(sum(BILL))
(profit_com=income_com-cost_com)
#profits from residential debts
cost_res<-dat%>%filter(TYPE=='RESIDENTIAL' & BILL>bill_60)%>%summarize(sum(BILL))
income_res<-dat%>%filter(TYPE=='RESIDENTIAL' & BILL<bill_60)%>%summarize(sum(BILL))
(profit_res=income_res-cost_res)
as we can see this promo will decrease our profits to 42% of what we would have gotten without the promo in place
com_profit_pre_promo<-dat%>%filter(TYPE=='COMMERCIAL' )%>%summarize(sum(BILL))
res_profit_pre_promo<-profit_res
profit_promo<-profit_res+profit_com
profit_null<-com_profit_pre_promo+res_profit_pre_promo
(percentage_income<-profit_promo/profit_null)
In order for it to make sense to run this promotion, we need to calculate the increase in business required to break even.
(C=profit_null*1/percentage_income)
This is the gross amount required to run the promotion, which is equal to
(C/profit_null)
a 234% increase in business. This does not account for a potential effect that our promotion does not change the way customers use our business, causing our regression equations to change.