Prepare a predictive model for a newspaper publisher to forecast the right quantity of newspaper to print daily

Introduction

A newspaper publisher is faced with the problem of deciding how many newspapers to print daily so as to maximize the daily profit.

Daily demand for that particular newspaper is a random variable. Yes, it is possible to print more newspapers if demand is high, since time span of demand for a new newspaper is very short, so, it is practically at most impossible logistically.

If the newspaper publisher prints less newspaper than the customer demand for that particular day, the newspaper will run out. So,

*1. The publisher will lose the opportunity to sell some more newspaper, so, will lose some extra profit. This is the short run effect.

*2. In long run, he will lose some customers, who, previously liked that particular newspaper, intended to buy that paper, but, forced to switch to any other newspaper, due to unavailability of this particular newspaper.

So, the simplest decision, the Inventory Manager generally follows is, to print some extra papers than the average sold out, so that, it never runs out.

Apparently, it is a good decision, because, since there is a little chance to stock out, he will always fulfill the customer demand. So, he will apparently maximize his profit.

But, practically, this is not good decision to take. In this type of inventory decision, there are some unsold newspapers every day. Every day the newspaper vendors return some unsold newspaper and the publisher is bound to accept those returned newspaper. And, the salvage value of those unsold newspaper is very small compare to the price of the news paper.

So, the publisher directly looses the production cost associated with the returned newspaper.

Loading data

newsdata=read.csv("newsdata.csv")
str(newsdata)

## 'data.frame':    118 obs. of  6 variables:
##  $ X       : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Date    : Factor w/ 59 levels "1/1/2014","1/10/2014",..: 1 12 23 26 27 28 29 30 31 2 ...
##  $ Store   : Factor w/ 2 levels "S0001","S0002": 2 1 1 1 1 1 1 1 1 1 ...
##  $ Supply  : int  1400 1200 1600 1400 1200 1600 1600 1400 1200 1600 ...
##  $ Return  : int  139 132 154 160 219 209 243 230 224 217 ...
##  $ sold.out: int  1261 1068 1446 1240 981 1391 1357 1170 976 1383 ...

Descriptive Analysis

tapply(newsdata$Supply, newsdata$Store, mean)

##    S0001    S0002 
## 1427.586 2230.883

tapply(newsdata$Return, newsdata$Store, mean)

##    S0001    S0002 
## 188.4310 326.3333

mean.sold.out=tapply(newsdata$sold.out, newsdata$Store, mean)
mean.sold.out

##    S0001    S0002 
## 1239.155 1904.550

tapply(newsdata$Supply, newsdata$Store, sd)

##    S0001    S0002 
## 169.4088 139.2291

tapply(newsdata$Return, newsdata$Store, sd)

##    S0001    S0002 
## 35.40201 49.30248

sd.sold.out=tapply(newsdata$sold.out, newsdata$Store, sd)
sd.sold.out

##    S0001    S0002 
## 172.5902 125.5628

After analyzing the data I have, average quantity of print of newspaper for Store 1 is 1428 & for Store 2 is 2231.The average quantity of returned newspapers for store 1 is 189 & for Store 2 is 327. The average sold out for store 1 is 1240 and for store 2 is 1905.

PercentReturn = (tapply(newsdata$Return, newsdata$Store, mean)/tapply(newsdata$Supply, newsdata$Store, mean))*100
PercentReturn

##    S0001    S0002 
## 13.19928 14.62799

So, for Store 1, return is 13.20% and for Store 2, return is 14.63%. That means, the publisher is losing 14% of his working capital. It is huge.

If we assume that, the publisher makes 25% profit, then he directly looses (14%+2.5%) or 16.5% of its total revenue.

Just imagine the situation! You are a small publisher with average yearly revenue of Rs. 1 crore and for strategic decisional problem, you directly loses Rs. 16.5 lakhs per year!

So, being a newspaper publisher, you must try to reduce the loss. You are not God, so, you cannot predict daily demand perfectly, but, at least, you should reduce this margin at least below 5%.How? That is the big question!

Being a Data Scientist, I have some tools to reach to the solution, if not perfectly, as perfectly as possible. How?

Well, I would explain as simple as possible. I know, all of you are non technical persons. So, I do not wish to explain all the technical details. Rather, I would tell you the analytical parts and how to approach to the solution.

Preparation of the Model

To approach the right model, I would use the concept of the Classical Single period Inventory model. It would help me to make foundation of my model.

To prepare my model, I need historical data of couple of years, like the same you have supplied to me. From these historical data,I would analyze : * 1. The overall trend of supply, Return & Sold Out using Time Series Analysis. This analysis, would make me understand the seasonal trend and/or cyclical trend and/or increasing /decreasing trend. It will also help us to understand the Pattern of Sales, Return & Sold out. * 2. After knowing the overall trend and pattern, we have to analyze more deeply. You know, to newspaper publisher, every day is different. The publisher publish separate supplementary papers for different days. He also publish diffrent types of consolidated advertisement for diferrent day. So, naturally, customers for different day may be different. Like, some newspapers publish matrimonial advertisement only on sunday. so, the parents of unmarried son/daughter may purchase those newspapers more than other day. So, we must discover the pattern and trend separately for Sunday, Moday etc. * 3. Here, We have to consider the fixed production cost. * 4. Though, salvage value of returned newspaper is very less, but, still, we need to include that cost into our model, to make it more perfect. * Another analysis we should do is for stores. How much distance in between 2 store? Logistically is it possible to stock transfer, if needed?

The Model

We have to consider the Potential loss & profit associate with too many or too less newspapers.

Assume that, per unit Price of the newspaper is Rs. Rs. 5 and per unit production cost is Rs. 4.

So, + Cost per Unit Under estimated ( UnderEstimatecost) = Rs. 5 & + Cost Per Unit Over estimated ( OverEstimateCost) = Rs. 4

By introducing Probability , the Expected Marginal Cost Equation becomes,

Probability * OverEstimateCost = (1 - Probability) * UnderEstimateCost

Here, Probability is the probability of the unit will not be sold and (1 -Probability) is the probability of it being sold, because, one or other will definitely occur (The unit is sold or is not sold).

Then, solving for P, we obtain, P = UnderEstimateCost/(UnderEstimateCost + OverEstimateCost)

UnderEstimateCost=5
OverEstimateCost=4
Probability=UnderEstimateCost/(UnderEstimateCost+OverEstimateCost)

Solution

We all know that, mean is the average quantity of any distribution.From the calculation we understand that the average sold out for store 1 is 1240 and for store 2 is 1905. We can surely print these quantity of newspapers fron store 1 & store 2 respectively. But then, since, “Mean” means middle point of any distribution, 50% of time we will print more than the daily demand and other 50% time we will print less than the daily demand.

So, we must use the concept of standard deviation (sd). Standard Deviation means how much deviation from the average. So, at the time of taking thhe printingg decision, if we add the sd with mean ( ie. average), the probability of stock out or probability of overstock will reduce.

But, this will also does not give us the perfect result.To be 95% sure, we need to find the point on our sold out distribution ( demand distribution ) that corresponds to the cumulative probability of P.

some_extra=qnorm(Probability)
store_q=(mean.sold.out + sd.sold.out * some_extra)
store_q

##    S0001    S0002 
## 1263.268 1922.092

So, we get the perfect result with “mean.sold.out + sd.sold.out + some_extra”. With these data provided to me, the publisher should print 1264 newspapers in store 1 & 1922 newspapers in store 2.

Validity Test of the Model

tapply(newsdata$Supply,newsdata$Store, mean)

##    S0001    S0002 
## 1427.586 2230.883

store_q

##    S0001    S0002 
## 1263.268 1922.092

ReturnNew=tapply(newsdata$Supply,newsdata$Store, mean) - store_q
PercentReturnNew=(ReturnNew/tapply(newsdata$Supply,newsdata$Store, mean))*100
PercentReturnNew

##    S0001    S0002 
## 11.51023 13.84164

So, now percentage of return is 11.51% for Store 1 & 13.84% for store 2, which is slightly improved from earlier situation. With more historical data, original costing data and detailed analysis, we will be able to improve our result within 5% return.

Automated Web Application

It is not possible for a newspaper publisher to hire a Data Scientist and ask him to do analysis daily. Because, he needs to forecast everyday.

Then, what is the solution? We can create an automated dashboard application where with one click the publisher will be able to know the forecasted result of tomorrow’s production.

It will help the publisher to take strategic decision about everyday’s production.

Here, I have tried to show you such application. Just double click on the webpage written here, the webpage will be open and you just have to click on the button and you will get the result we have deduced here.

At original project, we will automate this, so that, everyday it will change automatically and give you the desired result.