Group 5 - Debarati | Kanika | Mudit | Anshumaan | Kaushik | Shreya | Pratik
1. In last few years, growth in chocolate market has been tremendous in terms of value and volume.
2. There has been an increase in international brands and extension of product portfolios of the existing players in the market.
3. Sales in this sector is driven by several factors such as type of retail stores, town, availability of refrigerator etc.
4. Objective : Identify how the presence and volume of the refrigerations affects the sales of chocolates
5. For the scope of this project Sales Data for major chocolate manufacturers has been taken
6. Sales data is from 4 major Indian states : Chattisgarh, Gujarat, Madhya Pradesh and Maharashtra
To study the effect of presence of refrigerator on sales ofChocolates in India
To study the relative impact of factors such as fridge Volume, Retailer type, town etc. on sales of Chocolates
OutletCode TownClass Town State
1 1 TITANIUM Pune Maharashtra
2 2 GOLD Gwalior Madhya Pradesh
3 3 TITANIUM Nerul Maharashtra
4 4 SILVER KORBA Chattisgarh
5 5 TITANIUM Kalyan Maharashtra
6 7 SILVER KANKAVLI Maharashtra
RetailerClass Fridge.Volume Has.Fridge Sales
1 OTHERS 340 1 123860.13
2 HIGH END GROCER 50 1 79153.05
3 HIGH END GROCER 35 1 227851.06
4 HIGH END GROCER 35 1 31397.78
5 OTHERS 50 1 24210.18
6 OTHERS 0 0 20226.00
*Outlet Code - Unique Code specific to Outlets *
*Town Class - Titanium , Gold , Silver and Rest of Urban depending on the following two factors: *
*1) Revenue generated on an average from that Town *
*2) Disposable Income of the individuals staying there *
*Titanium being highest, followed by Gold, Silver and Rest of Urban *
*Town- The town where the outlet concerned is located *
*State - The state where the outlet concerned is located *
*RetailerClass- Only the major 4 categories of the retailers are retained and the remaining low frequency categories are clubbed under “OTHERS” *
*Outlet Code - Unique Code specific to Outlets *
*Town Class - Titanium , Gold , Silver and Rest of Urban depending on the following two factors: *
*1) Revenue generated on an average from that Town *
*2) Disposable Income of the individuals staying there *
*Titanium being highest, followed by Gold, Silver and Rest of Urban *
*Town- The town where the outlet concerned is located *
*State - The state where the outlet concerned is located *
*RetailerClass- Only the major 4 categories of the retailers are retained and the remaining low frequency categories are clubbed under “OTHERS” *
*Fridge - The volume of the Fridge in Litres. 0 meaning No Fridge *
*Has Fridge - Binary variable: 1 -Has Fridge, 0 -No Fridge *
*Sales New - Y Variable of concerned - Annual Sales generated by each store *
*Parent Firm - The revenue share of the parent firm in that shop as compared with all the Chocolate firms which supply their products in that shop *
n mean sd median skew kurtosis se
OutletCode 242595 126617.51 72905.95 126633.00 -0.01 -1.20 148.02
TownClass* 242595 2.98 1.23 4.00 -0.69 -1.21 0.00
Town* 242595 116.32 67.58 136.00 -0.24 -1.04 0.14
State* 242595 3.28 0.94 4.00 -0.82 -0.82 0.00
RetailerClass* 242595 3.52 1.15 4.00 -1.26 0.43 0.00
Fridge.Volume 242595 46.15 95.63 0.00 2.40 4.38 0.19
Has.Fridge 242595 0.43 0.50 0.00 0.28 -1.92 0.00
Sales 242595 46288.43 77949.27 18058.44 3.34 12.43 158.26
Manually Removing points with sales less than 1000
After Removing Sales below 1000
We have 232,145 outlets information now
Conclusion:
At higher fridge volmes, sales varies significantly as compared to lower fridge volumes for different town classes
Fridge Vol VS Sales
Fridge Vol VS Average Sales
Given the average sales based on fridge (has or not), we notice that having fridge increases sales significantly to close to Rs. 90,000 from Rs. 15,000
FridgeVolume x
1 0 15622.06
2 1 89334.69
1. Through this we found that food store and high end grocers drive the highest sales
2. It was found that Maharashtra and Madhya Pradesh drive the highest sales
3. It was found that Gold and Titanium town class drive the highest sales
[1] "GOLD" "REST OF URBAN" "SILVER" "TITANIUM"
*Model 1 : Regression Output: We tried to find out Effect of presence of Fridge in sales revenue *
*Conclusion: The p value < .05 , implying that the variables Sales and Fridge.Volume are significantly related. With 1 litre increment in fridge colume, the annual sales of the shop increase by 360 Rs INR. *
*Model 2 : Effect of other factors -Linear Linear *
*Conclusion : The p-value of the model is less that .05, implying the X variables are related with Y variables. All the X variables are related to the Y variables as their individual P values are < .05. The adjusted R square value of the model is 37%. *
*Model Fit3 : With interactions considered, the Adjusted R square value improved fom 37% to 38.92% *
The previous model fit 3 is declared the best model by step function
We have sales data for 232145 outlets currently
966 outliers were detected by the outlierTest
*After removing outliers, data for 231179 outlets remained *
After removing outliers, the adjusted R-squared has improved from 38.92% to 41.67% even for the same model
[1] 231179 9
1. Model does not seem to be linear visually
Data seems to be deviating from the normality at the extreme points
From Anderson Test, it was found that Residuals are not following a normal distribution since p <.05
Anderson-Darling normality test
data: fulldata.or$Sales
A = 30989, p-value < 2.2e-16
Lambda Value is 0, so log transform was taken on Sales
Box-Cox Transformation
231179 data points used to estimate Lambda
Input data summary:
Min. 1st Qu. Median Mean 3rd Qu. Max.
1000 7376 19600 48345 50974 454670
Largest/Smallest: 455
Sample Skewness: 3.28
Estimated Lambda: 0
With fudge factor, Lambda = 0 will be used for transformations
Running regression after transformation
fit5 <- lm(Sales.Log ~ Fridge.Volume + TownClass + RetailerClass + State + TownClass*Fridge.Volume + RetailerClass*Fridge.Volume + State*Fridge.Volume +RetailerClass*Fridge.Volume*TownClass)
Call:
lm(formula = Sales.Log ~ Fridge.Volume + TownClass + RetailerClass +
State + TownClass * Fridge.Volume + RetailerClass * Fridge.Volume +
State * Fridge.Volume + RetailerClass * Fridge.Volume * TownClass)
Residuals:
Min 1Q Median 3Q Max
-5.2837 -0.7553 0.0900 0.8254 3.7807
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.5498057 0.0309218 308.837 < 2e-16 ***
Fridge.Volume 0.0101374 0.0005676 17.860 < 2e-16 ***
TownClassGOLD -0.3058912 0.0361418 -8.464 < 2e-16 ***
TownClassSILVER -0.2673023 0.0339420 -7.875 3.41e-15 ***
TownClassTITANIUM -0.1548498 0.0294502 -5.258 1.46e-07 ***
RetailerClassCHEMIST -0.2818264 0.0356284 -7.910 2.58e-15 ***
RetailerClassFOOD STORE 0.9121945 0.0603217 15.122 < 2e-16 ***
RetailerClassHIGH END GROCER 1.1075684 0.0631348 17.543 < 2e-16 ***
RetailerClassLOW END GROCER -0.1088714 0.0290998 -3.741 0.000183 ***
StateGujarat 0.0027240 0.0153940 0.177 0.859545
StateMadhya Pradesh -0.0380490 0.0156466 -2.432 0.015026 *
StateMaharashtra 0.1697092 0.0148233 11.449 < 2e-16 ***
Fridge.Volume:TownClassGOLD 0.0029747 0.0005358 5.552 2.82e-08 ***
Fridge.Volume:TownClassSILVER 0.0013788 0.0005761 2.393 0.016690 *
Fridge.Volume:TownClassTITANIUM -0.0009665 0.0005024 -1.924 0.054366 .
Fridge.Volume:RetailerClassCHEMIST 0.0004037 0.0007025 0.575 0.565481
Fridge.Volume:RetailerClassFOOD STORE -0.0023177 0.0006299 -3.680 0.000234 ***
Fridge.Volume:RetailerClassHIGH END GROCER -0.0019488 0.0006604 -2.951 0.003170 **
Fridge.Volume:RetailerClassLOW END GROCER -0.0006347 0.0005201 -1.220 0.222352
Fridge.Volume:StateGujarat -0.0041874 0.0002927 -14.309 < 2e-16 ***
Fridge.Volume:StateMadhya Pradesh -0.0037602 0.0002970 -12.661 < 2e-16 ***
Fridge.Volume:StateMaharashtra -0.0048027 0.0002891 -16.614 < 2e-16 ***
TownClassGOLD:RetailerClassCHEMIST 0.5879360 0.0452542 12.992 < 2e-16 ***
TownClassSILVER:RetailerClassCHEMIST 0.3804536 0.0448203 8.488 < 2e-16 ***
TownClassTITANIUM:RetailerClassCHEMIST 0.8097071 0.0385209 21.020 < 2e-16 ***
TownClassGOLD:RetailerClassFOOD STORE 1.0555833 0.0744975 14.169 < 2e-16 ***
TownClassSILVER:RetailerClassFOOD STORE 0.5620115 0.0743777 7.556 4.17e-14 ***
TownClassTITANIUM:RetailerClassFOOD STORE 1.0765208 0.0649934 16.564 < 2e-16 ***
TownClassGOLD:RetailerClassHIGH END GROCER 0.8269123 0.0715753 11.553 < 2e-16 ***
TownClassSILVER:RetailerClassHIGH END GROCER 0.6434566 0.0741395 8.679 < 2e-16 ***
TownClassTITANIUM:RetailerClassHIGH END GROCER 0.8680269 0.0661356 13.125 < 2e-16 ***
TownClassGOLD:RetailerClassLOW END GROCER 0.2057213 0.0379771 5.417 6.07e-08 ***
TownClassSILVER:RetailerClassLOW END GROCER 0.0743876 0.0360851 2.061 0.039261 *
TownClassTITANIUM:RetailerClassLOW END GROCER 0.1718466 0.0313631 5.479 4.28e-08 ***
Fridge.Volume:TownClassGOLD:RetailerClassCHEMIST -0.0043421 0.0007471 -5.812 6.17e-09 ***
Fridge.Volume:TownClassSILVER:RetailerClassCHEMIST -0.0026042 0.0008002 -3.254 0.001137 **
Fridge.Volume:TownClassTITANIUM:RetailerClassCHEMIST -0.0014732 0.0007130 -2.066 0.038827 *
Fridge.Volume:TownClassGOLD:RetailerClassFOOD STORE -0.0045177 0.0006896 -6.551 5.74e-11 ***
Fridge.Volume:TownClassSILVER:RetailerClassFOOD STORE -0.0019426 0.0007295 -2.663 0.007750 **
Fridge.Volume:TownClassTITANIUM:RetailerClassFOOD STORE -0.0011390 0.0006459 -1.763 0.077824 .
Fridge.Volume:TownClassGOLD:RetailerClassHIGH END GROCER -0.0051954 0.0007089 -7.329 2.33e-13 ***
Fridge.Volume:TownClassSILVER:RetailerClassHIGH END GROCER -0.0031665 0.0007599 -4.167 3.09e-05 ***
Fridge.Volume:TownClassTITANIUM:RetailerClassHIGH END GROCER -0.0019335 0.0006740 -2.869 0.004120 **
Fridge.Volume:TownClassGOLD:RetailerClassLOW END GROCER -0.0031040 0.0005686 -5.459 4.80e-08 ***
Fridge.Volume:TownClassSILVER:RetailerClassLOW END GROCER -0.0011692 0.0006135 -1.906 0.056679 .
Fridge.Volume:TownClassTITANIUM:RetailerClassLOW END GROCER -0.0007406 0.0005320 -1.392 0.163870
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.127 on 231133 degrees of freedom
Multiple R-squared: 0.2993, Adjusted R-squared: 0.2992
F-statistic: 2194 on 45 and 231133 DF, p-value: < 2.2e-16
The normality has been solved, however linearity still remains an issue, due to which the adjusted R square has gone down to 29.6%.
Anderson Test
Anderson-Darling normality test
data: fulldata.or$Sales.Log
A = 225.89, p-value < 2.2e-16
Model does not seem to be linear visually
Data seems to be deviating from the normality at the extreme points, however normality has improved by great bounds compared to fit4 model
Does not have multicollinearity
a1<-bptest(fit5)
a1
studentized Breusch-Pagan test
data: fit5
BP = 7878.4, df = 45, p-value < 2.2e-16
b1<-ncvTest(fit5)
b1
Non-constant Variance Score Test
Variance formula: ~ fitted.values
Chisquare = 1798.344, Df = 1, p = < 2.22e-16
Data has heteroskedasticity
## FGLs Log- linear with Intaction
# Step 1:Residuals of linear OLS Model
LogOLSModelRes <- resid(fit5)
# Step 2: Taking square of the residuals of linear OLS Model
LogOLSModelResSq <- LogOLSModelRes^2
# Step 3: Taking natural log of the squared residuals of linear OLS Model
lnOLSResSq <- log(LogOLSModelResSq)
# Step 4: Running auxiliary OLS Model
auxOLSModel <- lm(lnOLSResSq ~ Fridge.Volume + TownClass + RetailerClass + State + TownClass*Fridge.Volume + RetailerClass*Fridge.Volume + State*Fridge.Volume +RetailerClass*Fridge.Volume*TownClass,data = fulldata.or)
# Step 5: Get fitted value of auxiliary OLS Model i.e. 'auxOLSModel'
fittedValue <- fitted(auxOLSModel)
# Step 6: Compute exponential values of fiited value for auxialiary OLS Model
expValue <- exp(fittedValue)
# Step 7: Fit Log-linear FGLS Model
fit5_new <- lm(Sales.Log ~ Fridge.Volume + TownClass + RetailerClass + State + TownClass*Fridge.Volume + RetailerClass*Fridge.Volume + State*Fridge.Volume +RetailerClass*Fridge.Volume*TownClass,weights = 1/expValue,data = fulldata.or)
# summary of linear FGLS model
summary(fit5_new)
Call:
lm(formula = Sales.Log ~ Fridge.Volume + TownClass + RetailerClass +
State + TownClass * Fridge.Volume + RetailerClass * Fridge.Volume +
State * Fridge.Volume + RetailerClass * Fridge.Volume * TownClass,
data = fulldata.or, weights = 1/expValue)
Weighted Residuals:
Min 1Q Median 3Q Max
-11.9895 -1.2284 0.1587 1.3239 9.1044
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.470e+00 2.399e-02 394.729 < 2e-16 ***
Fridge.Volume 1.846e-02 7.855e-04 23.505 < 2e-16 ***
TownClassGOLD -3.152e-01 3.125e-02 -10.086 < 2e-16 ***
TownClassSILVER -3.093e-01 2.689e-02 -11.500 < 2e-16 ***
TownClassTITANIUM -1.478e-01 2.427e-02 -6.089 1.14e-09 ***
RetailerClassCHEMIST -2.513e-01 2.874e-02 -8.744 < 2e-16 ***
RetailerClassFOOD STORE 9.684e-01 5.367e-02 18.043 < 2e-16 ***
RetailerClassHIGH END GROCER 1.168e+00 5.011e-02 23.309 < 2e-16 ***
RetailerClassLOW END GROCER -7.539e-02 2.284e-02 -3.300 0.000966 ***
StateGujarat 3.849e-02 1.244e-02 3.094 0.001972 **
StateMadhya Pradesh -3.353e-02 1.245e-02 -2.694 0.007067 **
StateMaharashtra 2.034e-01 1.187e-02 17.130 < 2e-16 ***
Fridge.Volume:TownClassGOLD -3.952e-04 7.662e-04 -0.516 0.606030
Fridge.Volume:TownClassSILVER 3.124e-03 8.976e-04 3.480 0.000502 ***
Fridge.Volume:TownClassTITANIUM -4.542e-03 7.253e-04 -6.262 3.81e-10 ***
Fridge.Volume:RetailerClassCHEMIST 4.392e-04 1.077e-03 0.408 0.683364
Fridge.Volume:RetailerClassFOOD STORE -7.200e-03 7.987e-04 -9.015 < 2e-16 ***
Fridge.Volume:RetailerClassHIGH END GROCER -6.753e-03 7.826e-04 -8.629 < 2e-16 ***
Fridge.Volume:RetailerClassLOW END GROCER -2.720e-03 7.447e-04 -3.652 0.000260 ***
Fridge.Volume:StateGujarat -7.273e-03 3.537e-04 -20.561 < 2e-16 ***
Fridge.Volume:StateMadhya Pradesh -6.329e-03 3.589e-04 -17.635 < 2e-16 ***
Fridge.Volume:StateMaharashtra -8.743e-03 3.485e-04 -25.085 < 2e-16 ***
TownClassGOLD:RetailerClassCHEMIST 5.718e-01 3.986e-02 14.346 < 2e-16 ***
TownClassSILVER:RetailerClassCHEMIST 4.038e-01 3.681e-02 10.969 < 2e-16 ***
TownClassTITANIUM:RetailerClassCHEMIST 7.925e-01 3.297e-02 24.039 < 2e-16 ***
TownClassGOLD:RetailerClassFOOD STORE 9.541e-01 6.619e-02 14.415 < 2e-16 ***
TownClassSILVER:RetailerClassFOOD STORE 5.226e-01 7.031e-02 7.432 1.07e-13 ***
TownClassTITANIUM:RetailerClassFOOD STORE 9.768e-01 5.832e-02 16.748 < 2e-16 ***
TownClassGOLD:RetailerClassHIGH END GROCER 7.615e-01 5.755e-02 13.233 < 2e-16 ***
TownClassSILVER:RetailerClassHIGH END GROCER 5.826e-01 5.904e-02 9.867 < 2e-16 ***
TownClassTITANIUM:RetailerClassHIGH END GROCER 7.842e-01 5.263e-02 14.901 < 2e-16 ***
TownClassGOLD:RetailerClassLOW END GROCER 1.907e-01 3.300e-02 5.779 7.54e-09 ***
TownClassSILVER:RetailerClassLOW END GROCER 6.787e-02 2.892e-02 2.347 0.018934 *
TownClassTITANIUM:RetailerClassLOW END GROCER 1.654e-01 2.615e-02 6.325 2.54e-10 ***
Fridge.Volume:TownClassGOLD:RetailerClassCHEMIST -4.887e-03 1.128e-03 -4.331 1.48e-05 ***
Fridge.Volume:TownClassSILVER:RetailerClassCHEMIST -7.442e-03 1.245e-03 -5.977 2.28e-09 ***
Fridge.Volume:TownClassTITANIUM:RetailerClassCHEMIST -2.216e-03 1.091e-03 -2.032 0.042163 *
Fridge.Volume:TownClassGOLD:RetailerClassFOOD STORE -4.547e-04 8.687e-04 -0.523 0.600663
Fridge.Volume:TownClassSILVER:RetailerClassFOOD STORE -3.106e-03 1.004e-03 -3.094 0.001972 **
Fridge.Volume:TownClassTITANIUM:RetailerClassFOOD STORE 3.200e-03 8.200e-04 3.902 9.56e-05 ***
Fridge.Volume:TownClassGOLD:RetailerClassHIGH END GROCER -1.273e-03 8.443e-04 -1.508 0.131662
Fridge.Volume:TownClassSILVER:RetailerClassHIGH END GROCER -4.077e-03 9.848e-04 -4.140 3.48e-05 ***
Fridge.Volume:TownClassTITANIUM:RetailerClassHIGH END GROCER 2.420e-03 8.012e-04 3.021 0.002518 **
Fridge.Volume:TownClassGOLD:RetailerClassLOW END GROCER -4.119e-04 8.085e-04 -0.509 0.610475
Fridge.Volume:TownClassSILVER:RetailerClassLOW END GROCER 8.464e-05 9.486e-04 0.089 0.928901
Fridge.Volume:TownClassTITANIUM:RetailerClassLOW END GROCER 7.180e-04 7.626e-04 0.942 0.346412
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.811 on 231133 degrees of freedom
Multiple R-squared: 0.3668, Adjusted R-squared: 0.3667
F-statistic: 2976 on 45 and 231133 DF, p-value: < 2.2e-16
attach(fulldata.or)
A <- fit5_new$coefficients
write.csv(A,"A.csv")
getwd()
[1] "C:/Users/Anshumaan/Desktop/2018 DAM/PROJECT"
Installing Fridges in the “Silver Town Class” for “Others” type of retailers seem most beneficial Installing Fridges in the Tier 3 Towns (Rest of Urban) in Chemists is also beneficial. It is because, Chemist is a moderate Sales driver and will be a good place to invest in. Fridge can also be used to store major drugs The -ve coefficients are not surprising as it implies most of the shops in those categories have fridge / air-conditioner already. Or, the demand for chocolate is saturated, so fridge will not result in any additional sale For Revenue Improvement: Target Gujrat. Invest more on Chemists and Other Retailers over the whole state For Society Welfare: Target Chattisgarh and MP (revenue generation is much low).Invest more on Chemists and Other Retailers over Silver and Gold Town Classes
Some More Graphical Insights
As Fridge Volume Increases, OTHERS Retailers drive more revenue - FRIDGE UPGRADATION TARGET
interact_plot(fit5_new, pred = "Fridge.Volume", modx = "RetailerClass",
main.title= "Interaction of Fridge Volume and Retailer Class")
Some More Graphical Insights
interact_plot(fit5_new, pred = "Fridge.Volume", modx = "TownClass",
main.title= "Interaction of Fridge Volume and Town Class")
Some More Graphical Insights
interact_plot(fit5_new, pred = "Fridge.Volume", modx = "State",
main.title= "Interaction of Fridge Volume and State")
A company has 2 Crores to invest in Fridges. It needs to decide how to invest the 2 Crores wisely, so that it can achieve the following goals:
1) Increase the sales of its chocolates as presence of Fride improves storage condition
2) Serve the society better as presence of a fridge in a grocery can enable retailer in storing other necessary items like -dairy product
How does the manager go about in deciding which State / Retailers / Towns to target?
EXCEL FILE FOR DETAILED EXPLANATION
This model has 37.91% adjusted R-squared which is an improvement - We have also taken interaction with Has.Fridge instead of Fridge.Vol.
fulldata.or$Has.Fridge*fulldata.or$RetailerClass - w/0 47.29
fulldata.or$Has.Fridge*fulldata.or$State - w/o 47.29
fulldata.or$Has.Fridge*fulldata.or$TownClass - w/o all interaction ~28%
all FridgeVol interactions ~30%
fulldata.or$Fridge.Volume:fulldata.or$State w/o - 29.2
fulldata.or$Fridge.Volume*fulldata.or$RetailerClass w/o - 28.2
Conclusion: Last model “Best fit model” with adjusted R-squared 47.29%