R Markdown

data.df <-read.csv(paste("PromotionData.csv", sep=""))

First Linear regression model dependent only on COD Charge

Model1 <- COD ~ CODCharge
fit1 <- glm(Model1, data = data.df)
summary(fit1)
## 
## Call:
## glm(formula = Model1, data = data.df)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -0.463  -0.463   0.000   0.537   0.537  
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 4.630e-01  2.328e-03   198.9   <2e-16 ***
## CODCharge   1.096e-02  8.982e-05   122.0   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.1791)
## 
##     Null deviance: 10886  on 45897  degrees of freedom
## Residual deviance:  8220  on 45896  degrees of freedom
## AIC: 51321
## 
## Number of Fisher Scoring iterations: 2
Model2 <- COD ~ FinalTotalPrice
fit2 <- lm(Model2, data = data.df)
summary(fit2)
## 
## Call:
## lm(formula = Model2, data = data.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.1427 -0.5959  0.3610  0.3998  0.4648 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     5.352e-01  5.472e-03   97.80   <2e-16 ***
## FinalTotalPrice 1.033e-04  6.598e-06   15.66   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4857 on 45896 degrees of freedom
## Multiple R-squared:  0.005317,   Adjusted R-squared:  0.005295 
## F-statistic: 245.3 on 1 and 45896 DF,  p-value: < 2.2e-16
Model3 <- COD ~ MRP + FinalTotalPrice + CODCharge
fit3 <- lm(Model3, data = data.df)
summary(fit3)
## 
## Call:
## lm(formula = Model3, data = data.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4849 -0.4595 -0.0062  0.5235  0.7786 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      4.922e-01  4.972e-03  98.991  < 2e-16 ***
## MRP             -4.220e-05  7.384e-06  -5.716  1.1e-08 ***
## FinalTotalPrice  1.701e-05  1.009e-05   1.685    0.092 .  
## CODCharge        1.102e-02  9.155e-05 120.351  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4229 on 45894 degrees of freedom
## Multiple R-squared:  0.2459, Adjusted R-squared:  0.2459 
## F-statistic:  4989 on 3 and 45894 DF,  p-value: < 2.2e-16
Model4 <- COD ~ MRP + FinalTotalPrice + CODCharge + VendorDiscount + WebsiteDiscount

fit4 <- lm(Model4, data = data.df)
summary(fit4)
## 
## Call:
## lm(formula = Model4, data = data.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.6051 -0.4547 -0.0083  0.5200  0.9059 
## 
## Coefficients: (1 not defined because of singularities)
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      4.943e-01  4.972e-03  99.424  < 2e-16 ***
## MRP              5.787e-05  1.271e-05   4.555 5.26e-06 ***
## FinalTotalPrice -8.768e-05  1.479e-05  -5.928 3.10e-09 ***
## CODCharge        1.124e-02  9.421e-05 119.272  < 2e-16 ***
## VendorDiscount  -1.229e-04  1.271e-05  -9.674  < 2e-16 ***
## WebsiteDiscount         NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4225 on 45893 degrees of freedom
## Multiple R-squared:  0.2474, Adjusted R-squared:  0.2474 
## F-statistic:  3772 on 4 and 45893 DF,  p-value: < 2.2e-16

Logistic Regression

Since the Y variable is a YES/NO Variable, running linear regression straightaway is not appropriate. Hence, we have to run a logistic regression using glm with logit link parameter.

fit5 <- glm(Model4, family = binomial(link='logit'), data = data.df)
summary(fit5)
## 
## Call:
## glm(formula = Model4, family = binomial(link = "logit"), data = data.df)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.42914  -1.09575   0.00013   1.19615   2.31390  
## 
## Coefficients: (1 not defined because of singularities)
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      6.993e-02  2.934e-02   2.383   0.0172 *  
## MRP              2.805e-04  6.837e-05   4.103 4.08e-05 ***
## FinalTotalPrice -4.846e-04  8.121e-05  -5.967 2.41e-09 ***
## CODCharge        3.839e-01  1.171e+00   0.328   0.7430    
## VendorDiscount  -7.159e-04  7.068e-05 -10.129  < 2e-16 ***
## WebsiteDiscount         NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 61256  on 45897  degrees of freedom
## Residual deviance: 45481  on 45893  degrees of freedom
## AIC: 45491
## 
## Number of Fisher Scoring iterations: 17
plot(fit5)

Logistic Regression Result Interpretation

We are having very high Deviance results. Null deviance of 61256 on 45897 degrees of freedom and Residual deviance of 45481 on 45893 degrees of freedom. This clearly shows no correlation with our model and the results.

Anova

fit6 <- aov(Model4, data = data.df)
summary(fit6)
##                    Df Sum Sq Mean Sq  F value   Pr(>F)    
## MRP                 1     10     9.9    55.73 8.45e-14 ***
## FinalTotalPrice     1     76    76.2   426.87  < 2e-16 ***
## CODCharge           1   2591  2590.9 14513.54  < 2e-16 ***
## VendorDiscount      1     17    16.7    93.58  < 2e-16 ***
## Residuals       45893   8193     0.2                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(fit6)

Anova Result Interpretation

F Value of CODCharge is significantly high with a comparatively <0.05 p value. Hence CODCharge seems to have heavily impacting negatively on whether the order is COD Order or otherwise. Next to it, Final Total price factor also has significant F value.

fit7 <- aov(Model1, data = data.df)
summary(fit7)
##                Df Sum Sq Mean Sq F value Pr(>F)    
## CODCharge       1   2667  2666.5   14888 <2e-16 ***
## Residuals   45896   8220     0.2                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(fit7)