data.df <-read.csv(paste("PromotionData.csv", sep=""))
Model1 <- COD ~ CODCharge
fit1 <- glm(Model1, data = data.df)
summary(fit1)
##
## Call:
## glm(formula = Model1, data = data.df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.463 -0.463 0.000 0.537 0.537
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.630e-01 2.328e-03 198.9 <2e-16 ***
## CODCharge 1.096e-02 8.982e-05 122.0 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.1791)
##
## Null deviance: 10886 on 45897 degrees of freedom
## Residual deviance: 8220 on 45896 degrees of freedom
## AIC: 51321
##
## Number of Fisher Scoring iterations: 2
Model2 <- COD ~ FinalTotalPrice
fit2 <- lm(Model2, data = data.df)
summary(fit2)
##
## Call:
## lm(formula = Model2, data = data.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.1427 -0.5959 0.3610 0.3998 0.4648
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.352e-01 5.472e-03 97.80 <2e-16 ***
## FinalTotalPrice 1.033e-04 6.598e-06 15.66 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4857 on 45896 degrees of freedom
## Multiple R-squared: 0.005317, Adjusted R-squared: 0.005295
## F-statistic: 245.3 on 1 and 45896 DF, p-value: < 2.2e-16
Model3 <- COD ~ MRP + FinalTotalPrice + CODCharge
fit3 <- lm(Model3, data = data.df)
summary(fit3)
##
## Call:
## lm(formula = Model3, data = data.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.4849 -0.4595 -0.0062 0.5235 0.7786
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.922e-01 4.972e-03 98.991 < 2e-16 ***
## MRP -4.220e-05 7.384e-06 -5.716 1.1e-08 ***
## FinalTotalPrice 1.701e-05 1.009e-05 1.685 0.092 .
## CODCharge 1.102e-02 9.155e-05 120.351 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4229 on 45894 degrees of freedom
## Multiple R-squared: 0.2459, Adjusted R-squared: 0.2459
## F-statistic: 4989 on 3 and 45894 DF, p-value: < 2.2e-16
Model4 <- COD ~ MRP + FinalTotalPrice + CODCharge + VendorDiscount + WebsiteDiscount
fit4 <- lm(Model4, data = data.df)
summary(fit4)
##
## Call:
## lm(formula = Model4, data = data.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.6051 -0.4547 -0.0083 0.5200 0.9059
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.943e-01 4.972e-03 99.424 < 2e-16 ***
## MRP 5.787e-05 1.271e-05 4.555 5.26e-06 ***
## FinalTotalPrice -8.768e-05 1.479e-05 -5.928 3.10e-09 ***
## CODCharge 1.124e-02 9.421e-05 119.272 < 2e-16 ***
## VendorDiscount -1.229e-04 1.271e-05 -9.674 < 2e-16 ***
## WebsiteDiscount NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4225 on 45893 degrees of freedom
## Multiple R-squared: 0.2474, Adjusted R-squared: 0.2474
## F-statistic: 3772 on 4 and 45893 DF, p-value: < 2.2e-16
Since the Y variable is a YES/NO Variable, running linear regression straightaway is not appropriate. Hence, we have to run a logistic regression using glm with logit link parameter.
fit5 <- glm(Model4, family = binomial(link='logit'), data = data.df)
summary(fit5)
##
## Call:
## glm(formula = Model4, family = binomial(link = "logit"), data = data.df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.42914 -1.09575 0.00013 1.19615 2.31390
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 6.993e-02 2.934e-02 2.383 0.0172 *
## MRP 2.805e-04 6.837e-05 4.103 4.08e-05 ***
## FinalTotalPrice -4.846e-04 8.121e-05 -5.967 2.41e-09 ***
## CODCharge 3.839e-01 1.171e+00 0.328 0.7430
## VendorDiscount -7.159e-04 7.068e-05 -10.129 < 2e-16 ***
## WebsiteDiscount NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 61256 on 45897 degrees of freedom
## Residual deviance: 45481 on 45893 degrees of freedom
## AIC: 45491
##
## Number of Fisher Scoring iterations: 17
plot(fit5)
We are having very high Deviance results. Null deviance of 61256 on 45897 degrees of freedom and Residual deviance of 45481 on 45893 degrees of freedom. This clearly shows no correlation with our model and the results.
fit6 <- aov(Model4, data = data.df)
summary(fit6)
## Df Sum Sq Mean Sq F value Pr(>F)
## MRP 1 10 9.9 55.73 8.45e-14 ***
## FinalTotalPrice 1 76 76.2 426.87 < 2e-16 ***
## CODCharge 1 2591 2590.9 14513.54 < 2e-16 ***
## VendorDiscount 1 17 16.7 93.58 < 2e-16 ***
## Residuals 45893 8193 0.2
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(fit6)
F Value of CODCharge is significantly high with a comparatively <0.05 p value. Hence CODCharge seems to have heavily impacting negatively on whether the order is COD Order or otherwise. Next to it, Final Total price factor also has significant F value.
fit7 <- aov(Model1, data = data.df)
summary(fit7)
## Df Sum Sq Mean Sq F value Pr(>F)
## CODCharge 1 2667 2666.5 14888 <2e-16 ***
## Residuals 45896 8220 0.2
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(fit7)