Analysis of Mondelez Sales Volume

Summarizing the data

mondelez.df = read.csv(paste("Mondelez.csv", sep=""))
mondelez.df$City <- as.integer((mondelez.df$City))
mondelez.df$Type <- as.integer((mondelez.df$Type))
library("psych", lib.loc="~/R/win-library/3.5")
describe(mondelez.df)
##          vars    n  mean    sd median trimmed   mad min    max  range
## Company*    1 2388  1.00  0.00   1.00    1.00  0.00   1   1.00   0.00
## Type        2 2388  3.44  1.69   3.00    3.43  1.48   1   6.00   5.00
## City        3 2388 18.02 10.29  18.00   18.03 13.34   1  35.00  34.00
## Month       4 2388  5.50  3.45   5.50    5.50  4.45   0  11.00  11.00
## Dpm         5 2041 44.92 21.62  48.99   46.06 21.35   0  90.30  90.30
## Dwm         6 2039 80.91 21.49  89.43   85.40 10.01   0  99.43  99.43
## Sm          7 2037 14.74 16.52   7.32   11.68  5.47   0  78.60  78.60
## Om          8 2041  1.27  2.16   0.65    0.89  0.49   0  27.99  27.99
## PPUm        9 2041 31.33 45.92   9.05   21.72  5.16   2 647.11 645.11
## Volume     10 2039 11.83 18.26   5.06    7.61  5.78   0 124.35 124.35
##           skew kurtosis   se
## Company*   NaN      NaN 0.00
## Type      0.03    -1.23 0.03
## City      0.00    -1.25 0.21
## Month     0.00    -1.22 0.07
## Dpm      -0.45    -0.74 0.48
## Dwm      -1.94     3.46 0.48
## Sm        1.54     0.82 0.37
## Om        7.72    77.50 0.05
## PPUm      2.83    18.22 1.02
## Volume    3.13    11.42 0.40

Does the data provides association between sales volume and other variables?

mondelez_model <- Volume ~ Type+City+Month+Dpm+Dwm+Sm+Om+PPUm+Volume
mondelez_fit <- lm(mondelez_model, data = mondelez.df)
## Warning in model.matrix.default(mt, mf, contrasts): the response appeared
## on the right-hand side and was dropped
## Warning in model.matrix.default(mt, mf, contrasts): problem with term 9 in
## model.matrix: no columns are assigned
summary(mondelez_fit)
## 
## Call:
## lm(formula = mondelez_model, data = mondelez.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -25.581  -7.223  -2.192   2.028  93.685 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.34629    2.25640   0.597 0.550804    
## Type        -0.62219    0.31175  -1.996 0.046091 *  
## City        -0.10391    0.03383  -3.071 0.002161 ** 
## Month       -0.04282    0.09983  -0.429 0.668039    
## Dpm          0.42645    0.03387  12.591  < 2e-16 ***
## Dwm         -0.11771    0.03235  -3.638 0.000281 ***
## Sm           0.21751    0.02718   8.003 2.03e-15 ***
## Om           1.19997    0.17591   6.822 1.18e-11 ***
## PPUm         0.01605    0.01241   1.293 0.196162    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.52 on 2028 degrees of freedom
##   (351 observations deleted due to missingness)
## Multiple R-squared:  0.2811, Adjusted R-squared:  0.2783 
## F-statistic: 99.12 on 8 and 2028 DF,  p-value: < 2.2e-16

Since, p-value < 0.05 => We infer that we reject null hypothesis that sales is not asociated with other variables and accept alternate hypothesis that data provides association between sales volume and other variables

How strong is the relationship between sales and advertising expenditure?

summary(mondelez_fit)
## 
## Call:
## lm(formula = mondelez_model, data = mondelez.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -25.581  -7.223  -2.192   2.028  93.685 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.34629    2.25640   0.597 0.550804    
## Type        -0.62219    0.31175  -1.996 0.046091 *  
## City        -0.10391    0.03383  -3.071 0.002161 ** 
## Month       -0.04282    0.09983  -0.429 0.668039    
## Dpm          0.42645    0.03387  12.591  < 2e-16 ***
## Dwm         -0.11771    0.03235  -3.638 0.000281 ***
## Sm           0.21751    0.02718   8.003 2.03e-15 ***
## Om           1.19997    0.17591   6.822 1.18e-11 ***
## PPUm         0.01605    0.01241   1.293 0.196162    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.52 on 2028 degrees of freedom
##   (351 observations deleted due to missingness)
## Multiple R-squared:  0.2811, Adjusted R-squared:  0.2783 
## F-statistic: 99.12 on 8 and 2028 DF,  p-value: < 2.2e-16

Since, Adjusted R-squared is average, the relationship is not very strong

Which parameter contributes to volume?

summary(mondelez_fit)
## 
## Call:
## lm(formula = mondelez_model, data = mondelez.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -25.581  -7.223  -2.192   2.028  93.685 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.34629    2.25640   0.597 0.550804    
## Type        -0.62219    0.31175  -1.996 0.046091 *  
## City        -0.10391    0.03383  -3.071 0.002161 ** 
## Month       -0.04282    0.09983  -0.429 0.668039    
## Dpm          0.42645    0.03387  12.591  < 2e-16 ***
## Dwm         -0.11771    0.03235  -3.638 0.000281 ***
## Sm           0.21751    0.02718   8.003 2.03e-15 ***
## Om           1.19997    0.17591   6.822 1.18e-11 ***
## PPUm         0.01605    0.01241   1.293 0.196162    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.52 on 2028 degrees of freedom
##   (351 observations deleted due to missingness)
## Multiple R-squared:  0.2811, Adjusted R-squared:  0.2783 
## F-statistic: 99.12 on 8 and 2028 DF,  p-value: < 2.2e-16

There is significant influence of Type, City, Dpm, Dwm, Sm and Om on the sales of Mondelez. We cant say about the influence of Month and PPUm

What are the values of Beta for different variables?

summary(mondelez_fit)
## 
## Call:
## lm(formula = mondelez_model, data = mondelez.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -25.581  -7.223  -2.192   2.028  93.685 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.34629    2.25640   0.597 0.550804    
## Type        -0.62219    0.31175  -1.996 0.046091 *  
## City        -0.10391    0.03383  -3.071 0.002161 ** 
## Month       -0.04282    0.09983  -0.429 0.668039    
## Dpm          0.42645    0.03387  12.591  < 2e-16 ***
## Dwm         -0.11771    0.03235  -3.638 0.000281 ***
## Sm           0.21751    0.02718   8.003 2.03e-15 ***
## Om           1.19997    0.17591   6.822 1.18e-11 ***
## PPUm         0.01605    0.01241   1.293 0.196162    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.52 on 2028 degrees of freedom
##   (351 observations deleted due to missingness)
## Multiple R-squared:  0.2811, Adjusted R-squared:  0.2783 
## F-statistic: 99.12 on 8 and 2028 DF,  p-value: < 2.2e-16

Volume = B0 + B1Type + B2City + B3Month + B4Dpm + B5Dwm + B6Sm + B7Om + B8PPUm

Therefore, Volume = 1.346 -0.622Type - 0.1City - 0.04Month + 0.42Dpm -0.117Dwm + 0.217Sm + 1.199Om + 0.016PPUm

Analysis of Nestle Sales Volume

Summarizing the data

nestle.df = read.csv(paste("Nestle.csv", sep=""))
nestle.df$City <- as.integer((nestle.df$City))
nestle.df$Type <- as.integer((nestle.df$Type))
library("psych", lib.loc="~/R/win-library/3.5")
describe(nestle.df)
##          vars    n  mean    sd median trimmed   mad  min    max  range
## Company*    1 1332  1.00  0.00   1.00    1.00  0.00 1.00   1.00   0.00
## Type        2 1332  4.98  2.55   5.00    4.93  2.97 1.00  10.00   9.00
## City        3 1332 12.05  7.32  12.00   11.88  8.90 1.00  26.00  25.00
## Month       4 1332  5.50  3.45   5.50    5.50  4.45 0.00  11.00  11.00
## Dpn         5 1018 17.72 17.56  10.91   15.06 11.19 0.01  73.35  73.34
## Dwn         6 1018 45.18 25.13  43.52   45.10 30.45 0.01  91.85  91.84
## Sn          7 1018  4.64  5.29   2.47    3.82  2.28 0.00  42.78  42.78
## On          8 1018  0.93  3.99   0.43    0.47  0.28 0.00  42.78  42.78
## PPUn        9 1016 27.38 35.06  10.77   19.38  4.75 5.30 200.00 194.70
## Volume     10 1018  4.86 11.08   0.72    1.90  0.96 0.00  71.04  71.04
##          skew kurtosis   se
## Company*  NaN      NaN 0.00
## Type     0.11    -0.95 0.07
## City     0.15    -1.21 0.20
## Month    0.00    -1.22 0.09
## Dpn      1.16     0.20 0.55
## Dwn      0.09    -1.12 0.79
## Sn       3.12    15.42 0.17
## On       9.03    80.71 0.12
## PPUn     1.89     2.55 1.10
## Volume   3.44    11.93 0.35

Does the data provides association between sales volume and other variables?

nestle_model <- Volume ~ Type+City+Month+Dpn+Dwn+Sn+On+PPUn+Volume
nestlez_fit <- lm(nestle_model, data = nestle.df)
## Warning in model.matrix.default(mt, mf, contrasts): the response appeared
## on the right-hand side and was dropped
## Warning in model.matrix.default(mt, mf, contrasts): problem with term 9 in
## model.matrix: no columns are assigned
summary(nestlez_fit)
## 
## Call:
## lm(formula = nestle_model, data = nestle.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.364  -2.015  -0.529   1.045  52.722 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.764155   1.468633   1.201  0.22995    
## Type        -0.511106   0.197368  -2.590  0.00975 ** 
## City        -0.046987   0.036894  -1.274  0.20310    
## Month       -0.051543   0.076498  -0.674  0.50060    
## Dpn          0.481885   0.044755  10.767  < 2e-16 ***
## Dwn         -0.001788   0.023640  -0.076  0.93972    
## Sn          -0.710762   0.142640  -4.983 7.37e-07 ***
## On           1.307644   0.133940   9.763  < 2e-16 ***
## PPUn        -0.001491   0.013159  -0.113  0.90981    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.358 on 1007 degrees of freedom
##   (316 observations deleted due to missingness)
## Multiple R-squared:  0.4359, Adjusted R-squared:  0.4314 
## F-statistic: 97.28 on 8 and 1007 DF,  p-value: < 2.2e-16

Since, p-value < 0.05 => We infer that we reject null hypothesis that sales is not asociated with other variables and accept alternate hypothesis that data provides association between sales volume and other variables

How strong is the relationship between sales and advertising expenditure?

summary(nestlez_fit)
## 
## Call:
## lm(formula = nestle_model, data = nestle.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.364  -2.015  -0.529   1.045  52.722 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.764155   1.468633   1.201  0.22995    
## Type        -0.511106   0.197368  -2.590  0.00975 ** 
## City        -0.046987   0.036894  -1.274  0.20310    
## Month       -0.051543   0.076498  -0.674  0.50060    
## Dpn          0.481885   0.044755  10.767  < 2e-16 ***
## Dwn         -0.001788   0.023640  -0.076  0.93972    
## Sn          -0.710762   0.142640  -4.983 7.37e-07 ***
## On           1.307644   0.133940   9.763  < 2e-16 ***
## PPUn        -0.001491   0.013159  -0.113  0.90981    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.358 on 1007 degrees of freedom
##   (316 observations deleted due to missingness)
## Multiple R-squared:  0.4359, Adjusted R-squared:  0.4314 
## F-statistic: 97.28 on 8 and 1007 DF,  p-value: < 2.2e-16

Since, Adjusted R-squared is average, the relationship is not very strong

Which parameter contributes to volume?

summary(nestlez_fit)
## 
## Call:
## lm(formula = nestle_model, data = nestle.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.364  -2.015  -0.529   1.045  52.722 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.764155   1.468633   1.201  0.22995    
## Type        -0.511106   0.197368  -2.590  0.00975 ** 
## City        -0.046987   0.036894  -1.274  0.20310    
## Month       -0.051543   0.076498  -0.674  0.50060    
## Dpn          0.481885   0.044755  10.767  < 2e-16 ***
## Dwn         -0.001788   0.023640  -0.076  0.93972    
## Sn          -0.710762   0.142640  -4.983 7.37e-07 ***
## On           1.307644   0.133940   9.763  < 2e-16 ***
## PPUn        -0.001491   0.013159  -0.113  0.90981    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.358 on 1007 degrees of freedom
##   (316 observations deleted due to missingness)
## Multiple R-squared:  0.4359, Adjusted R-squared:  0.4314 
## F-statistic: 97.28 on 8 and 1007 DF,  p-value: < 2.2e-16

There is significant influence of Type, Dpn, Sn and On on the sales of Mondelez. We cant say about the influence of City, Month and Dwn, PPUn

What are the values of Beta for different variables?

summary(nestlez_fit)
## 
## Call:
## lm(formula = nestle_model, data = nestle.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.364  -2.015  -0.529   1.045  52.722 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.764155   1.468633   1.201  0.22995    
## Type        -0.511106   0.197368  -2.590  0.00975 ** 
## City        -0.046987   0.036894  -1.274  0.20310    
## Month       -0.051543   0.076498  -0.674  0.50060    
## Dpn          0.481885   0.044755  10.767  < 2e-16 ***
## Dwn         -0.001788   0.023640  -0.076  0.93972    
## Sn          -0.710762   0.142640  -4.983 7.37e-07 ***
## On           1.307644   0.133940   9.763  < 2e-16 ***
## PPUn        -0.001491   0.013159  -0.113  0.90981    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.358 on 1007 degrees of freedom
##   (316 observations deleted due to missingness)
## Multiple R-squared:  0.4359, Adjusted R-squared:  0.4314 
## F-statistic: 97.28 on 8 and 1007 DF,  p-value: < 2.2e-16

Volume = B0 + B1Type + B2City + B3Month + B4Dpn + B5Dwn + B6Sn + B7On + B8PPUn

Therefore, Volume = 1.37 -0.64Type - 0.044City - 0.05Month + 0.489Dpn -0.005Dwn + 0.7Sn + 1.299On + 0.0015PPUn