mondelez.df = read.csv(paste("Mondelez.csv", sep=""))
mondelez.df$City <- as.integer((mondelez.df$City))
mondelez.df$Type <- as.integer((mondelez.df$Type))
library("psych", lib.loc="~/R/win-library/3.5")
describe(mondelez.df)
## vars n mean sd median trimmed mad min max range
## Company* 1 2388 1.00 0.00 1.00 1.00 0.00 1 1.00 0.00
## Type 2 2388 3.44 1.69 3.00 3.43 1.48 1 6.00 5.00
## City 3 2388 18.02 10.29 18.00 18.03 13.34 1 35.00 34.00
## Month 4 2388 5.50 3.45 5.50 5.50 4.45 0 11.00 11.00
## Dpm 5 2041 44.92 21.62 48.99 46.06 21.35 0 90.30 90.30
## Dwm 6 2039 80.91 21.49 89.43 85.40 10.01 0 99.43 99.43
## Sm 7 2037 14.74 16.52 7.32 11.68 5.47 0 78.60 78.60
## Om 8 2041 1.27 2.16 0.65 0.89 0.49 0 27.99 27.99
## PPUm 9 2041 31.33 45.92 9.05 21.72 5.16 2 647.11 645.11
## Volume 10 2039 11.83 18.26 5.06 7.61 5.78 0 124.35 124.35
## skew kurtosis se
## Company* NaN NaN 0.00
## Type 0.03 -1.23 0.03
## City 0.00 -1.25 0.21
## Month 0.00 -1.22 0.07
## Dpm -0.45 -0.74 0.48
## Dwm -1.94 3.46 0.48
## Sm 1.54 0.82 0.37
## Om 7.72 77.50 0.05
## PPUm 2.83 18.22 1.02
## Volume 3.13 11.42 0.40
mondelez_model <- Volume ~ Type+City+Month+Dpm+Dwm+Sm+Om+PPUm+Volume
mondelez_fit <- lm(mondelez_model, data = mondelez.df)
## Warning in model.matrix.default(mt, mf, contrasts): the response appeared
## on the right-hand side and was dropped
## Warning in model.matrix.default(mt, mf, contrasts): problem with term 9 in
## model.matrix: no columns are assigned
summary(mondelez_fit)
##
## Call:
## lm(formula = mondelez_model, data = mondelez.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -25.581 -7.223 -2.192 2.028 93.685
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.34629 2.25640 0.597 0.550804
## Type -0.62219 0.31175 -1.996 0.046091 *
## City -0.10391 0.03383 -3.071 0.002161 **
## Month -0.04282 0.09983 -0.429 0.668039
## Dpm 0.42645 0.03387 12.591 < 2e-16 ***
## Dwm -0.11771 0.03235 -3.638 0.000281 ***
## Sm 0.21751 0.02718 8.003 2.03e-15 ***
## Om 1.19997 0.17591 6.822 1.18e-11 ***
## PPUm 0.01605 0.01241 1.293 0.196162
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.52 on 2028 degrees of freedom
## (351 observations deleted due to missingness)
## Multiple R-squared: 0.2811, Adjusted R-squared: 0.2783
## F-statistic: 99.12 on 8 and 2028 DF, p-value: < 2.2e-16
Since, p-value < 0.05 => We infer that we reject null hypothesis that sales is not asociated with other variables and accept alternate hypothesis that data provides association between sales volume and other variables
summary(mondelez_fit)
##
## Call:
## lm(formula = mondelez_model, data = mondelez.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -25.581 -7.223 -2.192 2.028 93.685
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.34629 2.25640 0.597 0.550804
## Type -0.62219 0.31175 -1.996 0.046091 *
## City -0.10391 0.03383 -3.071 0.002161 **
## Month -0.04282 0.09983 -0.429 0.668039
## Dpm 0.42645 0.03387 12.591 < 2e-16 ***
## Dwm -0.11771 0.03235 -3.638 0.000281 ***
## Sm 0.21751 0.02718 8.003 2.03e-15 ***
## Om 1.19997 0.17591 6.822 1.18e-11 ***
## PPUm 0.01605 0.01241 1.293 0.196162
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.52 on 2028 degrees of freedom
## (351 observations deleted due to missingness)
## Multiple R-squared: 0.2811, Adjusted R-squared: 0.2783
## F-statistic: 99.12 on 8 and 2028 DF, p-value: < 2.2e-16
Since, Adjusted R-squared is average, the relationship is not very strong
summary(mondelez_fit)
##
## Call:
## lm(formula = mondelez_model, data = mondelez.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -25.581 -7.223 -2.192 2.028 93.685
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.34629 2.25640 0.597 0.550804
## Type -0.62219 0.31175 -1.996 0.046091 *
## City -0.10391 0.03383 -3.071 0.002161 **
## Month -0.04282 0.09983 -0.429 0.668039
## Dpm 0.42645 0.03387 12.591 < 2e-16 ***
## Dwm -0.11771 0.03235 -3.638 0.000281 ***
## Sm 0.21751 0.02718 8.003 2.03e-15 ***
## Om 1.19997 0.17591 6.822 1.18e-11 ***
## PPUm 0.01605 0.01241 1.293 0.196162
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.52 on 2028 degrees of freedom
## (351 observations deleted due to missingness)
## Multiple R-squared: 0.2811, Adjusted R-squared: 0.2783
## F-statistic: 99.12 on 8 and 2028 DF, p-value: < 2.2e-16
There is significant influence of Type, City, Dpm, Dwm, Sm and Om on the sales of Mondelez. We cant say about the influence of Month and PPUm
summary(mondelez_fit)
##
## Call:
## lm(formula = mondelez_model, data = mondelez.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -25.581 -7.223 -2.192 2.028 93.685
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.34629 2.25640 0.597 0.550804
## Type -0.62219 0.31175 -1.996 0.046091 *
## City -0.10391 0.03383 -3.071 0.002161 **
## Month -0.04282 0.09983 -0.429 0.668039
## Dpm 0.42645 0.03387 12.591 < 2e-16 ***
## Dwm -0.11771 0.03235 -3.638 0.000281 ***
## Sm 0.21751 0.02718 8.003 2.03e-15 ***
## Om 1.19997 0.17591 6.822 1.18e-11 ***
## PPUm 0.01605 0.01241 1.293 0.196162
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.52 on 2028 degrees of freedom
## (351 observations deleted due to missingness)
## Multiple R-squared: 0.2811, Adjusted R-squared: 0.2783
## F-statistic: 99.12 on 8 and 2028 DF, p-value: < 2.2e-16
Volume = B0 + B1Type + B2City + B3Month + B4Dpm + B5Dwm + B6Sm + B7Om + B8PPUm
Therefore, Volume = 1.346 -0.622Type - 0.1City - 0.04Month + 0.42Dpm -0.117Dwm + 0.217Sm + 1.199Om + 0.016PPUm
nestle.df = read.csv(paste("Nestle.csv", sep=""))
nestle.df$City <- as.integer((nestle.df$City))
nestle.df$Type <- as.integer((nestle.df$Type))
library("psych", lib.loc="~/R/win-library/3.5")
describe(nestle.df)
## vars n mean sd median trimmed mad min max range
## Company* 1 1332 1.00 0.00 1.00 1.00 0.00 1.00 1.00 0.00
## Type 2 1332 4.98 2.55 5.00 4.93 2.97 1.00 10.00 9.00
## City 3 1332 12.05 7.32 12.00 11.88 8.90 1.00 26.00 25.00
## Month 4 1332 5.50 3.45 5.50 5.50 4.45 0.00 11.00 11.00
## Dpn 5 1018 17.72 17.56 10.91 15.06 11.19 0.01 73.35 73.34
## Dwn 6 1018 45.18 25.13 43.52 45.10 30.45 0.01 91.85 91.84
## Sn 7 1018 4.64 5.29 2.47 3.82 2.28 0.00 42.78 42.78
## On 8 1018 0.93 3.99 0.43 0.47 0.28 0.00 42.78 42.78
## PPUn 9 1016 27.38 35.06 10.77 19.38 4.75 5.30 200.00 194.70
## Volume 10 1018 4.86 11.08 0.72 1.90 0.96 0.00 71.04 71.04
## skew kurtosis se
## Company* NaN NaN 0.00
## Type 0.11 -0.95 0.07
## City 0.15 -1.21 0.20
## Month 0.00 -1.22 0.09
## Dpn 1.16 0.20 0.55
## Dwn 0.09 -1.12 0.79
## Sn 3.12 15.42 0.17
## On 9.03 80.71 0.12
## PPUn 1.89 2.55 1.10
## Volume 3.44 11.93 0.35
nestle_model <- Volume ~ Type+City+Month+Dpn+Dwn+Sn+On+PPUn+Volume
nestlez_fit <- lm(nestle_model, data = nestle.df)
## Warning in model.matrix.default(mt, mf, contrasts): the response appeared
## on the right-hand side and was dropped
## Warning in model.matrix.default(mt, mf, contrasts): problem with term 9 in
## model.matrix: no columns are assigned
summary(nestlez_fit)
##
## Call:
## lm(formula = nestle_model, data = nestle.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -23.364 -2.015 -0.529 1.045 52.722
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.764155 1.468633 1.201 0.22995
## Type -0.511106 0.197368 -2.590 0.00975 **
## City -0.046987 0.036894 -1.274 0.20310
## Month -0.051543 0.076498 -0.674 0.50060
## Dpn 0.481885 0.044755 10.767 < 2e-16 ***
## Dwn -0.001788 0.023640 -0.076 0.93972
## Sn -0.710762 0.142640 -4.983 7.37e-07 ***
## On 1.307644 0.133940 9.763 < 2e-16 ***
## PPUn -0.001491 0.013159 -0.113 0.90981
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.358 on 1007 degrees of freedom
## (316 observations deleted due to missingness)
## Multiple R-squared: 0.4359, Adjusted R-squared: 0.4314
## F-statistic: 97.28 on 8 and 1007 DF, p-value: < 2.2e-16
Since, p-value < 0.05 => We infer that we reject null hypothesis that sales is not asociated with other variables and accept alternate hypothesis that data provides association between sales volume and other variables
summary(nestlez_fit)
##
## Call:
## lm(formula = nestle_model, data = nestle.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -23.364 -2.015 -0.529 1.045 52.722
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.764155 1.468633 1.201 0.22995
## Type -0.511106 0.197368 -2.590 0.00975 **
## City -0.046987 0.036894 -1.274 0.20310
## Month -0.051543 0.076498 -0.674 0.50060
## Dpn 0.481885 0.044755 10.767 < 2e-16 ***
## Dwn -0.001788 0.023640 -0.076 0.93972
## Sn -0.710762 0.142640 -4.983 7.37e-07 ***
## On 1.307644 0.133940 9.763 < 2e-16 ***
## PPUn -0.001491 0.013159 -0.113 0.90981
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.358 on 1007 degrees of freedom
## (316 observations deleted due to missingness)
## Multiple R-squared: 0.4359, Adjusted R-squared: 0.4314
## F-statistic: 97.28 on 8 and 1007 DF, p-value: < 2.2e-16
Since, Adjusted R-squared is average, the relationship is not very strong
summary(nestlez_fit)
##
## Call:
## lm(formula = nestle_model, data = nestle.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -23.364 -2.015 -0.529 1.045 52.722
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.764155 1.468633 1.201 0.22995
## Type -0.511106 0.197368 -2.590 0.00975 **
## City -0.046987 0.036894 -1.274 0.20310
## Month -0.051543 0.076498 -0.674 0.50060
## Dpn 0.481885 0.044755 10.767 < 2e-16 ***
## Dwn -0.001788 0.023640 -0.076 0.93972
## Sn -0.710762 0.142640 -4.983 7.37e-07 ***
## On 1.307644 0.133940 9.763 < 2e-16 ***
## PPUn -0.001491 0.013159 -0.113 0.90981
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.358 on 1007 degrees of freedom
## (316 observations deleted due to missingness)
## Multiple R-squared: 0.4359, Adjusted R-squared: 0.4314
## F-statistic: 97.28 on 8 and 1007 DF, p-value: < 2.2e-16
There is significant influence of Type, Dpn, Sn and On on the sales of Mondelez. We cant say about the influence of City, Month and Dwn, PPUn
summary(nestlez_fit)
##
## Call:
## lm(formula = nestle_model, data = nestle.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -23.364 -2.015 -0.529 1.045 52.722
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.764155 1.468633 1.201 0.22995
## Type -0.511106 0.197368 -2.590 0.00975 **
## City -0.046987 0.036894 -1.274 0.20310
## Month -0.051543 0.076498 -0.674 0.50060
## Dpn 0.481885 0.044755 10.767 < 2e-16 ***
## Dwn -0.001788 0.023640 -0.076 0.93972
## Sn -0.710762 0.142640 -4.983 7.37e-07 ***
## On 1.307644 0.133940 9.763 < 2e-16 ***
## PPUn -0.001491 0.013159 -0.113 0.90981
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.358 on 1007 degrees of freedom
## (316 observations deleted due to missingness)
## Multiple R-squared: 0.4359, Adjusted R-squared: 0.4314
## F-statistic: 97.28 on 8 and 1007 DF, p-value: < 2.2e-16
Volume = B0 + B1Type + B2City + B3Month + B4Dpn + B5Dwn + B6Sn + B7On + B8PPUn
Therefore, Volume = 1.37 -0.64Type - 0.044City - 0.05Month + 0.489Dpn -0.005Dwn + 0.7Sn + 1.299On + 0.0015PPUn