CokeVsPepsi

Agradeep
16 october 2017

Summary of data

Data<-read.csv("StoreData.csv")
summary(Data)
    storeNum          Year          Week          p1sales   
 Min.   :101.0   Min.   :1.0   Min.   : 1.00   Min.   : 73  
 1st Qu.:105.8   1st Qu.:1.0   1st Qu.:13.75   1st Qu.:113  
 Median :110.5   Median :1.5   Median :26.50   Median :129  
 Mean   :110.5   Mean   :1.5   Mean   :26.50   Mean   :133  
 3rd Qu.:115.2   3rd Qu.:2.0   3rd Qu.:39.25   3rd Qu.:150  
 Max.   :120.0   Max.   :2.0   Max.   :52.00   Max.   :263  

    p2sales         p1price         p2price         p1prom   
 Min.   : 51.0   Min.   :2.190   Min.   :2.29   Min.   :0.0  
 1st Qu.: 84.0   1st Qu.:2.290   1st Qu.:2.49   1st Qu.:0.0  
 Median : 96.0   Median :2.490   Median :2.59   Median :0.0  
 Mean   :100.2   Mean   :2.544   Mean   :2.70   Mean   :0.1  
 3rd Qu.:113.0   3rd Qu.:2.790   3rd Qu.:2.99   3rd Qu.:0.0  
 Max.   :225.0   Max.   :2.990   Max.   :3.19   Max.   :1.0  

     p2prom       country 
 Min.   :0.0000   AU:104  
 1st Qu.:0.0000   BR:208  
 Median :0.0000   CN:208  
 Mean   :0.1385   DE:520  
 3rd Qu.:0.0000   GB:312  
 Max.   :1.0000   JP:416  
                  US:312  

Correlation between coke sales and coke's promotion

attach(Data)
cor.test(Data$p1sales, Data$p1prom)

    Pearson's product-moment correlation

data:  Data$p1sales and Data$p1prom
t = 21.168, df = 2078, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.3851676 0.4559018
sample estimates:
     cor 
0.421175 

```

correlation between coke sales and pepsi promotion

attach(Data)
cor.test(Data$p1sales, Data$p2prom)

    Pearson's product-moment correlation

data:  Data$p1sales and Data$p2prom
t = -0.60848, df = 2078, p-value = 0.5429
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.05629431  0.02964957
sample estimates:
        cor 
-0.01334702 

```

correlation matrix of the sales and prices of Coke and Pepsi versus the promotions of Coke and Pepsi

a <- Data[,c("p1sales", "p2sales", "p1price", "p2price")]
# two columns in y
b <- Data[,c("p1prom", "p2prom")]
z <- cor(a,b)
round(z,2)
        p1prom p2prom
p1sales   0.42  -0.01
p2sales  -0.01   0.56
p1price  -0.01   0.02
p2price   0.00  -0.01

```

corrgram

library(corrgram)
corrgram(Data[,c(4:9)], order=FALSE, lower.panel=panel.conf,
         upper.panel=panel.pie, text.panel=panel.txt,
         main="Corrgram - StoreData")

![plot of chunk unnamed-chunk-5](CokeVsPepsi-figure/unnamed-chunk-5-1.png)

Test the null hypothesis that the sales of Pepsi are uncorrelated with Pepsi's promotions

cor.test(Data$p2sales,Data$p2prom)

    Pearson's product-moment correlation

data:  Data$p2sales and Data$p2prom
t = 30.804, df = 2078, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.5296696 0.5887155
sample estimates:
     cor 
0.559903 

Since P value is less than 0.05,so the null hypothesis of no correlation between pepsi sales and pepsi promotion is rejected

Test the null hypothesis that the sales of Pepsi are uncorrelated with Coke's promotions

cor.test(Data$p2sales,Data$p1prom)

    Pearson's product-moment correlation

data:  Data$p2sales and Data$p1prom
t = -0.6361, df = 2078, p-value = 0.5248
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.05689831  0.02904415
sample estimates:
        cor 
-0.01395285 

p value is not statiscally significant,the null hypothesis is accepted

Run a simple linear regression of the sales of Coke on the price of Coke

fit <- lm(p1sales ~ p1price, data = Data)
summary(fit)

Call:
lm(formula = p1sales ~ p1price, data = Data)

Residuals:
    Min      1Q  Median      3Q     Max 
-52.724 -17.454  -2.819  14.463 111.276 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  267.138      4.523   59.06   <2e-16 ***
p1price      -52.700      1.766  -29.84   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 23.74 on 2078 degrees of freedom
Multiple R-squared:    0.3, Adjusted R-squared:  0.2997 
F-statistic: 890.6 on 1 and 2078 DF,  p-value: < 2.2e-16

```

Run another simple linear regression of the sales of Pepsi on the price of Pepsi

fit1 <- lm(p2sales ~ p2price, data = Data)
summary(fit1)

Call:
lm(formula = p2sales ~ p2price, data = Data)

Residuals:
    Min      1Q  Median      3Q     Max 
-45.657 -15.657  -3.077  11.400 110.184 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  196.788      3.877   50.76   <2e-16 ***
p2price      -35.796      1.425  -25.11   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 21.4 on 2078 degrees of freedom
Multiple R-squared:  0.2328,    Adjusted R-squared:  0.2324 
F-statistic: 630.6 on 1 and 2078 DF,  p-value: < 2.2e-16

```

Compare the two simple linear regressions. The sales of which product are more responsive to a change in its price?

Beta for sales of coke on price of coke is -52.7,means an increase in price of coke by 1 units leads to drecrese in sales of coke by 52.7 units

Beta for sales of pepsi on price of pepsi is 35.92,means an increase in price of pepsi by 1 unit leads to decrease in sales of coke by 35.79 units