Session9 StoreData

Apurva (PGP32242)
16 October 2017

First Slide

setwd("~/Downloads/IIM Lucknow/TERM 5/DAM")
store.df <- read.csv("StoreData.csv")
summary(store.df)
    storeNum          Year          Week          p1sales   
 Min.   :101.0   Min.   :1.0   Min.   : 1.00   Min.   : 73  
 1st Qu.:105.8   1st Qu.:1.0   1st Qu.:13.75   1st Qu.:113  
 Median :110.5   Median :1.5   Median :26.50   Median :129  
 Mean   :110.5   Mean   :1.5   Mean   :26.50   Mean   :133  
 3rd Qu.:115.2   3rd Qu.:2.0   3rd Qu.:39.25   3rd Qu.:150  
 Max.   :120.0   Max.   :2.0   Max.   :52.00   Max.   :263  

    p2sales         p1price         p2price         p1prom   
 Min.   : 51.0   Min.   :2.190   Min.   :2.29   Min.   :0.0  
 1st Qu.: 84.0   1st Qu.:2.290   1st Qu.:2.49   1st Qu.:0.0  
 Median : 96.0   Median :2.490   Median :2.59   Median :0.0  
 Mean   :100.2   Mean   :2.544   Mean   :2.70   Mean   :0.1  
 3rd Qu.:113.0   3rd Qu.:2.790   3rd Qu.:2.99   3rd Qu.:0.0  
 Max.   :225.0   Max.   :2.990   Max.   :3.19   Max.   :1.0  

     p2prom       country 
 Min.   :0.0000   AU:104  
 1st Qu.:0.0000   BR:208  
 Median :0.0000   CN:208  
 Mean   :0.1385   DE:520  
 3rd Qu.:0.0000   GB:312  
 Max.   :1.0000   JP:416  
                  US:312  

1. What is the mean, standard deviation and variance of the sales of Coke?

library(psych)
describe(store.df$p1sales)
   vars    n   mean    sd median trimmed   mad min max range skew kurtosis
X1    1 2080 133.05 28.37    129  131.08 26.69  73 263   190 0.74     0.66
     se
X1 0.62

2. What is the correlation of the sales of Coke with the promotions of Coke?

x<- store.df$p1sales
y<- store.df$p1prom
z<- cor(x,y)
round(z,2)
[1] 0.42

3. What is the correlation of the sales of Coke with the promotions of Pepsi?

x<- store.df$p1sales
y<-store.df$p2prom
z<- cor(x,y)
round(z,2)
[1] -0.01

4. Create a correlation matrix of the sales and prices of Coke and Pepsi versus the promotions of Coke and Pepsi. Hint: This should be a 4*2 matrix.

x<- store.df[4:7]
y<- store.df[8:9]
z<-cor(x,y)
round(z,2)
        p1prom p2prom
p1sales   0.42  -0.01
p2sales  -0.01   0.56
p1price  -0.01   0.02
p2price   0.00  -0.01

5. Draw a corrgram illustrating the previous question

library(corrgram)
corrgram(store.df[,c(4:7,8:9)], order=FALSE, lower.panel=panel.conf,
         upper.panel=panel.pie, text.panel=panel.txt,
         main="Corrgram - Store")

plot of chunk unnamed-chunk-7

6.Test the null hypothesis that the sales of Pepsi are uncorrelated with Pepsi???s promotions

cor.test(store.df$p2sales,store.df$p2prom)

    Pearson's product-moment correlation

data:  store.df$p2sales and store.df$p2prom
t = 30.804, df = 2078, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.5296696 0.5887155
sample estimates:
     cor 
0.559903 

we can reject the null hypothesis as p value is less than .05

7. Test the null hypothesis that the sales of Pepsi are uncorrelated with Coke???s promotions

cor.test(store.df$p1sales,store.df$p1prom)

    Pearson's product-moment correlation

data:  store.df$p1sales and store.df$p1prom
t = 21.168, df = 2078, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.3851676 0.4559018
sample estimates:
     cor 
0.421175 

We can reject the null hypothesis as p value is less than .05

8. Run a simple linear regression of the sales of Coke on the price of Coke

lrcoke <- lm(p1sales ~ p1price, data = store.df)
summary(lrcoke)

Call:
lm(formula = p1sales ~ p1price, data = store.df)

Residuals:
    Min      1Q  Median      3Q     Max 
-52.724 -17.454  -2.819  14.463 111.276 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  267.138      4.523   59.06   <2e-16 ***
p1price      -52.700      1.766  -29.84   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 23.74 on 2078 degrees of freedom
Multiple R-squared:    0.3, Adjusted R-squared:  0.2997 
F-statistic: 890.6 on 1 and 2078 DF,  p-value: < 2.2e-16

9. Run another simple linear regression of the sales of Pepsi on the price of Pepsi

lrpepsi <- lm(p2sales ~ p2price, data = store.df)
summary(lrpepsi)

Call:
lm(formula = p2sales ~ p2price, data = store.df)

Residuals:
    Min      1Q  Median      3Q     Max 
-45.657 -15.657  -3.077  11.400 110.184 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  196.788      3.877   50.76   <2e-16 ***
p2price      -35.796      1.425  -25.11   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 21.4 on 2078 degrees of freedom
Multiple R-squared:  0.2328,    Adjusted R-squared:  0.2324 
F-statistic: 630.6 on 1 and 2078 DF,  p-value: < 2.2e-16

10. Compare the two simple linear regressions. The sales of which product are more responsive to a change in its price?

## Sale of Coke is more responsive