Coke Pepsi Data

Shikhar Kohli
16/10/17

Q1 What is the mean, standard deviation and variance of the sales of Coke?

setwd("~/code/DAM")
storedata.df <- read.csv("datasets/StoreData.csv")
library(psych)
describe(storedata.df$p1sales)  # mean, std. dev and variance of Coke sales
   vars    n   mean    sd median trimmed   mad min max range skew kurtosis
X1    1 2080 133.05 28.37    129  131.08 26.69  73 263   190 0.74     0.66
     se
X1 0.62

Q2 What is the correlation of the sales of Coke with the promotions of Coke?

cor(storedata.df$p1sales,storedata.df$p1prom) 
[1] 0.421175

Q3 What is the correlation of the sales of Coke with the promotions of Pepsi?

cor(storedata.df$p1sales,storedata.df$p2prom) 
[1] -0.01334702

Q4 Create a correlation matrix of the sales and prices of Coke and Pepsi versus the promotions of Coke and Pepsi. Hint: This should be a 4*2 matrix.

x <- storedata.df[,c("p1sales","p2sales","p1price","p2price")]
y <- storedata.df[,c("p1prom","p2prom")]
cor(x,y) 
              p1prom      p2prom
p1sales  0.421174952 -0.01334702
p2sales -0.013952850  0.55990301
p1price -0.014731296  0.02426913
p2price -0.001363308 -0.01201736

Q5 Draw a corrgram illustrating the previous question

x <- storedata.df[,c("p1sales","p2sales","p1price","p2price")]
y <- storedata.df[,c("p1prom","p2prom")]
library(corrgram)
corrgram(cor(x,y), order=FALSE, lower.panel=panel.conf,
         upper.panel=panel.pie, text.panel=panel.txt,
         main="Corrgram - StoreData")

plot of chunk unnamed-chunk-5

Q6 Test the null hypothesis that the sales of Pepsi are uncorrelated with Pepsi???s promotions

library(Hmisc)
rcorr(storedata.df$p2sales, storedata.df$p2prom)
     x    y
x 1.00 0.56
y 0.56 1.00

n= 2080 


P
  x  y 
x     0
y  0   

Q7 Test the null hypothesis that the sales of Pepsi are uncorrelated with Coke???s promotions

library(Hmisc)
rcorr(storedata.df$p2sales, storedata.df$p1prom)
      x     y
x  1.00 -0.01
y -0.01  1.00

n= 2080 


P
  x      y     
x        0.5248
y 0.5248       

Q8 Run a simple linear regression of the sales of Coke on the price of Coke

fit1 <- lm(p1sales ~ p1price, data = storedata.df)
summary(fit1)

Call:
lm(formula = p1sales ~ p1price, data = storedata.df)

Residuals:
    Min      1Q  Median      3Q     Max 
-52.724 -17.454  -2.819  14.463 111.276 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  267.138      4.523   59.06   <2e-16 ***
p1price      -52.700      1.766  -29.84   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 23.74 on 2078 degrees of freedom
Multiple R-squared:    0.3, Adjusted R-squared:  0.2997 
F-statistic: 890.6 on 1 and 2078 DF,  p-value: < 2.2e-16

Q9 Run another simple linear regression of the sales of Pepsi on the price of Pepsi

fit2 <- lm(p2sales ~ p2price, data = storedata.df)
summary(fit2)

Call:
lm(formula = p2sales ~ p2price, data = storedata.df)

Residuals:
    Min      1Q  Median      3Q     Max 
-45.657 -15.657  -3.077  11.400 110.184 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  196.788      3.877   50.76   <2e-16 ***
p2price      -35.796      1.425  -25.11   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 21.4 on 2078 degrees of freedom
Multiple R-squared:  0.2328,    Adjusted R-squared:  0.2324 
F-statistic: 630.6 on 1 and 2078 DF,  p-value: < 2.2e-16