Assignment 1

Q1. Create a Rmd file called StoreDataAnalysis.Rmd

Q2. Read the data in the file StoreData.CSV into a dataframe called store.df

Q3. Output the summary statistics (min, max, median etc) of the sales of Product 2 (Pepsi).

store.df<-read.csv("StoreData.csv")
summary(store.df$p2sales)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    51.0    84.0    96.0   100.2   113.0   225.0

Q4. Create a table aggregating the sales of Product 2 (Pepsi) in the presence / absence of promotions.

attach(store.df)
psych::describe(store.df)
##          vars    n   mean    sd median trimmed   mad    min    max range
## storeNum    1 2080 110.50  5.77 110.50  110.50  7.41 101.00 120.00  19.0
## Year        2 2080   1.50  0.50   1.50    1.50  0.74   1.00   2.00   1.0
## Week        3 2080  26.50 15.01  26.50   26.50 19.27   1.00  52.00  51.0
## p1sales     4 2080 133.05 28.37 129.00  131.08 26.69  73.00 263.00 190.0
## p2sales     5 2080 100.16 24.42  96.00   98.05 22.24  51.00 225.00 174.0
## p1price     6 2080   2.54  0.29   2.49    2.53  0.44   2.19   2.99   0.8
## p2price     7 2080   2.70  0.33   2.59    2.69  0.44   2.29   3.19   0.9
## p1prom      8 2080   0.10  0.30   0.00    0.00  0.00   0.00   1.00   1.0
## p2prom      9 2080   0.14  0.35   0.00    0.05  0.00   0.00   1.00   1.0
## country*   10 2080   4.55  1.72   4.50    4.62  2.22   1.00   7.00   6.0
##           skew kurtosis   se
## storeNum  0.00    -1.21 0.13
## Year      0.00    -2.00 0.01
## Week      0.00    -1.20 0.33
## p1sales   0.74     0.66 0.62
## p2sales   0.99     1.51 0.54
## p1price   0.28    -1.44 0.01
## p2price   0.32    -1.40 0.01
## p1prom    2.66     5.10 0.01
## p2prom    2.09     2.38 0.01
## country* -0.29    -0.81 0.04
p2prom.f = factor(p2prom, labels = c("No promotion","Promotion"))
aggdata <- aggregate(p2sales,by=list(p2prom.f),sum,na.rm=TRUE)
aggdata
##        Group.1      x
## 1 No promotion 169659
## 2    Promotion  38667

Do more to analyze the sales of Coke and Pepsi in the presence / absence of promotions

Q1a. Write R code to show the total number of rows and columns present in the data frame store.df. Q1b. Write R code to list the names of the column present in the data frame store.df.

nrow(store.df)
## [1] 2080
ncol(store.df)
## [1] 10
head(store.df,0)
##  [1] storeNum Year     Week     p1sales  p2sales  p1price  p2price 
##  [8] p1prom   p2prom   country 
## <0 rows> (or 0-length row.names)

Q2a. Write R code to output the data types of the different columns. Q2b. Write R code to convert the data types of the columns {Year, p1prom, p2prom, country} into factor variables. Q2c. Write R code to verify the conversions done in Q2b.

str(store.df)
## 'data.frame':    2080 obs. of  10 variables:
##  $ storeNum: int  101 101 101 101 101 101 101 101 101 101 ...
##  $ Year    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Week    : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ p1sales : int  127 137 156 117 138 115 116 106 116 145 ...
##  $ p2sales : int  106 105 97 106 100 127 90 126 94 91 ...
##  $ p1price : num  2.29 2.49 2.99 2.99 2.49 2.79 2.99 2.99 2.29 2.49 ...
##  $ p2price : num  2.29 2.49 2.99 3.19 2.59 2.49 3.19 2.29 2.29 2.99 ...
##  $ p1prom  : int  0 0 1 0 0 0 0 0 0 0 ...
##  $ p2prom  : int  0 0 0 0 1 0 0 0 0 0 ...
##  $ country : Factor w/ 7 levels "AU","BR","CN",..: 7 7 7 7 7 7 7 7 7 7 ...
store.df$Year<-factor(store.df$Year,labels = c("First","Second"))
store.df$p1prom<-factor(store.df$p1prom,labels = c("True","False"))
store.df$p2prom<-factor(store.df$p2prom,labels = c("True","False"))
str(store.df)
## 'data.frame':    2080 obs. of  10 variables:
##  $ storeNum: int  101 101 101 101 101 101 101 101 101 101 ...
##  $ Year    : Factor w/ 2 levels "First","Second": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Week    : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ p1sales : int  127 137 156 117 138 115 116 106 116 145 ...
##  $ p2sales : int  106 105 97 106 100 127 90 126 94 91 ...
##  $ p1price : num  2.29 2.49 2.99 2.99 2.49 2.79 2.99 2.99 2.29 2.49 ...
##  $ p2price : num  2.29 2.49 2.99 3.19 2.59 2.49 3.19 2.29 2.29 2.99 ...
##  $ p1prom  : Factor w/ 2 levels "True","False": 1 1 2 1 1 1 1 1 1 1 ...
##  $ p2prom  : Factor w/ 2 levels "True","False": 1 1 1 1 2 1 1 1 1 1 ...
##  $ country : Factor w/ 7 levels "AU","BR","CN",..: 7 7 7 7 7 7 7 7 7 7 ...

Q3a. Write R code to generate the summary statistics of the different variables present in the data frame store.df. Bonus: Q3b. Write R code to generate the following table using the describe() function from the psych package in R.

summary(store.df)
##     storeNum         Year           Week          p1sales   
##  Min.   :101.0   First :1040   Min.   : 1.00   Min.   : 73  
##  1st Qu.:105.8   Second:1040   1st Qu.:13.75   1st Qu.:113  
##  Median :110.5                 Median :26.50   Median :129  
##  Mean   :110.5                 Mean   :26.50   Mean   :133  
##  3rd Qu.:115.2                 3rd Qu.:39.25   3rd Qu.:150  
##  Max.   :120.0                 Max.   :52.00   Max.   :263  
##                                                             
##     p2sales         p1price         p2price       p1prom       p2prom    
##  Min.   : 51.0   Min.   :2.190   Min.   :2.29   True :1872   True :1792  
##  1st Qu.: 84.0   1st Qu.:2.290   1st Qu.:2.49   False: 208   False: 288  
##  Median : 96.0   Median :2.490   Median :2.59                            
##  Mean   :100.2   Mean   :2.544   Mean   :2.70                            
##  3rd Qu.:113.0   3rd Qu.:2.790   3rd Qu.:2.99                            
##  Max.   :225.0   Max.   :2.990   Max.   :3.19                            
##                                                                          
##  country 
##  AU:104  
##  BR:208  
##  CN:208  
##  DE:520  
##  GB:312  
##  JP:416  
##  US:312
library("psych", lib.loc="~/R/win-library/3.5")
describe(store.df)
##          vars    n   mean    sd median trimmed   mad    min    max range
## storeNum    1 2080 110.50  5.77 110.50  110.50  7.41 101.00 120.00  19.0
## Year*       2 2080   1.50  0.50   1.50    1.50  0.74   1.00   2.00   1.0
## Week        3 2080  26.50 15.01  26.50   26.50 19.27   1.00  52.00  51.0
## p1sales     4 2080 133.05 28.37 129.00  131.08 26.69  73.00 263.00 190.0
## p2sales     5 2080 100.16 24.42  96.00   98.05 22.24  51.00 225.00 174.0
## p1price     6 2080   2.54  0.29   2.49    2.53  0.44   2.19   2.99   0.8
## p2price     7 2080   2.70  0.33   2.59    2.69  0.44   2.29   3.19   0.9
## p1prom*     8 2080   1.10  0.30   1.00    1.00  0.00   1.00   2.00   1.0
## p2prom*     9 2080   1.14  0.35   1.00    1.05  0.00   1.00   2.00   1.0
## country*   10 2080   4.55  1.72   4.50    4.62  2.22   1.00   7.00   6.0
##           skew kurtosis   se
## storeNum  0.00    -1.21 0.13
## Year*     0.00    -2.00 0.01
## Week      0.00    -1.20 0.33
## p1sales   0.74     0.66 0.62
## p2sales   0.99     1.51 0.54
## p1price   0.28    -1.44 0.01
## p2price   0.32    -1.40 0.01
## p1prom*   2.66     5.10 0.01
## p2prom*   2.09     2.38 0.01
## country* -0.29    -0.81 0.04

Q4a. Break-up store.df by country. Write R code to generate the following break-ups by country. country AU BR CN DE GB JP US 104 208 208 520 312 416 312 Q4b. Write R code to express Q4a numbers in terms of percentages.

table(store.df$country)
## 
##  AU  BR  CN  DE  GB  JP  US 
## 104 208 208 520 312 416 312
prop.table(table(store.df$country))*100
## 
## AU BR CN DE GB JP US 
##  5 10 10 25 15 20 15

Q5a.The dataset has 2080 rows of weekly data for 20 unique store branches for two different years. For product 1 (Coke) and product 2 (Pepsi), write R code to count the number of weeks,

When both products were under promotion?

When product 1 (Coke) was under promotion but product 2 (Pepsi) was not under promotion?

When product 1 (Coke) was not under promotion but product 2 (Pepsi) was under promotion?

When neither product was under promotion?

sum(store.df$p1prom == "True" & store.df$p2prom == "True", na.rm=TRUE)
## [1] 1616
sum(store.df$p1prom == "True" & store.df$p2prom == "False", na.rm=TRUE)
## [1] 256
sum(store.df$p1prom == "False" & store.df$p2prom == "True", na.rm=TRUE)
## [1] 176
sum(store.df$p1prom == "False" & store.df$p2prom == "False", na.rm=TRUE)
## [1] 32

Q5b. Express Q5a as in percentages.

sum(store.df$p1prom == "True" & store.df$p2prom == "True", na.rm=TRUE)/nrow(store.df)*100
## [1] 77.69231
sum(store.df$p1prom == "True" & store.df$p2prom == "False", na.rm=TRUE)/nrow(store.df)*100
## [1] 12.30769
sum(store.df$p1prom == "False" & store.df$p2prom == "True", na.rm=TRUE)/nrow(store.df)*100
## [1] 8.461538
sum(store.df$p1prom == "False" & store.df$p2prom == "False", na.rm=TRUE)/nrow(store.df)*100
## [1] 1.538462

Q5c. Write R code to calculate the average price of product 1 (Coke) and product 2 (Pepsi) when both were under promotion.

Q5d. Write R code to calculate the average price of product 1 (Coke) and product 2 (Pepsi) when neither of them was under promotion.

Q5e. Write R code to calculate the average price of product 1 (Coke) and product 2 (Pepsi) when only one of them was under promotion.

aggregate(store.df$p1price, by=list(store.df$p1prom, store.df$p2prom), FUN=mean)
##   Group.1 Group.2        x
## 1    True    True 2.543342
## 2   False    True 2.524659
## 3    True   False 2.561484
## 4   False   False 2.568125
aggregate(store.df$p2price, by=list(store.df$p1prom, store.df$p2prom), FUN=mean)
##   Group.1 Group.2        x
## 1    True    True 2.700891
## 2   False    True 2.703068
## 3    True   False 2.691953
## 4   False   False 2.671250

Q6. Analyze the above numbers. Think like a Senior Manager. Prepare a list of qualitative insights on the “Effect of Promotions on Sales” of Coke and Pepsi, to be shared in class.

When both pepsi and coke ran promotions simultaneously, the average sales of both drinks were Pepsi average sales: 94.87 Coke average sales: 129.26

When only pepsi ran promotions and coke didn’t, the average sales of pepsi was lower than when both ran promotions and those of coke were much higher Pepsi average sales: 92.88 Coke average sales: 169.25

When only coke ran promotions and pepsi didn’t, the average sales of pepsi was much higher than when both ran promotions, while average sales of coke are lower Pepsi average sales: 134.35 Coke average sales: 127.82