Q1. Create a Rmd file called StoreDataAnalysis.Rmd
Q2. Read the data in the file StoreData.CSV into a dataframe called store.df
Q3. Output the summary statistics (min, max, median etc) of the sales of Product 2 (Pepsi).
store.df<-read.csv("StoreData.csv")
summary(store.df$p2sales)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 51.0 84.0 96.0 100.2 113.0 225.0
Q4. Create a table aggregating the sales of Product 2 (Pepsi) in the presence / absence of promotions.
attach(store.df)
psych::describe(store.df)
## vars n mean sd median trimmed mad min max range
## storeNum 1 2080 110.50 5.77 110.50 110.50 7.41 101.00 120.00 19.0
## Year 2 2080 1.50 0.50 1.50 1.50 0.74 1.00 2.00 1.0
## Week 3 2080 26.50 15.01 26.50 26.50 19.27 1.00 52.00 51.0
## p1sales 4 2080 133.05 28.37 129.00 131.08 26.69 73.00 263.00 190.0
## p2sales 5 2080 100.16 24.42 96.00 98.05 22.24 51.00 225.00 174.0
## p1price 6 2080 2.54 0.29 2.49 2.53 0.44 2.19 2.99 0.8
## p2price 7 2080 2.70 0.33 2.59 2.69 0.44 2.29 3.19 0.9
## p1prom 8 2080 0.10 0.30 0.00 0.00 0.00 0.00 1.00 1.0
## p2prom 9 2080 0.14 0.35 0.00 0.05 0.00 0.00 1.00 1.0
## country* 10 2080 4.55 1.72 4.50 4.62 2.22 1.00 7.00 6.0
## skew kurtosis se
## storeNum 0.00 -1.21 0.13
## Year 0.00 -2.00 0.01
## Week 0.00 -1.20 0.33
## p1sales 0.74 0.66 0.62
## p2sales 0.99 1.51 0.54
## p1price 0.28 -1.44 0.01
## p2price 0.32 -1.40 0.01
## p1prom 2.66 5.10 0.01
## p2prom 2.09 2.38 0.01
## country* -0.29 -0.81 0.04
p2prom.f = factor(p2prom, labels = c("No promotion","Promotion"))
aggdata <- aggregate(p2sales,by=list(p2prom.f),sum,na.rm=TRUE)
aggdata
## Group.1 x
## 1 No promotion 169659
## 2 Promotion 38667
Do more to analyze the sales of Coke and Pepsi in the presence / absence of promotions
Q1a. Write R code to show the total number of rows and columns present in the data frame store.df. Q1b. Write R code to list the names of the column present in the data frame store.df.
nrow(store.df)
## [1] 2080
ncol(store.df)
## [1] 10
head(store.df,0)
## [1] storeNum Year Week p1sales p2sales p1price p2price
## [8] p1prom p2prom country
## <0 rows> (or 0-length row.names)
Q2a. Write R code to output the data types of the different columns. Q2b. Write R code to convert the data types of the columns {Year, p1prom, p2prom, country} into factor variables. Q2c. Write R code to verify the conversions done in Q2b.
str(store.df)
## 'data.frame': 2080 obs. of 10 variables:
## $ storeNum: int 101 101 101 101 101 101 101 101 101 101 ...
## $ Year : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Week : int 1 2 3 4 5 6 7 8 9 10 ...
## $ p1sales : int 127 137 156 117 138 115 116 106 116 145 ...
## $ p2sales : int 106 105 97 106 100 127 90 126 94 91 ...
## $ p1price : num 2.29 2.49 2.99 2.99 2.49 2.79 2.99 2.99 2.29 2.49 ...
## $ p2price : num 2.29 2.49 2.99 3.19 2.59 2.49 3.19 2.29 2.29 2.99 ...
## $ p1prom : int 0 0 1 0 0 0 0 0 0 0 ...
## $ p2prom : int 0 0 0 0 1 0 0 0 0 0 ...
## $ country : Factor w/ 7 levels "AU","BR","CN",..: 7 7 7 7 7 7 7 7 7 7 ...
store.df$Year<-factor(store.df$Year,labels = c("First","Second"))
store.df$p1prom<-factor(store.df$p1prom,labels = c("True","False"))
store.df$p2prom<-factor(store.df$p2prom,labels = c("True","False"))
str(store.df)
## 'data.frame': 2080 obs. of 10 variables:
## $ storeNum: int 101 101 101 101 101 101 101 101 101 101 ...
## $ Year : Factor w/ 2 levels "First","Second": 1 1 1 1 1 1 1 1 1 1 ...
## $ Week : int 1 2 3 4 5 6 7 8 9 10 ...
## $ p1sales : int 127 137 156 117 138 115 116 106 116 145 ...
## $ p2sales : int 106 105 97 106 100 127 90 126 94 91 ...
## $ p1price : num 2.29 2.49 2.99 2.99 2.49 2.79 2.99 2.99 2.29 2.49 ...
## $ p2price : num 2.29 2.49 2.99 3.19 2.59 2.49 3.19 2.29 2.29 2.99 ...
## $ p1prom : Factor w/ 2 levels "True","False": 1 1 2 1 1 1 1 1 1 1 ...
## $ p2prom : Factor w/ 2 levels "True","False": 1 1 1 1 2 1 1 1 1 1 ...
## $ country : Factor w/ 7 levels "AU","BR","CN",..: 7 7 7 7 7 7 7 7 7 7 ...
Q3a. Write R code to generate the summary statistics of the different variables present in the data frame store.df. Bonus: Q3b. Write R code to generate the following table using the describe() function from the psych package in R.
summary(store.df)
## storeNum Year Week p1sales
## Min. :101.0 First :1040 Min. : 1.00 Min. : 73
## 1st Qu.:105.8 Second:1040 1st Qu.:13.75 1st Qu.:113
## Median :110.5 Median :26.50 Median :129
## Mean :110.5 Mean :26.50 Mean :133
## 3rd Qu.:115.2 3rd Qu.:39.25 3rd Qu.:150
## Max. :120.0 Max. :52.00 Max. :263
##
## p2sales p1price p2price p1prom p2prom
## Min. : 51.0 Min. :2.190 Min. :2.29 True :1872 True :1792
## 1st Qu.: 84.0 1st Qu.:2.290 1st Qu.:2.49 False: 208 False: 288
## Median : 96.0 Median :2.490 Median :2.59
## Mean :100.2 Mean :2.544 Mean :2.70
## 3rd Qu.:113.0 3rd Qu.:2.790 3rd Qu.:2.99
## Max. :225.0 Max. :2.990 Max. :3.19
##
## country
## AU:104
## BR:208
## CN:208
## DE:520
## GB:312
## JP:416
## US:312
library("psych", lib.loc="~/R/win-library/3.5")
describe(store.df)
## vars n mean sd median trimmed mad min max range
## storeNum 1 2080 110.50 5.77 110.50 110.50 7.41 101.00 120.00 19.0
## Year* 2 2080 1.50 0.50 1.50 1.50 0.74 1.00 2.00 1.0
## Week 3 2080 26.50 15.01 26.50 26.50 19.27 1.00 52.00 51.0
## p1sales 4 2080 133.05 28.37 129.00 131.08 26.69 73.00 263.00 190.0
## p2sales 5 2080 100.16 24.42 96.00 98.05 22.24 51.00 225.00 174.0
## p1price 6 2080 2.54 0.29 2.49 2.53 0.44 2.19 2.99 0.8
## p2price 7 2080 2.70 0.33 2.59 2.69 0.44 2.29 3.19 0.9
## p1prom* 8 2080 1.10 0.30 1.00 1.00 0.00 1.00 2.00 1.0
## p2prom* 9 2080 1.14 0.35 1.00 1.05 0.00 1.00 2.00 1.0
## country* 10 2080 4.55 1.72 4.50 4.62 2.22 1.00 7.00 6.0
## skew kurtosis se
## storeNum 0.00 -1.21 0.13
## Year* 0.00 -2.00 0.01
## Week 0.00 -1.20 0.33
## p1sales 0.74 0.66 0.62
## p2sales 0.99 1.51 0.54
## p1price 0.28 -1.44 0.01
## p2price 0.32 -1.40 0.01
## p1prom* 2.66 5.10 0.01
## p2prom* 2.09 2.38 0.01
## country* -0.29 -0.81 0.04
Q4a. Break-up store.df by country. Write R code to generate the following break-ups by country. country AU BR CN DE GB JP US 104 208 208 520 312 416 312 Q4b. Write R code to express Q4a numbers in terms of percentages.
table(store.df$country)
##
## AU BR CN DE GB JP US
## 104 208 208 520 312 416 312
prop.table(table(store.df$country))*100
##
## AU BR CN DE GB JP US
## 5 10 10 25 15 20 15
Q5a.The dataset has 2080 rows of weekly data for 20 unique store branches for two different years. For product 1 (Coke) and product 2 (Pepsi), write R code to count the number of weeks,
When both products were under promotion?
When product 1 (Coke) was under promotion but product 2 (Pepsi) was not under promotion?
When product 1 (Coke) was not under promotion but product 2 (Pepsi) was under promotion?
When neither product was under promotion?
sum(store.df$p1prom == "True" & store.df$p2prom == "True", na.rm=TRUE)
## [1] 1616
sum(store.df$p1prom == "True" & store.df$p2prom == "False", na.rm=TRUE)
## [1] 256
sum(store.df$p1prom == "False" & store.df$p2prom == "True", na.rm=TRUE)
## [1] 176
sum(store.df$p1prom == "False" & store.df$p2prom == "False", na.rm=TRUE)
## [1] 32
Q5b. Express Q5a as in percentages.
sum(store.df$p1prom == "True" & store.df$p2prom == "True", na.rm=TRUE)/nrow(store.df)*100
## [1] 77.69231
sum(store.df$p1prom == "True" & store.df$p2prom == "False", na.rm=TRUE)/nrow(store.df)*100
## [1] 12.30769
sum(store.df$p1prom == "False" & store.df$p2prom == "True", na.rm=TRUE)/nrow(store.df)*100
## [1] 8.461538
sum(store.df$p1prom == "False" & store.df$p2prom == "False", na.rm=TRUE)/nrow(store.df)*100
## [1] 1.538462
Q5c. Write R code to calculate the average price of product 1 (Coke) and product 2 (Pepsi) when both were under promotion.
Q5d. Write R code to calculate the average price of product 1 (Coke) and product 2 (Pepsi) when neither of them was under promotion.
Q5e. Write R code to calculate the average price of product 1 (Coke) and product 2 (Pepsi) when only one of them was under promotion.
aggregate(store.df$p1price, by=list(store.df$p1prom, store.df$p2prom), FUN=mean)
## Group.1 Group.2 x
## 1 True True 2.543342
## 2 False True 2.524659
## 3 True False 2.561484
## 4 False False 2.568125
aggregate(store.df$p2price, by=list(store.df$p1prom, store.df$p2prom), FUN=mean)
## Group.1 Group.2 x
## 1 True True 2.700891
## 2 False True 2.703068
## 3 True False 2.691953
## 4 False False 2.671250
Q6. Analyze the above numbers. Think like a Senior Manager. Prepare a list of qualitative insights on the “Effect of Promotions on Sales” of Coke and Pepsi, to be shared in class.
When both pepsi and coke ran promotions simultaneously, the average sales of both drinks were Pepsi average sales: 94.87 Coke average sales: 129.26
When only pepsi ran promotions and coke didn’t, the average sales of pepsi was lower than when both ran promotions and those of coke were much higher Pepsi average sales: 92.88 Coke average sales: 169.25
When only coke ran promotions and pepsi didn’t, the average sales of pepsi was much higher than when both ran promotions, while average sales of coke are lower Pepsi average sales: 134.35 Coke average sales: 127.82