Important before geting started

1. Set working directory

2. Read the data StoreData.csv file into a data frame called store.df.

# reading data and storing into `store.df` dataframe
store.df <- read.csv(paste("StoreData.csv"))

QUESTION 1

Q1a. Write R code to show the total number of rows and columns present in the data frame store.df.

# number of rows and columns
dim(store.df)
## [1] 2080   10

Q1b. Write R code to list the names of the column present in the data frame store.df.

# column names
colnames(store.df)
##  [1] "storeNum" "Year"     "Week"     "p1sales"  "p2sales"  "p1price" 
##  [7] "p2price"  "p1prom"   "p2prom"   "country"

QUESTION 2

Q2a. Write R code to output the data types of the different columns.

# data types of the variables of 'store.df' data frame
str(store.df)
## 'data.frame':    2080 obs. of  10 variables:
##  $ storeNum: int  101 101 101 101 101 101 101 101 101 101 ...
##  $ Year    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Week    : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ p1sales : int  127 137 156 117 138 115 116 106 116 145 ...
##  $ p2sales : int  106 105 97 106 100 127 90 126 94 91 ...
##  $ p1price : num  2.29 2.49 2.99 2.99 2.49 2.79 2.99 2.99 2.29 2.49 ...
##  $ p2price : num  2.29 2.49 2.99 3.19 2.59 2.49 3.19 2.29 2.29 2.99 ...
##  $ p1prom  : int  0 0 1 0 0 0 0 0 0 0 ...
##  $ p2prom  : int  0 0 0 0 1 0 0 0 0 0 ...
##  $ country : Factor w/ 7 levels "AU","BR","CN",..: 7 7 7 7 7 7 7 7 7 7 ...

Q2b. Write R code to convert the data types of the columns {Year, p1prom, p2prom, country} into factor variables.

# convert 'Year' into factor
store.df$Year <- as.factor(store.df$Year)
# convert 'p1prom' into factor
store.df$p1prom <- as.factor(store.df$p1prom)
# convert 'p2prom' into factor
store.df$p2prom <- as.factor(store.df$p2prom)
# convert 'country' into factor
store.df$country <- as.factor(store.df$country)

Q2c. Write R code to verify the conversions done in Q2b.

# verifying conversions
str(store.df)
## 'data.frame':    2080 obs. of  10 variables:
##  $ storeNum: int  101 101 101 101 101 101 101 101 101 101 ...
##  $ Year    : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Week    : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ p1sales : int  127 137 156 117 138 115 116 106 116 145 ...
##  $ p2sales : int  106 105 97 106 100 127 90 126 94 91 ...
##  $ p1price : num  2.29 2.49 2.99 2.99 2.49 2.79 2.99 2.99 2.29 2.49 ...
##  $ p2price : num  2.29 2.49 2.99 3.19 2.59 2.49 3.19 2.29 2.29 2.99 ...
##  $ p1prom  : Factor w/ 2 levels "0","1": 1 1 2 1 1 1 1 1 1 1 ...
##  $ p2prom  : Factor w/ 2 levels "0","1": 1 1 1 1 2 1 1 1 1 1 ...
##  $ country : Factor w/ 7 levels "AU","BR","CN",..: 7 7 7 7 7 7 7 7 7 7 ...
# another way to check the conversions happened
is.factor(store.df$p1prom)
## [1] TRUE

QUESTION 3

Q3a. Write R code to generate the summary statistics of the different variables present in the data frame store.df.

# summary statistics of the dataframe 'store.df'
summary(store.df)
##     storeNum     Year          Week          p1sales       p2sales     
##  Min.   :101.0   1:1040   Min.   : 1.00   Min.   : 73   Min.   : 51.0  
##  1st Qu.:105.8   2:1040   1st Qu.:13.75   1st Qu.:113   1st Qu.: 84.0  
##  Median :110.5            Median :26.50   Median :129   Median : 96.0  
##  Mean   :110.5            Mean   :26.50   Mean   :133   Mean   :100.2  
##  3rd Qu.:115.2            3rd Qu.:39.25   3rd Qu.:150   3rd Qu.:113.0  
##  Max.   :120.0            Max.   :52.00   Max.   :263   Max.   :225.0  
##                                                                        
##     p1price         p2price     p1prom   p2prom   country 
##  Min.   :2.190   Min.   :2.29   0:1872   0:1792   AU:104  
##  1st Qu.:2.290   1st Qu.:2.49   1: 208   1: 288   BR:208  
##  Median :2.490   Median :2.59                     CN:208  
##  Mean   :2.544   Mean   :2.70                     DE:520  
##  3rd Qu.:2.790   3rd Qu.:2.99                     GB:312  
##  Max.   :2.990   Max.   :3.19                     JP:416  
##                                                   US:312

Bonus: Q3b. Write R code to generate the following table using the describe() function from the psych package in R.

# attaching data columns of dataframe 'store.df'
attach(store.df)
# make sure you have installed the package 'psych'
library(psych)
describe(store.df)[, c(1:5, 6:8, 13)]  # selected columns
##          vars    n   mean    sd median trimmed   mad    min   se
## storeNum    1 2080 110.50  5.77 110.50  110.50  7.41 101.00 0.13
## Year*       2 2080   1.50  0.50   1.50    1.50  0.74   1.00 0.01
## Week        3 2080  26.50 15.01  26.50   26.50 19.27   1.00 0.33
## p1sales     4 2080 133.05 28.37 129.00  131.08 26.69  73.00 0.62
## p2sales     5 2080 100.16 24.42  96.00   98.05 22.24  51.00 0.54
## p1price     6 2080   2.54  0.29   2.49    2.53  0.44   2.19 0.01
## p2price     7 2080   2.70  0.33   2.59    2.69  0.44   2.29 0.01
## p1prom*     8 2080   1.10  0.30   1.00    1.00  0.00   1.00 0.01
## p2prom*     9 2080   1.14  0.35   1.00    1.05  0.00   1.00 0.01
## country*   10 2080   4.55  1.72   4.50    4.62  2.22   1.00 0.04

QUESTION 4

Q4a. Break-up store.df by country. Write R code to generate the following break-ups by country.

# frequency counts by 'country'
table(country)
## country
##  AU  BR  CN  DE  GB  JP  US 
## 104 208 208 520 312 416 312

Q4b. Write R code to express Q4a numbers in terms of percentages.

# frequency counts by 'country'
tab1 <- table(country)
# proportions
prop.table(tab1)
## country
##   AU   BR   CN   DE   GB   JP   US 
## 0.05 0.10 0.10 0.25 0.15 0.20 0.15
# percentages
prop.table(tab1)*100
## country
## AU BR CN DE GB JP US 
##  5 10 10 25 15 20 15

QUESTION 5

Q5a.The dataset has 2080 rows of weekly data for 20 unique store branches for two different years. For product 1 (Coke) and product 2 (Pepsi), write R code to count the number of weeks,

  1. When both products were under promotion?

  2. When product 1 (Coke) was under promotion but product 2 (Pepsi) was not under promotion?

  3. When product 1 (Coke) was not under promotion but product 2 (Pepsi) was under promotion?

  4. When neither product was under promotion?

Q5b. Express Q5a as in percentages.

Q5c. Write R code to calculate the average price of product 1 (Coke) and product 2 (Pepsi) when both were under promotion.

Q5d. Write R code to calculate the average price of product 1 (Coke) and product 2 (Pepsi) when neither of them was under promotion.

Q5e. Write R code to calculate the average price of product 1 (Coke) and product 2 (Pepsi) when only one of them was under promotion.

agg1 <- aggregate(list(p1price, p2price), by = list(p1prom, p2prom), mean)
colnames(agg1) <- c("P1 Promotion", "P2 Promotion", "Average P1 Price", "Average P2 Price")
agg1
##   P1 Promotion P2 Promotion Average P1 Price Average P2 Price
## 1            0            0         2.543342         2.700891
## 2            1            0         2.524659         2.703068
## 3            0            1         2.561484         2.691953
## 4            1            1         2.568125         2.671250

QUESTION 6

Q6. Analyze the above numbers. Think like a Senior Manager. Prepare a list of qualitative insights on the “Effect of Promotions on Sales” of Coke and Pepsi, to be shared in class.