library(tidyverse)
library(rafalib)
library(knitr)
data = read.csv('uk_retail.csv')
#Add new column for the date and month of purchase
data$Date = as.Date(data$Date, "%m/%d/%Y")
data$Month <- as.numeric(format(data$Date, format = "%m"))
data$Day <- as.numeric(format(data$Date, format = "%d"))

#Add new column for profit, which is price times quantity sold
data$Profit = data$Price*data$Quantity

#Filter data for items containing Christmas in product name
christmas = data %>% filter(grepl('Christmas', ProductName))

Emma Botton - Tesco Marketing Communications Director

Linkedin

Tesco is a British multinational groceries and general merchandise retailer. It is the market leader of groceries in the UK (where it has a market share of around 28.4%. Tesco owns 6,809 shops around the world with a reported 79 million shopping trips per week.

Recommendation

It is necessary to start advertising for Christmas before the month of November, as the majority of transactions for Christmas items were before November. The month that represented the median sales was October, and the profit made during the months of November and December were comparable to that made in the months of August to October. Thus, it is too late to wait until after Halloween for Christmas marketing. The pumpkins and Christmas trees need to be in your stores at the same time, embracing the festivities.

Evidence

Items sold during the festive periods of Christmas, Easter and Halloween are not always in stock year-round. It is therefore essential to be strategic about when to order product stock and the optimal time to advertise around these periods. The following graph visualises the profit recorded by an online e-commerce store during the year of 2019. It was filtered to only show items that contained the word Christmas in their item description.

#Y axis was rescaled for per thousand dollars for numbers to fit
ggplot(christmas , aes(x = Month, y = Profit/1000)) + geom_bar(stat = 'identity') + ggtitle('Revenue per month on Christmas items') + scale_x_continuous(breaks = seq(1, 12, by = 1)) + ylab('Profit / $\'000')

From the graph, it is seen that the majority of profit from Christmas items sold were recorded in the months leading up to December, which prompts the investigation when to start Christmas advertising. Based on a previous research article, the majority of Christmas profits were recorded in November and December, after Halloween was over (Gierl, 2021).

However, using this data, an alternate hypothesis that the mean month for Christmas sales is before November was proposed. Using a one sample T-test, it was shown that the mean month actually represented a value of 10.15 and a p value \[ p < 2.2 \times 10^{-16}\] This demonstrates significance to reject the original idea. Thus, it is clear that the mean sales occurred before November. This is also represented in the following table.

Month <- c('July','August','September','October','November','December')
Sales <- c(round(sum(christmas[sort(christmas$Month == '7'), 11])/1000), round(sum(christmas[sort(christmas$Month == '8'), 11])/1000), round(sum(christmas[sort(christmas$Month == '9'), 11])/1000), round(sum(christmas[sort(christmas$Month == '10'), 11])/1000), round(sum(christmas[sort(christmas$Month == '11'), 11])/1000), round(sum(christmas[sort(christmas$Month == '12'), 11])/1000))
df = data.frame(Month,Sales)
kable(df,col.names = c("Month","Sales ($'000)"), caption = 'Table of profit recorded from July to December 2019 on items with Christmas in product name')
Table of profit recorded from July to December 2019 on items with Christmas in product name
Month Sales ($’000)
July 60
August 96
September 331
October 790
November 1162
December 545

Defense of Approach

The client I have chosen needed to be in the field of marketing and advertising. I chose Tesco, a large retailer in the United Kingdom. This was partially due to the majority of the retail dataset coming from the UK, so it was logical to address the report to a British company rather than Woolworths for example. Since the client was high up in the company, I have curated the report to excluse most statistical testing in the evidence and focused on using the graph and table to strengthen my argument.

I have chosen to do hypothesis testing to determine which month represented the mean for Christmas sales. I used the null hypothesis presented in the research article that the median sales was recorded in November.

Hypothesis

\[H_0: \text{The mean month for Christmas sales was equal to 11 (November)}\] \[H_1: \text{The mean month for Christmas sales was less than 11 (before November)}\]

\[\text{Confidence threshold: } \alpha = 0.05\]

Assumptions:

  • Independence: the transactions in one month do not influence other transactions. This assumption is broken however, as there are some customers who purchased items in multiple months, thus affecting the independence.
  • Normality: assume the spread of data is normally distributed. This assumption is satisfied, since the sample size is large enough and the graph produced looks normal.

T test:

t.test(x = christmas$Month, mu = 11, alternative = 'less')
## 
##  One Sample t-test
## 
## data:  christmas$Month
## t = -70.881, df = 24996, p-value < 2.2e-16
## alternative hypothesis: true mean is less than 11
## 95 percent confidence interval:
##      -Inf 10.17127
## sample estimates:
## mean of x 
##  10.15158

P-value:

pt(-70.881, 24996, lower.tail = T)
## [1] 0

\[ p << \alpha\]

Conclusion:

There is sufficient evidence to conclude that the median sales for Christmas items was in face before November. From the dataset the sample mean was 10.15, which returned a p value extremely small. Thus, the null hypothesis can confidently be rejected and we can conclude that the mean month for Christmas sales was less than 11 (before November).

Limitations

  • Some Christmas items may not have the word Christmas in their name, so could not be counted in the dataset. For example, a product called ‘Bauble Decoration’ would not have been counted even though it is clearly Christmas related.
  • The month of purchase was only expressed as an integer, so an item bought on October 1 and October 31 carried the same weighting as 10 for the month value, which may affect the t-test.

Acknowledgements