Group/Individual Details

Executive Statement

Australia is dominated by two supermarket chains; Coles and Woolworths and the intention of this report is to find whether there is a price difference between them. By randomly sampling approximately 1% of matching food items from both supermarkets and comparing the non sale prices, a statistically significant price difference was found.

When collecting the data, the decision was made to use online prices rather than in-store to try to mitigate price differences by store location, furthermore, as the primary item sold is food products, we decided to focus on those items. On the Coles website, the number of items per department or category is listed. Use this to randomly generate an item to look up. For example, there are 213 herb items. We chose 2 random numbers between 1 and 213 by using r code: “sample(213,2)”. For ease of use, order the samples by “%>% sort”. We found the n-th item in Coles category, then search for the same item (brand and size) via the search function in Woolworths. If the item does not exist, proceed to the next item from Coles etc.. recording only the full price items as this will not change as frequently as weekly specials.

The categories of items that have been included are Bakery (4); Fruit and Vegetables (5); Meat, Seafood and Deli (11); Dairy, Eggs and Deli (13); Pantry (59); and Frozen (10). This compares 102 equivalent food items from both stores.

Testing our hypothesis, we found that Woolworths items were consistently cheaper than Coles. When we looked at a categorised view, pantry items which were made up a large proportion of the sample, tended to be more expensive at Coles.

Although this report did find a significant difference in price, there was a number of limitations which need to be accounted before we can definitively conclude that Woolworths is cheaper than Coles.

Load Packages and Data

# This is a chunk where you can load the necessary data and packages required to reproduce the report
# You should also include your code required to prepare your data for analysis. 
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
Supermarket_wars <- read.csv("supermarket_wars.csv", header = FALSE, col.names = c("ID","Isle","Specific_Isle","Item","Price","Store"))
Woolworths <- filter(Supermarket_wars, Supermarket_wars$Store=="Woolworths")
Coles <- filter(Supermarket_wars, Supermarket_wars$Store=="Coles")

Woolworths.Pantry <- filter(Supermarket_wars, Supermarket_wars$Store=="Woolworths"&  Supermarket_wars$Isle=="Pantry")
Coles.Pantry <- filter(Supermarket_wars, Supermarket_wars$Store=="Coles"& Supermarket_wars$Isle=="Pantry")

Woolworths.Frozen <- filter(Supermarket_wars, Supermarket_wars$Store=="Woolworths"&  Supermarket_wars$Isle=="Frozen")
Coles.Frozen <- filter(Supermarket_wars, Supermarket_wars$Store=="Coles"& Supermarket_wars$Isle=="Frozen")

Woolworths.FV <- filter(Supermarket_wars, Supermarket_wars$Store=="Woolworths"&  Supermarket_wars$Isle=="Fruit_and_Veg")
Coles.FV <- filter(Supermarket_wars, Supermarket_wars$Store=="Coles"& Supermarket_wars$Isle=="Fruit_and_Veg")

Woolworths.MSD <- filter(Supermarket_wars, Supermarket_wars$Store=="Woolworths"&  Supermarket_wars$Isle=="Meat_Seafood_Deli")
Coles.MSD <- filter(Supermarket_wars, Supermarket_wars$Store=="Coles"& Supermarket_wars$Isle=="Meat_Seafood_Deli")

Woolworths.Bakery <- filter(Supermarket_wars, Supermarket_wars$Store=="Woolworths"&  Supermarket_wars$Isle=="Bakery")
Coles.Bakery <- filter(Supermarket_wars, Supermarket_wars$Store=="Coles"& Supermarket_wars$Isle=="Bakery")

Woolworths.Perishable <- filter(Supermarket_wars, Supermarket_wars$Store=="Woolworths"&  Supermarket_wars$Isle=="Bakery"|Supermarket_wars$Isle=="Fruit_and_Veg"|Supermarket_wars$Isle=="Meat_Seafood_Deli"& Supermarket_wars$Price < 10)
Coles.Perishable <- filter(Supermarket_wars, Supermarket_wars$Store=="Coles"& Supermarket_wars$Isle=="Bakery"|Supermarket_wars$Isle=="Fruit_and_Veg"|Supermarket_wars$Isle=="Meat_Seafood_Deli"&Supermarket_wars$Price < 10)

Summary Statistics

# This is a chunk for your summary statistics and visualisation code

Supermarket_wars %>% group_by(Store) %>% summarise(Min = min(Price),Q1 = quantile(Price,probs = .25), Median = median(Price), Q3 = quantile(Price,probs = .75), Max = max(Price),Mean = mean(Price),SD = sd(Price), n = n())
## # A tibble: 2 × 9
##        Store   Min    Q1 Median    Q3   Max     Mean      SD     n
##       <fctr> <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>   <dbl> <int>
## 1      Coles  0.85  3.02   4.00  5.26 13.50 4.445294 2.35029   102
## 2 Woolworths  0.85  3.00   3.97  5.00 13.89 4.217353 2.17590   102
boxplot(Price ~ Store, data= Supermarket_wars)

matplot(t(data.frame(Coles.Pantry$Price, Woolworths.Pantry$Price)), type = "b", pch = 19, col = 1, lty = 1, xlab = "",ylab = "Item Price ($)", xaxt="n")
title("Pantry Items")
axis(1, at=1:2, labels = c("Coles","Woolworths"))

matplot(t(data.frame(Coles.Frozen$Price, Woolworths.Frozen$Price)), type = "b", pch = 19, col = 1, lty = 1, xlab = "",ylab = "Item Price ($)", xaxt="n")
title("Frozen Items")
axis(1, at=1:2, labels = c("Coles","Woolworths"))

#matplot(t(data.frame(Coles.FV$Price, Woolworths.FV$Price)), type = "b", pch = 19, col = 1, lty = 1, xlab = "",ylab = "Item Price ($)",xaxt="n")
#title("Fruit and Veg")
#axis(1, at=1:2, labels = c("Coles","Woolworths"))


#matplot(t(data.frame(Coles.Bakery$Price, Woolworths.Bakery$Price)), type = "b", pch = 19, col = 1, lty = 1, xlab = "",ylab = "Item Price ($)",xaxt="n")
#title("Bakery")
#axis(1, at=1:2, labels = c("Coles","Woolworths"))

#matplot(t(data.frame(Coles.MSD$Price, Woolworths.MSD$Price)), type = "b", pch = 19, col = 1, lty = 1, xlab = "",ylab = "Item Price ($)",xaxt="n")
#title("Meat, Seafood and Deli")
#axis(1, at=1:2, labels = c("Coles","Woolworths"))

matplot(t(data.frame(Coles.Perishable$Price, Woolworths.Perishable$Price)), type = "b", pch = 19, col = 1, lty = 1, xlab = "",ylab = "Item Price ($)",xaxt="n")
title("Perishable foods")
axis(1, at=1:2, labels = c("Coles","Woolworths"))

The box-plots from both supermarket chains look very similar. If there is a difference, Woolworths is cheaper for most products.

The mat-plots show that perishable foods (Bakery, Fruit and Veg, and Meat, Seafood and Deli) are mostly the same, if not, the items are more expensive at Woolworths.

Woolworths consistently has cheaper prices of pantry items. This is reflected by all min, Q1, median, Q3 and max statistics for Woolworths being less than Coles. Frozen items are most likely to be cheaper at Woolworths.

Hypothesis Test

Use R to perform an appropriate hypothesis test to determine which supermarket is the cheapest. You need to explain your choice of hypothesis test, any assumptions and the significance level.

We use a paired t-test, as each observation directly relates to another in the opposing supermarket. We defined the null hypothesis to be that there is no price difference between the two supermarket chains across each product; the alternative hypothesis being that there is some price difference between Coles and Woolworths.

# This is a chunk for your hypothesis testing code.
t.test(Woolworths$Price, Coles$Price, paired = TRUE, mu=0)
## 
##  Paired t-test
## 
## data:  Woolworths$Price and Coles$Price
## t = -4.7298, df = 101, p-value = 7.314e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.3235431 -0.1323392
## sample estimates:
## mean of the differences 
##              -0.2279412

Interpretation

Interpret the results of the hypothesis test by interpreting the \(p\)-value and confidence intervals and commenting on the statistical significance of the findings.

The paired-samples t-test found a statistically significant difference between the prices in Coles and Woolworths, since p-value is less than 0.05 we reject the null hypothesis. Also note that the null hypothesis value 0 is not captured by the 95% CI (-0.32, -0.13).

Discussion

Discuss the findings of your investigation. What did you conclude? What were the strengths and limitations of your investigation? What improvements could be made or what would you do differently next time?

The data collected provides significant evidence that Woolworth’s full price items are cheaper than Coles’. As this was on full-priced items this may suggest Coles have items more frequently on-sale to compete.

The strengths of this report include that the items included were selected via random stratified sampling, to reduce the bias in the sample and make sure items were included across food samples was accurate representation of the population. Also the sample itself was reasonably large, which should reduce the standard error.

Limitations of this report included; That the sample only included some food categories, these supermarkets have aisles which are not dedicated to food products, The sample did not account for sales - one supermarket may consistently put more items on sale, reducing the effective price of the item, Comparing home brand on unbranded items may not be exactly equivalent - the quality of the products may differ, Items of larger individual price may contribute more to a difference between stores, than overall cheaper items with a larger percentage discrepancy, Both chains can price differently between store-locations which may mean this finding may not apply it particular suburbs. Matching identical items does not allow for one supermarket stocking an exclusive and cheaper brand of a similar product available at both stores.

Opportunities for further investigation include using weekly specials to see whether this makes a significant difference to the pricing of items and sampling aisles with non-food products to see whether the finding is consistent for those items.