Individual Details

Executive Statement

Coles and Woolworths are the two most dominant supermarkets chains in Australia. As customer, are Australians better off shopping at one rather than the other? The aim of this investigation is to determine whether Coles or Woolworths is significantly cheaper than the other.

Sample

To complete this research, the current cost of 45 matched supermarket products were collected from Coles and Woolworths. Data was collected via the Coles and the Woolworths online shopping websites. I created an initial list of generic supermarket items, eg eggs, bread, based on the Choice Magazine Comparison. This inital list had 31 items to which I added an additional 15 items, making a total number of sampled items to 45.

The pairs of products sampled were collected on the same day (6 May 2017) and were perfect matches with the same brand, size, weight, flavour etc. As such, no homebrand items were sampled. All product prices are full price, sale items were not included. The collection process began with a search on each supermarket websites for the generic item eg “eggs”, and then selected the first item listed that matched on both websites. This method could be improved considerably to create a more random sample, however for the purpose of this assignment, 45 it has provided a satisfactory sample that covers a variety of products.

Variables

Date: Data collection date

Item: Generic description of supermarket item

Detailed Product Description: specific details eg brand and weight, of sampled items.

Coles Price: Price of item at Coles Supermarket

Woolworths Price: Price of item at Woolworths Supermarket

Proceedure

Collated data was entered into an Excel spreadsheet, saved as CSV file and imported into R Studio to be analysed. Initial data inspection included summarising data, visualising data and comparing data to a normal distribution. This lead to filtering out 4 outliers. As the research requires comparing two dependent data sets, the cost of 45 matched items at Coles and Woolworths, the Paired-Sample t-test was selected to test the null hypothesis, that there is no significant price difference between Coles and Woolworths.

Results

Although the initial review of the data shows that Coles items are generally more expensive than Woolworths items the results conclude that there is no significant difference between Coles and Woolworths prices, and the data analysis fails to reject the null hypothesis.

Load Packages and Data

library(dplyr)
library(ggplot2)
library(magrittr)
library(lubridate)
library(granova)
library(car)
library(readr)
library(readr)
Supermarket <- read_csv("~/Documents/Master (GIS)/Statistics/Assignment 3/Assignment 3/Supermarket comparison 060517.csv")
Coles<-Supermarket$`Coles Price`
Woolies<-Supermarket$`Woolworths Price`

Summary Statistics

Summary statistics for Coles product prices show that the mean product price is $6.01, with products ranging between $1.40 and $20.00.

Supermarket %>% summarise(
  Min = min(Coles, na.rm = TRUE),
  Q1 = quantile(Coles, probs = .25, na.rm = TRUE),
  Median = median(Coles, na.rm = TRUE),
  Q3 = quantile(Coles, probs = .75, na.rm = TRUE),
  Max = max(Coles, na.rm = TRUE),
  Mean = mean(Coles, na.rm = TRUE),
  SD = sd(Coles, na.rm = TRUE),
  n = n(),
  Missing = sum(is.na(Coles)))

Summary statistics for Woolworths product prices show that the mean product price is $5.69, 31 cents cheaper than Coles, with products also ranging in price between $1.40 and $20.00.

Supermarket %>% summarise(
  Min = min(Woolies, na.rm = TRUE),
  Q1 = quantile(Woolies, probs = .25, na.rm = TRUE),
  Median = median(Woolies, na.rm = TRUE),
  Q3 = quantile(Woolies, probs = .75, na.rm = TRUE),
  Max = max(Woolies, na.rm = TRUE),
  Mean = mean(Woolies, na.rm = TRUE),
  SD = sd(Woolies, na.rm = TRUE),
  n = n(),
  Missing = sum(is.na(Woolies)))

To visualise these statistics a box plot has been selected to illustrate the similarities and differences between the two supermarkets product prices, specifically the mean, standard deviation, range and outliers. The plot shows that the data for each of the two supermarkets is very similar, however Woolworths has a smaller range in price between Q1 and Q3 compared to Coles.

boxplot(Coles, Woolies, ylab = "Product Price $", xlab = "Supermarket")
axis(1, at = 1:2, labels = c("Coles", "Woolworths"), 
title (main = "Boxplot Comparing Supermarket Prices"))

The Line Plot is suitable for comparing paired samples, demonstrating the price difference between each individual product sampled in one visualisation. Horizontal lines demonstrate little or no difference, angled lines demonstrate difference. This Line Plot shows that most product prices don’t vary greatly between the two supermarkets, some products are more expensive at Coles than Woolworths, and only one product appears to be more expensive at Woolworths than Coles.

matplot(t(data.frame(Coles, Woolies)),type = "b",pch = 19,col = 1,lty = 1, xlab = "Supermarket", ylab = "Product Price", xaxt = "n")
axis(1, at = 1:2, labels = c("Coles", "Woolworths"))
title(main = "Line Plot comparing Supermarket Prices")

Calculate the Difference, Coles minus Woolworths prices, to visualise and compare to normal distribution.

Supermarket_Costs_Differences <- Supermarket %>% mutate(d = Coles - Woolies)
qqPlot(Supermarket_Costs_Differences$d, dist="norm", ylab = "Price Difference", xlab = "Normal Quantiles")

The Q-Q plot shows that most of the data is within the normal distribution range with some outliers at each end. Most of the data is quite crowded between the -1 and 1 quantiles. To improve the distribution and the further data analysis, the outliers with a price difference greater than $1.50 have been filtered out. This process removed four product items; chicken, tampons, strawberries and dish washing tablets.

Supermarket_Clean_Difference<-filter(Supermarket_Costs_Differences, d < 1.5 & d > -1.50) 
qqPlot(Supermarket_Clean_Difference$d, dist="norm", ylab = "Price Difference", xlab = "Normal Quantiles")

The filtered data, as illustrated in the refreshed Q-Q plot, has reduced the price difference to between Coles and Woolworths to be -$0.20 to $1.00, instead of $3.00. This data still favors Woolworths for being the cheaper supermarket.

Variables have been filtered to remove outliers to be able to continue to conduct the analysis.

Coles_C<-Supermarket_Clean_Difference$`Coles Price`
Woolies_C<-Supermarket_Clean_Difference$`Woolworths Price`

The filtered supermarket data with 41 items has been used to run a statistic summary of the differnce between supermarket prices:

Supermarket_Clean_Difference %>% summarise(
  Min = min(d, na.rm = TRUE),
  Q1 = quantile(d, probs = .25, na.rm = TRUE),
  Median = median(d, na.rm = TRUE),
  Q3 = quantile(d, probs = .75, na.rm = TRUE),
  Max = max(d, na.rm = TRUE),
  Mean = mean(d, na.rm = TRUE),
  SD = sd(d, na.rm = TRUE),
  n = n(),
  Missing = sum(is.na(d)))

The results show that products are on average $0.27 more expensive at Coles than at Woolworths. The price differnece between the two stores varies between Woolworths being $0.20 more expensive than Coles to Coles being $0.99 more expensive than Woolworths. The standard deviation is 0.30.

Hypothesis Test

The null hypothesis of this test is that there is no signficiant price difference matched products purchased at Coles and Woolworths. The alternative hypothesis is that there is a significant difference.

Hypotheses for the paired (dependent) samples t-test: Null Hypothesis, H0:μΔ=0 Alternaitve Hypothesis, HA:μΔ≠0

Assumptions: Comparing the product price difference, μΔ, between two matched products, di=xi2−xi1. Δ are normally distributed Large sample has been used (filtered data set = 41 samples)

Decision Rules: Reject H0H0: If pp-value < 0.05 (α significance level) If 95% Confidence Interval(CI) of the mean difference does not capture H0:μΔ=0 If neither of these occur; fail to reject H0.

Conclusion: Test will be statistically significant if the H0 is rejected. If the test results fail to reject the H0, the test is not statistically significant.

Paired t-test calculations

t.test(Coles_C, Woolies_C, paired = TRUE, alternative = "two.sided")

    Paired t-test

data:  Coles_C and Woolies_C
t = 5.731, df = 40, p-value = 1.129e-06
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.1736776 0.3629078
sample estimates:
mean of the differences 
              0.2682927 

The 95% confidence interval (CI) of the mean is reported as [0.174, 0.363]. As the 95% CI captures the mean of the sample difference (0.268), we fail to reject Null Hypothesis. There is no statistically significant mean difference between supermarket prices.

p-value

2*pt(q = 5.731, df = 44)
[1] 1.999999

Using the results from the Paired t-test the p-value was calculated to be 2.00. As this is much larger than significance level of 0.05 (alpha), we fail to reject the Null Hypothesis. There is no significant mean difference in supermarket prices.

Summary of results

t = 5.731.

Degrees of freedom for this sample is 40 (df=nΔ−1, df=41-1).

95% CI [0.174, 0.363]

Mean of the difference = 0.268

P-value = 2.00

Interpretation

A paired-samples t-test was used to test for a significant mean difference between Coles and Woolworths supermarket product prices. The investigation found the mean difference to be 0.268 (SD = 0.300).

Visual inspection of the Q-Q plot showed that the orignal data set was not normally distributed. Subsequently, four items were filtered from the data to approximate the desired normally distribution.

The paired-samples t-test found no statistically significant mean between the prices of matched products at Coles and Woolworths, t (df=41)=5.731, p = 2.00, 95% [0.174, 0.363]. According to this data neither Coles nor Woolworths are cheaper than the other.

Discussion

Although the initial inspection of the supermarket comparison data showed Woolworths products to be cheaper, the results from both the p-value and the 95% CI support the decision to fail to reject the Null Hypothesis. According to results it won’t make a significant difference to our ‘back pocket’ which supermarket we choose to shop in if purchasing full priced, brand items.

The strengths of this investiagtion includes the consistancy that was met in matching items between Coles and Woolworths, the number of items sampled, and the consistant methodology, however the method of collecting the specific products could be improved.

Limitations arise in the comparison of only full priced, brand items. Savings between supermarkets can potentially be made through selecting sale items, or cheaper house brands. In addition, the prices were only collected from the respective supermarket websites not from visiting supermarkets. Prices in supermarkets vary across the country due to transport and availability of product items, variation of location has not been considered in this investigation.

Additional limitations include only viewing the data through the statistical lens of applying an appropriate test. The paired t-test has tested each Coles item against its match at Woolworths, but the cummulative difference has not been investigated. The average price difference between supermarkets was found to be $0.28. The difference is not significant, but when purchasing a full trolly full of products, the average saving of $0.28 per product adds up. When purchasing all these items Coles total was $270.34, Woolworths was $256. $16.34 is a good saving, but if this was an average families weeklies groceries, this would add up to $850 saving across a year.

