Executive Statement

SAMPLE

Coles and Woolworths are the two most famous supermarkets outlets in Australia. As a customer, are Australians getting a better deal at Coles or woolworths? The aim of this investigation is to determine which is the cheapest supermarket amongst coles or woolworths and will it make any significant difference to cosumers in terms of pricing. To complete this investigation 95 same products are selected from both these supermarkets.Price for these products are noted on 17th sept 2019 and no discounted or special offer products are considered for this investigation.No homebrand products are present in this list of 95 samples.The link used are provided below:


COLES: https://shop.coles.com.au/a/a-national/home?cid=cdc_to_nav_shop-online
WOOLWORTHS: https://www.woolworths.com.au/?ds_rl=1260749&ds_rl=1260758&ds_rl=1260749&cmpid=smsm:ds:GOOGLE:Brand%20-%20MTE%20(Pos%20RLSA):woolworth%20au&gclid=EAIaIQobChMIhprYzrHh5AIVjYBwCh023Qr1EAAYASAAEgL8UfD_BwE&gclsrc=aw.ds

95 samples are selected as this size is needed to ensure that the margin of error is sufficiently small to be informative. 95% confidence level is in between 0.104 to 0.472, this margin of error is very small and confidence level is acceptable for conducting this investigation.

VARIABLE


Date: Data collection date.
Item_Name: Description of item
Coles_price: Price as listed in Coles supermarket.
Woolworths_price: Price as listed in woolworths supermarket.

PROCEDURE

The search began with typing names of general products (milk, butter, bread, shampoo, chicken, juice, fruits, etc.) on each website and then finding the same product listed on both website. An intial list of same items with exact specifications like weight, type, flavor, category were recorded on an Excel sheet and saved as .xlsx file which was then imported to R studio for further investigation. Initial inspection for both comprised of summarising, visualising and comparing data to a normal distribution. As the research requires comparing two dependent data sets, the Paired-Sample t-test was selected to test the null hypothesis.

OUTCOME

On proceeding with the investigation the outcome favored woolworths to be a cheaper supermarket at first, but furthur investigation proposed that there is no significant difference between these two supermarket.
The t test values for cost of these two supermarket, falls under the margin of error and 95% CI level thus rejecting the null hypothesis.

Load Packages and Data

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(rvest)
## Loading required package: xml2
library(readr)
## 
## Attaching package: 'readr'
## The following object is masked from 'package:rvest':
## 
##     guess_encoding
library(foreign)
library(knitr)
library(ggplot2)
library(readxl)
library(magrittr)
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
library(granova)
## Loading required package: car
## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode

Importing and preparing data for analysis.

COLES_WOOLSWORTH_PRICE <- read_excel("C:/data/Supermarkets.xlsx")
head(COLES_WOOLSWORTH_PRICE)
Coles<-COLES_WOOLSWORTH_PRICE$COLES_PRICE
Woolies<-COLES_WOOLSWORTH_PRICE$WOOLWORTHS_PRICE

Summary Statistics

Checking the dimension of my dataset.

dim(COLES_WOOLSWORTH_PRICE)            #95 obs and 4 variables.
## [1] 100   4

Summary statistics for coles price data. Mean product price is $4.3 with product ranging from $1 to $13.

COLES_WOOLSWORTH_PRICE %>% summarise(
  Min = min(COLES_PRICE, na.rm = TRUE),
  Q1 = quantile(COLES_PRICE, probs = .25, na.rm = TRUE),
  Median = median(COLES_PRICE, na.rm = TRUE),
  Q3 = quantile(COLES_PRICE, probs = .75, na.rm = TRUE),
  Max = max(COLES_PRICE, na.rm = TRUE),
  Mean = mean(COLES_PRICE, na.rm = TRUE),
  SD = sd(COLES_PRICE, na.rm = TRUE),
  n = n(),
  Missing = sum(is.na(COLES_PRICE)))
sum(COLES_WOOLSWORTH_PRICE$COLES_PRICE)
## [1] 489.82

Summary statistics for woolworths price data. Mean product price is $4.08 with product ranging from $1 to $13.

COLES_WOOLSWORTH_PRICE %>% summarise(
  Min = min(WOOLWORTHS_PRICE, na.rm = TRUE),
  Q1 = quantile(WOOLWORTHS_PRICE, probs = .25, na.rm = TRUE),
  Median = median(WOOLWORTHS_PRICE, na.rm = TRUE),
  Q3 = quantile(WOOLWORTHS_PRICE, probs = .75, na.rm = TRUE),
  Max = max(WOOLWORTHS_PRICE, na.rm = TRUE),
  Mean = mean(WOOLWORTHS_PRICE, na.rm = TRUE),
  SD = sd(WOOLWORTHS_PRICE, na.rm = TRUE),
  n = n(),
  Missing = sum(is.na(Woolies)))
sum(COLES_WOOLSWORTH_PRICE$WOOLWORTHS_PRICE)
## [1] 444.06

To visualize these statistics box plot and line plot is used. On comparing these plots we see that woolworths has a lower Q1 to Q3 values than coles.Thus woolworths has smaller range in term of pricing.Woolworths price has 3 outliers and coles has 1 outlier in box plot, similarly line plot shows that most products have a cheaper price in woolworths.

boxplot(Coles, Woolies, ylab = "Product Price $", xlab = "Supermarket",col=c("dark red","dark green"))
axis(1, at = 1:2, labels = c("Coles", "Woolworths"), 
title (main = "Boxplot Comparing Supermarket Prices"))


This Line Plot shows that most product prices don’t vary greatly between the two supermarkets, some products are more expensive at Coles than Woolworths.

matplot(t(data.frame(Coles, Woolies)),type = "b",pch = 20,col = 1,lty = 1, xlab = "Supermarket", ylab = "Product Price", xaxt = "n",)
axis(1, at = 1:2, labels = c("Coles", "Woolworths"))


Difference in Coles and Woolworths prices is used to visualise and compare data to normal distribution.Then plotting this qq plot shows that most of the difference lies in between -1 to 1.

Price_Differences <- COLES_WOOLSWORTH_PRICE %>% mutate(d = Coles - Woolies)
qqPlot(Price_Differences$d, dist="norm", ylab = "Difference in price of coles and woolies", xlab = "Normal Quantiles",col = "red")

## [1] 38 42

Cleaning the data by filtering any values that has a difference of more than 2.5 dollars.Plotting this qq plot gives a clear overview of this data.

Clean_Price_Difference<-filter(Price_Differences, d < 2.5 & d > -2.5) 
qqPlot(Clean_Price_Difference$d, dist="norm", ylab = "Price Difference", xlab = "Normal Quantiles",col = "dark red")

## [1] 59 83

Summary statistics for the difference column in this data shows that most of the product has a price difference of $0.17

Clean_Price_Difference %>% summarise(
  Min = min(d, na.rm = TRUE),
  Q1 = quantile(d, probs = .25, na.rm = TRUE),
  Median = median(d, na.rm = TRUE),
  Q3 = quantile(d, probs = .75, na.rm = TRUE),
  Max = max(d, na.rm = TRUE),
  Mean = mean(d, na.rm = TRUE),
  SD = sd(d, na.rm = TRUE),
  n = n(),
  Missing = sum(is.na(d)))

Hypothesis Test

The null hypothesis of this test is that there is no signficiant price difference between matched products bought at Coles and Woolworths. The alternative hypothesis is that there is a significant difference between coles and woolworths.

Hypotheses for the paired (dependent) samples t-test: Null Hypothesis(H0:μ=0) and Alternaitve Hypothesis(HA:μ=!0)

Assumptions: Comparing the product cost difference, μ, between two same products, d=coles_price-woolworths_price. Dataset size is 95 observations.Degree of freedom is (n-1)=(95-1).

Decision Rules: Reject Ho: If pp-value < 0.05 (alpha significance level). If 95% Confidence Interval of the mean difference lies outside the interval then null hypothesis stand or else the null hypothesis is rejected.

Conclusion: Test will be statistically acceptable if the null hypothesis is rejected. If the test results fail to reject the null hypothesis, the test is not statistically acceptable.

Paired t-test calculations

t.test(Coles, Woolies, paired = TRUE, alternative = "two.sided")
## 
##  Paired t-test
## 
## data:  Coles and Woolies
## t = 3.5583, df = 99, p-value = 0.0005749
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.2024278 0.7127722
## sample estimates:
## mean of the differences 
##                  0.4576

The 95% confidence interval of the mean is reported as [0.108, 0.4724]. As the 95% Confidence level captures the mean of the sample difference (0.290), we fail to reject Null Hypothesis. There is no statistically significant mean difference between coles and woolworths prices.

P-value

2*pt(q = 3.1715, df = 94)
## [1] 1.997951

Using the results from the Paired t-test the p-value was calculated to be 2.00. As this is much larger than significance level of 0.05 (alpha), thus we fail to reject the Null Hypothesis. There is no significant mean difference in prices between coles and woolworths.

SUMMARY OF TEST

t = 5.731.

Degree of freedom for this sample is 94 (df=n−1, df=95-1).

95% CI [0.108, 0.4724]

Mean of the difference = 0.290

P-value = 1.997

Interpretation

A paired-samples t-test was used to test for a significant mean difference between Coles and Woolworths supermarket product prices. The investigation found the mean difference to be 0.290

Visual inspection of the Q-Q plot showed that the orignal data set was not normally distributed,most of the product has the price differance in between -1 to 1 dollar.

The paired-samples t-test found no statistically significant mean between the prices of matched products at Coles and Woolworths, t (df=(95-1))=5.731, p-value = 1.997, 95% [0.108, 0.4724]. According to this data neither Coles nor Woolworths are cheaper than the other.

Discussion

Although the initial investigation of the supermarket comparison data showed Woolworths products to be cheaper, the results from both the p-value and the 95% confidence level support the decision to fail to reject the Null Hypothesis. According to results it won’t make a noteworthy difference in terms of which supermarket we are preferring to shop in if purchasing full priced, non-home brand products.

The strengths of this investiagtion includes the consistancy that was met in matching items between Coles and Woolworths, the number of items sampled for this investigation, and the method used for testing, the value of margin of error was really small which helped in a better unbiased result However the method of collecting the specific products could be improved and instead of getting it online by searching a specific category we should have the complete price list at our disposal.This would allow us to randomly pick products from both supermarkets more efficiently.

Limitation arises as these product were taken from a store located in a same region (ZIP code-3055) thus prices will vary in other regions due to transportation and other costs. Home brand products cannot be compared and no home brand product was used in this investigation,it is generally observed that home brand product for each supermarket is cheaper as compared to branded product.Thus on including home brand products falling in same category the result may have varied for these tests.

Limitation also includes that each product in the supermarket was compared to its significant matched product in other supermarket,the data was not considered as an whole. The average price difference between supermarkets was found to be $0.28.So,if a person purchased these 95 products from coles he would spend a total of $415.82 and for the same products consumer would spent $388.22, thus a total saving of $27. If it is for monthly grocery,then in an year a consumer will spend $27*12=$324 more at coles than woolworths which is a significant difference. But then again it is highly unlikly for a same person to purchase all these 95 products from a same supermarket for the whole year,thus these are the limitation of this investigation.