Group/Individual Details

Executive Statement

The decision as to which major supermarket to shop with can pose a dilemma for many consumers. This report evaluates the average price difference between two major Australian supermarket retailers: Coles and Woolworths. The sample was formed using 100 products from Coles were selected through stratified random sampling that were subsequently matched with identical products at Woolworths. Descriptive statistics were then calculated for the price range at each retailer revealing Coles prices to be 2 cents higher on average.However, when measured via median the sample revealed that prices were instead 12.5 cents higher at Woolworths. This difference could be accounted for in the large quantity of values that denoted no price difference between the retailers i.e price difference = 0. The samples were tested using the two-sample t-test not assuming equal variance, using a significance level of α = 0.05. The t-test reported the following statistics: p = 0.924, 95% CI [-1.9997,2.2037]. As the p- value was greater than the significance level and the confidence interval, it was revealed that there was not a statistically significant difference in pricing between both retailers. Therefore it can be said that prices are approximately the same between Coles and Woolworths.

Load Packages and Data

library(car)
## Loading required package: carData
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:car':
## 
##     recode
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
setwd("C:/Users/Jaimee-Lee/Documents/RMIT 2019/Semester One/Intro to Statistics/Assignment Two")
prices <- read.csv("Assignment 2 - Statistics - Collected Supermarket Data.csv")
prices %>% head()

Summary Statistics

Below is a summary of the data. The mean shows that on average Coles prices are approximately 2 cents higher than those of Woolworths. In contrast, the median shows that on average Woolworths tends to be approximately 12.5 cents higher than Coles. These two contradictory measures show that the data is in fact centred quite closely to zero (showing no difference in price).

The boxplot provided below further supports the idea that the difference in price between the two stores is quite close to zero. As seen in the plot, both stores exhibit very similar pricing, with differences in median almost undetectable by eye. The plots do however show a subtle difference between the IQR of both stores’ pricing (Woolworths having a wider IQR). Both plots exhibit a right skew, which once again accounts for the discrepancy between the median and mean values for each retailer.

The outliers present are largely due to high priced items being randomly selected during sampling. Both plots exhibiting a similar quantity of outliers supports the notion that matched products have a similar price between stores.

#Calculation of Summary Statistics
prices %>% group_by(Store) %>% summarise(Min = min(Price, na.rm = TRUE),
                     Q1 = quantile(Price, probs = .25, na.rm = TRUE),
                     Median = median(Price, na.rm = TRUE),
                     Q3 = quantile(Price, probs = .75, na.rm = TRUE),
                     IQR = IQR(Price, na.rm = TRUE),
                     Max = max(Price, na.rm = TRUE),
                     Mean = mean(Price, na.rm = TRUE),
                     SD = sd(Price, na.rm = TRUE),
                     n = n(),
                     Missing = sum(is.na(Price)))
#Visualisation of measurement variable
prices %>% boxplot(Price ~ Store, data = ., ylab = "Prices ($)")

Hypothesis Test

A two-sample t-test was used to test for a price difference between Coles and Woolworths products, as the data was segregated by store, creating two samples measuring for the same parameter. A two-sample t-test assumes that samples follow a normal distribution, are independent and have equal variance. The significance level used in the two-sample t-test was α = 0.05.

# Null Hypothesis: H0: μ1 - μ2 = 0
# Alternate Hypothesis: HA: μ1 - μ2 ≠ 0

# Levene's Test of Equal Variance
leveneTest(prices$Price ~ prices$Store, data = prices)
#Two sample t-test equal variances not assumed
t.test(prices$Price ~ prices$Store,
  data = prices,
  var.equal = FALSE,
  alternative = "two.sided"
  )
## 
##  Welch Two Sample t-test
## 
## data:  prices$Price by prices$Store
## t = 0.095705, df = 197.88, p-value = 0.9239
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.999727  2.203727
## sample estimates:
##      mean in group Coles mean in group Woolworths 
##                   7.4905                   7.3885
#Two-sample t-test equal variances assumed
t.test(prices$Price ~ prices$Store,
  data = prices,
  var.equal = TRUE,
  alternative = "two.sided"
  )
## 
##  Two Sample t-test
## 
## data:  prices$Price by prices$Store
## t = 0.095705, df = 198, p-value = 0.9239
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.99972  2.20372
## sample estimates:
##      mean in group Coles mean in group Woolworths 
##                   7.4905                   7.3885

Interpretation

The two-sample t-test for price difference between Coles(μ1) and Woolworths(μ2) was as follows:
* H0: μ1 - μ2 = 0
* HA: μ1 - μ2 ≠ 0

Central Limit Theorem allowed the pricing data to by-pass the assumption of normality due to it’s large sample size (n = 200). Despite this, Levene’s Test of homogeneity of variance exhibited that equal variances could not be assumed. Therefore, the two-sample t-test was run with and without the assumption of equal variances, in order to determine that the variances were in fact quite small. The following interpreted values have been extracted from The Welch Two-Sample t-test (equal variance assumed).

The Welch Test reports p = 0.096, meaning p > 0.05. This greater p-value results in failure to reject H0, therefore deeming the results to not be considered statistically significant.

The Welch Test also reports the 95% Confidence Interval for the sample means as [-1.9997,2.2037] . This value captures H0 which again results in failure to reject H0, which therefore deeming the results to not be statistically significant.

Discussion

The results of the two-sample t-test, not assuming equal variance, did not find a statistically significant difference between prices at Coles and Woolworths, p = 0.924, 95% CI [-1.9997,2.2037].

The preceding results were based on a research design that was subject to various strengths and limitations. Random sampling could not truly be achieved as exhaustive product listings from each store were not available. Therefore stratified random sampling.

Categories of products form the Coles website were selected using a random number generator, consequently individual products within those categories were then also selected by the same method. Selected products were then matched with the same product at Woolworths, when a match could not be found, the selected was re-generated.

A sample size of 100 matched products was used to ensure that the Central Limit Theorem would override the assumption of normality and that there would be a suitable variety of products. The prices used were whatever a product was selling for on 13/05/2019, therefore some products from particular stores may have been subject to promotional sales. Matches between products were ensured by comparing the item brand, name, and quantity.

This method produces limitations in terms of pure randomness in the sampling process, in future reproductions it would be optimal to be able to obtain an exhaustive list of products. This study was also limited in terms of accuracy due to both retailers having active catalogue sales that may have changed pricing and therefore skewed the data. In the event of reproduction, using only ‘normally’ priced items may provide more accurate results.

Despite the aforementioned limitations, it can be said that due to the failure to reject the null hypothesis by both the p-value and 95% Confidence Interval methods, it can be deemed that in this case there is no statistically significant price difference between Coles and Woolworths, across the range of both stores on the day of data collection (13/05/2019).