The objective of the investigation is to figure out which supermarket, Coles or Woolworths, is cheaper. The sample is gathered from the website https://grocerycop.com.au/products which includes 9 products from each of the 10 categories. A large sample of 90 (n > 30) is chosen in accordance with Central Limit Theorem(CLT) to effectively avoid the issue with normality and to limit standard error. We have used Stratified Sampling method to randomly select the products from each category(i.e, strata). The dataset consists of 6 variables - Sl_No, Product_Name, Units, Category, Coles_Price, and Woolworths_Price. The Product_Name, Units and Category match between Coles and Woolworths. All the product prices are in Australian Dollars (AUD). The summary statistics and Box plot help in comparing the prices between the stores. The QQ-Plot of Diff column(Coles_Price - Woolworths_Price) is used for exploring the Normality. The paired-samples t-test is used to check for the statistically significant mean difference between Coles and Woolworths prices. The result of the dependent sample t-test signifies that there is a statistically significant mean difference between Coles Price and Woolworths Price. In conclusion, Woolworths prices are found to be significantly cheaper when compared to Coles prices.
#Loading the necessary packages and reading the dataset using read_csv() function.
library("readr")
library("magrittr")
library("car")
## Loading required package: carData
library("granova")
library("dplyr")
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:car':
##
## recode
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
coles_vs_woolworths <- read_csv("ColesVsWoolworths.csv")
## Parsed with column specification:
## cols(
## Sl_No = col_double(),
## Product_Name = col_character(),
## Units = col_character(),
## Category = col_character(),
## Coles_Price = col_double(),
## Woolworths_Price = col_double()
## )
#head() function is used to display the first 10 observations of the dataset.
head(coles_vs_woolworths, 10)
#checking for any missing values in the dataset
sum(is.na(coles_vs_woolworths))
## [1] 0
#To Display the dimensions of the dataset
dim(coles_vs_woolworths)
## [1] 90 6
Coles_Price and Woolworths_Price, obtained from the descriptive statistics, are 8.576778 and 8.097667 respectively. This shows that the average price of products in Coles supermarket is more than that of Woolworths supermarket.Coles_Price and Woolworths_Price are drawn using Boxplot() function.Diff is created to store Price Difference ( Coles_Price - Woolworths_Price ).Diff column ( mean difference ) from descriptive statistics is found to be 0.4791111. This shows that on average coles prices are 0.479 AUD more than Woolworths price.Diff column is normally distributed. Hence, we can proceed with the paired sample t-test.#Summary Statistics for Coles Prices
coles_vs_woolworths %>%
summarise(
Store = "Coles",
Min = min(Coles_Price, na.rm = TRUE),
Q1 = quantile(Coles_Price, probs = .25, na.rm = TRUE),
Median = median(Coles_Price, na.rm = TRUE),
Q3 = quantile(Coles_Price, probs = .75, na.rm = TRUE),
Max = max(Coles_Price, na.rm = TRUE),
Mean = mean(Coles_Price, na.rm = TRUE),
SD = sd(Coles_Price, na.rm = TRUE),
N = n(),
Missing = sum(is.na(Coles_Price))
)
#Summary Statistics for Woolworths Prices
coles_vs_woolworths %>%
summarise(
Store = "Woolworths",
Min = min(Woolworths_Price, na.rm = TRUE),
Q1 = quantile(Woolworths_Price, probs = .25, na.rm = TRUE),
Median = median(Woolworths_Price, na.rm = TRUE),
Q3 = quantile(Woolworths_Price, probs = .75, na.rm = TRUE),
Max = max(Woolworths_Price, na.rm = TRUE),
Mean = mean(Woolworths_Price, na.rm = TRUE),
SD = sd(Woolworths_Price, na.rm = TRUE),
N = n(),
Missing = sum(is.na(Woolworths_Price))
)
#Visualizing the Coles and Woolworths Prices using boxplot() function
boxplot(
coles_vs_woolworths$Coles_Price,
coles_vs_woolworths$Woolworths_Price,
ylab = "Prices",
xlab = "Supermarkets" , col = c("firebrick" , "darkolivegreen1") , las = 1 , main = "Boxplot of Prices in AUD of Coles and Woolworths"
)
axis(1, at = 1:2, labels = c("Coles", "Woolworths"))
#Creating a new column called "Diff" to store Price Difference
coles_vs_woolworths <- coles_vs_woolworths %>% mutate("Diff" = Coles_Price - Woolworths_Price)
#To display the Product Name, Coles price , Woolworths Price and Diff column
head(coles_vs_woolworths[, c(2,5,6,7)])
#Summary Statistics for Price Difference(Coles_Price - Woolworths_Price).
coles_vs_woolworths %>%
summarise(
Min = min(Diff, na.rm = TRUE),
Q1 = quantile(Diff, probs = .25, na.rm = TRUE),
Median = median(Diff, na.rm = TRUE),
Q3 = quantile(Diff, probs = .75, na.rm = TRUE),
Max = max(Diff, na.rm = TRUE),
Mean = mean(Diff, na.rm = TRUE),
SD = sd(Diff, na.rm = TRUE),
N = n(),
Missing = sum(is.na(Diff))
)
#Drawing QQ-Plot for the "Diff" column
qqPlot(coles_vs_woolworths$Diff, dist="norm" , ylab = "Price Difference")
## [1] 87 84
where, H0 : there is no price difference between Coles and Woolworths.
HA : there is a price difference between Coles and Woolworths.
granova.ds() is used to visualise the mean difference between Coles_Price and Woolworths_Price using a scatter plot as shown in Figure - 3.
#Calculating the paired sample t-test using t.test() function
t.test(coles_vs_woolworths$Coles_Price, coles_vs_woolworths$Woolworths_Price,
paired = TRUE,
alternative = "two.sided", mu = 0)
##
## Paired t-test
##
## data: coles_vs_woolworths$Coles_Price and coles_vs_woolworths$Woolworths_Price
## t = 2.7221, df = 89, p-value = 0.007804
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.1293857 0.8288365
## sample estimates:
## mean of the differences
## 0.4791111
#The critical value t* for the paired-sample t-test, assuming a two-tailed test with significance level = 0.05
qt(p = 0.025, df = 89)
## [1] -1.986979
#To visualise mean difference using a scatter plot.
granova.ds(
data.frame(coles_vs_woolworths$Coles_Price, coles_vs_woolworths$Woolworths_Price),
xlab = "Coles Price",
ylab = "Woolworths Price"
)
## Summary Stats
## n 90.000
## mean(x) 8.577
## mean(y) 8.098
## mean(D=x-y) 0.479
## SD(D) 1.670
## ES(D) 0.287
## r(x,y) 0.988
## r(x+y,d) 0.376
## LL 95%CI 0.129
## UL 95%CI 0.829
## t(D-bar) 2.722
## df.t 89.000
## pval.t 0.008
A paired samples t-test has been used to test for a significant mean difference between Coles price and Woolworths price. The mean difference was found to be 0.4791111 (\(SD\) = 1.670). While the price differences exhibited evidence of non-normality upon inspection of the normal Q-Q plot, the central limit theorem ensured that the t-test could be applied due to the large sample size.The values of paired samples t-test are as follows -
\(t\)(\(df\)=89)=2.7221, \(p\)< 0.007804, 95% \(CI\) [0.1293857 0.8288365].
As the \(p\)-value is less than significance level(α = 0.05) and 95% \(CI\) does not capture H0, therefore we reject H0. Thus there was a statistically significant mean difference between Coles and Woolworths prices.Woolworths prices were found to be significantly cheaper when compared to Coles prices.
From the interpretation, we can infer that, a paired t-test found a statistically significant mean difference between the prices of products of Coles and Woolworths.As a result, we can conclude that Woolworths is cheaper than coles.
Strengths of the Investigation -
Limitations of the Investigation -
Improvements -
Improvements that can be done to the investigation are :