Aniketh Reddy Nimma (s3670774)
Krishna Sai Patha (s3670773)
Vikrant Yadav (s3676697)
This report provides an analysis of prices of goods between Coles and Woolworths. The data has been collected from monthly catalogues from each retailer for the month of August 2017 using random sampling method. A total of 30 observations were recorded with prices from both the supermarkets for the same products. The aim of the report is to find if Woolworths is more expensive than Coles. Methods of analysis include calculating mean and variance in prices. The difference in prices is calculated and tested for normality. Finally, paired sample t-test is performed between the prices of the two supermarkets with HA: µ\(\Delta\) < 0 which proves to be statistically insignificant.
library(dplyr)
library(car)
## Warning: package 'car' was built under R version 3.4.1
price <- read.csv("prices.csv")
price$diff <- price$WOOLWORTHS - price$COLES
price %>% summarise("Mean Woolworths Price" = mean(price$WOOLWORTHS, na.rm = TRUE),
"SD Woolworths Price" = sd(price$WOOLWORTHS, na.rm = TRUE),
"Mean Coles Price" = mean(price$COLES, na.rm = TRUE),
"SD Coles Price" = sd(price$COLES, na.rm = TRUE),
"Mean difference" = mean(price$diff, na.rm = TRUE),
"SD difference" = sd(price$diff, na.rm = TRUE),
"n" = n())
## Mean Woolworths Price SD Woolworths Price Mean Coles Price
## 1 6.884667 5.230532 7.136667
## SD Coles Price Mean difference SD difference n
## 1 6.098126 -0.252 1.81713 30
matplot(t(data.frame(price$WOOLWORTHS, price$COLES)), type = "b", pch = 19, col = 1, lty = 1,
xlab = "", ylab = "Price ($)", xaxt = "n")
axis(1, at=1:2, labels = c("Woolworths", "Coles"))
boxplot(price$diff, ylab = "Paired difference in Prices", col = "chartreuse3")
The mean price of goods in Woolworths is 6.88 with standard deviation of 5.23. Meanwhile, the mean price of the same goods in Coles is 7.13 with a standard deviation of 6.09. Thus, the mean value of difference of price of Woolworths from Coles comes out to be -0.252 with a standard deviation of 1.817.
price$diff %>% qqPlot(dist = "norm")
pttest <- t.test(price$WOOLWORTHS, price$COLES,
paired = TRUE,
alternative = "less",
conf.level = 0.95)
pttest
##
## Paired t-test
##
## data: price$WOOLWORTHS and price$COLES
## t = -0.75958, df = 29, p-value = 0.2268
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf 0.3117041
## sample estimates:
## mean of the differences
## -0.252
For the hypothesis testing we need to assume that the price difference is normally distributed. The qqPlot shows that there are many observations which are not captured by the 95% CI, but sample size is big enough (n = 30) we can assume it to be normally distributed. The null hypothesis HO: µ\(\Delta\) = 0 and alternate hypothesis HA: µ\(\Delta\) < 0 are used for paired sample t-test with a significance level of 0.05. The result is 95% CI of [-0.930, 0.426] and p-value to be 0.226. Since, p-value is not less than \(\alpha\) (0.05), we fail to reject the null hypothesis and our test has proven to be statistically insignificant.
The hypothesis testing performed has proven to be statistically insignificant. Thus, we fail reject HO: µ\(\Delta\) = 0. The analysis could be improved by having a larger sample size and having stratified sampling with strata based on the product category to provide a refined analysis of price difference between the two supermarkets.