Sameer Mathur
store.df <- read.csv(paste("StoreData.csv"))
attach(store.df)
head(store.df)
storeNum Year Week p1sales p2sales p1price p2price p1prom p2prom country
1 101 1 1 127 106 2.29 2.29 0 0 US
2 101 1 2 137 105 2.49 2.49 0 0 US
3 101 1 3 156 97 2.99 2.99 1 0 US
4 101 1 4 117 106 2.99 3.19 0 0 US
5 101 1 5 138 100 2.49 2.59 0 1 US
6 101 1 6 115 127 2.79 2.49 0 0 US
# summary of sales of product 1 (Coke)
library(psych)
describe(p1sales)
vars n mean sd median trimmed mad min max range skew kurtosis
X1 1 2080 133.05 28.37 129 131.08 26.69 73 263 190 0.74 0.66
se
X1 0.62
# average sales of product 1 under promotion
tapply(p1sales, p1prom, function(x)(c(mean=mean(x),sd=sd(x))))
$`0`
mean sd
129.06624 24.89769
$`1`
mean sd
168.88942 32.37033
# checking the normality of p1sales
qqnorm(p1sales)
qqline(p1sales)
# checking the normality of p1sales
hist(p1sales,freq=FALSE)
lines(density(p1sales), lwd=2)
shapiro.test(p1sales)
Shapiro-Wilk normality test
data: p1sales
W = 0.96828, p-value < 2.2e-16
Here the p-value is less than the 0.05 then the null hypothesis is rejected and there is evidence that the data are not normally distributed.
# converting p1prom to factor variable
store.df$p1prom <- as.factor(store.df$p1prom)
# p1sales vs p1prom
salesftest <- var.test(p1sales ~ p1prom, data=store.df, alternative = "two.sided")
salesftest
F test to compare two variances
data: p1sales by p1prom
F = 0.59159, num df = 1871, denom df = 207, p-value = 4.749e-08
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.4786392 0.7190429
sample estimates:
ratio of variances
0.5915945
The p-value of F-test is less than the significance level 0.05. In conclusion, there is significant difference between the two variances.
boxplot(p1sales ~ p1prom, data=store.df, horizontal=TRUE,
ylab="Promotion of Product 1", xlab="Sales of Product 1",
main="Comparison of Sales of Product 1 under Promotion")
library(gplots)
# plot the average sales of product 1 under promotion
plotmeans(p1sales ~ p1prom, data = store.df, frame = TRUE)
To test whether there is a significance diffrence between avarage sales of product 1 under promotion.
# independent 2-group t-test
t.test(p1sales ~ p1prom, data=store.df)
Welch Two Sample t-test
data: p1sales by p1prom
t = -17.187, df = 235, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-44.38807 -35.25830
sample estimates:
mean in group 0 mean in group 1
129.0662 168.8894
We obtained p-value less than 0.05, then we can conclude that the averages of two groups are not significantly similar.