Store -- T-test

Sameer Mathur

Read the data

store.df <- read.csv(paste("StoreData.csv"))
attach(store.df)
head(store.df)
  storeNum Year Week p1sales p2sales p1price p2price p1prom p2prom country
1      101    1    1     127     106    2.29    2.29      0      0      US
2      101    1    2     137     105    2.49    2.49      0      0      US
3      101    1    3     156      97    2.99    2.99      1      0      US
4      101    1    4     117     106    2.99    3.19      0      0      US
5      101    1    5     138     100    2.49    2.59      0      1      US
6      101    1    6     115     127    2.79    2.49      0      0      US

Summary of sales of Product 1 (Coke)

# summary of sales of product 1 (Coke)
library(psych)
describe(p1sales)
   vars    n   mean    sd median trimmed   mad min max range skew kurtosis
X1    1 2080 133.05 28.37    129  131.08 26.69  73 263   190 0.74     0.66
     se
X1 0.62

Average Sales and standard deviation of the product 1 under promotion

# average sales of product 1 under promotion 
tapply(p1sales, p1prom, function(x)(c(mean=mean(x),sd=sd(x))))
$`0`
     mean        sd 
129.06624  24.89769 

$`1`
     mean        sd 
168.88942  32.37033 

Normality Check -- QQ Plot

# checking the normality of p1sales
qqnorm(p1sales)
qqline(p1sales)

Normality Check -- QQ Plot

plot of chunk unnamed-chunk-5

Normality Check -- Histogram and Density Curve

# checking the normality of p1sales
hist(p1sales,freq=FALSE)
lines(density(p1sales), lwd=2)

Normality Check -- Histogram and Density Curve

plot of chunk unnamed-chunk-7

Normality Check -- Shapiro-Wilk's Test

shapiro.test(p1sales)

    Shapiro-Wilk normality test

data:  p1sales
W = 0.96828, p-value < 2.2e-16

Here the p-value is less than the 0.05 then the null hypothesis is rejected and there is evidence that the data are not normally distributed.

Compare Variances using F-test (p1sales vs p1prom)

# converting p1prom to factor variable
store.df$p1prom <- as.factor(store.df$p1prom)
# p1sales vs p1prom
salesftest <- var.test(p1sales ~ p1prom, data=store.df, alternative = "two.sided")
salesftest

    F test to compare two variances

data:  p1sales by p1prom
F = 0.59159, num df = 1871, denom df = 207, p-value = 4.749e-08
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.4786392 0.7190429
sample estimates:
ratio of variances 
         0.5915945 

The p-value of F-test is less than the significance level 0.05. In conclusion, there is significant difference between the two variances.

Boxplot of Sales of Product 1 under promotion

boxplot(p1sales ~ p1prom, data=store.df, horizontal=TRUE, 
        ylab="Promotion of Product 1", xlab="Sales of Product 1",
        main="Comparison of Sales of Product 1 under Promotion")

Boxplot of Sales of Product 1 under promotion

plot of chunk unnamed-chunk-11

Plot the average Sales of PRoduct 1 under Promotion

library(gplots)
# plot the average sales of product 1 under promotion
plotmeans(p1sales ~ p1prom, data = store.df, frame = TRUE)

Plot the average Sales of PRoduct 1 under Promotion

plot of chunk unnamed-chunk-13

Independent two-group t-test (Sales under promotion)

To test whether there is a significance diffrence between avarage sales of product 1 under promotion.

# independent 2-group t-test
t.test(p1sales ~ p1prom, data=store.df)

    Welch Two Sample t-test

data:  p1sales by p1prom
t = -17.187, df = 235, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -44.38807 -35.25830
sample estimates:
mean in group 0 mean in group 1 
       129.0662        168.8894 

We obtained p-value less than 0.05, then we can conclude that the averages of two groups are not significantly similar.