A client has come to you. There in-house data scientist has gone crazy and fled to study the social habits of monkeys in the Amazon. Unfortunately, they had just run an important study trying to determine their new ad campaign and their data scientist left before analyzing the results. All they have is a piece of paper with a table on it (see below) and a glimmer of hope. Properly analyze the data showing your code. Then summarize the results.
library(readxl)
data <- read_excel("/Users/anacapa/sunny/mera/harrisburg/Term 3/ANLY 510 Analytics II/Labs/Lab 1/WeeklyLab1Data.xlsx")
data
## # A tibble: 18 x 4
## Audience Day Ad Rating
## <dbl> <dbl> <dbl> <dbl>
## 1 3 1 1 9
## 2 1 1 2 3
## 3 3 1 2 2
## 4 2 1 1 9
## 5 2 1 2 4
## 6 2 1 3 4
## 7 1 1 3 6
## 8 1 1 1 10
## 9 3 1 3 8
## 10 5 2 2 5
## 11 4 2 1 10
## 12 6 2 3 5
## 13 5 2 3 7
## 14 4 2 2 4
## 15 4 2 3 6
## 16 6 2 2 2
## 17 6 2 1 10
## 18 5 2 1 10
# Ensuring if 'Rating' is normally distributed
d <- density(data$Rating)
plot(d)
shapiro.test(data$Rating)
##
## Shapiro-Wilk normality test
##
## data: data$Rating
## W = 0.90503, p-value = 0.07036
library(moments)
#'Rating' is negatively skewed
agostino.test(data$Rating)
##
## D'Agostino skewness test
##
## data: data$Rating
## skew = -0.0068299, z = -0.0147850, p-value = 0.9882
## alternative hypothesis: data have a skewness
#'Rating' has kurtosis
anscombe.test(data$Rating)
##
## Anscombe-Glynn kurtosis test
##
## data: data$Rating
## kurt = 1.6127, z = -2.1684, p-value = 0.03013
## alternative hypothesis: kurtosis is not equal to 3
#The curve is distributed normally now, skewness is fixed
data$Rating2 <- log(data$Rating)
plot(density(data$Rating2))
#next step is to check moments
#Factorizing the variables
data$Audience <- factor(data$Audience)
data$Day <- factor(data$Day)
data$Ad <- factor(data$Ad)
bartlett.test(data$Rating2,data$Ad)
##
## Bartlett test of homogeneity of variances
##
## data: data$Rating2 and data$Ad
## Bartlett's K-squared = 11.874, df = 2, p-value = 0.002639
#Upon factorization, the below results reflect the homogeneity of variances
#Ad is significant
model <- aov(Rating~Day+Day/Audience+Ad, data=data)
model <- aov(Rating2~Day+Day/Audience+Ad, data=data)
summary(model)
## Df Sum Sq Mean Sq F value Pr(>F)
## Day 1 0.037 0.0366 0.497 0.496793
## Ad 2 3.798 1.8992 25.836 0.000112 ***
## Day:Audience 4 0.286 0.0716 0.974 0.463494
## Residuals 10 0.735 0.0735
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model <- aov(Rating2~Day+Day/Audience+Ad, data = data)
library(xtable)
table <- xtable(model)
qqnorm(model$residuals)
#We use TukeyHSD test
TukeyHSD(model, "Ad")
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Rating2 ~ Day + Day/Audience + Ad, data = data)
##
## $Ad
## diff lwr upr p adj
## 2-1 -1.1229760 -1.5520804 -0.69387167 0.0000812
## 3-1 -0.5000311 -0.9291355 -0.07092673 0.0237664
## 3-2 0.6229449 0.1938406 1.05204931 0.0066777
#From the result, we can get a comparison table with lwr and upr bounds of the differences.
#Ad 3 and Ad 2, Ad 2 and Ad 1 show significant differences, with Ad 1 being the best and Ad 3 better than Ad 2.
tapply(data$Rating2, data$Ad, mean)
## 1 2 3
## 2.267465 1.144489 1.767434
tapply(data$ Rating2, data$Ad, sd)
## 1 2 3
## 0.05440794 0.38539349 0.24520036
The original data was negatively skewed and we distributed it normally. Out of all the variables, only Ad was significant. Based on the results from Tukey method, Ad 1 (M = 2.27; SD = 0.05) is better than Ad 2 (M = 1.14; SD = 0.39). And Ad 3 (M = 1.77; SD = 0.25) also better than Ad 2. So Ad 1 being the best followed by Ad 3 and then Ad 2.