510 - Weekly Lab 1

Question

A client has come to you. There in-house data scientist has gone crazy and fled to study the social habits of monkeys in the Amazon. Unfortunately, they had just run an important study trying to determine their new ad campaign and their data scientist left before analyzing the results. All they have is a piece of paper with a table on it (see below) and a glimmer of hope. Properly analyze the data showing your code. Then summarize the results.

library(readxl)
data <- read_excel("/Users/anacapa/sunny/mera/harrisburg/Term 3/ANLY 510 Analytics II/Labs/Lab 1/WeeklyLab1Data.xlsx")
data

## # A tibble: 18 x 4
##    Audience   Day    Ad Rating
##       <dbl> <dbl> <dbl>  <dbl>
##  1        3     1     1      9
##  2        1     1     2      3
##  3        3     1     2      2
##  4        2     1     1      9
##  5        2     1     2      4
##  6        2     1     3      4
##  7        1     1     3      6
##  8        1     1     1     10
##  9        3     1     3      8
## 10        5     2     2      5
## 11        4     2     1     10
## 12        6     2     3      5
## 13        5     2     3      7
## 14        4     2     2      4
## 15        4     2     3      6
## 16        6     2     2      2
## 17        6     2     1     10
## 18        5     2     1     10

Check if data is normal

# Ensuring if 'Rating' is normally distributed
d <- density(data$Rating)
plot(d)

Shapiro test to check if data is normal

shapiro.test(data$Rating)

## 
##  Shapiro-Wilk normality test
## 
## data:  data$Rating
## W = 0.90503, p-value = 0.07036

Agostino test to check negative skewness

library(moments)
#'Rating' is negatively skewed
agostino.test(data$Rating)

## 
##  D'Agostino skewness test
## 
## data:  data$Rating
## skew = -0.0068299, z = -0.0147850, p-value = 0.9882
## alternative hypothesis: data have a skewness

Ascombe test to check negative skewness

#'Rating' has kurtosis
anscombe.test(data$Rating)

## 
##  Anscombe-Glynn kurtosis test
## 
## data:  data$Rating
## kurt = 1.6127, z = -2.1684, p-value = 0.03013
## alternative hypothesis: kurtosis is not equal to 3

Making the curve normal and removing negative skew

#The curve is distributed normally now, skewness is fixed
data$Rating2 <- log(data$Rating)
plot(density(data$Rating2))

Factoring data

#next step is to check moments
#Factorizing the variables
data$Audience <- factor(data$Audience)
data$Day <- factor(data$Day)
data$Ad <- factor(data$Ad)
bartlett.test(data$Rating2,data$Ad)

## 
##  Bartlett test of homogeneity of variances
## 
## data:  data$Rating2 and data$Ad
## Bartlett's K-squared = 11.874, df = 2, p-value = 0.002639

#Upon factorization, the below results reflect the homogeneity of variances
#Ad is significant
model <- aov(Rating~Day+Day/Audience+Ad, data=data)
model <- aov(Rating2~Day+Day/Audience+Ad, data=data)
summary(model)

##              Df Sum Sq Mean Sq F value   Pr(>F)    
## Day           1  0.037  0.0366   0.497 0.496793    
## Ad            2  3.798  1.8992  25.836 0.000112 ***
## Day:Audience  4  0.286  0.0716   0.974 0.463494    
## Residuals    10  0.735  0.0735                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Fit the data to Anova, check residuals

model <- aov(Rating2~Day+Day/Audience+Ad, data = data)
library(xtable)
table <- xtable(model)
qqnorm(model$residuals)

Check the impact of the ads

#We use TukeyHSD test
TukeyHSD(model, "Ad")

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Rating2 ~ Day + Day/Audience + Ad, data = data)
## 
## $Ad
##           diff        lwr         upr     p adj
## 2-1 -1.1229760 -1.5520804 -0.69387167 0.0000812
## 3-1 -0.5000311 -0.9291355 -0.07092673 0.0237664
## 3-2  0.6229449  0.1938406  1.05204931 0.0066777

#From the result, we can get a comparison table with lwr and upr bounds of the differences. 
#Ad 3 and Ad 2, Ad 2 and Ad 1 show significant differences, with Ad 1 being the best and Ad 3 better than Ad 2.

tapply(data$Rating2, data$Ad, mean)

##        1        2        3 
## 2.267465 1.144489 1.767434

tapply(data$ Rating2, data$Ad, sd)

##          1          2          3 
## 0.05440794 0.38539349 0.24520036

Conclusion

The original data was negatively skewed and we distributed it normally. Out of all the variables, only Ad was significant. Based on the results from Tukey method, Ad 1 (M = 2.27; SD = 0.05) is better than Ad 2 (M = 1.14; SD = 0.39). And Ad 3 (M = 1.77; SD = 0.25) also better than Ad 2. So Ad 1 being the best followed by Ad 3 and then Ad 2.