Importing Data
setwd("C:/Users/nazan/Desktop/ANLY510/Weekly Lab datafiles-20180511")
library(xlsx)
## Loading required package: rJava
## Loading required package: xlsxjars
df <- read.xlsx2 ("weeklylab3.xlsx",1)
df
## Time Audience Day Ad Rating
## 1 Afternoon 3 1 1 9
## 2 Morning 5 2 2 9
## 3 Morning 4 2 1 10
## 4 Afternoon 1 1 2 5
## 5 Evening 3 1 2 2
## 6 Evening 2 1 1 8
## 7 Morning 2 1 2 9
## 8 Morning 6 2 3 9
## 9 Afternoon 5 2 3 8
## 10 Afternoon 2 1 3 8
## 11 Afternoon 4 2 2 4
## 12 Evening 4 2 3 8
## 13 Evening 6 2 2 2
## 14 Evening 1 1 3 7
## 15 Morning 1 1 1 10
## 16 Afternoon 6 2 1 10
## 17 Morning 3 1 3 8
## 18 Evening 5 2 1 6
class(df)
## [1] "data.frame"
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
glimpse(df)
## Observations: 18
## Variables: 5
## $ Time <fct> Afternoon, Morning, Morning, Afternoon, Evening, Even...
## $ Audience <fct> 3, 5, 4, 1, 3, 2, 2, 6, 5, 2, 4, 4, 6, 1, 1, 6, 3, 5
## $ Day <fct> 1, 2, 2, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 1, 1, 2, 1, 2
## $ Ad <fct> 1, 2, 1, 2, 2, 1, 2, 3, 3, 3, 2, 3, 2, 3, 1, 1, 3, 1
## $ Rating <fct> 9, 9, 10, 5, 2, 8, 9, 9, 8, 8, 4, 8, 2, 7, 10, 10, 8, 6
df2=df
df2[ , 2:ncol(df2) ] = sapply(df2[ , 2:ncol(df2) ], as.character)
df2[ , 2:ncol(df2) ] = sapply(df2[ , 2:ncol(df2) ], as.numeric) ###Redefine factor variables as a numeric variables
To get definitions of the columns type
glimpse(df2)
## Observations: 18
## Variables: 5
## $ Time <fct> Afternoon, Morning, Morning, Afternoon, Evening, Even...
## $ Audience <dbl> 3, 5, 4, 1, 3, 2, 2, 6, 5, 2, 4, 4, 6, 1, 1, 6, 3, 5
## $ Day <dbl> 1, 2, 2, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 1, 1, 2, 1, 2
## $ Ad <dbl> 1, 2, 1, 2, 2, 1, 2, 3, 3, 3, 2, 3, 2, 3, 1, 1, 3, 1
## $ Rating <dbl> 9, 9, 10, 5, 2, 8, 9, 9, 8, 8, 4, 8, 2, 7, 10, 10, 8, 6
plot(density(df2$Rating)) ###To see how normal data has spread

shapiro.test(df2$Rating) ###Confirmation test for skewness
##
## Shapiro-Wilk normality test
##
## data: df2$Rating
## W = 0.84697, p-value = 0.007589
library(moments)
agostino.test(df2$Rating) ###having significant skew.p value less than alpha==>reject null hypothesis
##
## D'Agostino skewness test
##
## data: df2$Rating
## skew = -1.0248, z = -2.0462, p-value = 0.04074
## alternative hypothesis: data have a skewness
skewness(df2$Rating) ###Rating is negatively skewed (-1.024784)
## [1] -1.024784
For negative skew: Square the observations==>Creating a new variable==>Rating2.
df2$Rating2 <- df2$Rating^2
df2$Rating2
## [1] 81 81 100 25 4 64 81 81 64 64 16 64 4 49 100 100 64
## [18] 36
agostino.test(df2$Rating2) ###Skewness is reduced.p value >alpha==> fail to reject null hypothesis
##
## D'Agostino skewness test
##
## data: df2$Rating2
## skew = -0.50677, z = -1.07270, p-value = 0.2834
## alternative hypothesis: data have a skewness
plot(density(df2$Rating2))

Test for our factor of interest
bartlett.test(df2$Rating2~df2$Ad) ###This one is significant but from the output we can see that the p-value of 0.04827 is less than the significance level of 0.05. This means we can reject the null hypothesis that the variance is the same for all treatment groups.
##
## Bartlett test of homogeneity of variances
##
## data: df2$Rating2 by df2$Ad
## Bartlett's K-squared = 6.0618, df = 2, p-value = 0.04827
bartlett.test(df2$Rating2~df2$Time)
##
## Bartlett test of homogeneity of variances
##
## data: df2$Rating2 by df2$Time
## Bartlett's K-squared = 3.1002, df = 2, p-value = 0.2122
bartlett.test(df2$Rating2~df2$Day)
##
## Bartlett test of homogeneity of variances
##
## data: df2$Rating2 by df2$Day
## Bartlett's K-squared = 0.20589, df = 1, p-value = 0.65
bartlett.test(df2$Rating2~df2$Audience)
##
## Bartlett test of homogeneity of variances
##
## data: df2$Rating2 by df2$Audience
## Bartlett's K-squared = 3.9927, df = 5, p-value = 0.5505
We have a significant effect but it is right on the verge==>Check how big it is
tapply(df2$Rating2, df2$Ad, var)
## 1 2 3
## 678.5667 1322.9667 102.6667
Running a model to see if Ad is significant,Day 1 and Day 2 are replicates
model <- aov(Rating2~Day+Day/Audience+Time+Ad, data = df2)
model
## Call:
## aov(formula = Rating2 ~ Day + Day/Audience + Time + Ad, data = df2)
##
## Terms:
## Day Time Ad Day:Audience Residuals
## Sum of Squares 10.889 6838.111 6252.778 610.222 3061.778
## Deg. of Freedom 1 2 2 4 8
##
## Residual standard error: 19.56329
## 6 out of 16 effects not estimable
## Estimated effects may be unbalanced
summary(model)###Time and Ad are significant
## Df Sum Sq Mean Sq F value Pr(>F)
## Day 1 11 11 0.028 0.87024
## Time 2 6838 3419 8.934 0.00915 **
## Ad 2 6253 3126 8.169 0.01167 *
## Day:Audience 4 610 153 0.399 0.80469
## Residuals 8 3062 383
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Making a table of findings
library(xtable)
table <- xtable(model)
table
## % latex table generated in R 3.5.0 by xtable 1.8-2 package
## % Tue Jun 05 09:36:35 2018
## \begin{table}[ht]
## \centering
## \begin{tabular}{lrrrrr}
## \hline
## & Df & Sum Sq & Mean Sq & F value & Pr($>$F) \\
## \hline
## Day & 1 & 10.89 & 10.89 & 0.03 & 0.8702 \\
## Time & 2 & 6838.11 & 3419.06 & 8.93 & 0.0091 \\
## Ad & 2 & 6252.78 & 3126.39 & 8.17 & 0.0117 \\
## Day:Audience & 4 & 610.22 & 152.56 & 0.40 & 0.8047 \\
## Residuals & 8 & 3061.78 & 382.72 & & \\
## \hline
## \end{tabular}
## \end{table}
Cheking normality of residuals
qqnorm(model$residuals)

check for where the difference is for the main effect of Ad(Test all possible pairwise comparisons)
TukeyHSD(model, "Ad")
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Rating2 ~ Day + Day/Audience + Time + Ad, data = df2)
##
## $Ad
## diff lwr upr p adj
## 2-1 -45.00000 -77.274460 -12.72554 0.0100098
## 3-1 -15.83333 -48.107793 16.44113 0.3846714
## 3-2 29.16667 -3.107793 61.44113 0.0749544
From the result, we can get a comparison table with lwr and upr bounds of the differences.
Ad 2 and Ad 1 show significant differences, with Ad 2 better than Ad 1.
Get the means and sd for summary
tapply(df2$Rating2, df2$Ad, mean)
## 1 2 3
## 80.16667 35.16667 64.33333
tapply(df2$Rating2, df2$Ad, sd)
## 1 2 3
## 26.04931 36.37261 10.13246
The original data was negatively skewed and we distributed it normally. Out of all the variables, only Ad (with 3 different levels) was significant variable (The main effect of Ad was significant (pvalue=0.011)). Data analyzed as a repeated Latin square with focus groups differing across replications (Days).Based on the results from Tukey method, Ad 1 (M=80.16667 ,SD=26.04931)is significantly more positive than Ad 2(M=35.16667,SD=36.37261). And Ad 3 also better than Ad 2. So Ad 1 being the best followed by Ad 3 and then Ad 2 .