ANLY 510-90- O-2018/Summer - Analytics II: Principles & Appl -Weekly lab #2 Assignment

Importing Data

setwd("C:/Users/nazan/Desktop/ANLY510/Weekly Lab datafiles-20180511")
library(xlsx)

## Loading required package: rJava

## Loading required package: xlsxjars

df <- read.xlsx2 ("weeklylab3.xlsx",1)
df

##         Time Audience Day Ad Rating
## 1  Afternoon        3   1  1      9
## 2    Morning        5   2  2      9
## 3    Morning        4   2  1     10
## 4  Afternoon        1   1  2      5
## 5    Evening        3   1  2      2
## 6    Evening        2   1  1      8
## 7    Morning        2   1  2      9
## 8    Morning        6   2  3      9
## 9  Afternoon        5   2  3      8
## 10 Afternoon        2   1  3      8
## 11 Afternoon        4   2  2      4
## 12   Evening        4   2  3      8
## 13   Evening        6   2  2      2
## 14   Evening        1   1  3      7
## 15   Morning        1   1  1     10
## 16 Afternoon        6   2  1     10
## 17   Morning        3   1  3      8
## 18   Evening        5   2  1      6

class(df)

## [1] "data.frame"

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

glimpse(df)

## Observations: 18
## Variables: 5
## $ Time     <fct> Afternoon, Morning, Morning, Afternoon, Evening, Even...
## $ Audience <fct> 3, 5, 4, 1, 3, 2, 2, 6, 5, 2, 4, 4, 6, 1, 1, 6, 3, 5
## $ Day      <fct> 1, 2, 2, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 1, 1, 2, 1, 2
## $ Ad       <fct> 1, 2, 1, 2, 2, 1, 2, 3, 3, 3, 2, 3, 2, 3, 1, 1, 3, 1
## $ Rating   <fct> 9, 9, 10, 5, 2, 8, 9, 9, 8, 8, 4, 8, 2, 7, 10, 10, 8, 6

df2=df
df2[ , 2:ncol(df2) ]  =  sapply(df2[ , 2:ncol(df2) ], as.character)

df2[ , 2:ncol(df2) ]  =  sapply(df2[ , 2:ncol(df2) ], as.numeric) ###Redefine factor variables as a numeric variables

To get definitions of the columns type

glimpse(df2)

## Observations: 18
## Variables: 5
## $ Time     <fct> Afternoon, Morning, Morning, Afternoon, Evening, Even...
## $ Audience <dbl> 3, 5, 4, 1, 3, 2, 2, 6, 5, 2, 4, 4, 6, 1, 1, 6, 3, 5
## $ Day      <dbl> 1, 2, 2, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 1, 1, 2, 1, 2
## $ Ad       <dbl> 1, 2, 1, 2, 2, 1, 2, 3, 3, 3, 2, 3, 2, 3, 1, 1, 3, 1
## $ Rating   <dbl> 9, 9, 10, 5, 2, 8, 9, 9, 8, 8, 4, 8, 2, 7, 10, 10, 8, 6

plot(density(df2$Rating)) ###To see how normal data has spread

shapiro.test(df2$Rating) ###Confirmation test for skewness

## 
##  Shapiro-Wilk normality test
## 
## data:  df2$Rating
## W = 0.84697, p-value = 0.007589

library(moments)
agostino.test(df2$Rating) ###having significant skew.p value less than alpha==>reject null hypothesis

## 
##  D'Agostino skewness test
## 
## data:  df2$Rating
## skew = -1.0248, z = -2.0462, p-value = 0.04074
## alternative hypothesis: data have a skewness

skewness(df2$Rating) ###Rating is negatively skewed (-1.024784)

## [1] -1.024784

For negative skew: Square the observations==>Creating a new variable==>Rating2.

df2$Rating2 <- df2$Rating^2
df2$Rating2

##  [1]  81  81 100  25   4  64  81  81  64  64  16  64   4  49 100 100  64
## [18]  36

agostino.test(df2$Rating2) ###Skewness is reduced.p value >alpha==> fail to reject null hypothesis

## 
##  D'Agostino skewness test
## 
## data:  df2$Rating2
## skew = -0.50677, z = -1.07270, p-value = 0.2834
## alternative hypothesis: data have a skewness

plot(density(df2$Rating2))

Transforming variables from numeric to factor to test for our factor of interest

df2$Ad <- factor(df2$Ad)
df2$Day <- factor(df2$Day)
df2$Audience <- factor(df2$Audience)
df2$Time <- factor(df2$Time)

Test for our factor of interest

bartlett.test(df2$Rating2~df2$Ad) ###This one is significant but from the output we can see that the p-value of 0.04827 is less than the significance level of 0.05. This means we can reject the null hypothesis that the variance is the same for all treatment groups.

## 
##  Bartlett test of homogeneity of variances
## 
## data:  df2$Rating2 by df2$Ad
## Bartlett's K-squared = 6.0618, df = 2, p-value = 0.04827

bartlett.test(df2$Rating2~df2$Time)

## 
##  Bartlett test of homogeneity of variances
## 
## data:  df2$Rating2 by df2$Time
## Bartlett's K-squared = 3.1002, df = 2, p-value = 0.2122

bartlett.test(df2$Rating2~df2$Day)

## 
##  Bartlett test of homogeneity of variances
## 
## data:  df2$Rating2 by df2$Day
## Bartlett's K-squared = 0.20589, df = 1, p-value = 0.65

bartlett.test(df2$Rating2~df2$Audience)

## 
##  Bartlett test of homogeneity of variances
## 
## data:  df2$Rating2 by df2$Audience
## Bartlett's K-squared = 3.9927, df = 5, p-value = 0.5505

We have a significant effect but it is right on the verge==>Check how big it is

tapply(df2$Rating2, df2$Ad, var)

##         1         2         3 
##  678.5667 1322.9667  102.6667

Running a model to see if Ad is significant,Day 1 and Day 2 are replicates

model <- aov(Rating2~Day+Day/Audience+Time+Ad, data = df2)
model

## Call:
##    aov(formula = Rating2 ~ Day + Day/Audience + Time + Ad, data = df2)
## 
## Terms:
##                      Day     Time       Ad Day:Audience Residuals
## Sum of Squares    10.889 6838.111 6252.778      610.222  3061.778
## Deg. of Freedom        1        2        2            4         8
## 
## Residual standard error: 19.56329
## 6 out of 16 effects not estimable
## Estimated effects may be unbalanced

summary(model)###Time and Ad are significant

##              Df Sum Sq Mean Sq F value  Pr(>F)   
## Day           1     11      11   0.028 0.87024   
## Time          2   6838    3419   8.934 0.00915 **
## Ad            2   6253    3126   8.169 0.01167 * 
## Day:Audience  4    610     153   0.399 0.80469   
## Residuals     8   3062     383                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Making a table of findings

library(xtable)
table <- xtable(model)
table

## % latex table generated in R 3.5.0 by xtable 1.8-2 package
## % Tue Jun 05 09:36:35 2018
## \begin{table}[ht]
## \centering
## \begin{tabular}{lrrrrr}
##   \hline
##  & Df & Sum Sq & Mean Sq & F value & Pr($>$F) \\ 
##   \hline
## Day & 1 & 10.89 & 10.89 & 0.03 & 0.8702 \\ 
##   Time & 2 & 6838.11 & 3419.06 & 8.93 & 0.0091 \\ 
##   Ad & 2 & 6252.78 & 3126.39 & 8.17 & 0.0117 \\ 
##   Day:Audience & 4 & 610.22 & 152.56 & 0.40 & 0.8047 \\ 
##   Residuals & 8 & 3061.78 & 382.72 &  &  \\ 
##    \hline
## \end{tabular}
## \end{table}

Cheking normality of residuals

qqnorm(model$residuals)

check for where the difference is for the main effect of Ad(Test all possible pairwise comparisons)

TukeyHSD(model, "Ad")

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Rating2 ~ Day + Day/Audience + Time + Ad, data = df2)
## 
## $Ad
##          diff        lwr       upr     p adj
## 2-1 -45.00000 -77.274460 -12.72554 0.0100098
## 3-1 -15.83333 -48.107793  16.44113 0.3846714
## 3-2  29.16667  -3.107793  61.44113 0.0749544

From the result, we can get a comparison table with lwr and upr bounds of the differences.

Ad 2 and Ad 1 show significant differences, with Ad 2 better than Ad 1.

Get the means and sd for summary

tapply(df2$Rating2, df2$Ad, mean)

##        1        2        3 
## 80.16667 35.16667 64.33333

tapply(df2$Rating2, df2$Ad, sd)

##        1        2        3 
## 26.04931 36.37261 10.13246

The original data was negatively skewed and we distributed it normally. Out of all the variables, only Ad (with 3 different levels) was significant variable (The main effect of Ad was significant (pvalue=0.011)). Data analyzed as a repeated Latin square with focus groups differing across replications (Days).Based on the results from Tukey method, Ad 1 (M=80.16667 ,SD=26.04931)is significantly more positive than Ad 2(M=35.16667,SD=36.37261). And Ad 3 also better than Ad 2. So Ad 1 being the best followed by Ad 3 and then Ad 2 .

ANLY 510-90- O-2018/Summer - Analytics II: Principles & Appl -Weekly lab #2 Assignment

Nazanin Yousefzadeh

June 5, 2018

Importing Data

To get definitions of the columns type

For negative skew: Square the observations==>Creating a new variable==>Rating2.

Transforming variables from numeric to factor to test for our factor of interest

Test for our factor of interest

We have a significant effect but it is right on the verge==>Check how big it is

Running a model to see if Ad is significant,Day 1 and Day 2 are replicates

Making a table of findings

Cheking normality of residuals

check for where the difference is for the main effect of Ad(Test all possible pairwise comparisons)

From the result, we can get a comparison table with lwr and upr bounds of the differences.

Ad 2 and Ad 1 show significant differences, with Ad 2 better than Ad 1.

Get the means and sd for summary