ANLY510

Understanding the Dataset

summary(Lab1)

##     Audience        Day            Ad        Rating      
##  Min.   :1.0   Min.   :1.0   Min.   :1   Min.   : 2.000  
##  1st Qu.:2.0   1st Qu.:1.0   1st Qu.:1   1st Qu.: 4.000  
##  Median :3.5   Median :1.5   Median :2   Median : 6.000  
##  Mean   :3.5   Mean   :1.5   Mean   :2   Mean   : 6.333  
##  3rd Qu.:5.0   3rd Qu.:2.0   3rd Qu.:3   3rd Qu.: 9.000  
##  Max.   :6.0   Max.   :2.0   Max.   :3   Max.   :10.000

str(Lab1)

## 'data.frame':    18 obs. of  4 variables:
##  $ Audience: int  3 1 3 2 2 2 1 1 3 5 ...
##  $ Day     : int  1 1 1 1 1 1 1 1 1 2 ...
##  $ Ad      : int  1 2 2 1 2 3 3 1 3 2 ...
##  $ Rating  : int  9 3 2 9 4 4 6 10 8 5 ...

Lab1$Day <- as.factor(Lab1$Day)
Lab1$Ad <- as.factor(Lab1$Ad)

ggplot(Lab1,aes(Lab1$Ad,Lab1$Rating))+
  geom_bar(stat = "identity")+
  theme_classic()+
  ggtitle("Rating vs. Ad")+
  xlab("Ad")+
  ylab("Rating")

ggplot(Lab1,aes(Lab1$Rating))+
  geom_histogram(binwidth = 0.5)+
  theme_classic()+
  xlab("Rating")

ggplot(Lab1,aes(x=Lab1$Rating,fill=Lab1$Day))+
  geom_density(alpha= 0.3)+
  xlab("Rating")+
  ggtitle("Rating by")

plot(density(Lab1$Rating))

ggplot(Lab1,aes(x=Lab1$Rating))+
  stat_density(aes(group = Lab1$Rating, color = Lab1$Rating), position = "identity", geom = "line")

## Warning: Groups with fewer than two data points have been dropped.

## Warning: Groups with fewer than two data points have been dropped.

## Warning: Groups with fewer than two data points have been dropped.

## Warning: Removed 3 rows containing missing values (geom_path).

Conducted quick summary analysis on dataset and generated various visualizations including histogram, density chart to better understand dataset. Converted Day and Ad into 2-Level and 3-Level factor respectively.

Skewness Test

## 
##  D'Agostino skewness test
## 
## data:  Lab1$Rating
## skew = -0.0068299, z = -0.0147850, p-value = 0.9882
## alternative hypothesis: data have a skewness

## 
##  Anscombe-Glynn kurtosis test
## 
## data:  Lab1$Rating
## kurt = 1.6127, z = -2.1684, p-value = 0.03013
## alternative hypothesis: kurtosis is not equal to 3

## [1] -0.006829878

Dataset appears to be negatively skewed,

Addressing Negative Skew

Lab1$Rating2 <- as.integer(Lab1$Rating^2)
Lab1$Rating2

##  [1]  81   9   4  81  16  16  36 100  64  25 100  25  49  16  36   4 100
## [18] 100

Since the dataset is negative skew, proceed to squre data.

Check again for skewness

agostino.test(Lab1$Rating2)

## 
##  D'Agostino skewness test
## 
## data:  Lab1$Rating2
## skew = 0.35169, z = 0.75291, p-value = 0.4515
## alternative hypothesis: data have a skewness

skewness(Lab1$Rating2)

## [1] 0.3516884

plot(density(Lab1$Rating2))

Data is now positive skew.

Factor Test (Bartlett)

Lab1$Audience <- as.factor(Lab1$Audience)
str(Lab1)

## 'data.frame':    18 obs. of  5 variables:
##  $ Audience: Factor w/ 6 levels "1","2","3","4",..: 3 1 3 2 2 2 1 1 3 5 ...
##  $ Day     : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 2 ...
##  $ Ad      : Factor w/ 3 levels "1","2","3": 1 2 2 1 2 3 3 1 3 2 ...
##  $ Rating  : int  9 3 2 9 4 4 6 10 8 5 ...
##  $ Rating2 : int  81 9 4 81 16 16 36 100 64 25 ...

bartlett.test(Lab1$Rating2~Lab1$Day)

## 
##  Bartlett test of homogeneity of variances
## 
## data:  Lab1$Rating2 by Lab1$Day
## Bartlett's K-squared = 0.032222, df = 1, p-value = 0.8575

bartlett.test(Lab1$Rating2~Lab1$Audience)

## 
##  Bartlett test of homogeneity of variances
## 
## data:  Lab1$Rating2 by Lab1$Audience
## Bartlett's K-squared = 0.23195, df = 5, p-value = 0.9987

bartlett.test(Lab1$Rating2~Lab1$Ad)

## 
##  Bartlett test of homogeneity of variances
## 
## data:  Lab1$Rating2 by Lab1$Ad
## Bartlett's K-squared = 2.8134, df = 2, p-value = 0.245

All P-value appear to be greater than 0.05. Good to proceed.

Model

model <- aov(Rating2 ~ Day + Day/Audience + Ad, data = Lab1)
model

## Call:
##    aov(formula = Rating2 ~ Day + Day/Audience + Ad, data = Lab1)
## 
## Terms:
##                       Day        Ad Day:Audience Residuals
## Sum of Squares    128.000 20785.778      597.111  1550.889
## Deg. of Freedom         1         2            4        10
## 
## Residual standard error: 12.45347
## 6 out of 14 effects not estimable
## Estimated effects may be unbalanced

summary(model)

##              Df Sum Sq Mean Sq F value   Pr(>F)    
## Day           1    128     128   0.825    0.385    
## Ad            2  20786   10393  67.012 1.61e-06 ***
## Day:Audience  4    597     149   0.963    0.469    
## Residuals    10   1551     155                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Construct a model for the dataset, and Ad is significant.

TukeyHSD(model,"Ad")

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Rating2 ~ Day + Day/Audience + Ad, data = Lab1)
## 
## $Ad
##          diff         lwr       upr     p adj
## 2-1 -81.33333 -101.043283 -61.62338 0.0000014
## 3-1 -56.00000  -75.709949 -36.29005 0.0000402
## 3-2  25.33333    5.623384  45.04328 0.0138864

Use TukeyHSD test to evaluate “Ad”. Lower and upper bounds of differences are displayed. All 3 Ad shows differences, and Ad-1 appears to be the better one.

Analysis Summary

After conducting analysis on the dataset, it appears that “Ad” is the only variable that was significant. According to the results from TukeyHSD test, Ad-1 appears to be best, Ad-3 comes after, and Ad-2 being the last.

ANLY510_W1

Michael Jiayang Chen