Nested effects (Hierarchical models): Commercial Ratings

The data file Weeklylab8data.xlsx contains participant attractiveness ratings of three versions of a commercial product. Each participant was assigned to a focus group (4-6 participants were in each focus group) and everyone in the focus group rated the same version of the product on a scale from 0 to 100 (larger numbers more support for the product). Focus groups were constructed in several different cities across the NE. Standard demographic questions were included so that they could be used to vary out their potential effects. Analyze the data, being sure to take into account the nesting structure, and summarize the results.

library(readxl)
## Warning: package 'readxl' was built under R version 3.4.4
Ratings <- read_excel("C:/Users/Enrique/OneDrive/Documents/HU/ANLY510_Principles7Applicaitons02/Data/CommercialRatings.xlsx")
str(Ratings)
## Classes 'tbl_df', 'tbl' and 'data.frame':    154 obs. of  8 variables:
##  $ Participant: num  1 2 3 4 5 6 7 8 9 10 ...
##  $ FocusGroup : num  1 1 1 1 1 2 2 2 2 3 ...
##  $ City       : chr  "PITT" "PITT" "PITT" "PITT" ...
##  $ Age        : num  20 57 63 42 51 48 58 63 19 29 ...
##  $ Gender     : num  0 0 0 0 1 1 0 0 1 0 ...
##  $ Income     : num  56594 114612 63011 23596 85726 ...
##  $ Version    : num  1 1 1 1 1 2 2 2 2 3 ...
##  $ Rating     : num  20.7 31.8 34.1 31.6 14.7 ...

Convert FocusGroup, Gender, and version to factors

Ratings$FocusGroup= factor(Ratings$FocusGroup)
Ratings$Gender= factor(Ratings$Gender)
Ratings$Version= factor(Ratings$Version)

By taking a first glance at the raw data we find the following to be of interest:

#The average rating is 49

#There are 7 focus groups

#The average participant is 41 years old

#There are 80 males and 74 females

summary(Ratings)
##   Participant       FocusGroup      City                Age        Gender
##  Min.   :  1.00   5      :  7   Length:154         Min.   :18.00   0:80  
##  1st Qu.: 39.25   10     :  7   Class :character   1st Qu.:29.25   1:74  
##  Median : 77.50   15     :  7   Mode  :character   Median :42.00         
##  Mean   : 77.50   20     :  7                      Mean   :41.47         
##  3rd Qu.:115.75   25     :  7                      3rd Qu.:52.00         
##  Max.   :154.00   30     :  7                      Max.   :65.00         
##                   (Other):112                                            
##      Income       Version     Rating       
##  Min.   : 12405   1:54    Min.   :  2.442  
##  1st Qu.: 43314   2:54    1st Qu.: 31.831  
##  Median : 67007   3:46    Median : 46.690  
##  Mean   : 67935           Mean   : 49.064  
##  3rd Qu.: 94708           3rd Qu.: 61.626  
##  Max.   :118533           Max.   :100.000  
## 

1. Test for normallity

Plot displays a normal distribution. Let’s test for skewness and normality to make sure.

plot(density(Ratings$Rating), col="red", lwd=2, lty="dashed")

skewness and normality tests failed. Transform data to eliminate skewness.

Add 75 points to each rating and then transform by taking the log.

library(moments)
Ratings$Rating= log(Ratings$Rating + 75)
agostino.test(Ratings$Rating)
## 
##  D'Agostino skewness test
## 
## data:  Ratings$Rating
## skew = 0.021208, z = 0.112150, p-value = 0.9107
## alternative hypothesis: data have a skewness
shapiro.test(Ratings$Rating)
## 
##  Shapiro-Wilk normality test
## 
## data:  Ratings$Rating
## W = 0.98408, p-value = 0.07344

Skewness is now non significant and data is closer to normality.

2. Visualize association

Lets plot some visuals to display the association between predictors and Satisfaction.

City: most cities have similar range of ratings, NYC seems to have the lowest.

boxplot(Ratings$Rating~Ratings$City, main="Ratings per City", xlab="City",
        ylab="Ratings", col=c(3:8), lwd=2)

Age - lack of linear relationship, abline is flat

plot(Ratings$Age, Ratings$Rating, main= "Ratings & Age")
abline(lm(Ratings$Rating~Ratings$Age), lwd=1, col=2, lty="dashed")

Gender - ratings do not differ significantly between genders

boxplot(Ratings$Rating~Ratings$Gender, main="Ratings per Gender", xlab="Gender",
        ylab="Ratings", col=c(4:5))

Income - lack of linear relationship, abline is flat

plot(Ratings$Income, Ratings$Rating, main= "Ratings & Income")
abline(lm(Ratings$Rating~Ratings$Income), lwd=1, col=2, lty="dashed")

version - version 3 yielded the highest ratings.

boxplot(Ratings$Rating~Ratings$Version, main="Ratings per Version", xlab="Version",
        ylab="Ratings", col=c(4:6))

Product versions seems to be the only predictor with a significant difference in ratings.

All othe predictor do not show a significant difference on regards to ratings within their own categories.

3. Fit the model

Age and income need to be standarized as they are continuous and might contain mullticollinearity that can obscure the results.

Ratings$Age=scale(Ratings$Age)
Ratings$Income=scale(Ratings$Income)

The model whos that only version 2 and 3 are statistically significant predictors of Ratings

library(lme4)
## Warning: package 'lme4' was built under R version 3.4.4
## Loading required package: Matrix
Model1=lmer(Rating~Age+Gender+Income+Version+(1|City:FocusGroup), data=Ratings)
summary(Model1)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Rating ~ Age + Gender + Income + Version + (1 | City:FocusGroup)
##    Data: Ratings
## 
## REML criterion at convergence: -195
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.4593 -0.6423  0.1107  0.6544  2.0342 
## 
## Random effects:
##  Groups          Name        Variance Std.Dev.
##  City:FocusGroup (Intercept) 0.006456 0.08035 
##  Residual                    0.010081 0.10040 
## Number of obs: 154, groups:  City:FocusGroup, 32
## 
## Fixed effects:
##              Estimate Std. Error t value
## (Intercept)  4.642745   0.030080 154.346
## Age         -0.007104   0.008715  -0.815
## Gender1     -0.026270   0.018061  -1.454
## Income      -0.003788   0.008559  -0.443
## Version2     0.194355   0.040281   4.825
## Version3     0.348927   0.040647   8.584
## 
## Correlation of Fixed Effects:
##          (Intr) Age    Gendr1 Income Versn2
## Age      -0.026                            
## Gender1  -0.264  0.042                     
## Income    0.026 -0.069 -0.105              
## Version2 -0.689 -0.008 -0.022  0.018       
## Version3 -0.684  0.040 -0.018 -0.004  0.514

Let’s verify the results by ploting residuals.

qqnorm(resid(Model1))
qqline(resid(Model1))

shapiro.test(resid(Model1))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(Model1)
## W = 0.98508, p-value = 0.09615

Residuals follow a normal distribution.

4. Conclusion

Product versions have the strongest relationship with Ratings among independent variables, thus product versions are the strongest predictors of ratings. Versions 2 and 3 have the highest signficance of association with ratings.