In this experiment we study the impact of 4 factors on two response varibales. As a result we state our Null Hypothesis as:
H01: The variance in the number of items offered to the customer cannot be explained by the type of MNL_CE models used, the variability in parameters z, alpha and j or the interactions between these factors.
H02:The variance in the minimum no-choice probability value cannot be explained by the type of MNL_CE models used, the variability in parameters z, alpha and j or the interactions between these factors.
The following data is generated by solving the assortment optimization problem under the four MNL_CE models developed to capture the context effects associated with information overload. These models also incorporate notion of novelty and diversity and the moderating influence of the viewing devices. The data set below shows the output : Mean of Optimal number of items to be offered to 100 customers and the mean of no-choice probability value. The summary of the dataset is presented below:
## Models z alpha j
## MNL-CE1l :15 Min. : 0 Min. :0.001 Min. : 0.00
## MNL-CE1nl:15 1st Qu.:40 1st Qu.:0.001 1st Qu.: 0.75
## MNL-CE2l :45 Median :50 Median :0.010 Median : 5.50
## MNL-CE2nl:45 Mean :46 Mean :0.037 Mean : 7.75
## 3rd Qu.:60 3rd Qu.:0.100 3rd Qu.:12.50
## Max. :80 Max. :0.100 Max. :20.00
## Number.of.Items No.Choice St.dev.for.items
## Min. : 1.985 Min. :2.227e-05 Min. : 0.000
## 1st Qu.: 62.203 1st Qu.:1.263e-04 1st Qu.: 0.000
## Median : 98.700 Median :2.035e-04 Median : 3.699
## Mean :184.865 Mean :3.936e-04 Mean : 12.949
## 3rd Qu.:204.000 3rd Qu.:2.721e-04 3rd Qu.: 13.995
## Max. :907.200 Max. :9.419e-03 Max. :100.356
## St.dev.for.no.choice
## Min. : 0
## 1st Qu.: 0
## Median : 0
## Mean : 10913
## 3rd Qu.: 0
## Max. :1309609
Here we describe the factors ()4 of them and 2 response variables.
## Save variables as factors.
Models<-as.factor(FinalDOE$Models)
z<-as.factor(FinalDOE$z)
alpha<-as.factor(FinalDOE$alpha)
j<-as.factor(FinalDOE$j)
Number<-(FinalDOE$Number.of.Items)
Probability<-(FinalDOE$No.Choice)
##Create dataframe
df = data.frame(Models,z,alpha,j,Number,Probability)
## Generate four plots.
par(mfrow=c(2,2))
qqnorm(Number)
qqline(Number, col = 1)
boxplot(Number, horizontal=TRUE, main="Box Plot", xlab="Number of items to recommend")
hist(Number, main="Histogram", xlab="Number of items to recommend")
plot(Probability, Number, xlab="No Choice Probability", ylab="Number of items to recommend",
main="Relationship between two response variables")
par(mfrow=c(1,1))
The response variable (Number of items to recommend) shows an exponential decay with a small number of items appearing at around 900 items. For the most part of the data, number of items to recommend is concentrated in the 0-200 range.
par(mfrow=c(2,3))
boxplot(Number~Models, data=df, main="Number of items vs. Models",
xlab="Models",ylab="Number of items to recommend")
boxplot(Number~z, data=df, main="Number of items vs. z value",
xlab="z-value",ylab="Number of items to recommend")
boxplot(Number~alpha, data=df, main="Number of items vs. alpha",
xlab="Alpha values",ylab="Number of items to recommend")
boxplot(Number~j, data=df, main="Number of items vs. j",
xlab="Items per page",ylab="Number of items to recommend")
par(mfrow=c(1,1))
##Observations for No-choice
par(mfrow=c(2,3))
boxplot(Probability~Models, data=df, main="No-choice vs. Models",
xlab="Models",ylab="Number of items to recommend")
boxplot(Probability~z, data=df, main="No-choice vs. z value",
xlab="z-value",ylab="Number of items to recommend")
boxplot(Probability~alpha, data=df, main="No-choice vs. alpha",
xlab="Alpha values",ylab="Number of items to recommend")
boxplot(Probability~j, data=df, main="No-choice vs. j",
xlab="Items per page",ylab="Number of items to recommend")
par(mfrow=c(1,1))
For a full factorial design we can fit a model with the four main-effects terms, and all interation terms (from 2 to 4). However, we assume that the high level interaction terms do not have a high impact. Therefore, we consider upto three level interactions. We perform ANOVA to find the impact of each factor on both the response variables.
q = aov(Number~(Models+z+alpha+j)^3,data=df)
summary.aov(q)
## Df Sum Sq Mean Sq F value Pr(>F)
## Models 3 960784 320261 1948.876 < 2e-16 ***
## z 4 48289 12072 73.463 4.34e-10 ***
## alpha 2 1692051 846025 5148.290 < 2e-16 ***
## j 2 144088 72044 438.408 1.06e-14 ***
## Models:z 12 6189 516 3.138 0.01754 *
## Models:alpha 6 1571697 261949 1594.032 < 2e-16 ***
## Models:j 2 238368 119184 725.267 < 2e-16 ***
## z:alpha 8 5734 717 4.362 0.00592 **
## z:j 8 1418 177 1.079 0.42454
## alpha:j 4 43399 10850 66.023 9.69e-10 ***
## Models:z:alpha 24 8594 358 2.179 0.05546 .
## Models:z:j 8 2020 253 1.537 0.22077
## Models:alpha:j 4 115727 28932 176.057 5.23e-13 ***
## z:alpha:j 16 2094 131 0.797 0.67268
## Residuals 16 2629 164
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the results above, it is clear that the interaction terms between factors z and j have no impact on the number of items. Interaction between z, alpha and j also produce no effect. Summary of the model: Residual standard error: 12.82 on 16 degrees of freedom Multiple R-squared: 0.9995, Adjusted R-squared: 0.996 F-statistic: 286 on 103 and 16 DF, p-value: < 2.2e-16 The R-squared values suggest that the model is a good fit.
p=aov(Probability~(Models+z+alpha+j)^3,data=df)
summary.aov(p)
## Df Sum Sq Mean Sq F value Pr(>F)
## Models 3 6.462e-06 2.154e-06 12.493 0.000184 ***
## z 4 2.048e-05 5.120e-06 29.696 3.17e-07 ***
## alpha 2 8.217e-06 4.108e-06 23.828 1.59e-05 ***
## j 2 1.125e-06 5.620e-07 3.262 0.064822 .
## Models:z 12 1.937e-05 1.614e-06 9.364 4.18e-05 ***
## Models:alpha 6 6.573e-06 1.096e-06 6.354 0.001429 **
## Models:j 2 9.480e-07 4.740e-07 2.749 0.094152 .
## z:alpha 8 2.382e-05 2.978e-06 17.271 1.66e-06 ***
## z:j 8 3.165e-06 3.960e-07 2.295 0.074888 .
## alpha:j 4 1.112e-06 2.780e-07 1.613 0.219450
## Models:z:alpha 24 2.798e-05 1.166e-06 6.762 0.000128 ***
## Models:z:j 8 2.725e-06 3.410e-07 1.975 0.117326
## Models:alpha:j 4 6.680e-07 1.670e-07 0.969 0.451489
## z:alpha:j 16 3.656e-06 2.290e-07 1.325 0.289899
## Residuals 16 2.759e-06 1.720e-07
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#summary.lm(p)
The no-choice prbability values have a significant impact of all main effects (slightly less for the factor j). The interaction between alpha and j is not significant. Most of the higher order terms sre in gignificant except the interation effect between the model types, z-value and j-value.Summary of the model: Residual standard error: 0.0004152 on 16 degrees of freedom Multiple R-squared: 0.9786, Adjusted R-squared: 0.841 F-statistic: 7.112 on 103 and 16 DF, p-value: 3.431e-05
The dependence of both the response variable on the independent variable ‘Models’ is of significance to us. A different choice environment is represented by each model. Therefore, we analyze the statistical significance of this ‘factor’. All four models have a significant impact on the optimal assortment size with a R-Squared value of 0.5659. However, only the non-linear models impact the objective function value i.e. no-choice probability significantly. An important observation here is that all models irrespective of the input paramters and environment achieve a comparable objective function value. As a resul the variance in the ‘model type’ has little or no impact on objective function value (no-choice probability).
##Number of items to recommend
summary(lm(Number~Models-1))
##
## Call:
## lm(formula = Number ~ Models - 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -333.54 -92.79 -30.64 87.45 564.91
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## ModelsMNL-CE1l 342.29 47.24 7.246 5.11e-11 ***
## ModelsMNL-CE1nl 99.36 47.24 2.103 0.037587 *
## ModelsMNL-CE2l 244.87 27.27 8.979 5.82e-15 ***
## ModelsMNL-CE2nl 100.89 27.27 3.699 0.000332 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 182.9 on 116 degrees of freedom
## Multiple R-squared: 0.5659, Adjusted R-squared: 0.551
## F-statistic: 37.81 on 4 and 116 DF, p-value: < 2.2e-16
##No-Choice Probability
summary(lm(Probability~Models-1))
##
## Call:
## lm(formula = Probability ~ Models - 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0007553 -0.0002671 -0.0001176 -0.0000263 0.0085017
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## ModelsMNL-CE1l 0.0003425 0.0002654 1.290 0.199584
## ModelsMNL-CE1nl 0.0009173 0.0002654 3.456 0.000768 ***
## ModelsMNL-CE2l 0.0001754 0.0001533 1.144 0.254869
## ModelsMNL-CE2nl 0.0004544 0.0001533 2.965 0.003678 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.001028 on 116 degrees of freedom
## Multiple R-squared: 0.1697, Adjusted R-squared: 0.141
## F-statistic: 5.926 on 4 and 116 DF, p-value: 0.0002246
t1<-TukeyHSD(q)
t2<-TukeyHSD(p)
Both the models above show that all terms are not significant. We perform a step-wise regression to eliminate non-significant terms.
## Stepwise regression based on AIC.
sreg1 = step(q,direction="backward")
## Start: AIC=578.44
## Number ~ (Models + z + alpha + j)^3
##
## Df Sum of Sq RSS AIC
## <none> 2629 578.44
## - z:alpha:j 16 2094 4724 616.74
## - Models:z:j 8 2020 4650 630.85
## - Models:z:alpha 16 6666 9296 697.98
## - Models:alpha:j 4 115727 118356 1027.28
summary(sreg1)
## Df Sum Sq Mean Sq F value Pr(>F)
## Models 3 960784 320261 1948.876 < 2e-16 ***
## z 4 48289 12072 73.463 4.34e-10 ***
## alpha 2 1692051 846025 5148.290 < 2e-16 ***
## j 2 144088 72044 438.408 1.06e-14 ***
## Models:z 12 6189 516 3.138 0.01754 *
## Models:alpha 6 1571697 261949 1594.032 < 2e-16 ***
## Models:j 2 238368 119184 725.267 < 2e-16 ***
## z:alpha 8 5734 717 4.362 0.00592 **
## z:j 8 1418 177 1.079 0.42454
## alpha:j 4 43399 10850 66.023 9.69e-10 ***
## Models:z:alpha 24 8594 358 2.179 0.05546 .
## Models:z:j 8 2020 253 1.537 0.22077
## Models:alpha:j 4 115727 28932 176.057 5.23e-13 ***
## z:alpha:j 16 2094 131 0.797 0.67268
## Residuals 16 2629 164
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Stepwise regression based on AIC.
sreg2 = step(p,direction="backward")
## Start: AIC=-1902.59
## Probability ~ (Models + z + alpha + j)^3
##
## Df Sum of Sq RSS AIC
## <none> 2.7587e-06 -1902.6
## - Models:alpha:j 4 6.6820e-07 3.4269e-06 -1884.6
## - Models:z:j 8 2.7247e-06 5.4834e-06 -1836.2
## - z:alpha:j 16 3.6561e-06 6.4148e-06 -1833.3
## - Models:z:alpha 16 1.5853e-05 1.8612e-05 -1705.5
summary(sreg2)
## Df Sum Sq Mean Sq F value Pr(>F)
## Models 3 6.462e-06 2.154e-06 12.493 0.000184 ***
## z 4 2.048e-05 5.120e-06 29.696 3.17e-07 ***
## alpha 2 8.217e-06 4.108e-06 23.828 1.59e-05 ***
## j 2 1.125e-06 5.620e-07 3.262 0.064822 .
## Models:z 12 1.937e-05 1.614e-06 9.364 4.18e-05 ***
## Models:alpha 6 6.573e-06 1.096e-06 6.354 0.001429 **
## Models:j 2 9.480e-07 4.740e-07 2.749 0.094152 .
## z:alpha 8 2.382e-05 2.978e-06 17.271 1.66e-06 ***
## z:j 8 3.165e-06 3.960e-07 2.295 0.074888 .
## alpha:j 4 1.112e-06 2.780e-07 1.613 0.219450
## Models:z:alpha 24 2.798e-05 1.166e-06 6.762 0.000128 ***
## Models:z:j 8 2.725e-06 3.410e-07 1.975 0.117326
## Models:alpha:j 4 6.680e-07 1.670e-07 0.969 0.451489
## z:alpha:j 16 3.656e-06 2.290e-07 1.325 0.289899
## Residuals 16 2.759e-06 1.720e-07
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
For the model with number of items as the response variable, the final model reduces to 10 significant terms while the no-choice probability model reduces to 7 variables only. The adjusted r-squared value is high for the first model and moderate for the second.
## Remove non-significant terms from the stepwise model.
remod1 = aov(formula = Number ~ Models + z + alpha + j +
Models:z + Models:alpha + Models:j +
z:alpha + alpha:j + Models:alpha:j, data = df)
summary(remod1)
## Df Sum Sq Mean Sq F value Pr(>F)
## Models 3 960784 320261 1376.127 < 2e-16 ***
## z 4 48289 12072 51.873 < 2e-16 ***
## alpha 2 1692051 846025 3635.275 < 2e-16 ***
## j 2 144088 72044 309.566 < 2e-16 ***
## Models:z 12 6189 516 2.216 0.01948 *
## Models:alpha 6 1571697 261949 1125.567 < 2e-16 ***
## Models:j 2 238368 119184 512.121 < 2e-16 ***
## z:alpha 8 5734 717 3.080 0.00484 **
## alpha:j 4 43399 10850 46.620 < 2e-16 ***
## Models:alpha:j 4 115727 28932 124.316 < 2e-16 ***
## Residuals 72 16756 233
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##Remove non-significant terms from the second model
remod2 = aov(formula = Probability ~ Models + z + alpha +
Models:z + Models:alpha +
z:alpha + Models:z:alpha, data = df)
summary(remod2)
## Df Sum Sq Mean Sq F value Pr(>F)
## Models 3 6.462e-06 2.154e-06 7.999 0.000144 ***
## z 4 2.048e-05 5.120e-06 19.012 3.83e-10 ***
## alpha 2 8.217e-06 4.108e-06 15.256 4.40e-06 ***
## Models:z 12 1.937e-05 1.614e-06 5.995 9.73e-07 ***
## Models:alpha 6 6.573e-06 1.096e-06 4.068 0.001735 **
## z:alpha 8 2.382e-05 2.978e-06 11.057 1.94e-09 ***
## Models:z:alpha 24 2.798e-05 1.166e-06 4.329 2.12e-06 ***
## Residuals 60 1.616e-05 2.690e-07
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Print adjusted R squared for both models
summary.lm(remod1)$adj.r.squared
## [1] 0.9942816
summary.lm(remod2)$adj.r.squared
## [1] 0.7517056
## Plot residuals versus predicted response.
#par(bg=rgb(1,1,0.8))
plot(predict(remod1),remod1$residuals,ylab="Residual",
xlab="Predicted Number of items")
abline(h=0)
par(mfrow=c(1,1))
## Generate four plots.
par(mfrow=c(2,2))
qqnorm(remod1$residuals)
qqline(remod1$residuals, col = 2)
abline(h=0)
boxplot(remod1$residuals, horizontal=TRUE, main="Box Plot", xlab="Residual")
hist(remod1$residuals, main="Histogram", xlab="Residual")
plot(Number, remod1$residuals, xlab="Actual Number of items", ylab="Residual",
main="Number of items vs. residual")
par(mfrow=c(1,1))
The boxplot indicates that there may be outliers but the normal Q-Q plot shows a very slight deviation from normality. The histogram also depicts a slightly skewed normal distribution which shows data is not strictly normally distributed. The variability in the residuals is not high with a few exceptions.
## Plot residuals versus predicted response.
#par(bg=rgb(1,1,0.8))
plot(predict(remod2),remod2$residuals,ylab="Residual",
xlab="Predicted No Choice Probability")
abline(h=0)
par(mfrow=c(1,1))
## Generate four plots.
par(mfrow=c(2,2))
qqnorm(remod2$residuals)
qqline(remod2$residuals, col = 2)
abline(h=0)
boxplot(remod2$residuals, horizontal=TRUE, main="Box Plot", xlab="Residual")
hist(remod2$residuals, main="Histogram", xlab="Residual")
plot(Number, remod2$residuals, xlab="Actual No Choice Probability", ylab="Residual",
main="No Choice Probability vs. residual")
par(mfrow=c(1,1))
As is evident from the plots, there is little or no variability in the residuals. Histogram shows a marked deviation from normality due to lack of variance. These observations are largely a result of extremely low no-chocie probability values for the optimal assortment because the objective is to minimize no-choice.
Both the models analyzed, show a significant main-effect of the 4 factors on both response variables. For the model with ‘Number of items to recommend’ as the response the two-way interaction of Models and alpha is very significant. This points to the fact that the variability in the number of items to recommend can be attributes to the interaction between the magnitude of cardinality effects (alpha) and the model being used. The number of items presented per page (j) also interacts with these two factors and significantly impacts the response (3-way interaction). However, the variability in the magnitude of lowest no-choice probability (second model) is affected more by the interation between factors-Models, threshold(z) and cardinality-effects’ magnitude(alpha) in addition to the main-effects of all factors.