main conclusion :
- Country and the person who’s rating matters most when rating attraction – makes sense.
df <- read.csv("lighting.csv")
str(df); head(df)
## 'data.frame': 1200 obs. of 5 variables:
## $ observation : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Country : chr "United States of America" "United States of America" "United States of America" "United States of America" ...
## $ light : chr "light1" "light2" "light3" "light4" ...
## $ Rater : int 1 1 1 1 1 1 2 2 2 2 ...
## $ average_rating: num 7 7.7 7.8 7.8 7.7 7.5 4.7 4.4 5 4.9 ...
## observation Country light Rater average_rating
## 1 1 United States of America light1 1 7.0
## 2 2 United States of America light2 1 7.7
## 3 3 United States of America light3 1 7.8
## 4 4 United States of America light4 1 7.8
## 5 5 United States of America light5 1 7.7
## 6 6 United States of America light6 1 7.5
“Can I run logistic regression with random factors. Please give me the R codes.”
Take a brief look at the key variables
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
country_avr <- df |> group_by(Country) |> summarize(avr_avr_country = mean(average_rating))
country_avr
## # A tibble: 2 × 2
## Country avr_avr_country
## <chr> <dbl>
## 1 India 6.05
## 2 United States of America 5.08
rater_avr <- df |> group_by(Rater) |> summarize(avr_avr_rater = mean(average_rating))
rater_avr
## # A tibble: 200 × 2
## Rater avr_avr_rater
## <int> <dbl>
## 1 1 7.58
## 2 2 4.82
## 3 3 2.6
## 4 4 5.58
## 5 5 5.28
## 6 6 5.92
## 7 7 6.17
## 8 8 4.98
## 9 9 7.1
## 10 10 5.83
## # ℹ 190 more rows
unique(df$light)
## [1] "light1" "light2" "light3" "light4" "light5" "light6"
df <-
df |>
mutate(light_labels = case_when(
light == "light1" ~ "90° overhead box light",
light == "light2" ~ "ring light",
light == "light3" ~ "45° superior box light",
light == "light4" ~ "built-in camera flash",
light == "light5" ~ "2 straight-on 45° box lights",
light == "light6" ~ "natural light"
)) |>
select(light_labels, everything())
- Volunteer survey respondents were instructed to rate the subject’s attractiveness on a scale of 0 to 10
Treating country and light as fixed factors, conduct multiple linear regression “lm”; using average rating as outcome with Country and light as fixed predictors.
m1<-lm(average_rating~Country+light, data=df)
summary(m1)
##
## Call:
## lm(formula = average_rating ~ Country + light, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.5161 -0.6747 0.0086 0.6939 3.0333
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.86668 0.09482 61.874 < 2e-16 ***
## CountryUnited States of America -0.96508 0.07226 -13.356 < 2e-16 ***
## lightlight2 0.21450 0.11177 1.919 0.05520 .
## lightlight3 0.31050 0.11177 2.778 0.00555 **
## lightlight4 0.20800 0.11177 1.861 0.06299 .
## lightlight5 0.20450 0.11177 1.830 0.06754 .
## lightlight6 0.15150 0.11177 1.356 0.17551
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.118 on 1193 degrees of freedom
## Multiple R-squared: 0.1354, Adjusted R-squared: 0.1311
## F-statistic: 31.14 on 6 and 1193 DF, p-value: < 2.2e-16
Details :
Significant lights + Intercept :
(Intercept) 5.86668 0.09482 61.874 < 2e-16 ***
CountryUnited States of America -0.96508 0.07226 -13.356 < 2e-16 ***
lightlight3 0.31050 0.11177 2.778 0.00555 **
Maybe Significant :
lightlight2 0.21450 0.11177 1.919 0.05520 .
lightlight4 0.20800 0.11177 1.861 0.06299 .
lightlight5 0.20450 0.11177 1.830 0.06754 .
Overall Model :
F-statistic: 31.14 on 6 and 1193 DF, p-value: < 2.2e-16Interpretation :
\[\textbf{Hypotheses (for coefficient $\beta_j$):}\] \[ H_0: \beta_j = 0 \] \[ H_a: \beta_j \neq 0 \]
\[\textbf{Test statistic:}\] \[ t = \frac{\hat{\beta}_j}{\text{SE}(\hat{\beta}_j)} \]
- Country is statistically significant
- Light 3 is Significant
- $$
F
$$
The F-test in regression compares explained and unexplained variation: \[ F = \frac{\text{SSR}/p}{\text{SSE}/(n - p - 1)} \]
\[
\textbf{Hypotheses:}
\] \[
H_0: \beta_1 = \beta_2 = \cdots = \beta_p = 0
\] \[
H_a: \text{at least one } \beta_j \neq 0
\]
A large \(F\) indicates that the
explained variation is large relative to the residual variation,
suggesting the model has explanatory power.
\[ R^2 \]
our
Multiple R-squared: 0.1354, Adjusted R-squared: 0.1311Interpretation :
png("boxplot_light.png", width = 800, height = 600)
boxplot(
average_rating ~ light_labels,
data = df,
xaxt = "n"
)
labels <- levels(factor(df$light_labels))
text(
seq_along(labels),
par("usr")[3] - 0.2,
labels = labels,
srt = 45,
adj = 1,
xpd = TRUE,
cex = 0.7
)
Earlier we said these were sign.
lightlight2 0.21450 0.11177 1.919 0.05520 .
lightlight4 0.20800 0.11177 1.861 0.06299 .
lightlight5 0.20450 0.11177 1.830 0.06754 .
and,
df |> distinct(light_labels, light)
## light_labels light
## 1 90° overhead box light light1
## 2 ring light light2
## 3 45° superior box light light3
## 4 built-in camera flash light4
## 5 2 straight-on 45° box lights light5
## 6 natural light light6
so :
| ring light | light2 |
| built-in camera flash | light4 |
| 2 straight-on 45° box lights | light5 |
png("boxplot_country.png", width = 800, height = 600)
boxplot(df$average_rating~df$Country, horizontal = T)
\[ \text{MSE} = \frac{1}{n - p - 1} \sum_{i=1}^n (y_i - \hat{y}_i)^2 \\ = \frac{\text{SSE}}{n - p - 1} \]
Model error for every degree of freedom from our error
RMSE : square root of it : Average error
Calculation :
mse <- mean(residuals(m1)^2)
mse
## [1] 1.241887
sqrt(mse)
## [1] 1.1144
plot(m1, which = 1)
plot(m1, which = 2)
error looks almost normally distributed – more like t but close
average error of 0 – really good
library(car)
library(effects)
## Warning: package 'effects' was built under R version 4.4.3
library(lattice)
plot(allEffects(m1),ask=FALSE)
\[ \textbf{Model:} \quad y_i = \beta_0 + \beta_{\text{light}_j} + \beta_{\text{Country}_k} + \varepsilon_i \]
\[ \textbf{Predicted mean (effect):} \quad \hat{\mu}_j = \mathbf{x}_j^\top \hat{\boldsymbol{\beta}} \]
\[ \textbf{Variance:} \quad \widehat{\mathrm{Var}}(\hat{\mu}_j) = \mathbf{x}_j^\top \widehat{\mathrm{Var}}(\hat{\boldsymbol{\beta}})\,\mathbf{x}_j \]
\[ \textbf{Standard Error:} \quad \mathrm{SE}(\hat{\mu}_j) = \sqrt{\mathbf{x}_j^\top \widehat{\mathrm{Var}}(\hat{\boldsymbol{\beta}})\,\mathbf{x}_j} \]
\[ \textbf{Interpretation:} \quad \text{CI}_j \approx \text{plausible values for } \mathbb{E}[Y \mid \text{light}=j, \text{others fixed}] \]
Where j is light level – we are seeing for each light level – using light1 as the baseline – we see what out model would predict given we are
| ring light | light2 |
| built-in camera flash | light4 |
| 2 straight-on 45° box lights | light5 |
Treating country and light as fixed factors and rater as random effect
Predicting average_rating with Country and light as
fixed predictors and “Rater” as random factor
library("lme4")
df$Country <- factor(df$Country)
df$light <- factor(df$light)
df$Rater <- factor(df$Rater)
m2 <- lmer(
average_rating ~ Country + light + (1 | Rater),
data = df
)
summary(m2)
## Linear mixed model fit by REML ['lmerMod']
## Formula: average_rating ~ Country + light + (1 | Rater)
## Data: df
##
## REML criterion at convergence: 1305.9
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -5.2105 -0.4611 0.0155 0.4517 3.8312
##
## Random effects:
## Groups Name Variance Std.Dev.
## Rater (Intercept) 1.1735 1.0833
## Residual 0.0806 0.2839
## Number of obs: 1200, groups: Rater, 200
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 5.86668 0.14804 39.629
## CountryUnited States of America -0.96508 0.17253 -5.594
## lightlight2 0.21450 0.02839 7.555
## lightlight3 0.31050 0.02839 10.937
## lightlight4 0.20800 0.02839 7.326
## lightlight5 0.20450 0.02839 7.203
## lightlight6 0.15150 0.02839 5.336
##
## Correlation of Fixed Effects:
## (Intr) CnUSoA lghtl2 lghtl3 lghtl4 lghtl5
## CntryUntSoA -0.845
## lightlight2 -0.096 0.000
## lightlight3 -0.096 0.000 0.500
## lightlight4 -0.096 0.000 0.500 0.500
## lightlight5 -0.096 0.000 0.500 0.500 0.500
## lightlight6 -0.096 0.000 0.500 0.500 0.500 0.500
Fixed effects:
Estimate Std. Error t value
(Intercept) 5.86668 0.14804 39.629
CountryUnited States of America -0.96508 0.17253 -5.594
lightlight2 0.21450 0.02839 7.555
lightlight3 0.31050 0.02839 10.937
lightlight4 0.20800 0.02839 7.326
lightlight5 0.20450 0.02839 7.203
lightlight6 0.15150 0.02839 5.336
Our intercept is extremely significant – ie our Baseline rating (India + light1) is significant
Country is significant
US raters usually give about 1 pt lower than india – holding others constant
All lights increase rating
Random effects:
Groups Name Variance Std.Dev.
Rater (Intercept) 1.1735 1.0833
Residual 0.0806 0.2839
Number of obs: 1200, groups: Rater, 200
Which makes sense based on our previous graphic of the effects :
library(car)
library(effects)
library(lattice)
plot(allEffects(m1),ask=FALSE)
vc <- as.data.frame(VarCorr(m2))
rater_var <- vc$vcov[vc$grp == "Rater"]
resid_var <- vc$vcov[vc$grp == "Residual"]
ICC <- rater_var / (rater_var + resid_var)
ICC
## [1] 0.9357286
ranef(m2)$Rater
## (Intercept)
## 1 2.4719320423
## 2 -0.2634212244
## 3 -2.4549994442
## 4 0.4945682350
## 5 -0.7561918560
## 6 0.8241288696
## 7 1.0712993455
## 8 -0.0986409072
## 9 1.9940691222
## 10 0.7417387109
## 11 -2.9493403960
## 12 0.5110462667
## 13 0.9065190282
## 14 1.4832501387
## 15 -0.4776356369
## 16 1.2196016310
## 17 0.0002272832
## 18 1.3514258848
## 19 -1.2685811598
## 20 -0.6588939859
## 21 -0.5765038273
## 22 -0.0492068120
## 23 0.5604803619
## 24 0.0496613784
## 25 -0.7726698877
## 26 0.2473977591
## 27 1.5491622656
## 28 -0.4925433484
## 29 0.5110462667
## 30 -0.5765038273
## 31 -0.5435477638
## 32 0.2968318543
## 33 0.9065190282
## 34 0.1979636639
## 35 1.2211719513
## 36 0.3972703649
## 37 1.1372114724
## 38 0.2819241428
## 39 -0.5105917004
## 40 0.7433090312
## 41 0.2968318543
## 42 0.1155735053
## 43 0.4796605236
## 44 -0.0146804283
## 45 -0.5419774435
## 46 0.8900409965
## 47 1.8787229001
## 48 0.1995339842
## 49 0.5110462667
## 50 2.2592879501
## 51 -0.9884546204
## 52 -1.4498395088
## 53 -0.4266312214
## 54 1.5821183290
## 55 -1.2670108395
## 56 -1.0378887156
## 57 -0.6588939859
## 58 0.4616121716
## 59 0.3643143015
## 60 -1.6146198261
## 61 -1.9112243972
## 62 -0.3293333514
## 63 -2.4204730605
## 64 1.7468986463
## 65 1.2525576945
## 66 -1.6310978578
## 67 -1.0049326522
## 68 -0.2453728724
## 69 -1.5487076992
## 70 -0.2798992562
## 71 -0.6738016974
## 72 -1.0708447791
## 73 0.3133098860
## 74 1.7798547098
## 75 -0.0492068120
## 76 -1.4168834454
## 77 -2.2243070000
## 78 0.4961385553
## 79 0.3478362697
## 80 1.5821183290
## 81 0.2654461111
## 82 1.6496007762
## 83 0.4121780764
## 84 1.0218652503
## 85 0.4121780764
## 86 -2.1254388096
## 87 1.9462053473
## 88 -1.1038008425
## 89 -0.1315969706
## 90 0.8241288696
## 91 0.6758265840
## 92 -0.2798992562
## 93 1.5507325858
## 94 2.4554540106
## 95 0.0841877621
## 96 -1.2834888713
## 97 0.8406069013
## 98 -0.5254994118
## 99 -0.4611576052
## 100 1.7798547098
## 101 0.5275242985
## 102 -0.4446795734
## 103 -1.5487076992
## 104 1.3349478531
## 105 0.8406069013
## 106 -1.4977032837
## 107 -0.3458113831
## 108 0.3462659495
## 109 0.0167053149
## 110 -0.5105917004
## 111 -0.1810310658
## 112 -1.2356250964
## 113 2.6218046481
## 114 0.4451341398
## 115 0.7417387109
## 116 1.0383432820
## 117 -1.3674493502
## 118 -0.0821628754
## 119 -0.6902797291
## 120 0.4451341398
## 121 -0.7577621763
## 122 -0.6408456339
## 123 0.9740014754
## 124 -0.3787674465
## 125 1.1058257292
## 126 0.6758265840
## 127 1.6331227445
## 128 -1.0543667474
## 129 0.2803538226
## 130 -0.0162507485
## 131 -0.7742402080
## 132 1.3349478531
## 133 0.9080893485
## 134 -0.6588939859
## 135 -0.1810310658
## 136 -1.1846206809
## 137 -0.6588939859
## 138 -0.1975090975
## 139 -0.1810310658
## 140 -1.8453122703
## 141 0.2803538226
## 142 -0.2963772879
## 143 -0.1151189389
## 144 0.1155735053
## 145 0.7746947744
## 146 0.4451341398
## 147 0.3297879177
## 148 -0.6918500494
## 149 -0.3293333514
## 150 0.4121780764
## 151 2.3087220453
## 152 0.3462659495
## 153 0.6758265840
## 154 -0.4117235100
## 155 0.1485295687
## 156 0.4121780764
## 157 -0.3787674465
## 158 -0.3293333514
## 159 -0.6243676022
## 160 0.5126165870
## 161 -0.4117235100
## 162 -3.2789010306
## 163 -2.4863851874
## 164 0.1336218573
## 165 1.8128107732
## 166 -0.6408456339
## 167 -2.0430486510
## 168 -1.8288342385
## 169 0.0496613784
## 170 0.3148802063
## 171 -1.4812252520
## 172 -0.3936751580
## 173 0.1485295687
## 174 -0.9209721733
## 175 -0.5105917004
## 176 0.6428705206
## 177 -0.0162507485
## 178 -0.4117235100
## 179 0.2309197274
## 180 -0.7397138243
## 181 0.8570849330
## 182 0.0677097304
## 183 -0.2963772879
## 184 -0.5105917004
## 185 0.3148802063
## 186 -0.0986409072
## 187 -0.1315969706
## 188 -0.6078895705
## 189 -0.0970705869
## 190 0.0331833467
## 191 -0.1315969706
## 192 0.1336218573
## 193 -1.0378887156
## 194 2.0435032174
## 195 0.2984021746
## 196 0.1830559525
## 197 0.9229970599
## 198 -1.5651857309
## 199 0.9229970599
## 200 -2.3710389653
plot(m2)
m3 <- lmer(
average_rating ~ Country + (1 | Rater) + (1 | light),
data = df
)
summary(m3)
## Linear mixed model fit by REML ['lmerMod']
## Formula: average_rating ~ Country + (1 | Rater) + (1 | light)
## Data: df
##
## REML criterion at convergence: 1299.1
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -5.2349 -0.4558 0.0162 0.4546 3.8356
##
## Random effects:
## Groups Name Variance Std.Dev.
## Rater (Intercept) 1.17349 1.0833
## light (Intercept) 0.01016 0.1008
## Residual 0.08060 0.2839
## Number of obs: 1200, groups: Rater, 200; light, 6
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 6.0482 0.1526 39.646
## CountryUnited States of America -0.9651 0.1725 -5.594
##
## Correlation of Fixed Effects:
## (Intr)
## CntryUntSoA -0.820
Random effects:
Groups Name Variance Std.Dev.
Rater (Intercept) 1.17349 1.0833
light (Intercept) 0.01016 0.1008
Residual 0.08060 0.2839
Number of obs: 1200, groups: Rater, 200; light, 6
Fixed effects:
Estimate Std. Error t value
(Intercept) 6.0482 0.1526 39.646
CountryUnited States of America -0.9651 0.1725 -5.594
ranef(m3)$Rater
## (Intercept)
## 1 2.4719320005
## 2 -0.2634212200
## 3 -2.4549994027
## 4 0.4945682267
## 5 -0.7561918432
## 6 0.8241288556
## 7 1.0712993274
## 8 -0.0986409055
## 9 1.9940690885
## 10 0.7417386984
## 11 -2.9493403462
## 12 0.5110462581
## 13 0.9065190129
## 14 1.4832501136
## 15 -0.4776356288
## 16 1.2196016104
## 17 0.0002272832
## 18 1.3514258620
## 19 -1.2685811384
## 20 -0.6588939748
## 21 -0.5765038175
## 22 -0.0492068112
## 23 0.5604803525
## 24 0.0496613775
## 25 -0.7726698747
## 26 0.2473977549
## 27 1.5491622394
## 28 -0.4925433400
## 29 0.5110462581
## 30 -0.5765038175
## 31 -0.5435477546
## 32 0.2968318493
## 33 0.9065190129
## 34 0.1979636606
## 35 1.2211719307
## 36 0.3972703582
## 37 1.1372114532
## 38 0.2819241381
## 39 -0.5105916917
## 40 0.7433090186
## 41 0.2968318493
## 42 0.1155735033
## 43 0.4796605155
## 44 -0.0146804280
## 45 -0.5419774344
## 46 0.8900409814
## 47 1.8787228684
## 48 0.1995339808
## 49 0.5110462581
## 50 2.2592879119
## 51 -0.9884546037
## 52 -1.4498394843
## 53 -0.4266312142
## 54 1.5821183023
## 55 -1.2670108181
## 56 -1.0378886981
## 57 -0.6588939748
## 58 0.4616121638
## 59 0.3643142953
## 60 -1.6146197988
## 61 -1.9112243649
## 62 -0.3293333458
## 63 -2.4204730195
## 64 1.7468986168
## 65 1.2525576733
## 66 -1.6310978302
## 67 -1.0049326352
## 68 -0.2453728683
## 69 -1.5487076730
## 70 -0.2798992514
## 71 -0.6738016860
## 72 -1.0708447610
## 73 0.3133098807
## 74 1.7798546797
## 75 -0.0492068112
## 76 -1.4168834214
## 77 -2.2243069624
## 78 0.4961385469
## 79 0.3478362639
## 80 1.5821183023
## 81 0.2654461066
## 82 1.6496007483
## 83 0.4121780694
## 84 1.0218652330
## 85 0.4121780694
## 86 -2.1254387737
## 87 1.9462053144
## 88 -1.1038008239
## 89 -0.1315969684
## 90 0.8241288556
## 91 0.6758265726
## 92 -0.2798992514
## 93 1.5507325596
## 94 2.4554539691
## 95 0.0841877607
## 96 -1.2834888496
## 97 0.8406068871
## 98 -0.5254994029
## 99 -0.4611575974
## 100 1.7798546797
## 101 0.5275242896
## 102 -0.4446795659
## 103 -1.5487076730
## 104 1.3349478305
## 105 0.8406068871
## 106 -1.4977032584
## 107 -0.3458113772
## 108 0.3462659436
## 109 0.0167053146
## 110 -0.5105916917
## 111 -0.1810310627
## 112 -1.2356250755
## 113 2.6218046038
## 114 0.4451341323
## 115 0.7417386984
## 116 1.0383432645
## 117 -1.3674493271
## 118 -0.0821628741
## 119 -0.6902797174
## 120 0.4451341323
## 121 -0.7577621635
## 122 -0.6408456231
## 123 0.9740014589
## 124 -0.3787674401
## 125 1.1058257105
## 126 0.6758265726
## 127 1.6331227169
## 128 -1.0543667295
## 129 0.2803538178
## 130 -0.0162507483
## 131 -0.7742401949
## 132 1.3349478305
## 133 0.9080893331
## 134 -0.6588939748
## 135 -0.1810310627
## 136 -1.1846206609
## 137 -0.6588939748
## 138 -0.1975090942
## 139 -0.1810310627
## 140 -1.8453122391
## 141 0.2803538178
## 142 -0.2963772829
## 143 -0.1151189370
## 144 0.1155735033
## 145 0.7746947613
## 146 0.4451341323
## 147 0.3297879122
## 148 -0.6918500377
## 149 -0.3293333458
## 150 0.4121780694
## 151 2.3087220063
## 152 0.3462659436
## 153 0.6758265726
## 154 -0.4117235030
## 155 0.1485295662
## 156 0.4121780694
## 157 -0.3787674401
## 158 -0.3293333458
## 159 -0.6243675916
## 160 0.5126165783
## 161 -0.4117235030
## 162 -3.2789009751
## 163 -2.4863851453
## 164 0.1336218550
## 165 1.8128107426
## 166 -0.6408456231
## 167 -2.0430486165
## 168 -1.8288342076
## 169 0.0496613775
## 170 0.3148802010
## 171 -1.4812252270
## 172 -0.3936751513
## 173 0.1485295662
## 174 -0.9209721577
## 175 -0.5105916917
## 176 0.6428705097
## 177 -0.0162507483
## 178 -0.4117235030
## 179 0.2309197235
## 180 -0.7397138118
## 181 0.8570849185
## 182 0.0677097292
## 183 -0.2963772829
## 184 -0.5105916917
## 185 0.3148802010
## 186 -0.0986409055
## 187 -0.1315969684
## 188 -0.6078895602
## 189 -0.0970705853
## 190 0.0331833461
## 191 -0.1315969684
## 192 0.1336218550
## 193 -1.0378886981
## 194 2.0435031829
## 195 0.2984021695
## 196 0.1830559494
## 197 0.9229970443
## 198 -1.5651857045
## 199 0.9229970443
## 200 -2.3710389252
ranef(m3)$light
## (Intercept)
## light1 -0.17457368
## light2 0.03174067
## light3 0.12407716
## light4 0.02548872
## light5 0.02212229
## light6 -0.02885515