E-Reader Data has been extracted from a research paper that studies the effect of device type and its lighting condition on the speed of reading (reading time).
Device type: three different types.
Sony PRS-700 with a 6-in. display, 800 600 resolution;
Amazon Kindle DX with a 9.7-in. display, 1200 824 resolution; and
iRex 1000S with a 10.2-in. display, 1024 1280 resolution.
Lighting Condition: four different conditions (200Lx,500Lx, 1000Lx, 1500Lx), Lx = lux, one lumen per square meter
Reading Time: measured in seconds.
In this module we will answer these questions using ANOVA:
Does reading speed differ significantly between devices type 1,2,3 ?
Does reading speed differ significantly between lighting conditions (200Lx,500Lx, 1000Lx, 1500Lx) ?
Do device type and lighting conditions interact?
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.2 --
## v ggplot2 3.4.0 v purrr 0.3.4
## v tibble 3.1.6 v dplyr 1.0.8
## v tidyr 1.2.0 v stringr 1.4.0
## v readr 2.1.2 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(ggplot2) #a package for nice plots!
library(dplyr)
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(magrittr)
##
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
##
## set_names
## The following object is masked from 'package:tidyr':
##
## extract
library(hrbrthemes)
## NOTE: Either Arial Narrow or Roboto Condensed fonts are required to use these themes.
## Please use hrbrthemes::import_roboto_condensed() to install Roboto Condensed and
## if Arial Narrow is not on your system, please see https://bit.ly/arialnarrow
library(viridis)
## Loading required package: viridisLite
reader = read.delim("ereader.txt" , sep = "\t")
colnames(reader) <- c("device", "light", "time")
reader$device = as.factor(reader$device)
reader$light = as.factor(reader$light)
reader$light = recode(reader$light, "1" = "200Lx", "2" = "500Lx", "3" = "1000Lx", "4" = "1500Lx")
head(reader)
## device light time
## 1 1 200Lx 1405.92
## 2 1 200Lx 1797.21
## 3 1 200Lx 1155.96
## 4 1 200Lx 1295.44
## 5 1 500Lx 1022.32
## 6 1 500Lx 1538.07
reader %>%
group_by(device) %>%
summarize(mean_reading = mean(time))
## # A tibble: 3 x 2
## device mean_reading
## <fct> <dbl>
## 1 1 1232.
## 2 2 1032.
## 3 3 1014.
reader %>%
group_by(light) %>%
summarize(mean_reading = mean(time))
## # A tibble: 4 x 2
## light mean_reading
## <fct> <dbl>
## 1 200Lx 1282.
## 2 500Lx 1195.
## 3 1000Lx 971.
## 4 1500Lx 926.
reader %>% ggplot( aes(x=time)) +
geom_histogram(binwidth=70, fill="#69b3a2", color="#e9ecef")+
ggtitle(" Reading Time") +
theme_ipsum() +
theme(
plot.title = element_text(size=15))
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
reader %>%
ggplot( aes(x=device, y=time, fill=device)) +
geom_boxplot() +
scale_fill_viridis(discrete = TRUE, alpha=0.6) +
theme_bw() +
theme(
legend.position="none",
plot.title = element_text(size=11)
) +
ggtitle("Time of Reading among Devices Type ") +
xlab("")
reader %>%
ggplot( aes(x=light, y=time, fill=light)) +
geom_boxplot() +
scale_fill_viridis(discrete = TRUE, alpha=0.6) +
theme_bw() +
theme(
legend.position="none",
plot.title = element_text(size=11)
) +
ggtitle("Time of Reading among the devices lights ") +
xlab("")
In this section we will answer the 3rd question Do device type and lighting conditions interact? by fitting a model with an interaction, and one without, and conduct an F-test:
reader %>%
ggplot() +
aes(x = light, color = device, group = device, y = time) +
stat_summary(fun.y = mean, geom = "point") +
stat_summary(fun.y = mean, geom = "line") +
scale_color_manual(values=c("black", "#6C00FF","#1C315E")) +
theme_bw()
## Warning: The `fun.y` argument of `stat_summary()` is deprecated as of ggplot2 3.3.0.
## i Please use the `fun` argument instead.
In this plot we can interpolate that there are interactions between
device types and its lighting condition. To illustrate this, at lighting
200Lx, reading time is longer for device two than device three, but at
lighting 500Lx, reading time is longer for device three than two.
reader %>%
ggplot() +
aes(x = device, color = light, group = light, y = time) +
stat_summary(fun.y = mean, geom = "point") +
stat_summary(fun.y = mean, geom = "line") +
scale_color_manual(values=c("black", "#227C70", "#6C00FF","#1C315E")) +
theme_bw()
No interaction visible in this plot. So we will do test analysis for an
interaction to decide to include them in our ANOVA model or not.
To answer this question we will fit a model with an interaction, and one without, and conduct an F-test:
twoway_interact = lm(time ~ device + light + device:light, data = reader)
twoway = lm(time ~ device + light, data = reader)
anova(twoway, twoway_interact)
## Analysis of Variance Table
##
## Model 1: time ~ device + light
## Model 2: time ~ device + light + device:light
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 53 3628970
## 2 47 3603108 6 25862 0.0562 0.9992
The p-value is 0.9 , is larger than alpha , and thus, we reject the null hypothesis that there is an interaction between device type and lighting variables. Thus, we can proceed with excluding interaction term.”
twoway = lm(time ~ device + light, data = reader)
summary(twoway)
##
## Call:
## lm(formula = time ~ device + light, data = reader)
##
## Residuals:
## Min 1Q Median 3Q Max
## -500.0 -194.6 -24.8 204.9 460.5
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1438.25 87.22 16.489 < 2e-16 ***
## device2 -209.73 83.89 -2.500 0.015547 *
## device3 -227.93 83.89 -2.717 0.008879 **
## light500Lx -97.46 97.30 -1.002 0.321052
## light1000Lx -321.66 97.30 -3.306 0.001704 **
## light1500Lx -366.16 97.30 -3.763 0.000421 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 261.7 on 53 degrees of freedom
## Multiple R-squared: 0.3455, Adjusted R-squared: 0.2838
## F-statistic: 5.596 on 5 and 53 DF, p-value: 0.0003268
We will answer this question Does reading speed differ significantly between devices type 1,2,3 ? using Bonferroni correction in pairwise post hoc comparisons test which is used to reduce type 1 error.
library(lsmeans)
## Loading required package: emmeans
## The 'lsmeans' package is now basically a front end for 'emmeans'.
## Users are encouraged to switch the rest of the way.
## See help('transition') for more information, including how to
## convert old 'lsmeans' objects and scripts to work with 'emmeans'.
pairs(lsmeans(twoway, "device"), adjust = "bonferroni")
## contrast estimate SE df t.ratio p.value
## device1 - device2 209.7 83.9 53 2.500 0.0466
## device1 - device3 227.9 83.9 53 2.717 0.0266
## device2 - device3 18.2 82.7 53 0.220 1.0000
##
## Results are averaged over the levels of: light
## P value adjustment: bonferroni method for 3 tests
Recall that the device type variable included three different types:
The difference with respect to mean reading time between device 2 (Amazon Kindle) and device 3 (iRex) is statistically significant ( p≈0.03 ). So, we can conclude that, on average, holding lighting conditions constant, one can read for approximately 228 seconds longer with the Sony device than iRex.
The difference with respect to mean reading time between device 1(Sony) and device 2 (Amazon Kindle) is statistically significant ( p≈0.047 ). So, we can conclude that, on average, holding lighting conditions constant, one can read for approximately 210 seconds longer with the Sony device than Amazon Kindle.
The difference with respect to mean reading time between device 1(Sony) and device 2 (Amazon Kindle) is not statistically significant.
library(lsmeans)
pairs(lsmeans(twoway, "light"), adjust = "bonferroni")
## contrast estimate SE df t.ratio p.value
## 200Lx - 500Lx 97.5 97.3 53 1.002 1.0000
## 200Lx - 1000Lx 321.7 97.3 53 3.306 0.0102
## 200Lx - 1500Lx 366.2 97.3 53 3.763 0.0025
## 500Lx - 1000Lx 224.2 95.5 53 2.346 0.1363
## 500Lx - 1500Lx 268.7 95.5 53 2.812 0.0413
## 1000Lx - 1500Lx 44.5 95.5 53 0.466 1.0000
##
## Results are averaged over the levels of: device
## P value adjustment: bonferroni method for 6 tests
The difference with respect to mean reading time between lighting condition(200LX) and lighting condition (500LX) is not statistically significant.
The difference with respect to mean reading time between lighting condition(200LX) and lighting condition(1000LX) is statistically significant ( p≈0.01 ). So, we can conclude that, on average, holding device types constant, one can read for approximately 322 seconds longer with 200LX.
The difference with respect to mean reading time between lighting condition(200LX) and lighting condition(1500LX) is statistically significant ( p≈0.002 ). So, we can conclude that, on average, holding device types constant, one can read for approximately 366 seconds longer with 200LX.
The difference with respect to mean reading time between lighting condition(500LX) and lighting condition (1000LX) is not statistically significant.
The difference with respect to mean reading time between lighting condition(500LX) and lighting condition(1500LX) is statistically significant ( p≈0.04 ). So, we can conclude that, on average, holding device types constant, one can read for approximately 269 seconds longer with 500LX.
The difference with respect to mean reading time between lighting condition(1000LX) and lighting condition (1500LX) is not statistically significant.