The Data

E-Reader Data has been extracted from a research paper that studies the effect of device type and its lighting condition on the speed of reading (reading time).

Codebook:-

Device type: three different types.

Sony PRS-700 with a 6-in. display, 800 􏰡 600 resolution;
Amazon Kindle DX with a 9.7-in. display, 1200 􏰡 824 resolution; and
iRex 1000S with a 10.2-in. display, 1024 􏰡 1280 resolution.
Lighting Condition: four different conditions (200Lx,500Lx, 1000Lx, 1500Lx), Lx = lux, one lumen per square meter
Reading Time: measured in seconds.

In this module we will answer these questions using ANOVA:

Does reading speed differ significantly between devices type 1,2,3 ?
Does reading speed differ significantly between lighting conditions (200Lx,500Lx, 1000Lx, 1500Lx) ?
Do device type and lighting conditions interact?

Install the packages

library(tidyverse)

## -- Attaching packages --------------------------------------- tidyverse 1.3.2 --
## v ggplot2 3.4.0     v purrr   0.3.4
## v tibble  3.1.6     v dplyr   1.0.8
## v tidyr   1.2.0     v stringr 1.4.0
## v readr   2.1.2     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(ggplot2) #a package for nice plots!
library(dplyr)

library(janitor)

## 
## Attaching package: 'janitor'

## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

library(magrittr)

## 
## Attaching package: 'magrittr'

## The following object is masked from 'package:purrr':
## 
##     set_names

## The following object is masked from 'package:tidyr':
## 
##     extract

library(hrbrthemes)

## NOTE: Either Arial Narrow or Roboto Condensed fonts are required to use these themes.

##       Please use hrbrthemes::import_roboto_condensed() to install Roboto Condensed and

##       if Arial Narrow is not on your system, please see https://bit.ly/arialnarrow

library(viridis)

## Loading required package: viridisLite

Reading the Data

reader = read.delim("ereader.txt" ,  sep = "\t")
colnames(reader) <- c("device", "light", "time")
reader$device = as.factor(reader$device)
reader$light = as.factor(reader$light)
reader$light = recode(reader$light, "1" = "200Lx", "2" = "500Lx", "3" = "1000Lx", "4" = "1500Lx") 

head(reader)

##   device light    time
## 1      1 200Lx 1405.92
## 2      1 200Lx 1797.21
## 3      1 200Lx 1155.96
## 4      1 200Lx 1295.44
## 5      1 500Lx 1022.32
## 6      1 500Lx 1538.07

summarizing the average reading time for each device type and its lighting conditions

reader %>%
  group_by(device) %>%
  summarize(mean_reading = mean(time))

## # A tibble: 3 x 2
##   device mean_reading
##   <fct>         <dbl>
## 1 1             1232.
## 2 2             1032.
## 3 3             1014.

reader %>%
  group_by(light) %>%
  summarize(mean_reading = mean(time))

## # A tibble: 4 x 2
##   light  mean_reading
##   <fct>         <dbl>
## 1 200Lx         1282.
## 2 500Lx         1195.
## 3 1000Lx         971.
## 4 1500Lx         926.

ploting the continuous variable reading time to visualize its distribution

reader %>% ggplot( aes(x=time)) +
    geom_histogram(binwidth=70, fill="#69b3a2", color="#e9ecef")+
     ggtitle(" Reading Time") +
    theme_ipsum() +
    theme(
      plot.title = element_text(size=15))

## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database

## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database

## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

Visualizing the difference of reading time between devices types 1 2 3

reader %>%
  ggplot( aes(x=device, y=time, fill=device)) +
    geom_boxplot() +
    scale_fill_viridis(discrete = TRUE, alpha=0.6) +
    theme_bw() +
    theme(
      legend.position="none",
      plot.title = element_text(size=11)
    ) +
    ggtitle("Time of Reading among Devices Type ") +
    xlab("")

Visualizing the difference of reading time between lighting conditions

reader %>%
  ggplot( aes(x=light, y=time, fill=light)) +
    geom_boxplot() +
    scale_fill_viridis(discrete = TRUE, alpha=0.6) +
    theme_bw() +
    theme(
      legend.position="none",
      plot.title = element_text(size=11)
    ) +
    ggtitle("Time of Reading among the devices lights ") +
    xlab("")

Interaction plots

In this section we will answer the 3rd question Do device type and lighting conditions interact? by fitting a model with an interaction, and one without, and conduct an F-test:

reader %>% 
  ggplot() +
  aes(x = light, color = device, group = device, y = time) +
  stat_summary(fun.y = mean, geom = "point") +
  stat_summary(fun.y = mean, geom = "line") + 
  scale_color_manual(values=c("black", "#6C00FF","#1C315E")) +
  theme_bw()

## Warning: The `fun.y` argument of `stat_summary()` is deprecated as of ggplot2 3.3.0.
## i Please use the `fun` argument instead.

In this plot we can interpolate that there are interactions between device types and its lighting condition. To illustrate this, at lighting 200Lx, reading time is longer for device two than device three, but at lighting 500Lx, reading time is longer for device three than two.

reader %>% 
  ggplot() +
  aes(x = device, color = light, group = light, y = time) +
  stat_summary(fun.y = mean, geom = "point") +
  stat_summary(fun.y = mean, geom = "line") + 
  scale_color_manual(values=c("black", "#227C70", "#6C00FF","#1C315E")) +
  theme_bw()

No interaction visible in this plot. So we will do test analysis for an interaction to decide to include them in our ANOVA model or not.

Analysis

Do device type and lighting conditions interact?

To answer this question we will fit a model with an interaction, and one without, and conduct an F-test:

twoway_interact = lm(time ~ device + light + device:light, data = reader)
twoway = lm(time ~ device + light, data = reader)
anova(twoway, twoway_interact)

## Analysis of Variance Table
## 
## Model 1: time ~ device + light
## Model 2: time ~ device + light + device:light
##   Res.Df     RSS Df Sum of Sq      F Pr(>F)
## 1     53 3628970                           
## 2     47 3603108  6     25862 0.0562 0.9992

The p-value is 0.9 , is larger than alpha , and thus, we reject the null hypothesis that there is an interaction between device type and lighting variables. Thus, we can proceed with excluding interaction term.”

twoway = lm(time ~ device + light, data = reader)
summary(twoway)

## 
## Call:
## lm(formula = time ~ device + light, data = reader)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -500.0 -194.6  -24.8  204.9  460.5 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1438.25      87.22  16.489  < 2e-16 ***
## device2      -209.73      83.89  -2.500 0.015547 *  
## device3      -227.93      83.89  -2.717 0.008879 ** 
## light500Lx    -97.46      97.30  -1.002 0.321052    
## light1000Lx  -321.66      97.30  -3.306 0.001704 ** 
## light1500Lx  -366.16      97.30  -3.763 0.000421 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 261.7 on 53 degrees of freedom
## Multiple R-squared:  0.3455, Adjusted R-squared:  0.2838 
## F-statistic: 5.596 on 5 and 53 DF,  p-value: 0.0003268

Post hoc comparisons for device type using a Bonferroni correction

We will answer this question Does reading speed differ significantly between devices type 1,2,3 ? using Bonferroni correction in pairwise post hoc comparisons test which is used to reduce type 1 error.

library(lsmeans)

## Loading required package: emmeans

## The 'lsmeans' package is now basically a front end for 'emmeans'.
## Users are encouraged to switch the rest of the way.
## See help('transition') for more information, including how to
## convert old 'lsmeans' objects and scripts to work with 'emmeans'.

pairs(lsmeans(twoway, "device"), adjust = "bonferroni")

##  contrast          estimate   SE df t.ratio p.value
##  device1 - device2    209.7 83.9 53   2.500  0.0466
##  device1 - device3    227.9 83.9 53   2.717  0.0266
##  device2 - device3     18.2 82.7 53   0.220  1.0000
## 
## Results are averaged over the levels of: light 
## P value adjustment: bonferroni method for 3 tests

Recall that the device type variable included three different types:

Sony PRS-700 with a 6-in. display, 800 􏰡 600 resolution;
Amazon Kindle DX with a 9.7-in. display, 1200 􏰡 824 resolution; and
iRex 1000S with a 10.2-in. display, 1024 􏰡 1280 resolution.

Model 1 Interpretation / Does reading time differ significantly between devices type 1,2,3 ?

The difference with respect to mean reading time between device 2 (Amazon Kindle) and device 3 (iRex) is statistically significant ( p≈0.03 ). So, we can conclude that, on average, holding lighting conditions constant, one can read for approximately 228 seconds longer with the Sony device than iRex.
The difference with respect to mean reading time between device 1(Sony) and device 2 (Amazon Kindle) is statistically significant ( p≈0.047 ). So, we can conclude that, on average, holding lighting conditions constant, one can read for approximately 210 seconds longer with the Sony device than Amazon Kindle.
The difference with respect to mean reading time between device 1(Sony) and device 2 (Amazon Kindle) is not statistically significant.

Does reading time differ significantly between lighting conditions (200Lx,500Lx, 1000Lx, 1500Lx) ?

library(lsmeans)
pairs(lsmeans(twoway, "light"), adjust = "bonferroni")

##  contrast        estimate   SE df t.ratio p.value
##  200Lx - 500Lx       97.5 97.3 53   1.002  1.0000
##  200Lx - 1000Lx     321.7 97.3 53   3.306  0.0102
##  200Lx - 1500Lx     366.2 97.3 53   3.763  0.0025
##  500Lx - 1000Lx     224.2 95.5 53   2.346  0.1363
##  500Lx - 1500Lx     268.7 95.5 53   2.812  0.0413
##  1000Lx - 1500Lx     44.5 95.5 53   0.466  1.0000
## 
## Results are averaged over the levels of: device 
## P value adjustment: bonferroni method for 6 tests

Model 2 Interpretation

The difference with respect to mean reading time between lighting condition(200LX) and lighting condition (500LX) is not statistically significant.
The difference with respect to mean reading time between lighting condition(200LX) and lighting condition(1000LX) is statistically significant ( p≈0.01 ). So, we can conclude that, on average, holding device types constant, one can read for approximately 322 seconds longer with 200LX.
The difference with respect to mean reading time between lighting condition(200LX) and lighting condition(1500LX) is statistically significant ( p≈0.002 ). So, we can conclude that, on average, holding device types constant, one can read for approximately 366 seconds longer with 200LX.
The difference with respect to mean reading time between lighting condition(500LX) and lighting condition (1000LX) is not statistically significant.
The difference with respect to mean reading time between lighting condition(500LX) and lighting condition(1500LX) is statistically significant ( p≈0.04 ). So, we can conclude that, on average, holding device types constant, one can read for approximately 269 seconds longer with 500LX.
The difference with respect to mean reading time between lighting condition(1000LX) and lighting condition (1500LX) is not statistically significant.

Analysis Of Variance ‘ANOVA’

2023-01-03