Report Template for Data Management

please download the pdf through the link Download inclass4.pdf

Abstract

The abstract should address the following five questions: (1) why is the problem interesting, (2) how was the data set obtained, (3) what have you done to it, (4) what have you found, and (5) what do the results mean?
All this and be less than 200 words!

Introduction

Writing papers should follow an hour-glass design, i.e., starting with a broad beginning and ending with a much narrower methods and results section. The beginning should capture the reader’s attention and the conclusion should leave them seeing the broad implications of your study.

The study introduced below investigates human memory by examining how fast we can respond to old or new written symbols (items) depending on their features. Specifically, the authors investigated the effects of word characteristics on episodic recognition memory using analyses that avoid language-as-a-fixed-effect fallacy.

Data

Freeman, Heathcote, Chalmers, and Hockley (2010) collected lexical decision and word naming latencies for 300 words and 300 nonwords. The study design consisted of one between-subjects factor, task with two levels (naming or lexdec), and four within-subjects factors: stimulus type with two levels (word or nonword), word density and word frequency each with two levels (low and high) and stimulus length with three levels (4, 5, and 6).

Methods

We demonstrate how to analyze the above-mentioned response time data set in R with the contributed package afex (analysis of factorial experiment) To simplfy presentation, we shall ignore two within-subject factors: density and frequency in the study design.

Set the print options.

options(digits = 4, show.signif.stars = FALSE)

Load the afex package for the data set called fhch2010. Examine the first 6 lines of the data object.

library(afex)
data("fhch2010")
knitr::kable(head(fhch2010), format = 'latex')

Load the tidyverse package to examine the struture of the data object.

library(tidyverse)
glimpse(fhch2010)

Rows: 13,222
Columns: 10
$ id        <fct> N1, N1, N1, N1, N1, N1, N1, N1, N1, N1, N1, N1, N1, N1, N1, ~
$ task      <fct> naming, naming, naming, naming, naming, naming, naming, nami~
$ stimulus  <fct> word, word, word, nonword, nonword, word, nonword, nonword, ~
$ density   <fct> high, low, low, high, low, high, low, low, low, low, high, l~
$ frequency <fct> low, high, high, high, high, high, low, high, low, high, low~
$ length    <fct> 6, 6, 5, 5, 4, 4, 6, 5, 4, 6, 5, 5, 5, 4, 4, 5, 6, 4, 6, 6, ~
$ item      <fct> potted, engine, ideal, uares, xazz, fill, bounge, psems, jol~
$ rt        <dbl> 1.091, 0.876, 0.710, 1.210, 0.843, 0.785, 0.662, 0.713, 0.75~
$ log_rt    <dbl> 0.08709, -0.13239, -0.34249, 0.19062, -0.17079, -0.24207, -0~
$ correct   <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, ~

Compute error rate

# error rate
mean(!fhch2010$correct)

[1] 0.01982

clean up the data by remove trials with incorrect responses.

# remove errors
dta <- droplevels(fhch2010[ fhch2010$correct,])

Any data analysis should start by visual exploration. Knowing the the data set asises from a 2 by 2 factorial design provides us with an idea of what to plot first.

library(ez)
library(jtools)
ezPlot(data = dta, 
       dv = .(log_rt), 
       wid = .(id), 
       between = .(task), 
       within = .(length, stimulus), 
       x = .(length), 
       split = .(stimulus), 
       col = .(task), 
       x_lab = 'Length', 
       y_lab = 'RT (log_ms)', 
       split_lab = 'Stimulus'
) + theme_apa() + theme(legend.position = c(.7, .3))

Perform a two-way (one-between, two-within) analysis of variance and display the resulting anova table.

m1 <- aov_ez(id = "id", 
             dv = "log_rt", 
             data = dta, 
             between = "task", 
             within = c("length", "stimulus"))
knitr::kable(nice(m1), format = 'latex')

Follow-up tests: We might be interested in the task x stimulus interaction. First, we compute the marginal means.

pacman::p_load(lsmeans)
m1m <- lsmeans(m1, c("stimulus", "task"))
knitr::kable(m1m, format = 'latex')

next, compute conditional marginal means.

m1c <- lsmeans(m1, c("stimulus"), by = "task")
knitr::kable(m1c, format = 'latex')

Are the stimulus effects in both tasks independently significant?

update(pairs(m1c), by = NULL, adjust = "holm")

 contrast       task   estimate     SE df t.ratio p.value
 word - nonword naming  -0.3142 0.0208 43 -15.107  <.0001
 word - nonword lexdec  -0.0531 0.0186 43  -2.854  0.0066

Results are averaged over the levels of: length 
P value adjustment: holm method for 2 tests

The stimulus effects in both tasks are independently significant. Is the difference between them also significant?

pairs(update(pairs(m1c), by = NULL))

 contrast                                          estimate     SE df t.ratio
 (word - nonword naming) - (word - nonword lexdec)   -0.261 0.0279 43  -9.358
 p.value
  <.0001

Results are averaged over the levels of: length

We can plot the estimated effects.

lsmip(m1, stimulus ~ length | task) + theme_bw()

Results

Participants were slower in naming task than their counterparts in lexical decision task. In addition, the difference between word and nonword trials is larger in the naming task than in the lexdec task.

Conclusions

afex provides a set of functions to perform standard factorial analysis of variance.
In its default settings, the afex ANOVA functions replicate the results of commercial statistical packages such as SPSS or SAS (using orthogonal contrasts and Type III sums of squares).
Fitted results can be passed to lsmeans for post-hoc comparions and plotting.

Discussion

You must realize that R is written by experts in statistics and statistical computing who, despite popular opinion, do not believe that everything in SAS and SPSS is worth copying. Some things done in such packages, which trace their roots back to the days of punched cards and magnetic tape when fitting a single linear model may take several days because your first 5 attempts failed due to syntax errors in the JCL or the SAS code, still reflect the approach of “give me every possible statistic that could be calculated from this model, whether or not it makes sense”. The approach taken in R is different. The underlying assumption is that the useR is thinking about the analysis while doing it.
-Douglas Bates (in reply to the suggestion to include type III sums of squares and lsmeans in base R to make it more similar to SAS or SPSS)
R-help (March 2007)

References

Freeman, E., Heathcote, A., Chalmers, K., & Hockley, W. (2010). Item effects in recognition memory for words. Journal of Memory and Language, 62(1), 1-18. http://doi.org/10.1016/j.jml.2009.09.004

R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.

Singmann, H., Bolker, B., Westfall, J., & Aust, F. (2016). afex: Analysis of Factorial Experiments. R package version 0.16-1. https://CRAN.R-project.org/package=afex

R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Traditional)_Taiwan.950 
[2] LC_CTYPE=Chinese (Traditional)_Taiwan.950   
[3] LC_MONETARY=Chinese (Traditional)_Taiwan.950
[4] LC_NUMERIC=C                                
[5] LC_TIME=Chinese (Traditional)_Taiwan.950    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] lsmeans_2.30-0    emmeans_1.7.0     jtools_2.1.4      ez_4.4-0         
 [5] forcats_0.5.1     stringr_1.4.0     purrr_0.3.4       readr_2.0.2      
 [9] tidyr_1.1.4       tibble_3.1.5      tidyverse_1.3.1   afex_1.0-1       
[13] lme4_1.1-27.1     mosaic_1.8.3      ggridges_0.5.3    mosaicData_0.20.2
[17] ggformula_0.10.1  ggstance_0.3.5    dplyr_1.0.7       Matrix_1.3-4     
[21] ggplot2_3.3.5     lattice_0.20-45  

loaded via a namespace (and not attached):
 [1] minqa_1.2.4         colorspace_2.0-2    ellipsis_0.3.2     
 [4] rio_0.5.27          leaflet_2.0.4.1     estimability_1.3   
 [7] ggdendro_0.1.22     fs_1.5.0            rstudioapi_0.13    
[10] farver_2.1.0        ggrepel_0.9.1       fansi_0.5.0        
[13] mvtnorm_1.1-2       lubridate_1.8.0     xml2_1.3.2         
[16] codetools_0.2-18    splines_4.1.1       knitr_1.36         
[19] polyclip_1.10-0     jsonlite_1.7.2      nloptr_1.2.2.2     
[22] broom_0.7.9         dbplyr_2.1.1        ggforce_0.3.3      
[25] compiler_4.1.1      httr_1.4.2          backports_1.2.1    
[28] assertthat_0.2.1    fastmap_1.1.0       cli_3.0.1          
[31] tweenr_1.0.2        htmltools_0.5.2     tools_4.1.1        
[34] lmerTest_3.1-3      coda_0.19-4         gtable_0.3.0       
[37] glue_1.4.2          reshape2_1.4.4      Rcpp_1.0.7         
[40] carData_3.0-4       cellranger_1.1.0    jquerylib_0.1.4    
[43] vctrs_0.3.8         nlme_3.1-153        crosstalk_1.1.1    
[46] xfun_0.26           openxlsx_4.2.4      rvest_1.0.1        
[49] mime_0.12           lifecycle_1.0.1     mosaicCore_0.9.0   
[52] pacman_0.5.1        MASS_7.3-54         scales_1.1.1       
[55] hms_1.1.1           parallel_4.1.1      yaml_2.2.1         
[58] curl_4.3.2          gridExtra_2.3       pander_0.6.4       
[61] labelled_2.8.0      stringi_1.7.5       highr_0.9          
[64] boot_1.3-28         zip_2.2.0           rlang_0.4.11       
[67] pkgconfig_2.0.3     evaluate_0.14       labeling_0.4.2     
[70] htmlwidgets_1.5.4   tidyselect_1.1.1    plyr_1.8.6         
[73] magrittr_2.0.1      R6_2.5.1            generics_0.1.0     
[76] DBI_1.1.1           pillar_1.6.3        haven_2.4.3        
[79] foreign_0.8-81      withr_2.4.2         mgcv_1.8-38        
[82] abind_1.4-5         modelr_0.1.8        crayon_1.4.1       
[85] car_3.0-11          utf8_1.2.2          tzdb_0.1.2         
[88] rmarkdown_2.11      grid_4.1.1          readxl_1.3.1       
[91] data.table_1.14.2   reprex_2.0.1        digest_0.6.28      
[94] xtable_1.8-4        numDeriv_2016.8-1.1 munsell_0.5.0