About me: Jens Roeser

  • senior lecturer in psycholinguistics @ psychology department (Nottingham Trent University)
  • theory: language production / comprehension / acquisition (e.g. Roeser, Torrance, and Baguley 2019)
  • focus: psycholinguistics of written language production
  • methods: Bayesian modelling (talk to me about mixture models, Roeser et al. 2021) in Stan; keystroke logging; eyetracking
  • teaching: statistics – R (psyntur, Andrews and Roeser 2021); cognitive psychology; language acquisition

Outline

  • Today: General introduction to R (\(\sim\) 1 hour)
  • Today: Data wrangling with tidyverse (\(\sim\) 2-3 hours)
  • Tomorrow: Data viz with ggplot2 (\(\sim\) 3 hours)
  • Lots of hands-on exercises

Why should I care?

Why using R to handle data?

Why should I care?

  • > 70% to 80% of data analysis is data wrangling
  • Open source: R is and always will be free
  • Large community of friendly peer support
  • Reproducibility: publish your code and look at code of other researchers
  • Flexibility: different ways of looking at data
  • Quickly growing number of available add-ons
  • Faster than manual data processing
  • Processing of large data sets
  • Reduce manual work
  • Reduce human error

Rules!

  • Stop using spreadsheets (“but this would only take me a second in excel”)
  • Never change your data manually; document everything in code.
    • Retrospective amendments made easy
    • Documentation / reproducibility
  • Organized working environment
    • .Rproj with one director per project with sub-directories for scripts, data, plots, etc
    • Short scripts: less code with one clear purpose is always better (test is: does the name of your script suggest a specific or general purpose?)
  • Comment your code for others (# Ceci n'est pas un comment!)
  • Try to use tidyverse instead of base R.

Recommended reading

General introduction

  • R projects and the R environment
  • Using R as a basic calculator
  • Assignments and objects (variables and vectors)
  • Indexing, slicing and Booleans
  • Base R functions
  • Packages, data frames and “tibbles”
  • Debugging common errors

Download repository

  • Download: https://github.com/jensroes/uia-r-workshop
  • Click on: Code > Download ZIP > unzip directory on your machine.
  • Open project by double-clicking on uia-r-workshop.Rproj
  • exercises/: exercises associated with each topic
  • slides.Rmd: these slides in R-markdown format (.html as well)
  • data/: scripts will read data from here

Example data set: Blomkvist et al. (2017)

  • Age-related changes in cognitive performance through adolescence and adulthood in a real-world task.

Real-world task: StarCraft 2

  • Real-time strategy video game
  • Nintendo Wii Balance Board

Example data set: Blomkvist et al. (2017)

blomkvist <- read_csv("data/blomkvist.csv")
glimpse(blomkvist)
Rows: 267
Columns: 10
$ id         <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, …
$ sex        <chr> "male", "female", "female", "female", "…
$ age        <dbl> 84, 37, 62, 85, 73, 65, 30, 49, 83, 58,…
$ medicine   <dbl> 8, 1, 0, 4, 5, 0, 0, 0, 11, 0, 0, 4, 3,…
$ meds_cat   <chr> "a lot", "little", "none", "few", "a lo…
$ smoker     <chr> "former", "no", "yes", "former", "forme…
$ rt_hand_d  <dbl> 702, 471, 639, 708, 607, 542, 571, 509,…
$ rt_hand_nd <dbl> 780, 497, 638, 639, 652, 499, 527, 547,…
$ rt_foot_d  <dbl> 1009, 738, 878, 902, 923, 687, 778, 743…
$ rt_foot_nd <dbl> 963, 692, 786, 1374, 805, 600, 750, 797…
  • Average reaction time (rt) of dominant (_d) or non-dominant (_nd) hand or foot in msecs
  • medicine: number of drugs used daily

References

Andrews, Mark. 2021. Doing data science in R: An Introduction for Social Scientists. London, UK: SAGE Publications Ltd.

Andrews, Mark, and Jens Roeser. 2021. Psyntur: Helper Tools for Teaching Statistical Data Analysis. https://CRAN.R-project.org/package=psyntur.

Blomkvist, Andreas W., Fredrik Eika, Martin T. Rahbek, Karin D. Eikhof, Mette D. Hansen, Malene Søndergaard, Jesper Ryg, Stig Andersen, and Martin G. Jørgensen. 2017. “Reference Data on Reaction Time and Aging Using the Nintendo Wii Balance Board: A Cross-Sectional Study of 354 Subjects from 20 to 99 Years of Age.” PLoS One 12 (12): e0189598.

Roeser, Jens, Sven De Maeyer, Mariëlle Leijten, and Luuk Van Waes. 2021. “Modelling Typing Disfluencies as Finite Mixture Process.” Reading and Writing, 1–26.

Roeser, Jens, Mark Torrance, and Thom Baguley. 2019. “Advance Planning in Written and Spoken Sentence Production.” Journal of Experimental Psychology: Learning, Memory, and Cognition 45 (11): 1983–2009.

Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media, Inc.

Winter, Bodo. 2019. Statistics for Linguists: An Introduction Using R. Routledge.