About us: Jens Roeser

  • Senior Lecturer in Psycholinguistics @ Psychology Department
  • Language production / comprehension / acquisition (e.g. Roeser, Torrance, and Baguley 2019; Garcia, Roeser, and Kidd 2023)
  • Bayesian modelling (Roeser et al. 2021); keystroke logging; eyetracking
  • Teaching: advanced statistical modelling, data wrangling, data visualisation, R package (psyntur, Andrews and Roeser 2021)

About us: Thom Baguley

  • Professor of Experimental Psychology
  • Lectures on modeling discrete outcomes (e.g., binary and count data)
  • Modeling data from a wide range of areas including cognitive, forensic and occupational data
  • Serious Stats: A guide to advanced statistics for the behavioral sciences (Baguley 2012)

About us: Mark Andrews

  • Associate Professor in Statistical Methods
  • ML for Statistics III
  • Cognitive psychology
  • Bayesian statistical and computational models
  • UG Statistics II / III, MSc Advanced Stats
  • Doing Data Science in R: An Introduction for Social Scientists (Andrews 2021); psyntur (Andrews and Roeser 2021)

Who are you?

Say hi to everyone on your table and brainstorm …

  • Which statistical tests do you remember and when do you use them?
  • Wouldn’t it be nice to have one approach that can capture most (all?) of our data problems?

Module Aims

  • Provide a unified set of advanced statistical techniques that covers a wider range of data-analysis problems: multilevel generalised linear models
  • General framework encapsulating a range of widely used techniques including linear regression, ANOVA, ANCOVA, MANCOVA, logistic regression, Poisson regression, log-linear, random effects / mixed-effects models etc.
  • These models arise as special cases of a single underlying general principle.
  • Apply this general principle to a wider range of data analysis problems.

Module Aims

  • Multilevel generalised linear models will be introduce as sequence of consecutively more general models, all of which build on linear regression.
  • Linear regression: review of multiple linear regression and factorial ANOVA.
  • General linear models: certain problems, which are poorly handled by ANOVA, can be easily accommodated (e.g. varying slope models).
  • Generalised linear models: transformations applied to linear models allows modelling of data-types including categorical, ordinal, discrete frequency data.
  • Multi-level models: data with inherently hierarchical structure, which have often been inadequately dealt with.

Learning outcomes

By the end of this module, you should be able to:

  1. Apply advanced statistics to real-world data-analysis problems.
  2. Use rational and principled arguments to justify the choice of statistical methods.
  3. Engage in peer-to-peer debate about methods and practices of data-analysis.
  4. Demonstrate a range of transferable skills, including data-analysis, data-preparation, computer programming, graphical presentation of quantitative evidence, reporting of data-analysis results.

Why was it a good idea to choose this module?

  • Why did you choose Statistics III?
  • What do you expect to get out off this module?
  • How confident do you feel about RStudio; how about stats?
  • Is there anything you’re worried about with this module?

Why was it a good idea to choose this module?

  • Career opportunities! Stats and statistical software skills transfer to many different areas in academia and industry.
  • You will be able to show-off your data analysis skill in your final year project, unless you are using qualitative methods.
  • Solid understanding of statistical methods is relevant for planning data collections and making sense of quantitative data.
  • Data analysis can be efficient, flexible, reliable, and fast once you have had enough R practice.
  • You have the opportunity to improve your data analysis and R skills with three experts.

Outline

Each week of Term 1, except PACE week. Each week one 4-hours workshop.
Topic Lecturer
Week 1: Introduction to module, and introduction to RMarkdown Jens Roeser
Week 2: Normal linear models Jens Roeser
Week 3: Model comparison Jens Roeser
Week 4: Logistic regression Thom Baguley
Week 5: Poisson regression Thom Baguley
Week 6: PACE week
Week 7: Negative binomial and zero-inflated count models Thom Baguley
Week 8: Multilevel linear models (I) Mark Andrews
Week 9: Multilevel linear models (II) Mark Andrews
Week 10: Revision (and loose ends) Mark Andrews

Engaging with this module

  • 1 \(\times\) 4-hours workshop each week.
  • Mixture of theory and practical exercises.
  • Together, we will work through statistics problems using R.
  • For each topic, there will be detail accompanying lecture notes.
  • To engage successfully with this module, attend all workshops, and read the lecture notes.
  • For excellent results, you will need to demonstrate independent learning that goes beyond what we cover in class.

Assessment

Using data-sets of your own choice, perform and report analyses using the tools in 1 – 4 to address a set of theoretical questions to address a research question of relevance to psychology.

  1. Logistic regression for binomial data
  2. Regression models for count data
  3. Multilevel regression models
  4. A new methods that was not explicitly covered in class

Assessment

Psychological relevance: address some non-trivial question of some psychological relevance using your chosen data.

  • Very wide range of topics and sub-fields, many of which overlap with many other disciplines.
  • Consider the data-set described here published in

Bertrand, M. and Mullainathan, S. (2004). Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination. American Economic Review, 94, 991–1013.

  • Published in an economics journal but it addresses implicit racial bias, and is obviously related to psychology.

Assessment

  • Choose a different data set per each question.
  • 4 separate data analyses reported on no more than 4 pages each.
  • For all questions, provide full technical details, and graphical analysis.
  • Theoretical model (assumed data-generative process) needs to be motivated.

Assessment

  • Deadline for submission via Dropbox: Jan 13th 2025
  • Individual written feedback (by Feb 3rd 2025) to help you how to best carry out and report statistical analyses.
  • We are looking for evidence of
    • understanding and knowledge of the theoretical basis.
    • how and why to apply certain statistical models.
    • practical skills in data manipulation and analysis.
  • For high or exceptional grades, display learning that goes beyond the content explicitly taught in class.
  • See Grading Matrix on NOW for details.

Assessment

  • Work must be submitted as single zip archive with
    • One RMarkdown file
    • PDF document generated from RMarkdown
    • RStudio project file
    • Data and supplementary files (e.g. bibliography, custom R function)
    • For an example see demo on NOW.
  • Work must be reproducible using RMarkdown file.
  • Your analysis, plots, descriptions, for each question will be contained in RMarkdown file, and this will reference all relevant data files.

Assessment

Assessment

For all tasks, provide …

  • descriptive statistics and graphical visualizations of the data.
  • full details of the statistical model that you are using (multiple predictors incl. categorical predictors, interaction terms).
  • full statistical details of model fit and comparisons with alternative models.
  • full statistical details of the inferred model parameter value estimates.
  • analysis, including graphical analysis, of the predictions of the model (model checks).

Question 4: A new topic

  • Y3 modules require you to go beyond the class contents.
  • Look for text books and online tutorials, e.g. my tutorial for mixture models.
  • Examples: time series, structural equation modelling (SEM), auto-regression models, non-linear regression, signal detection theory, mixture models, categorical models, multi-nomial models, multi-variate models, diffusion models, Bayesian models …
  • Talk to your supervisor / us re advanced analyses relevant for your project.

Formative assessment

  • Practice how to write a the kind of report we expect from you.
  • Data-analysis report using linear regression to address a psychologically relevant question.
  • Support in workshops.
  • Just one data analysis hence 4 pages max.
  • Submit single zip archive via Dropbox until end October (deadline tba).
  • Feedback on what you did well and what you can improve.
  • It does not (directly) affect your grade.

Task

  • Have a look at the Grading Matrix on NOW.
  • Is there anything you find unclear?
  • What concerns have you got with the task?
  • What opportunities do you see? Have you got ideas you want to share?

Support

  • MS Teams channel for R, stats, assessment-related questions, and useful resources.
  • Because our answers will be of interest to everyone / easy code sharing, tagging.
  • Also you will receive an answer faster.
  • We might refer emails to the forum so everyone can benefit from our answer.
  • Help each other!
  • Online support will stop at Dec 20 2024.
  • For confidential questions contact ML Mark Andrews: mark.andrews@ntu.ac.uk

Tech requirements

  • You need R and RStudio on your own machine: instructions
  • Don’t rely on RStudio Cloud or university machines.
  • Check if tidyverse is installed.
  • If not run
install.packages("tidyverse")

Required and recommended reading

What questions have you got?

  • How will the module be taught?
  • What’s the assessment like?
  • How will you be supported?
  • What’s expected of you?

References

Andrews, Mark. 2021. Doing Data Science in R: An Introduction for Social Scientists. SAGE Publications Ltd.

Andrews, Mark, and Jens Roeser. 2021. psyntur: Helper Tools for Teaching Statistical Data Analysis. https://CRAN.R-project.org/package=psyntur.

Baguley, Thom. 2012. Serious Stats: A Guide to Advanced Statistics for the Behavioral Sciences. Macmillan International Higher Education.

Garcia, Rowena, Jens Roeser, and Evan Kidd. 2023. “Finding Your Voice: Voice-Specific Effects in Tagalog Reveal the Limits of Word Order Priming.” Cognition 236: 105424.

Gelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and Other Stories. Cambridge University Press.

Roeser, Jens, Sven De Maeyer, Mariëlle Leijten, and Luuk Van Waes. 2021. “Modelling Typing Disfluencies as Finite Mixture Process.” Reading and Writing, 1–26.

Roeser, Jens, Mark Torrance, and Thom Baguley. 2019. “Advance Planning in Written and Spoken Sentence Production.” Journal of Experimental Psychology: Learning, Memory, and Cognition 45 (11): 1983–2009.

Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media, Inc.