Statistics III – Module Overview

About us: Jens Roeser

Senior lecturer in psycholinguistics @ psychology department (Nottingham Trent University)
Theory: psycholinguistics; language production / comprehension / acquisition (e.g. Roeser, Torrance, and Baguley 2019)
Methods: Bayesian modelling (talk to me about mixture models, Roeser et al. 2021) in Stan; keystroke logging; eyetracking
Teaching and consultancy: advanced statistical modelling, data wrangling, data visualisation; R package (psyntur, Andrews and Roeser 2021)
Twitter: https://twitter.com/jens_roeser and https://twitter.com/LangLitPsychNTU

About us: Thom Baguley

Professor of Experimental Psychology
Teaching lectures on modeling discrete outcomes (e.g., for binary and count data)
My work focuses on modeling data from a wide range of areas including cognitive, forensic and occupational data
Author of Serious Stats: A guide to advanced statistics for the behavioral sciences (Baguley 2012)

About us: Lucy Justice

Principal Lecturer in the School of Social Sciences
Role involves designing and evaluating interventions to improve student outcomes and experiences at university
Interested in using statistical models to understand phenomena from across the social sciences

Who are you?

Why did you choose Statistics III?
What do you expect to learn?
How confident are you with RStudio?
How about stats?

Module Aims

Provide a unified set of advanced statistical techniques that can cover an exceptionally wider range of data-analysis problems.
- Aka. multilevel generalised linear models
- General data-analytical framework encapsulating a range of widely used techniques including linear regression, ANOVA, ANCOVA, MANCOVA, logistic regression, Poisson regression, log-linear analysis, random effects / mixed-effects models etc.
Describe how these models arise as special cases of a single underlying general principle.
Demonstrate using real-world data-sets how we can apply this general principle to a wider range of data analysis problems.

Module Aims

Multilevel generalized linear models will be introduce as sequence of consecutively more general models, all of which build on the familiar principle of linear regression.
Linear regression: review of multiple linear regression and factorial ANOVA.
General linear models: certain problems, which are poorly handled by standard ANOVA, can be easily accommodated (e.g. varying slope regression models).
Generalized linear models: transformations can be applied to linear models to allow them to be applied to a wide variety of data-types including categorical, ordinal and discrete frequency data (logistic and Poisson regression models).
Multi-level models: data with inherently hierarchical structure arise naturally and widely in psychology, e.g. repeated measures data and data collected from large numbers of sub-groups, yet have often been inadequately dealt with in data analysis until recently.

Learning outcomes

By the end of this module you should be able to:

Critically discuss the theoretical basis of a wide variety of statistical tests that are in common practice.
Critically evaluate debates on the relative merits or appropriateness of different statistical methods being used in psychology.
Critically consider the formal or mathematical basis of statistical models and the mathematical basis of statistical inference.
Critically evaluate the choices of statistical tools available for any given research problem

Skills, Qualities, and Attributes

By the end of this module, you should be able to:

Apply advanced statistics to real-world data-analysis problems, whether academic or non-academic.
Use rational and principled arguments to justify the choice of statistical methods.
Engage in peer-to-peer debate about methods and practices of data-analysis.
Demonstrate a range of transferable skills, including data-analysis, data-preparation, computer programming, graphical presentation of quantitative evidence, reporting of data-analysis results.

Why was it a good idea to choose this module?

Employability! Stats and statistical software skills transfer to many different areas in academia and industry.
Unless you use qualitative methods, you will be able to show-off your data analysis skill in your project (markers do like to see this).
Solid understanding of statistical methods is relevant for making sense of quantitative data.
Data analysis can be efficient, flexible, reliable, and fast once you have had enough R practice.
You have the opportunity to improve your data analysis and R skills with three experts.

Outline

Each week of Term 1, except PACE week. Each week two 2-hour workshops.
W/C	Topic	Lecturer
26/09/22	Week 1: Introduction to module, and introduction to RMarkdown	Jens Roeser + Thom Baguley
03/10/22	Week 2: Normal linear models	Jens Roeser
10/10/22	Week 3: Model comparison	Jens Roeser
17/10/22	Week 4: Logistic regression	Thom Baguley
24/10/22	Week 5: Poisson regression	Thom Baguley
31/10/22	Week 6: PACE week
07/11/22	Week 7: Negative binomial and zero-inflated count models	Thom Baguley
14/11/22	Week 8: Multilevel linear models (I)	Lucy Justice
21/11/22	Week 9: Multilevel linear models (II)	Lucy Justice
28/11/22	Week 10: Revision (and loose ends)	Lucy Justice + Jens Roeser + Thom Baguley

Engaging with this module

Each week in this module, we will have 2 workshops.
Mixture of theory and lots of practical exercises.
You will be watching my screen and working on your own computer as the same time.
Together, we will work through statistics problems using R.
For each topic, there will be detail accompanying lecture notes.
To engage successfully with this module, you should attend each workshop each week, and read the lecture notes for each topic.
For excellent results, you will also need to demonstrate independent learning that goes beyond what we cover in class.

Assessment

All details are here and assessments specifications are here.

Using data-sets of your own choice, perform and report analyses using the tools in 1 – 4 to address a set of theoretical questions of your choice that relate to your chosen data-sets.

Binary logistic regression
Regression models for count data (e.g. Poisson, negative binomial, zero-inflated count model)
Multilevel linear regression models
A new topic / methods that was not explicitly covered in class

Choose a different data set per each question.
4 separate data analyses reported on no more than 4 pages each (max 16 pages overall).
For all questions, provide full technical details, and graphical analysis.
By theoretical question, we mean some non-trivial question.

E.g. if your data set concerns rates of extra-marital affairs and properties of people who do and do not have affairs, a theoretical question could simply be what variables (such as age, gender, religiousness, etc) explain why people have affairs and how often they do so.

Assessment

Deadline for submission via Dropbox: 9 January 2023, 2pm
Individual written feedback by 30 January 2023
Feedback will help you how to best carry out and report the types of statistical analyses that are at the core of this course.
We wish to see evidence of
- understanding and knowledge of the theoretical basis.
- how and why to apply certain statistical models.
- practical skills in data manipulation and analysis.
For high or exceptional grades it is vital that you display learning that goes beyond the content explicitly taught in class.
See Grading Matrix for details.

Assessment

Work must be submitted as single zip archive with
- One RMarkdown file
- PDF document generated from RMarkdown
- RStudio project file
- Data and supplementary files (e.g. bibliography, custom R function)
- for an example see Demo
Work must be reproducible as per requirement of the assignment: we must be able to reproduce your report from the RMarkdown file.
Your work, analysis, plots, descriptions, for each question will be contained in the RMarkdown file, and this will reference all relevant data files.

Assessment

For all questions choose a different data set that is appropriate to the task.
Data can be found for many recent publications.
Also, on this repository.
Pick a topic that your are interested in (maybe a topic that relates to your dissertation).
Only restriction: you must share the data with us, so no confidential data.

Assessment

For all tasks, you should provide …

descriptive statistics and graphical visualizations of the data.
full details of the statistical model that you are using.
full statistical details of the inferred model parameters.
analysis, including graphical analysis, of the predictions of the model.
full statistical details of model fit and comparisons with plausible alternative models.

Assessment

Ideally, all analyses should contain multiple predictors, including categorical predictors, and possibly interaction terms.
It is not strictly necessary to have those in all analyses, but it is expected that you will use them in at least some of the analyses.
For high or exceptional grades it is vital that you display learning that goes beyond the content explicitly taught in class.

Question 4. A new topic / method that was not explicitly covered in class

Year 3 modules require you to go beyond what as covered in class for upper grade bands.
6 months free access to DataCamp
Introductory text books and online tutorials.
Examples: time series, structural equation modelling (SEM), auto-regression models, non-linear regression, signal detection theory, mixture models, categorical models, multi-nomial models, multi-variate models, diffusion models, Bayesian models …

Formative assessment

To learn how best to write a report of the kind we expect in the (summative, i.e. real) assessment, you have the opportunity to write a data analysis report on linear regression (first topic).
Support in workshop October 11th.
Submit via Dropbox until October 28th 2pm
We will provide feedback on what you did well and what you can improve.
This is the best way for you to learn how to write a good report.
This is a learning and feedback opportunity for you.
It does not affect your grade for the course (except, you will naturally do better if you had some practice).

Formative assessment

Follow the same procedure as for the final assessment (discussed before).
Submit your work as a single zip archive – see demo for an example.
There is just one data analysis to do for the formative assessment (hence, 4 pages max).
Use a data set of your own choice; report a linear regression analysis to address a set of theoretical questions of your own choice that relate to your chosen data-set.

Support

Teams for discussing R, stats, assessment-related questions, and useful resources because all of this might be of interest to others.
- We might refer certain emails to Teams so everyone can benefit from our answer.
- Make sure you have access to our Teams channel!
- Ask on Teams because you will receive an answer faster.
- Do help each other out!
- Obviously lecturers like seeing engagement with their module :)
Online support will stop at Dec 21st 2022.
Email for questions that are less general or correspondence that is of sensitive nature: primary contact jens.roeser@ntu.ac.uk

Tech requirements

You need R and RStudio on your own machine: Instructions can be found here
Ideally, don’t rely on RStudio Cloud or machines that you don’t have admin rights for.
Check if tidyverse is installed.
If not run

install.packages("tidyverse")

Required and recommended reading

Lecture notes are available for each topic we will cover in class.
Mark Andrew’s Lecture notes on NOW also published here (Andrews 2021).
Data sets and R code for this book can be found here.
“R for Data Science” by Wickham and Grolemund for data wrangling, visualisation and modelling (Wickham and Grolemund 2016).
For more topic specific literature see Assessment section on NOW.
Advanced linear models: “Regression and other Stories” by Gelman, Hill, and Vehtari (2020).

What questions have you got?

How will the module be taught?
What’s the assessment like?
How will you be supported?
What’s expected of you?

References

Andrews, Mark. 2021. Doing Data Science in R: An Introduction for Social Scientists. SAGE Publications Ltd.

Andrews, Mark, and Jens Roeser. 2021. psyntur: Helper Tools for Teaching Statistical Data Analysis. https://CRAN.R-project.org/package=psyntur.

Baguley, Thom. 2012. Serious Stat: A Guide to Advanced Statistics for the Behavioral Sciences. Bloomsbury Publishing.

Gelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and Other Stories. Cambridge University Press.

Roeser, Jens, Sven De Maeyer, Mariëlle Leijten, and Luuk Van Waes. 2021. “Modelling Typing Disfluencies as Finite Mixture Process.” Reading and Writing, 1–26.

Roeser, Jens, Mark Torrance, and Thom Baguley. 2019. “Advance Planning in Written and Spoken Sentence Production.” Journal of Experimental Psychology: Learning, Memory, and Cognition 45 (11): 1983–2009.

Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media, Inc.