Consumer Analytics with R
European Summer School — Bielefeld, 16 June 2026
About the Course
This course introduces students to the practice of consumer analytics using R. Working with a real retail transaction dataset, students progress from raw data to actionable customer insights — and reflect on the legal and ethical boundaries that govern this work in Europe.
The course is designed for students with basic familiarity with data and statistics. No prior R experience is required, though it is helpful.
- Duration: 3 hours (with one short break)
- Format: Live coding — students follow along in their browsers using webR
- Dataset: UK Online Retail Dataset (UCI Machine Learning Repository, Chen 2012)
Course Objectives
By the end of this course, students will:
- Be able to load, inspect, and clean a real-world transactional dataset in R
- Understand and apply the RFM (Recency, Frequency, Monetary) framework for customer segmentation
- Build and interpret a logistic regression model to predict customer value
- Critically evaluate the legal and ethical implications of consumer profiling under GDPR
Learning Outcomes
| # | Outcome | Assessed through |
|---|---|---|
| 1 | Load data from a URL and inspect its structure | Live coding |
| 2 | Apply data cleaning steps to remove noise and invalid records | Live coding |
| 3 | Produce and interpret summary statistics and visualisations | Discussion |
| 4 | Calculate RFM scores and assign customer segments | Live coding |
| 5 | Build a logistic regression model and interpret its output | Live coding |
| 6 | Explain what GDPR requires of a consumer analytics project | Discussion |
Course Outline
| Time | Session | Topics |
|---|---|---|
| 9:45 – 10:30 | Session 1 — Data | Load, inspect, clean |
| 10:30– 11:15 | Session 2 — Explore data and RFM | Recency, Frequency, Monetary scoring and segmentation |
| 11:15 – 11:30 | Break | |
| 11:30 – 12:15 | Session 3 — Modelling | Logistic regression, prediction, evaluation |
| 12:15 – 13:00 | Session 4 — Regulation | GDPR, Article 22, enforcement, ethics |
The Dataset
UK Online Retail Dataset Creator: Daqing Chen, London South Bank University Source: UCI Machine Learning Repository License: CC BY 4.0
A UK-based online retailer selling unique occasion gifts, primarily to wholesalers. Transactions from December 2010 to September 2011.
We work with a random sample of 20,000 rows hosted on GitHub:
retail <- read.csv("https://raw.githubusercontent.com/gulertan/Consumer-Analytics-with-R/refs/heads/main/online_retail_sample.csv")After cleaning, our working dataset contains:
| Metric | Value |
|---|---|
| Transactions | 14,579 |
| Unique customers | 3,017 |
| Unique products | 2,544 |
| Countries | 34 |
| Period | Dec 2010 – Dec 2011 |
| Total revenue | £316,047 |
Required Package
All code in this course uses one package:
install.packages("tidyverse") # data manipulation and visualisationtidymodels is used in Session 3 for the logistic regression. Everything else is base R.
References
- Chen, D. (2012). Data mining for the online retail industry. Journal of Database Marketing & Customer Strategy Management, 19, 197–208. DOI: 10.24432/C5BW33
- Hughes, A.M. (1994). Strategic Database Marketing. Probus Publishing.
- Wachter, S. et al. (2021). Is that your final decision? Multi-stage profiling and Article 22 GDPR. International Data Privacy Law, 11(4), 319–340. Oxford Academic
- Future of Privacy Forum (2022). Automated Decision-Making: Practical Cases from Courts. FPF Report
- European Data Protection Board (2023). €1.2 billion fine for Facebook. EDPB