Consumer Analytics with R

European Summer School — Bielefeld, 16 June 2026

Author

Gül Ertan Özgüzer

Published

June 15, 2026

About the Course

This course introduces students to the practice of consumer analytics using R. Working with a real retail transaction dataset, students progress from raw data to actionable customer insights — and reflect on the legal and ethical boundaries that govern this work in Europe.

The course is designed for students with basic familiarity with data and statistics. No prior R experience is required, though it is helpful.

  • Duration: 3 hours (with one short break)
  • Format: Live coding — students follow along in their browsers using webR
  • Dataset: UK Online Retail Dataset (UCI Machine Learning Repository, Chen 2012)

Course Objectives

By the end of this course, students will:

  1. Be able to load, inspect, and clean a real-world transactional dataset in R
  2. Understand and apply the RFM (Recency, Frequency, Monetary) framework for customer segmentation
  3. Build and interpret a logistic regression model to predict customer value
  4. Critically evaluate the legal and ethical implications of consumer profiling under GDPR

Learning Outcomes

# Outcome Assessed through
1 Load data from a URL and inspect its structure Live coding
2 Apply data cleaning steps to remove noise and invalid records Live coding
3 Produce and interpret summary statistics and visualisations Discussion
4 Calculate RFM scores and assign customer segments Live coding
5 Build a logistic regression model and interpret its output Live coding
6 Explain what GDPR requires of a consumer analytics project Discussion

Course Outline

Time Session Topics
9:45 – 10:30 Session 1 — Data Load, inspect, clean
10:30– 11:15 Session 2 — Explore data and RFM Recency, Frequency, Monetary scoring and segmentation
11:15 – 11:30 Break
11:30 – 12:15 Session 3 — Modelling Logistic regression, prediction, evaluation
12:15 – 13:00 Session 4 — Regulation GDPR, Article 22, enforcement, ethics

The Dataset

UK Online Retail Dataset Creator: Daqing Chen, London South Bank University Source: UCI Machine Learning Repository License: CC BY 4.0

A UK-based online retailer selling unique occasion gifts, primarily to wholesalers. Transactions from December 2010 to September 2011.

We work with a random sample of 20,000 rows hosted on GitHub:

retail <- read.csv("https://raw.githubusercontent.com/gulertan/Consumer-Analytics-with-R/refs/heads/main/online_retail_sample.csv")

After cleaning, our working dataset contains:

Metric Value
Transactions 14,579
Unique customers 3,017
Unique products 2,544
Countries 34
Period Dec 2010 – Dec 2011
Total revenue £316,047

Required Package

All code in this course uses one package:

install.packages("tidyverse")   # data manipulation and visualisation

tidymodels is used in Session 3 for the logistic regression. Everything else is base R.


References

  • Chen, D. (2012). Data mining for the online retail industry. Journal of Database Marketing & Customer Strategy Management, 19, 197–208. DOI: 10.24432/C5BW33
  • Hughes, A.M. (1994). Strategic Database Marketing. Probus Publishing.
  • Wachter, S. et al. (2021). Is that your final decision? Multi-stage profiling and Article 22 GDPR. International Data Privacy Law, 11(4), 319–340. Oxford Academic
  • Future of Privacy Forum (2022). Automated Decision-Making: Practical Cases from Courts. FPF Report
  • European Data Protection Board (2023). €1.2 billion fine for Facebook. EDPB