Consumer Analytics with R

European Summer School — Bielefeld, 16 June 2026

Author

Gül Ertan Özgüzer

Published

June 16, 2026

About the Course

This course introduces students to the practice of consumer analytics using R. Working with a real retail transaction dataset, students progress from raw data to actionable customer insights — and reflect on the legal and ethical boundaries that govern this work in Europe.

The course is designed for students with basic familiarity with data and statistics. No prior R experience is required, though it is helpful.

Duration: 3 hours (with one short break)
Format: Live coding — students follow along in their browsers using webR
Dataset: UK Online Retail Dataset (UCI Machine Learning Repository, Chen 2012)

Course Objectives

By the end of this course, students will:

Be able to load, inspect, and clean a real-world transactional dataset in R
Understand and apply the RFM (Recency, Frequency, Monetary) framework for customer segmentation
Have an idea of designing personalised marketing actions based on customer segments
Critically evaluate the legal and ethical implications of consumer profiling under GDPR

Learning Outcomes

#	Outcome	Assessed through
1	Load data from a URL and inspect its structure	Live coding
2	Apply data cleaning steps to remove noise and invalid records	Live coding
3	Produce and interpret summary statistics and visualisations	Discussion
4	Calculate RFM scores and assign customer segments	Live coding
5	Design segment-based marketing actions and interpret findings	Discussion
6	Explain what GDPR requires of a consumer analytics project	Discussion

Course Outline

Time	Session	Topics
9:45 – 10:30	Session 1 — Data	Load, inspect, clean, explore
10:30 – 11:15	Session 2 — RFM	Scoring, segmentation, interpretation
11:15 – 11:30	Break
11:30 – 12:15	Session 3 — Marketing	Personalised actions, product gaps
12:15 – 13:00	Session 4 — Regulation	GDPR, profiling, ethics

The Dataset

UK Online Retail Dataset Creator: Daqing Chen, London South Bank University Source: UCI Machine Learning Repository License: CC BY 4.0

A UK-based online retailer selling unique occasion gifts, primarily to wholesalers. Transactions from December 2010 to September 2011.

We work with a random sample of 20,000 rows hosted on GitHub.

Variables

Column	Type	Description
`InvoiceNo`	Text	6-digit transaction ID. Starts with `C` if cancelled
`StockCode`	Text	5-digit product code
`Description`	Text	Product name
`Quantity`	Integer	Units purchased per row
`InvoiceDate`	DateTime	Date and time of transaction
`UnitPrice`	Numeric	Price per unit in British pounds (£)
`CustomerID`	Integer	5-digit customer identifier
`Country`	Text	Country of the customer

After cleaning, our working dataset contains:

Metric	Value
Transactions	14,579
Unique customers	3,017
Unique products	2,544
Countries	34
Period	Dec 2010 – Dec 2011
Total revenue	£316,047

Why R?

R is a free, open-source programming language designed for statistical computing and data analysis. It is now one of the two dominant languages in data science alongside Python.

Open and free — no licences, no fees, no institutional access restrictions
Reproducible — R code is a complete, readable record of every step. Anyone can re-run your code and get the same result
Community-driven — over 20,000 packages on (The Comprehensive R Archive Network) CRAN, contributed by researchers worldwide
Industry and academia — widely used in economics, marketing, public health, and finance

The R ecosystem:

Tool	What it is
R	The core language and engine
RStudio	The most popular editor for writing R code
Quarto	Combines R code, output, and text into polished HTML, PDF, or Word documents
tidyverse	A collection of packages that make data work readable and consistent

In this course we use webR — R running entirely in your browser, with no installation required.

Session 1 — Loading, Cleaning, and Exploring the Data

Time: 9:45 – 10:30

1. Load Packages and Data

We start by loading tidyverse — a collection of R packages with a shared design philosophy. library(tidyverse) loads them all at once.

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Package	What it does
`dplyr`	Filter, group, summarise — the core data manipulation tool
`ggplot2`	Create charts and visualisations
`lubridate`	Work with dates and times

Then we load the dataset. read.csv() is base R — no extra package needed.

retail <- read.csv("https://raw.githubusercontent.com/gulertan/Consumer-Analytics-with-R/refs/heads/main/online_retail_sample.csv")

2. Inspect

Before touching the data, always look at it first. glimpse() shows column names, types, and a preview of values. summary() gives the range and missing value counts.

glimpse(retail)

Rows: 20,000
Columns: 8
$ InvoiceNo   <chr> "546306", "557055", "572516", "556484", "579187", "559552"…
$ StockCode   <chr> "21733", "23203", "22336", "21670", "90200D", "22068", "20…
$ Description <chr> "RED HANGING HEART T-LIGHT HOLDER", "JUMBO BAG DOILEY PATT…
$ Quantity    <int> 2, 20, -71, 6, 1, 2, 48, 1, 24, 1, 1, 2, 36, 6, 36, 24, 4,…
$ InvoiceDate <chr> "2011-03-10 16:16:00", "2011-06-16 14:45:00", "2011-10-24 …
$ UnitPrice   <dbl> 2.95, 2.08, 0.00, 1.25, 4.15, 1.65, 2.46, 5.79, 0.55, 1.25…
$ CustomerID  <int> NA, 12621, NA, 16938, NA, 16014, NA, 14096, 15382, 14465, …
$ Country     <chr> "United Kingdom", "Germany", "United Kingdom", "United Kin…

summary(retail)

  InvoiceNo          StockCode         Description           Quantity       
 Length:20000       Length:20000       Length:20000       Min.   :-1200.00  
 Class :character   Class :character   Class :character   1st Qu.:    1.00  
 Mode  :character   Mode  :character   Mode  :character   Median :    3.00  
                                                          Mean   :    9.65  
                                                          3rd Qu.:   10.00  
                                                          Max.   : 4800.00  
                                                                            
 InvoiceDate          UnitPrice          CustomerID      Country         
 Length:20000       Min.   :   0.000   Min.   :12347   Length:20000      
 Class :character   1st Qu.:   1.250   1st Qu.:13905   Class :character  
 Mode  :character   Median :   2.080   Median :15159   Mode  :character  
                    Mean   :   4.073   Mean   :15288                     
                    3rd Qu.:   4.130   3rd Qu.:16791                     
                    Max.   :2653.950   Max.   :18287                     
                                       NA's   :5078

What we see:

20,000 rows, 8 columns
InvoiceDate is read as text — needs to be converted to a date
Some CustomerID values are missing
Quantity and UnitPrice can be negative — these are returns and cancellations

The Pipe Operator `|>`

All tidyverse code is chained with the pipe operator |>, which reads as “and then”.

# Without pipe — nested, hard to read
arrange(summarise(group_by(retail, Country), Revenue = sum(Revenue)), desc(Revenue))

# With pipe — reads left to right, step by step
retail |>
  group_by(Country) |>
  summarise(Revenue = sum(Revenue)) |>
  arrange(desc(Revenue))

Both produce identical output. The pipe makes the logic transparent — you read each step in the order it happens.

3. Clean

Real data is never clean. We remove cancelled orders, missing customers, and invalid values. We also create a Revenue column (Quantity × UnitPrice).

retail_clean <- retail |>
  filter(!grepl("^C", InvoiceNo)) |>         # Remove cancellations (InvoiceNo starts with C)
  filter(!is.na(CustomerID)) |>               # Remove missing customers
  filter(Quantity > 0, UnitPrice > 0) |>      # Remove invalid rows
  mutate(
    InvoiceDate = as.Date(InvoiceDate, format = "%Y-%m-%d"),  # Parse date
    Revenue     = Quantity * UnitPrice                         # Row-level revenue
  )

# How many rows were removed?
nrow(retail) - nrow(retail_clean)

[1] 5421

Result: 5,421 rows removed — 27% of the sample. This is typical for retail transaction data.

4. Explore

range(retail_clean$InvoiceDate, na.rm = TRUE)  # Time period

[1] "2010-12-01" "2011-12-09"

n_distinct(retail_clean$CustomerID)             # Unique customers

[1] 3017

n_distinct(retail_clean$Description)            # Unique products

[1] 2544

n_distinct(retail_clean$Country)                # Countries

[1] 34

sum(retail_clean$Revenue)                       # Total revenue

[1] 316047

Revenue by Country

The UK accounts for 82% of total revenue. Despite selling to 34 countries, this is essentially a domestic business with modest international reach.

retail_clean |>
  group_by(Country) |>
  summarise(Revenue = sum(Revenue)) |>
  arrange(desc(Revenue)) |>
  slice(1:10) |>
  ggplot(aes(x = reorder(Country, Revenue), y = Revenue)) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "Top 10 Countries by Revenue", x = NULL, y = "Revenue (£)") +
  theme_minimal()

Volume vs. Revenue

There is an important distinction between products that sell in high volume and products that generate high revenue. They are not always the same.

knitr::kable() formats a data frame as a clean, readable table in the output document. We use it throughout this course whenever we want to display a summary table neatly.

# By quantity sold
retail_clean |>
  group_by(Description) |>
  summarise(Total_Quantity = sum(Quantity)) |>
  arrange(desc(Total_Quantity)) |>
  slice(1:10) |>
  knitr::kable()

Description	Total_Quantity
WORLD WAR 2 GLIDERS ASSTD DESIGNS	6058
JUMBO BAG RED RETROSPOT	1760
60 CAKE CASES VINTAGE CHRISTMAS	1636
RABBIT NIGHT LIGHT	1451
METAL SIGN TAKE IT OR LEAVE IT	1450
PACK OF 72 SKULL CAKE CASES	1091
MINI PAINT SET VINTAGE	1080
PACK OF 72 RETROSPOT CAKE CASES	1034
ASSORTED COLOUR BIRD ORNAMENT	1004
ASSORTED FLOWER COLOUR “LEIS”	990

# By revenue
retail_clean |>
  group_by(Description) |>
  summarise(Total_Revenue = sum(Revenue)) |>
  arrange(desc(Total_Revenue)) |>
  slice(1:10) |>
  knitr::kable()

Description	Total_Revenue
REGENCY CAKESTAND 3 TIER	5788.56
METAL SIGN TAKE IT OR LEAVE IT	3996.70
VINTAGE UNION JACK MEMOBOARD	3543.87
POSTAGE	3424.00
LANDMARK FRAME OXFORD STREET	3317.85
JUMBO BAG RED RETROSPOT	3210.64
PARTY BUNTING	2922.42
RABBIT NIGHT LIGHT	2628.32
Manual	2551.14
BLACK RECORD COVER FRAME	2400.69

Key observation: The top-selling product by volume (World War 2 Gliders, 6,058 units) does not appear in the top revenue list. The top revenue product (Regency Cakestand 3 Tier, £5,789) sells in far lower quantities. This is a classic volume vs. margin trade-off.

Monthly Revenue Trend

retail_clean |>
  mutate(Month = floor_date(InvoiceDate, "month")) |>
  group_by(Month) |>
  summarise(Revenue = sum(Revenue)) |>
  ggplot(aes(x = Month, y = Revenue)) +
  geom_line(color = "steelblue", linewidth = 1) +
  geom_point(color = "steelblue") +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "Monthly Revenue Trend", x = NULL, y = "Revenue (£)") +
  theme_minimal()

What the chart shows: Revenue is flat through early 2011, rises sharply from September, and peaks in November (£40,000+). The December drop is a data artefact — the dataset ends on 9 December. The seasonal pattern is consistent with a gift retailer supplying wholesalers ahead of Christmas.

Discussion: Why does revenue stay low in January–February despite Valentine’s Day? What does this tell us about this retailer’s customer base?

Exercise

Exercise 1 — Country deep-dive

Pick a country other than the UK — for example Germany or France. Filter the data to that country only and find the top 5 products by revenue.

retail_clean |>
  filter(Country == "Germany") |>
  group_by(Description) |>
  summarise(Total_Revenue = sum(Revenue)) |>
  arrange(desc(Total_Revenue)) |>
  slice(1:5) |>
  knitr::kable()

Description	Total_Revenue
POSTAGE	1206.0
EDWARDIAN PARASOL BLACK	475.5
ADVENT CALENDAR GINGHAM SACK	237.6
PINK PARTY BAGS	169.0
ROTATING SILVER ANGELS T-LIGHT HLDR	153.0

Exercise 2 — Highest revenue transaction

Which single transaction row generated the highest revenue?

retail_clean |>
  arrange(desc(Revenue)) |>
  slice(1)

  InvoiceNo StockCode                     Description Quantity InvoiceDate
1    581115     22413 METAL SIGN TAKE IT OR LEAVE IT      1404  2011-12-07
  UnitPrice CustomerID        Country Revenue
1      2.75      15195 United Kingdom    3861

Session 2 — RFM Segmentation

Time: 10:30 – 11:15

What is RFM?

RFM scores each customer on three dimensions derived entirely from their transaction history:

Dimension	Question	Direction
Recency (R)	How recently did this customer buy?	Fewer days = better
Frequency (F)	How many distinct purchases?	Higher = better
Monetary (M)	How much have they spent in total?	Higher = better

RFM requires no demographic data — only a customer ID, a date, and a value. It was developed in direct mail marketing in the 1980s and remains one of the most widely used segmentation frameworks in retail and e-commerce.

Step 1 — Calculate R, F, M

ref_date <- max(retail_clean$InvoiceDate) + 1

rfm <- retail_clean |>
  group_by(CustomerID) |>
  summarise(
    Recency   = as.numeric(ref_date - max(InvoiceDate)),
    Frequency = n_distinct(InvoiceNo),
    Monetary  = sum(Revenue)
  )

summary(rfm)

   CustomerID       Recency      Frequency         Monetary       
 Min.   :12347   Min.   :  1   Min.   : 1.000   Min.   :    0.39  
 1st Qu.:13792   1st Qu.: 24   1st Qu.: 1.000   1st Qu.:   17.00  
 Median :15289   Median : 64   Median : 2.000   Median :   39.00  
 Mean   :15290   Mean   :104   Mean   : 2.699   Mean   :  104.76  
 3rd Qu.:16779   3rd Qu.:163   3rd Qu.: 3.000   3rd Qu.:   90.55  
 Max.   :18287   Max.   :374   Max.   :97.000   Max.   :14678.87

What the distribution tells us:

Metric	Median	Mean	Max
Recency (days)	64	104	374
Frequency (orders)	2	2.7	97
Monetary (£)	39	105	14,679

All three distributions are right-skewed — the mean is pulled well above the median by a small number of high-value, high-frequency customers, almost certainly wholesalers. Half of all customers made just 2 or fewer purchases and spent less than £39 in the entire year.

Step 2 — Score 1–5 and Assign Segments

We use quintile scoring — dividing customers into five equal groups for each dimension, scored 1 (worst) to 5 (best).

rfm <- rfm |>
  mutate(
    R_score = ntile(-Recency,  5),   # lower recency days = better = higher score
    F_score = ntile(Frequency, 5),
    M_score = ntile(Monetary,  5),
    RFM     = R_score * 100 + F_score * 10 + M_score
  ) |>
  mutate(Segment = case_when(
    R_score >= 4 & F_score >= 4  ~ "Champions",
    R_score >= 3 & F_score >= 3  ~ "Loyal Customers",
    R_score >= 4 & F_score <= 2  ~ "Recent Customers",
    R_score <= 2 & F_score >= 3  ~ "At Risk",
    R_score <= 2 & F_score <= 2  ~ "Lost",
    TRUE                          ~ "Potential Loyalists"
  ))

Step 3 — Summarise and Visualise

rfm |>
  group_by(Segment) |>
  summarise(
    Customers     = n(),
    Avg_Recency   = round(mean(Recency)),
    Avg_Frequency = round(mean(Frequency), 1),
    Avg_Monetary  = round(mean(Monetary), 1),
    Total_Revenue = round(sum(Monetary))
  ) |>
  arrange(desc(Total_Revenue)) |>
  knitr::kable()

Segment	Customers	Avg_Recency	Avg_Frequency	Avg_Monetary	Total_Revenue
Champions	712	18	5.9	248.8	177153
Loyal Customers	581	48	2.6	84.2	48935
At Risk	516	182	2.3	91.1	47008
Lost	692	227	1.0	35.7	24704
Recent Customers	268	23	1.0	38.8	10403
Potential Loyalists	248	64	1.0	31.6	7844

rfm |>
  count(Segment) |>
  ggplot(aes(x = reorder(Segment, n), y = n)) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  labs(title = "Customer Segments", x = NULL, y = "Number of Customers") +
  theme_minimal()

Interpreting the Segments

Segment	Customers	Total Revenue	Share
Champions	712	£177,153	56%
Loyal Customers	581	£48,935	15%
At Risk	516	£47,008	15%
Lost	692	£24,704	8%
Recent Customers	268	£10,403	3%
Potential Loyalists	248	£7,844	2%

Champions (712 customers, £177,153) Bought on average 18 days ago, placed nearly 6 orders, spent £249 on average. These are the wholesalers — the backbone of the business. Losing one Champion costs as much as acquiring 5–7 new customers.

Loyal Customers (581 customers, £48,935) Bought 48 days ago on average, 2.6 orders, £84 average spend. Solid, recurring customers The opportunity here is to move them up — a loyalty programme or personalised upsell could convert some into Champions.

At Risk (516 customers, £47,008) This is the most strategically important segment. Their average monetary value (£91) is actually higher than Loyal Customers (£84), yet they have not purchased in an average of 182 days. They were valuable customers who have gone quiet. Without intervention they will slide into Lost. A targeted win-back campaign — a personalised offer, a “we miss you” email — is the highest-priority action.

Lost (692 customers, £24,704) The largest segment by customer count. They last bought 227 days ago on average, placed only 1 order, and spent just £36. These customers were probably never deeply engaged with the retailer. At this recency distance, the cost of re-acquisition typically exceeds the expected return. Low priority.

Recent Customers (268 customers, £10,403) Bought recently (23 days ago) but only once. These are new customers in their first purchase window — the most critical period for habit formation. A follow-up offer or welcome sequence within the next 30 days can convert them into Loyal Customers. If ignored, most will become Lost.

Potential Loyalists (248 customers, £7,844) One purchase, 64 days ago, £32 average spend. Light engagement is appropriate; heavy investment is not yet justified.

Strategic summary: Protect Champions at all costs. Rescue At Risk customers urgently. Nurture Recent Customers within 30 days. Do not spend on Lost.

Exercise

Exercise 3 — How many customers are in each segment?

Exercise 4 — What does the average Champion look like?

Filter the rfm table to Champions and report their average Recency, Frequency, and Monetary.

Session 3 — Personalised Marketing

Time: 11:30 – 12:15

From Segments to Actions

RFM tells us who the customers are. The next question is what to offer them. A natural starting point: recommend to Loyal Customers the products that Champions already buy — to nudge them up a tier.

# Attach segment labels to transaction data
retail_segmented <- retail_clean |>
  left_join(rfm |> select(CustomerID, Segment), by = "CustomerID")

# Top 15 products bought by Champions
champion_products <- retail_segmented |>
  filter(Segment == "Champions") |>
  group_by(Description) |>
  summarise(Times_Bought = n()) |>
  arrange(desc(Times_Bought)) |>
  slice(1:15)

champion_products |> knitr::kable()

Description	Times_Bought
PARTY BUNTING	37
JUMBO BAG RED RETROSPOT	36
HEART OF WICKER SMALL	32
LUNCH BAG RED RETROSPOT	28
WHITE HANGING HEART T-LIGHT HOLDER	28
ASSORTED COLOUR BIRD ORNAMENT	27
SPACEBOY LUNCH BOX	27
LUNCH BAG BLACK SKULL.	26
LUNCH BAG CARS BLUE	26
LUNCH BAG SPACEBOY DESIGN	25
PAPER CHAIN KIT 50’S CHRISTMAS	25
REGENCY CAKESTAND 3 TIER	25
PACK OF 72 RETROSPOT CAKE CASES	24
JUMBO STORAGE BAG SUKI	23
LUNCH BAG PINK POLKADOT	23

# Products bought by Loyal Customers
loyal_products <- retail_segmented |>
  filter(Segment == "Loyal Customers") |>
  distinct(Description)

# Gap: what Champions love that Loyal Customers have not bought
champion_products |>
  filter(!Description %in% loyal_products$Description) |>
  knitr::kable()

Description	Times_Bought
JUMBO STORAGE BAG SUKI	23

The Finding

Champions and Loyal Customers buy almost identical products. Only one product — Jumbo Storage Bag Suki — appears in Champion purchases but not in Loyal Customer purchases.

What this means: The difference between these two segments is not what they buy but how often and how much. Product recommendations are the wrong intervention.

The right strategy: Increase purchase frequency. Bring Loyal Customers back sooner with:

A time-limited offer tied to their purchase gap (“It has been 60 days since your last order”)
A loyalty reward triggered after the next purchase
A seasonal prompt ahead of the Christmas peak

This is more targeted, more actionable — and, as we will see, more legally sensitive.

Exercise

Exercise 5 — How much revenue is at stake if At Risk customers are lost?

Exercise 6 — Which segment generates the most revenue?

Sort all segments by total Monetary value.

Session 4 — Data Regulation and Ethics

Time: 12:15 – 13:00

What did we just build?

Look at what we coded today through a GDPR lens:

What we did	Marketing term	GDPR term
Computed RFM scores per CustomerID	Customer scoring	Profiling (Article 4.4)
Assigned segment labels automatically	Segmentation	Systematic profiling at scale
Identified days since each customer’s last purchase	Churn detection	Individual behavioural tracking
Designed a trigger email based on purchase timing	Lifecycle marketing	Automated direct marketing based on profiling
Recommended products to specific segments	Personalised marketing	Targeted direct marketing using profiled data

Every row in this table is profiling under GDPR. Every row requires a lawful basis — and the company must be able to prove it.

Two articles that directly apply

Article 4.4 — Profiling Any automated use of personal data to evaluate or predict a person’s behaviour is profiling. An RFM score attached to a CustomerID is profiling — even if no name or email address is visible. The score is derived from personal data and linked to an identifiable individual.

Article 21 — Right to object to direct marketing Every individual has an absolute right to object to their data being used for direct marketing. No exceptions. No override. If a customer clicks unsubscribe, all profiling for marketing purposes must stop — immediately and permanently.

Our trigger email “It has been 60 days since your last order” sends because we tracked that individual’s purchase timing and used it to target them. That is direct marketing based on profiling. Article 21 applies.

The Target case

In 2012, Target predicted a teenage customer was pregnant from her purchase patterns — before she told her family. She received baby coupons at home. Target faced no legal consequence in the US.

Under GDPR, pregnancy is health data — special category data requiring explicit consent. Inferring it from purchase behaviour without consent is a serious violation regardless of whether the inference is correct.

Same analytics. Entirely different legal outcome depending on which side of the Atlantic you operate.

Real fines

Company	Fine	Reason	Year
Meta	€1.2 billion	Transferring EU user data to US servers	2023
Amazon	€746 million	Targeted advertising without consent	2021
LinkedIn	€310 million	Unlawful behavioural advertising	2024
Google	€50 million	Consent buried in multi-step menus	2019

Discussion (15 min)

“You are hired as a data analyst at a European e-commerce company. Your manager asks you to build exactly what we built today and deploy it next week. What do you do?”

“Analytics tells you what is possible. GDPR tells you what is permitted. Your job as a data scientist is to know both.”

References

Chen, D. (2012). Data mining for the online retail industry. Journal of Database Marketing & Customer Strategy Management, 19, 197–208. DOI: 10.24432/C5BW33
Hughes, A.M. (1994). Strategic Database Marketing. Probus Publishing.
Wachter, S. et al. (2021). Is that your final decision? Multi-stage profiling and Article 22 GDPR. International Data Privacy Law, 11(4). Oxford Academic
Future of Privacy Forum (2022). Automated Decision-Making: Practical Cases from Courts. FPF Report
European Data Protection Board (2023). €1.2 billion fine for Facebook. EDPB

About the Course

Course Objectives

Learning Outcomes

Course Outline

The Dataset

Variables

Why R?

Session 1 — Loading, Cleaning, and Exploring the Data

1. Load Packages and Data

2. Inspect

The Pipe Operator |>

3. Clean

4. Explore

Revenue by Country

Volume vs. Revenue

Monthly Revenue Trend

Exercise

Session 2 — RFM Segmentation

What is RFM?

Step 1 — Calculate R, F, M

Step 2 — Score 1–5 and Assign Segments

Step 3 — Summarise and Visualise

Interpreting the Segments

Exercise

Session 3 — Personalised Marketing

From Segments to Actions

The Finding

Exercise

Session 4 — Data Regulation and Ethics

What is GDPR?

How GDPR is enforced

What did we just build?

Two articles that directly apply

The Target case

Real fines

Discussion (15 min)

References

The Pipe Operator `|>`