Consumer Analytics with R

European Summer School — Bielefeld, 16 June 2026

Author

Gül Ertan Özgüzer

Published

June 16, 2026

About the Course

This course introduces students to the practice of consumer analytics using R. Working with a real retail transaction dataset, students progress from raw data to actionable customer insights — and reflect on the legal and ethical boundaries that govern this work in Europe.

The course is designed for students with basic familiarity with data and statistics. No prior R experience is required, though it is helpful.

  • Duration: 3 hours (with one short break)
  • Format: Live coding — students follow along in their browsers using webR
  • Dataset: UK Online Retail Dataset (UCI Machine Learning Repository, Chen 2012)

Course Objectives

By the end of this course, students will:

  1. Be able to load, inspect, and clean a real-world transactional dataset in R
  2. Understand and apply the RFM (Recency, Frequency, Monetary) framework for customer segmentation
  3. Have an idea of designing personalised marketing actions based on customer segments
  4. Critically evaluate the legal and ethical implications of consumer profiling under GDPR

Learning Outcomes

# Outcome Assessed through
1 Load data from a URL and inspect its structure Live coding
2 Apply data cleaning steps to remove noise and invalid records Live coding
3 Produce and interpret summary statistics and visualisations Discussion
4 Calculate RFM scores and assign customer segments Live coding
5 Design segment-based marketing actions and interpret findings Discussion
6 Explain what GDPR requires of a consumer analytics project Discussion

Course Outline

Time Session Topics
9:45 – 10:30 Session 1 — Data Load, inspect, clean, explore
10:30 – 11:15 Session 2 — RFM Scoring, segmentation, interpretation
11:15 – 11:30 Break
11:30 – 12:15 Session 3 — Marketing Personalised actions, product gaps
12:15 – 13:00 Session 4 — Regulation GDPR, profiling, ethics

The Dataset

UK Online Retail Dataset Creator: Daqing Chen, London South Bank University Source: UCI Machine Learning Repository License: CC BY 4.0

A UK-based online retailer selling unique occasion gifts, primarily to wholesalers. Transactions from December 2010 to September 2011.

We work with a random sample of 20,000 rows hosted on GitHub.

Variables

Column Type Description
InvoiceNo Text 6-digit transaction ID. Starts with C if cancelled
StockCode Text 5-digit product code
Description Text Product name
Quantity Integer Units purchased per row
InvoiceDate DateTime Date and time of transaction
UnitPrice Numeric Price per unit in British pounds (£)
CustomerID Integer 5-digit customer identifier
Country Text Country of the customer

After cleaning, our working dataset contains:

Metric Value
Transactions 14,579
Unique customers 3,017
Unique products 2,544
Countries 34
Period Dec 2010 – Dec 2011
Total revenue £316,047

Why R?

R is a free, open-source programming language designed for statistical computing and data analysis. It is now one of the two dominant languages in data science alongside Python.

  • Open and free — no licences, no fees, no institutional access restrictions
  • Reproducible — R code is a complete, readable record of every step. Anyone can re-run your code and get the same result
  • Community-driven — over 20,000 packages on (The Comprehensive R Archive Network) CRAN, contributed by researchers worldwide
  • Industry and academia — widely used in economics, marketing, public health, and finance

The R ecosystem:

Tool What it is
R The core language and engine
RStudio The most popular editor for writing R code
Quarto Combines R code, output, and text into polished HTML, PDF, or Word documents
tidyverse A collection of packages that make data work readable and consistent

In this course we use webR — R running entirely in your browser, with no installation required.


Session 1 — Loading, Cleaning, and Exploring the Data

Time: 9:45 – 10:30


1. Load Packages and Data

We start by loading tidyverse — a collection of R packages with a shared design philosophy. library(tidyverse) loads them all at once.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Package What it does
dplyr Filter, group, summarise — the core data manipulation tool
ggplot2 Create charts and visualisations
lubridate Work with dates and times

Then we load the dataset. read.csv() is base R — no extra package needed.

retail <- read.csv("https://raw.githubusercontent.com/gulertan/Consumer-Analytics-with-R/refs/heads/main/online_retail_sample.csv")

2. Inspect

Before touching the data, always look at it first. glimpse() shows column names, types, and a preview of values. summary() gives the range and missing value counts.

glimpse(retail)
Rows: 20,000
Columns: 8
$ InvoiceNo   <chr> "546306", "557055", "572516", "556484", "579187", "559552"…
$ StockCode   <chr> "21733", "23203", "22336", "21670", "90200D", "22068", "20…
$ Description <chr> "RED HANGING HEART T-LIGHT HOLDER", "JUMBO BAG DOILEY PATT…
$ Quantity    <int> 2, 20, -71, 6, 1, 2, 48, 1, 24, 1, 1, 2, 36, 6, 36, 24, 4,…
$ InvoiceDate <chr> "2011-03-10 16:16:00", "2011-06-16 14:45:00", "2011-10-24 …
$ UnitPrice   <dbl> 2.95, 2.08, 0.00, 1.25, 4.15, 1.65, 2.46, 5.79, 0.55, 1.25…
$ CustomerID  <int> NA, 12621, NA, 16938, NA, 16014, NA, 14096, 15382, 14465, …
$ Country     <chr> "United Kingdom", "Germany", "United Kingdom", "United Kin…
summary(retail)
  InvoiceNo          StockCode         Description           Quantity       
 Length:20000       Length:20000       Length:20000       Min.   :-1200.00  
 Class :character   Class :character   Class :character   1st Qu.:    1.00  
 Mode  :character   Mode  :character   Mode  :character   Median :    3.00  
                                                          Mean   :    9.65  
                                                          3rd Qu.:   10.00  
                                                          Max.   : 4800.00  
                                                                            
 InvoiceDate          UnitPrice          CustomerID      Country         
 Length:20000       Min.   :   0.000   Min.   :12347   Length:20000      
 Class :character   1st Qu.:   1.250   1st Qu.:13905   Class :character  
 Mode  :character   Median :   2.080   Median :15159   Mode  :character  
                    Mean   :   4.073   Mean   :15288                     
                    3rd Qu.:   4.130   3rd Qu.:16791                     
                    Max.   :2653.950   Max.   :18287                     
                                       NA's   :5078                      

What we see:

  • 20,000 rows, 8 columns
  • InvoiceDate is read as text — needs to be converted to a date
  • Some CustomerID values are missing
  • Quantity and UnitPrice can be negative — these are returns and cancellations

The Pipe Operator |>

All tidyverse code is chained with the pipe operator |>, which reads as “and then”.

# Without pipe — nested, hard to read
arrange(summarise(group_by(retail, Country), Revenue = sum(Revenue)), desc(Revenue))

# With pipe — reads left to right, step by step
retail |>
  group_by(Country) |>
  summarise(Revenue = sum(Revenue)) |>
  arrange(desc(Revenue))

Both produce identical output. The pipe makes the logic transparent — you read each step in the order it happens.


3. Clean

Real data is never clean. We remove cancelled orders, missing customers, and invalid values. We also create a Revenue column (Quantity × UnitPrice).

retail_clean <- retail |>
  filter(!grepl("^C", InvoiceNo)) |>         # Remove cancellations (InvoiceNo starts with C)
  filter(!is.na(CustomerID)) |>               # Remove missing customers
  filter(Quantity > 0, UnitPrice > 0) |>      # Remove invalid rows
  mutate(
    InvoiceDate = as.Date(InvoiceDate, format = "%Y-%m-%d"),  # Parse date
    Revenue     = Quantity * UnitPrice                         # Row-level revenue
  )

# How many rows were removed?
nrow(retail) - nrow(retail_clean)
[1] 5421

Result: 5,421 rows removed — 27% of the sample. This is typical for retail transaction data.


4. Explore

range(retail_clean$InvoiceDate, na.rm = TRUE)  # Time period
[1] "2010-12-01" "2011-12-09"
n_distinct(retail_clean$CustomerID)             # Unique customers
[1] 3017
n_distinct(retail_clean$Description)            # Unique products
[1] 2544
n_distinct(retail_clean$Country)                # Countries
[1] 34
sum(retail_clean$Revenue)                       # Total revenue
[1] 316047

Revenue by Country

The UK accounts for 82% of total revenue. Despite selling to 34 countries, this is essentially a domestic business with modest international reach.

retail_clean |>
  group_by(Country) |>
  summarise(Revenue = sum(Revenue)) |>
  arrange(desc(Revenue)) |>
  slice(1:10) |>
  ggplot(aes(x = reorder(Country, Revenue), y = Revenue)) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "Top 10 Countries by Revenue", x = NULL, y = "Revenue (£)") +
  theme_minimal()

Volume vs. Revenue

There is an important distinction between products that sell in high volume and products that generate high revenue. They are not always the same.

knitr::kable() formats a data frame as a clean, readable table in the output document. We use it throughout this course whenever we want to display a summary table neatly.

# By quantity sold
retail_clean |>
  group_by(Description) |>
  summarise(Total_Quantity = sum(Quantity)) |>
  arrange(desc(Total_Quantity)) |>
  slice(1:10) |>
  knitr::kable()
Description Total_Quantity
WORLD WAR 2 GLIDERS ASSTD DESIGNS 6058
JUMBO BAG RED RETROSPOT 1760
60 CAKE CASES VINTAGE CHRISTMAS 1636
RABBIT NIGHT LIGHT 1451
METAL SIGN TAKE IT OR LEAVE IT 1450
PACK OF 72 SKULL CAKE CASES 1091
MINI PAINT SET VINTAGE 1080
PACK OF 72 RETROSPOT CAKE CASES 1034
ASSORTED COLOUR BIRD ORNAMENT 1004
ASSORTED FLOWER COLOUR “LEIS” 990
# By revenue
retail_clean |>
  group_by(Description) |>
  summarise(Total_Revenue = sum(Revenue)) |>
  arrange(desc(Total_Revenue)) |>
  slice(1:10) |>
  knitr::kable()
Description Total_Revenue
REGENCY CAKESTAND 3 TIER 5788.56
METAL SIGN TAKE IT OR LEAVE IT 3996.70
VINTAGE UNION JACK MEMOBOARD 3543.87
POSTAGE 3424.00
LANDMARK FRAME OXFORD STREET 3317.85
JUMBO BAG RED RETROSPOT 3210.64
PARTY BUNTING 2922.42
RABBIT NIGHT LIGHT 2628.32
Manual 2551.14
BLACK RECORD COVER FRAME 2400.69

Key observation: The top-selling product by volume (World War 2 Gliders, 6,058 units) does not appear in the top revenue list. The top revenue product (Regency Cakestand 3 Tier, £5,789) sells in far lower quantities. This is a classic volume vs. margin trade-off.

Monthly Revenue Trend

retail_clean |>
  mutate(Month = floor_date(InvoiceDate, "month")) |>
  group_by(Month) |>
  summarise(Revenue = sum(Revenue)) |>
  ggplot(aes(x = Month, y = Revenue)) +
  geom_line(color = "steelblue", linewidth = 1) +
  geom_point(color = "steelblue") +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "Monthly Revenue Trend", x = NULL, y = "Revenue (£)") +
  theme_minimal()

What the chart shows: Revenue is flat through early 2011, rises sharply from September, and peaks in November (£40,000+). The December drop is a data artefact — the dataset ends on 9 December. The seasonal pattern is consistent with a gift retailer supplying wholesalers ahead of Christmas.

Discussion: Why does revenue stay low in January–February despite Valentine’s Day? What does this tell us about this retailer’s customer base?


Exercise

Exercise 1 — Country deep-dive

Pick a country other than the UK — for example Germany or France. Filter the data to that country only and find the top 5 products by revenue.

retail_clean |>
  filter(Country == "Germany") |>
  group_by(Description) |>
  summarise(Total_Revenue = sum(Revenue)) |>
  arrange(desc(Total_Revenue)) |>
  slice(1:5) |>
  knitr::kable()
Description Total_Revenue
POSTAGE 1206.0
EDWARDIAN PARASOL BLACK 475.5
ADVENT CALENDAR GINGHAM SACK 237.6
PINK PARTY BAGS 169.0
ROTATING SILVER ANGELS T-LIGHT HLDR 153.0

Exercise 2 — Highest revenue transaction

Which single transaction row generated the highest revenue?

retail_clean |>
  arrange(desc(Revenue)) |>
  slice(1)
  InvoiceNo StockCode                     Description Quantity InvoiceDate
1    581115     22413 METAL SIGN TAKE IT OR LEAVE IT      1404  2011-12-07
  UnitPrice CustomerID        Country Revenue
1      2.75      15195 United Kingdom    3861

Session 2 — RFM Segmentation

Time: 10:30 – 11:15


What is RFM?

RFM scores each customer on three dimensions derived entirely from their transaction history:

Dimension Question Direction
Recency (R) How recently did this customer buy? Fewer days = better
Frequency (F) How many distinct purchases? Higher = better
Monetary (M) How much have they spent in total? Higher = better

RFM requires no demographic data — only a customer ID, a date, and a value. It was developed in direct mail marketing in the 1980s and remains one of the most widely used segmentation frameworks in retail and e-commerce.


Step 1 — Calculate R, F, M

ref_date <- max(retail_clean$InvoiceDate) + 1

rfm <- retail_clean |>
  group_by(CustomerID) |>
  summarise(
    Recency   = as.numeric(ref_date - max(InvoiceDate)),
    Frequency = n_distinct(InvoiceNo),
    Monetary  = sum(Revenue)
  )

summary(rfm)
   CustomerID       Recency      Frequency         Monetary       
 Min.   :12347   Min.   :  1   Min.   : 1.000   Min.   :    0.39  
 1st Qu.:13792   1st Qu.: 24   1st Qu.: 1.000   1st Qu.:   17.00  
 Median :15289   Median : 64   Median : 2.000   Median :   39.00  
 Mean   :15290   Mean   :104   Mean   : 2.699   Mean   :  104.76  
 3rd Qu.:16779   3rd Qu.:163   3rd Qu.: 3.000   3rd Qu.:   90.55  
 Max.   :18287   Max.   :374   Max.   :97.000   Max.   :14678.87  

What the distribution tells us:

Metric Median Mean Max
Recency (days) 64 104 374
Frequency (orders) 2 2.7 97
Monetary (£) 39 105 14,679

All three distributions are right-skewed — the mean is pulled well above the median by a small number of high-value, high-frequency customers, almost certainly wholesalers. Half of all customers made just 2 or fewer purchases and spent less than £39 in the entire year.


Step 2 — Score 1–5 and Assign Segments

We use quintile scoring — dividing customers into five equal groups for each dimension, scored 1 (worst) to 5 (best).

rfm <- rfm |>
  mutate(
    R_score = ntile(-Recency,  5),   # lower recency days = better = higher score
    F_score = ntile(Frequency, 5),
    M_score = ntile(Monetary,  5),
    RFM     = R_score * 100 + F_score * 10 + M_score
  ) |>
  mutate(Segment = case_when(
    R_score >= 4 & F_score >= 4  ~ "Champions",
    R_score >= 3 & F_score >= 3  ~ "Loyal Customers",
    R_score >= 4 & F_score <= 2  ~ "Recent Customers",
    R_score <= 2 & F_score >= 3  ~ "At Risk",
    R_score <= 2 & F_score <= 2  ~ "Lost",
    TRUE                          ~ "Potential Loyalists"
  ))

Step 3 — Summarise and Visualise

rfm |>
  group_by(Segment) |>
  summarise(
    Customers     = n(),
    Avg_Recency   = round(mean(Recency)),
    Avg_Frequency = round(mean(Frequency), 1),
    Avg_Monetary  = round(mean(Monetary), 1),
    Total_Revenue = round(sum(Monetary))
  ) |>
  arrange(desc(Total_Revenue)) |>
  knitr::kable()
Segment Customers Avg_Recency Avg_Frequency Avg_Monetary Total_Revenue
Champions 712 18 5.9 248.8 177153
Loyal Customers 581 48 2.6 84.2 48935
At Risk 516 182 2.3 91.1 47008
Lost 692 227 1.0 35.7 24704
Recent Customers 268 23 1.0 38.8 10403
Potential Loyalists 248 64 1.0 31.6 7844
rfm |>
  count(Segment) |>
  ggplot(aes(x = reorder(Segment, n), y = n)) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  labs(title = "Customer Segments", x = NULL, y = "Number of Customers") +
  theme_minimal()


Interpreting the Segments

Segment Customers Total Revenue Share
Champions 712 £177,153 56%
Loyal Customers 581 £48,935 15%
At Risk 516 £47,008 15%
Lost 692 £24,704 8%
Recent Customers 268 £10,403 3%
Potential Loyalists 248 £7,844 2%

Champions (712 customers, £177,153) Bought on average 18 days ago, placed nearly 6 orders, spent £249 on average. These are the wholesalers — the backbone of the business. Losing one Champion costs as much as acquiring 5–7 new customers.

Loyal Customers (581 customers, £48,935) Bought 48 days ago on average, 2.6 orders, £84 average spend. Solid, recurring customers The opportunity here is to move them up — a loyalty programme or personalised upsell could convert some into Champions.

At Risk (516 customers, £47,008) This is the most strategically important segment. Their average monetary value (£91) is actually higher than Loyal Customers (£84), yet they have not purchased in an average of 182 days. They were valuable customers who have gone quiet. Without intervention they will slide into Lost. A targeted win-back campaign — a personalised offer, a “we miss you” email — is the highest-priority action.

Lost (692 customers, £24,704) The largest segment by customer count. They last bought 227 days ago on average, placed only 1 order, and spent just £36. These customers were probably never deeply engaged with the retailer. At this recency distance, the cost of re-acquisition typically exceeds the expected return. Low priority.

Recent Customers (268 customers, £10,403) Bought recently (23 days ago) but only once. These are new customers in their first purchase window — the most critical period for habit formation. A follow-up offer or welcome sequence within the next 30 days can convert them into Loyal Customers. If ignored, most will become Lost.

Potential Loyalists (248 customers, £7,844) One purchase, 64 days ago, £32 average spend. Light engagement is appropriate; heavy investment is not yet justified.

Strategic summary: Protect Champions at all costs. Rescue At Risk customers urgently. Nurture Recent Customers within 30 days. Do not spend on Lost.


Exercise

Exercise 3 — How many customers are in each segment?

Exercise 4 — What does the average Champion look like?

Filter the rfm table to Champions and report their average Recency, Frequency, and Monetary.


Session 3 — Personalised Marketing

Time: 11:30 – 12:15


From Segments to Actions

RFM tells us who the customers are. The next question is what to offer them. A natural starting point: recommend to Loyal Customers the products that Champions already buy — to nudge them up a tier.

# Attach segment labels to transaction data
retail_segmented <- retail_clean |>
  left_join(rfm |> select(CustomerID, Segment), by = "CustomerID")

# Top 15 products bought by Champions
champion_products <- retail_segmented |>
  filter(Segment == "Champions") |>
  group_by(Description) |>
  summarise(Times_Bought = n()) |>
  arrange(desc(Times_Bought)) |>
  slice(1:15)

champion_products |> knitr::kable()
Description Times_Bought
PARTY BUNTING 37
JUMBO BAG RED RETROSPOT 36
HEART OF WICKER SMALL 32
LUNCH BAG RED RETROSPOT 28
WHITE HANGING HEART T-LIGHT HOLDER 28
ASSORTED COLOUR BIRD ORNAMENT 27
SPACEBOY LUNCH BOX 27
LUNCH BAG BLACK SKULL. 26
LUNCH BAG CARS BLUE 26
LUNCH BAG SPACEBOY DESIGN 25
PAPER CHAIN KIT 50’S CHRISTMAS 25
REGENCY CAKESTAND 3 TIER 25
PACK OF 72 RETROSPOT CAKE CASES 24
JUMBO STORAGE BAG SUKI 23
LUNCH BAG PINK POLKADOT 23
# Products bought by Loyal Customers
loyal_products <- retail_segmented |>
  filter(Segment == "Loyal Customers") |>
  distinct(Description)

# Gap: what Champions love that Loyal Customers have not bought
champion_products |>
  filter(!Description %in% loyal_products$Description) |>
  knitr::kable()
Description Times_Bought
JUMBO STORAGE BAG SUKI 23

The Finding

Champions and Loyal Customers buy almost identical products. Only one product — Jumbo Storage Bag Suki — appears in Champion purchases but not in Loyal Customer purchases.

What this means: The difference between these two segments is not what they buy but how often and how much. Product recommendations are the wrong intervention.

The right strategy: Increase purchase frequency. Bring Loyal Customers back sooner with:

  • A time-limited offer tied to their purchase gap (“It has been 60 days since your last order”)
  • A loyalty reward triggered after the next purchase
  • A seasonal prompt ahead of the Christmas peak

This is more targeted, more actionable — and, as we will see, more legally sensitive.


Exercise

Exercise 5 — How much revenue is at stake if At Risk customers are lost?

Exercise 6 — Which segment generates the most revenue?

Sort all segments by total Monetary value.


Session 4 — Data Regulation and Ethics

Time: 12:15 – 13:00


What is GDPR?

The General Data Protection Regulation came into force in May 2018. It is EU law governing how organisations collect, store, and use personal data. It applies to any organisation — anywhere in the world — that handles data from EU residents.

The core idea: in Europe, privacy is a fundamental right, not a consumer preference. This is the key difference from the US, where there is no equivalent federal law.

Fines reach up to 4% of global annual turnover — for Meta, that was €1.2 billion in 2023.


How GDPR is enforced

National Supervisory Authorities (SAs) Every EU member state has an independent data protection authority — the ICO in the UK, the CNIL in France, the BfDI in Germany, the DPC in Ireland. These authorities can investigate companies on their own initiative, without any complaint, and impose fines directly. They do not need to go to court.

European Data Protection Board (EDPB) When a case crosses borders, the EDPB coordinates national authorities and can issue binding decisions. This is how the €1.2 billion Meta fine was imposed: the EDPB overruled Ireland’s DPC and required a larger penalty.

What did we just build?

Look at what we coded today through a GDPR lens:

What we did Marketing term GDPR term
Computed RFM scores per CustomerID Customer scoring Profiling (Article 4.4)
Assigned segment labels automatically Segmentation Systematic profiling at scale
Identified days since each customer’s last purchase Churn detection Individual behavioural tracking
Designed a trigger email based on purchase timing Lifecycle marketing Automated direct marketing based on profiling
Recommended products to specific segments Personalised marketing Targeted direct marketing using profiled data

Every row in this table is profiling under GDPR. Every row requires a lawful basis — and the company must be able to prove it.


Two articles that directly apply

Article 4.4 — Profiling Any automated use of personal data to evaluate or predict a person’s behaviour is profiling. An RFM score attached to a CustomerID is profiling — even if no name or email address is visible. The score is derived from personal data and linked to an identifiable individual.

Article 21 — Right to object to direct marketing Every individual has an absolute right to object to their data being used for direct marketing. No exceptions. No override. If a customer clicks unsubscribe, all profiling for marketing purposes must stop — immediately and permanently.

Our trigger email “It has been 60 days since your last order” sends because we tracked that individual’s purchase timing and used it to target them. That is direct marketing based on profiling. Article 21 applies.


The Target case

In 2012, Target predicted a teenage customer was pregnant from her purchase patterns — before she told her family. She received baby coupons at home. Target faced no legal consequence in the US.

Under GDPR, pregnancy is health data — special category data requiring explicit consent. Inferring it from purchase behaviour without consent is a serious violation regardless of whether the inference is correct.

Same analytics. Entirely different legal outcome depending on which side of the Atlantic you operate.


Real fines

Company Fine Reason Year
Meta €1.2 billion Transferring EU user data to US servers 2023
Amazon €746 million Targeted advertising without consent 2021
LinkedIn €310 million Unlawful behavioural advertising 2024
Google €50 million Consent buried in multi-step menus 2019

Discussion (15 min)

“You are hired as a data analyst at a European e-commerce company. Your manager asks you to build exactly what we built today and deploy it next week. What do you do?”


“Analytics tells you what is possible. GDPR tells you what is permitted. Your job as a data scientist is to know both.”


References

  • Chen, D. (2012). Data mining for the online retail industry. Journal of Database Marketing & Customer Strategy Management, 19, 197–208. DOI: 10.24432/C5BW33
  • Hughes, A.M. (1994). Strategic Database Marketing. Probus Publishing.
  • Wachter, S. et al. (2021). Is that your final decision? Multi-stage profiling and Article 22 GDPR. International Data Privacy Law, 11(4). Oxford Academic
  • Future of Privacy Forum (2022). Automated Decision-Making: Practical Cases from Courts. FPF Report
  • European Data Protection Board (2023). €1.2 billion fine for Facebook. EDPB