Code
pacman::p_load(tidyverse, ggplot2, ggraph, igraph, janitor, lubridate, scales, gt, gtsummary)In this blog post, i will walk you through a basic implementation of a user conversion analysis. Before then we need to know what user conversion analysis is all about. The image below details the stages of conversion right from awareness all the way to action and retention.
The goal here is to understand and optimize the process through which users move from initial engagement to purchase or signing up for a service.
Use cases for conversion analysis include identifying the percentage of users who complete a desired action at each step of the conversion funnel. We could also pin-point drop off so as to design products or marketing strategy to increase conversion. we can also optimize the users experience by redesigning the website and simplifying the users experience so as to drive sales or increase revenue. Even more interesting is to understand how different segments of users behave differently in the conversion funnel
We will use the Amazon E commerce Click-stream Transaction Data for this purpose.
The data contains the following
UserID: Identifier for the user.
SessionID: Identifier for the user’s session.
Timestamp: The time at which the event occurred.
EventType: The type of event (e.g., page view, product view, add to cart).
ProductID: Identifier for the product involved in the event.
Amount: The monetary amount associated with the event.
Outcome: The outcome of the event (e.g., success, failure).
Lets begin by loading the packages and later loading the data. For this purpose, we will used pacman to load all the packages.
pacman::p_load(tidyverse, ggplot2, ggraph, igraph, janitor, lubridate, scales, gt, gtsummary)Now lets load and present some interesting statistics about the data. The dataset contains 1000 users. 10 unique sessions and 10682 t
# Load the dataset
data <- read.csv('ecommerce_clickstream_transactions 3.csv')
summary(data) UserID SessionID Timestamp EventType
Min. : 1.0 Min. : 1.00 Length:74817 Length:74817
1st Qu.: 251.0 1st Qu.: 3.00 Class :character Class :character
Median : 501.0 Median : 6.00 Mode :character Mode :character
Mean : 500.7 Mean : 5.51
3rd Qu.: 751.0 3rd Qu.: 8.00
Max. :1000.0 Max. :10.00
ProductID Amount Outcome
Length:74817 Min. : 5.132 Length:74817
Class :character 1st Qu.:130.934 Class :character
Mode :character Median :253.113 Mode :character
Mean :253.190
3rd Qu.:378.832
Max. :499.982
NA's :64135
Having presented the descriptive statistics, our next approach is to define the conversion funnel
The process flow could look like this
# Define the conversion funnel
funnel_steps <- c("page_view", "product_view", "add_to_cart", "purchase")Having defined the process flow for the conversion funnel, we can now calculate the number of occurrences for each event type
# Calculate the number of occurrences for each event type
event_counts <- table(data$EventType)
# Calculate conversion rates
conversion_rates <- data.frame(
Step = funnel_steps[-length(funnel_steps)],
Conversion_Rate = sapply(1:(length(funnel_steps)-1), function(i) {
event_counts[funnel_steps[i+1]] / event_counts[funnel_steps[i]]
})
)# Print conversion rates
print(conversion_rates) Step Conversion_Rate
product_view page_view 0.9886311
add_to_cart product_view 1.0036462
purchase add_to_cart 0.9950629
ggplot(conversion_rates, aes(x = reorder(Step, -Conversion_Rate), y = Conversion_Rate)) +
geom_bar(stat = "identity", fill = "sky blue", width = 0.6) +
geom_text(aes(label = percent(Conversion_Rate)), vjust = -0.4, size = 5, fontface = "bold") +
labs(
title = "Conversion Funnel",
x = "Funnel Step",
y = "Conversion Rate"
) +
theme_minimal(base_size = 14) +
theme(
plot.title = element_text(face = "bold", size = 16),
axis.title = element_text(size = 11),
axis.text = element_text(size = 11),
panel.grid.major.x = element_blank(),
panel.grid.minor = element_blank()
)The conversion rates between the sequential steps in the user interaction is as follows. Moving from page view to product view, the conversion rate is about 0.9. while moving from product view to add to cart, the conversion rate is 1.00 and moving from product to purchase, the conversion rate is 0.99. What this suggest is moving from page view to product view is effective in capturing the users interest. While moving from product view to add to cart suggest some level of product appeal. Although the conversion rate is above 1, suggesting some anomalies. Moving from add to cart to purchase has a conversion rate of 0.9951, suggesting a high user intent or satisfaction. What this means for business is that there is effective engagement, an optimized check out process and possibility for upselling and user retention. One can conclude from this analysis that the business is performing well in converting user interest into actual purchase.
One interesting aspect of conversion analysis is to figure out what we know about the user to understand their pain-point and develop a user journey (tract sequence of events, common paths to purchase and analyse drop off) as well as behavioral segments of the users (cart abandonment, Browser-only-users and or direct purchases). This will allow us to compute key metrics like session duration, event per session and session outcomes such as purchase, abandon and browse only. We can also conduct a temporal analysis to assess the time of day patterns, day of the week trends and month patterns. We cannot conduct segment analysis without considering the purchase value distributions as well as drop-off analysis.
Before we start, le
# Load the dataset
data <- read_csv("ecommerce_clickstream_transactions 3.csv") %>%
clean_names() %>%
mutate(timestamp = ymd_hms(timestamp))Rows: 74817 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): EventType, ProductID, Outcome
dbl (3): UserID, SessionID, Amount
dttm (1): Timestamp
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
summarise(data,
total_rows = n(),
unique_users = n_distinct(user_id),
unique_sessions = n_distinct(session_id),
total_purchases = sum(!is.na(amount)))# A tibble: 1 × 4
total_rows unique_users unique_sessions total_purchases
<int> <int> <int> <int>
1 74817 1000 10 10682
# Calculate session-level metrics
session_data <- data %>%
group_by(user_id, session_id) %>%
summarise(
session_start = min(timestamp),
session_end = max(timestamp),
session_duration = as.numeric(difftime(session_end, session_start, units = "mins")),
events_per_session = n(),
made_purchase = any(event_type == "purchase"),
added_to_cart = any(event_type == "add_to_cart"),
.groups = "drop"
)
# Preview nicely using gt
top_sessions <- session_data %>% slice_head(n = 10)
top_sessions %>%
gt() %>%
tab_header(
title = "Session-Level Metrics (Sample)"
) %>%
fmt_number(columns = session_duration:events_per_session, decimals = 2) %>%
cols_label(
session_start = "Start Time",
session_end = "End Time",
session_duration = "Duration (min)",
events_per_session = "Event Count",
made_purchase = "Purchased",
added_to_cart = "Added to Cart"
)| Session-Level Metrics (Sample) | |||||||
|---|---|---|---|---|---|---|---|
| user_id | session_id | Start Time | End Time | Duration (min) | Event Count | Purchased | Added to Cart |
| 1 | 1 | 2024-01-01 23:09:51.956825 | 2024-07-07 18:00:26.959902 | 270,410.58 | 9.00 | FALSE | TRUE |
| 1 | 2 | 2024-01-30 21:47:38.829172 | 2024-06-27 16:17:34.523695 | 214,229.93 | 6.00 | FALSE | FALSE |
| 1 | 3 | 2024-01-19 15:04:33.06565 | 2024-07-17 03:46:13.897763 | 258,521.68 | 9.00 | TRUE | TRUE |
| 1 | 4 | 2024-01-02 00:15:51.420238 | 2024-07-15 16:15:52.074487 | 281,760.01 | 10.00 | TRUE | TRUE |
| 1 | 5 | 2024-01-03 23:51:05.729189 | 2024-06-27 07:40:55.37483 | 252,469.83 | 9.00 | TRUE | FALSE |
| 1 | 6 | 2024-03-18 21:56:10.116916 | 2024-06-15 04:30:08.607016 | 127,113.97 | 8.00 | FALSE | TRUE |
| 1 | 7 | 2024-01-17 08:27:34.705063 | 2024-07-18 01:56:58.108233 | 263,129.39 | 8.00 | FALSE | TRUE |
| 1 | 8 | 2024-01-04 17:09:29.67706 | 2024-07-18 14:43:52.480919 | 282,094.38 | 10.00 | FALSE | FALSE |
| 1 | 9 | 2024-01-06 04:33:39.275154 | 2024-06-27 04:30:04.344369 | 249,116.42 | 8.00 | TRUE | TRUE |
| 1 | 10 | 2024-02-27 23:10:22.034384 | 2024-07-22 20:10:14.181302 | 210,059.87 | 5.00 | FALSE | TRUE |
The table above presents a sample of ten user sessions, capturing key behavioral indicators such as session start and end times, duration, and engagement depth (event count). Notably, session durations span thousands of minutes due to extended timestamp gaps, suggesting the need for further filtering to isolate active periods. We also flag whether a user added items to their cart or completed a purchase. This snapshot helps differentiate user intent across sessions—ranging from casual browsing to decisive transactions—and sets the stage for deeper segmentation and funnel analysis.
session_data <- session_data %>%
mutate(segment = case_when(
made_purchase ~ "Buyer",
added_to_cart & !made_purchase ~ "Cart Abandoner",
!added_to_cart & !made_purchase ~ "Browser Only"
))
# Segment counts with gt table
segment_summary <- session_data %>%
count(segment) %>%
arrange(desc(n))
segment_summary %>%
gt() %>%
tab_header(
title = "User Segment Distribution"
) %>%
cols_label(
segment = "User Segment",
n = "Session Count"
) %>%
fmt_number(columns = n, use_seps = TRUE)| User Segment Distribution | |
|---|---|
| User Segment | Session Count |
| Buyer | 6,721.00 |
| Cart Abandoner | 2,320.00 |
| Browser Only | 959.00 |
The segmentation analysis reveals three distinct patterns of user behavior across sessions. A majority of sessions—6,721—resulted in a purchase, classifying those users as Buyers, a strong indicator of conversion success. Cart Abandoners, with 2,320 sessions, represent users who showed intent to purchase but dropped off before completing a transaction—an opportunity for targeted re-engagement strategies. Lastly, Browser Only sessions, totaling 959, suggest a segment that may benefit from improved product discovery or incentivized nudges. This distribution highlights the importance of optimizing both the user journey and interventions at critical funnel stages.
# Find common sequences (simplified version)
event_paths <- data %>%
arrange(user_id, session_id, timestamp) %>%
group_by(user_id, session_id) %>%
summarise(path = paste(event_type, collapse = " -> "))`summarise()` has grouped output by 'user_id'. You can override using the
`.groups` argument.
top_paths <- event_paths %>%
count(path, sort = TRUE) %>%
slice_head(n = 10)
top_paths# A tibble: 10,000 × 3
# Groups: user_id [1,000]
user_id path n
<dbl> <chr> <int>
1 1 add_to_cart -> add_to_cart -> login -> page_view -> add_to_car… 1
2 1 add_to_cart -> add_to_cart -> logout -> add_to_cart -> add_to_… 1
3 1 add_to_cart -> product_view -> add_to_cart -> product_view -> … 1
4 1 click -> product_view -> login -> logout -> logout -> product_… 1
5 1 login -> add_to_cart -> login -> logout -> add_to_cart 1
6 1 logout -> purchase -> purchase -> product_view -> login -> pur… 1
7 1 page_view -> add_to_cart -> login -> add_to_cart -> page_view … 1
8 1 page_view -> login -> add_to_cart -> click -> add_to_cart -> l… 1
9 1 page_view -> page_view -> logout -> product_view -> click -> p… 1
10 1 page_view -> page_view -> purchase -> product_view -> product_… 1
# ℹ 9,990 more rows
# Add time components
data <- data %>%
mutate(
hour = hour(timestamp),
weekday = wday(timestamp, label = TRUE),
month = month(timestamp, label = TRUE)
)
# Hourly pattern
data %>% count(hour) %>% ggplot(aes(x = hour, y = n)) +
geom_line(color = "#2C3E50", size = 1) +
labs(title = "Events by Hour", x = "Hour of Day", y = "Event Count") +
theme_minimal()Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
# Day of week pattern
data %>% count(weekday) %>% ggplot(aes(x = weekday, y = n)) +
geom_bar(stat = "identity", fill = "#18BC9C") +
labs(title = "Events by Weekday", x = "Day of Week", y = "Event Count") +
theme_minimal()The line chart below displays user activity by hour of day. Activity levels are relatively stable, but with noticeable spikes and dips, suggesting episodic peaks in engagement. These variations may align with user routines such as morning browsing or evening purchases, and can guide time-based marketing strategies.
The bar chart below summarizes user activity across the days of the week. Engagement peaks slightly on Tuesday, hinting at mid-week shopping tendencies. Weekend usage remains high, making it crucial to maintain consistent presence across all days.
data %>% filter(!is.na(amount)) %>%
ggplot(aes(x = amount)) +
geom_histogram(bins = 30, fill = "#9B59B6", color = "white", alpha = 0.9) +
labs(
title = "Purchase Amount Distribution",
x = "Purchase Amount (USD)",
y = "Number of Purchases"
) +
theme_minimal(base_size = 14) +
theme(
plot.title = element_text(face = "bold", size = 16),
axis.title = element_text(size = 12),
axis.text = element_text(size = 10)
)The histogram below shows how purchase amounts are distributed across all sessions. The distribution appears fairly uniform, indicating diverse spending behaviors. Notably, purchases span the full range up to $500, suggesting a wide product price mix. This even spread offers flexibility in designing pricing strategies, bundling offers, or personalizing recommendations.
Having performed the purchase amount distribution, we move on to conduct a drop off analysis. this will allow us to glean insight as to when a user will not convert so as to device marketing strategies for conversion.
# Drop-off frequency by last event in session
last_events <- data %>%
arrange(user_id, session_id, timestamp) %>%
group_by(user_id, session_id) %>%
summarise(last_event = last(event_type))`summarise()` has grouped output by 'user_id'. You can override using the
`.groups` argument.
last_events %>% count(last_event, sort = TRUE)# A tibble: 5,507 × 3
# Groups: user_id [1,000]
user_id last_event n
<dbl> <chr> <int>
1 864 login 7
2 303 add_to_cart 6
3 467 logout 6
4 534 login 6
5 543 click 6
6 630 page_view 6
7 773 page_view 6
8 3 click 5
9 47 product_view 5
10 55 page_view 5
# ℹ 5,497 more rows