User Conversion Analysis

Author

Richmond Silvanus Baye

Published

April 11, 2025

Conversion Analysis

In this blog post, i will walk you through a basic implementation of a user conversion analysis. Before then we need to know what user conversion analysis is all about. The image below details the stages of conversion right from awareness all the way to action and retention.

The goal here is to understand and optimize the process through which users move from initial engagement to purchase or signing up for a service.

Use Cases

Use cases for conversion analysis include identifying the percentage of users who complete a desired action at each step of the conversion funnel. We could also pin-point drop off so as to design products or marketing strategy to increase conversion. we can also optimize the users experience by redesigning the website and simplifying the users experience so as to drive sales or increase revenue. Even more interesting is to understand how different segments of users behave differently in the conversion funnel

About the Data

We will use the Amazon E commerce Click-stream Transaction Data for this purpose.

The data contains the following

  • UserID: Identifier for the user.

  • SessionID: Identifier for the user’s session.

  • Timestamp: The time at which the event occurred.

  • EventType: The type of event (e.g., page view, product view, add to cart).

  • ProductID: Identifier for the product involved in the event.

  • Amount: The monetary amount associated with the event.

  • Outcome: The outcome of the event (e.g., success, failure).

Lets begin by loading the packages and later loading the data. For this purpose, we will used pacman to load all the packages.

Code
pacman::p_load(tidyverse, ggplot2, ggraph, igraph, janitor, lubridate, scales, gt, gtsummary)

Descriptive Statistics

Now lets load and present some interesting statistics about the data. The dataset contains 1000 users. 10 unique sessions and 10682 t

Code
# Load the dataset
data <- read.csv('ecommerce_clickstream_transactions 3.csv')
summary(data)
     UserID         SessionID      Timestamp          EventType        
 Min.   :   1.0   Min.   : 1.00   Length:74817       Length:74817      
 1st Qu.: 251.0   1st Qu.: 3.00   Class :character   Class :character  
 Median : 501.0   Median : 6.00   Mode  :character   Mode  :character  
 Mean   : 500.7   Mean   : 5.51                                        
 3rd Qu.: 751.0   3rd Qu.: 8.00                                        
 Max.   :1000.0   Max.   :10.00                                        
                                                                       
  ProductID             Amount          Outcome         
 Length:74817       Min.   :  5.132   Length:74817      
 Class :character   1st Qu.:130.934   Class :character  
 Mode  :character   Median :253.113   Mode  :character  
                    Mean   :253.190                     
                    3rd Qu.:378.832                     
                    Max.   :499.982                     
                    NA's   :64135                       

Conversion Funnel

Having presented the descriptive statistics, our next approach is to define the conversion funnel

The process flow could look like this

Code
# Define the conversion funnel
funnel_steps <- c("page_view", "product_view", "add_to_cart", "purchase")

Conversion Rates

Having defined the process flow for the conversion funnel, we can now calculate the number of occurrences for each event type

Code
# Calculate the number of occurrences for each event type
event_counts <- table(data$EventType)

# Calculate conversion rates
conversion_rates <- data.frame(
  Step = funnel_steps[-length(funnel_steps)],
  Conversion_Rate = sapply(1:(length(funnel_steps)-1), function(i) {
    event_counts[funnel_steps[i+1]] / event_counts[funnel_steps[i]]
  })
)
Code
# Print conversion rates
print(conversion_rates)
                     Step Conversion_Rate
product_view    page_view       0.9886311
add_to_cart  product_view       1.0036462
purchase      add_to_cart       0.9950629

Visualize the Funnel

Code
ggplot(conversion_rates, aes(x = reorder(Step, -Conversion_Rate), y = Conversion_Rate)) +
  geom_bar(stat = "identity", fill = "sky blue", width = 0.6) +
  geom_text(aes(label = percent(Conversion_Rate)), vjust = -0.4, size = 5, fontface = "bold") +
  labs(
    title = "Conversion Funnel",
    x = "Funnel Step",
    y = "Conversion Rate"
  ) +
  theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(face = "bold", size = 16),
    axis.title = element_text(size = 11),
    axis.text = element_text(size = 11),
    panel.grid.major.x = element_blank(),
    panel.grid.minor = element_blank()
  )

The conversion rates between the sequential steps in the user interaction is as follows. Moving from page view to product view, the conversion rate is about 0.9. while moving from product view to add to cart, the conversion rate is 1.00 and moving from product to purchase, the conversion rate is 0.99. What this suggest is moving from page view to product view is effective in capturing the users interest. While moving from product view to add to cart suggest some level of product appeal. Although the conversion rate is above 1, suggesting some anomalies. Moving from add to cart to purchase has a conversion rate of 0.9951, suggesting a high user intent or satisfaction. What this means for business is that there is effective engagement, an optimized check out process and possibility for upselling and user retention. One can conclude from this analysis that the business is performing well in converting user interest into actual purchase.

Segmentation Analysis

One interesting aspect of conversion analysis is to figure out what we know about the user to understand their pain-point and develop a user journey (tract sequence of events, common paths to purchase and analyse drop off) as well as behavioral segments of the users (cart abandonment, Browser-only-users and or direct purchases). This will allow us to compute key metrics like session duration, event per session and session outcomes such as purchase, abandon and browse only. We can also conduct a temporal analysis to assess the time of day patterns, day of the week trends and month patterns. We cannot conduct segment analysis without considering the purchase value distributions as well as drop-off analysis.

Before we start, le

Code
# Load the dataset
data <- read_csv("ecommerce_clickstream_transactions 3.csv") %>%
  clean_names() %>%
  mutate(timestamp = ymd_hms(timestamp))
Rows: 74817 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (3): EventType, ProductID, Outcome
dbl  (3): UserID, SessionID, Amount
dttm (1): Timestamp

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Code
summarise(data, 
  total_rows = n(), 
  unique_users = n_distinct(user_id), 
  unique_sessions = n_distinct(session_id), 
  total_purchases = sum(!is.na(amount)))
# A tibble: 1 × 4
  total_rows unique_users unique_sessions total_purchases
       <int>        <int>           <int>           <int>
1      74817         1000              10           10682

Session Metrics

Code
# Calculate session-level metrics
session_data <- data %>%
  group_by(user_id, session_id) %>%
  summarise(
    session_start = min(timestamp),
    session_end = max(timestamp),
    session_duration = as.numeric(difftime(session_end, session_start, units = "mins")),
    events_per_session = n(),
    made_purchase = any(event_type == "purchase"),
    added_to_cart = any(event_type == "add_to_cart"),
    .groups = "drop"
  )

# Preview nicely using gt
top_sessions <- session_data %>% slice_head(n = 10)

top_sessions %>%
  gt() %>%
  tab_header(
    title = "Session-Level Metrics (Sample)"
  ) %>%
  fmt_number(columns = session_duration:events_per_session, decimals = 2) %>%
  cols_label(
    session_start = "Start Time",
    session_end = "End Time",
    session_duration = "Duration (min)",
    events_per_session = "Event Count",
    made_purchase = "Purchased",
    added_to_cart = "Added to Cart"
  )
Session-Level Metrics (Sample)
user_id session_id Start Time End Time Duration (min) Event Count Purchased Added to Cart
1 1 2024-01-01 23:09:51.956825 2024-07-07 18:00:26.959902 270,410.58 9.00 FALSE TRUE
1 2 2024-01-30 21:47:38.829172 2024-06-27 16:17:34.523695 214,229.93 6.00 FALSE FALSE
1 3 2024-01-19 15:04:33.06565 2024-07-17 03:46:13.897763 258,521.68 9.00 TRUE TRUE
1 4 2024-01-02 00:15:51.420238 2024-07-15 16:15:52.074487 281,760.01 10.00 TRUE TRUE
1 5 2024-01-03 23:51:05.729189 2024-06-27 07:40:55.37483 252,469.83 9.00 TRUE FALSE
1 6 2024-03-18 21:56:10.116916 2024-06-15 04:30:08.607016 127,113.97 8.00 FALSE TRUE
1 7 2024-01-17 08:27:34.705063 2024-07-18 01:56:58.108233 263,129.39 8.00 FALSE TRUE
1 8 2024-01-04 17:09:29.67706 2024-07-18 14:43:52.480919 282,094.38 10.00 FALSE FALSE
1 9 2024-01-06 04:33:39.275154 2024-06-27 04:30:04.344369 249,116.42 8.00 TRUE TRUE
1 10 2024-02-27 23:10:22.034384 2024-07-22 20:10:14.181302 210,059.87 5.00 FALSE TRUE

The table above presents a sample of ten user sessions, capturing key behavioral indicators such as session start and end times, duration, and engagement depth (event count). Notably, session durations span thousands of minutes due to extended timestamp gaps, suggesting the need for further filtering to isolate active periods. We also flag whether a user added items to their cart or completed a purchase. This snapshot helps differentiate user intent across sessions—ranging from casual browsing to decisive transactions—and sets the stage for deeper segmentation and funnel analysis.

Behavioral Segmentation

Code
session_data <- session_data %>%
  mutate(segment = case_when(
    made_purchase ~ "Buyer",
    added_to_cart & !made_purchase ~ "Cart Abandoner",
    !added_to_cart & !made_purchase ~ "Browser Only"
  ))

# Segment counts with gt table
segment_summary <- session_data %>%
  count(segment) %>%
  arrange(desc(n))

segment_summary %>%
  gt() %>%
  tab_header(
    title = "User Segment Distribution"
  ) %>%
  cols_label(
    segment = "User Segment",
    n = "Session Count"
  ) %>%
  fmt_number(columns = n, use_seps = TRUE)
User Segment Distribution
User Segment Session Count
Buyer 6,721.00
Cart Abandoner 2,320.00
Browser Only 959.00

The segmentation analysis reveals three distinct patterns of user behavior across sessions. A majority of sessions—6,721—resulted in a purchase, classifying those users as Buyers, a strong indicator of conversion success. Cart Abandoners, with 2,320 sessions, represent users who showed intent to purchase but dropped off before completing a transaction—an opportunity for targeted re-engagement strategies. Lastly, Browser Only sessions, totaling 959, suggest a segment that may benefit from improved product discovery or incentivized nudges. This distribution highlights the importance of optimizing both the user journey and interventions at critical funnel stages.

Event Path Analysis

Code
# Find common sequences (simplified version)
event_paths <- data %>%
  arrange(user_id, session_id, timestamp) %>%
  group_by(user_id, session_id) %>%
  summarise(path = paste(event_type, collapse = " -> "))
`summarise()` has grouped output by 'user_id'. You can override using the
`.groups` argument.
Code
top_paths <- event_paths %>%
  count(path, sort = TRUE) %>%
  slice_head(n = 10)

top_paths
# A tibble: 10,000 × 3
# Groups:   user_id [1,000]
   user_id path                                                                n
     <dbl> <chr>                                                           <int>
 1       1 add_to_cart -> add_to_cart -> login -> page_view -> add_to_car…     1
 2       1 add_to_cart -> add_to_cart -> logout -> add_to_cart -> add_to_…     1
 3       1 add_to_cart -> product_view -> add_to_cart -> product_view -> …     1
 4       1 click -> product_view -> login -> logout -> logout -> product_…     1
 5       1 login -> add_to_cart -> login -> logout -> add_to_cart              1
 6       1 logout -> purchase -> purchase -> product_view -> login -> pur…     1
 7       1 page_view -> add_to_cart -> login -> add_to_cart -> page_view …     1
 8       1 page_view -> login -> add_to_cart -> click -> add_to_cart -> l…     1
 9       1 page_view -> page_view -> logout -> product_view -> click -> p…     1
10       1 page_view -> page_view -> purchase -> product_view -> product_…     1
# ℹ 9,990 more rows

Purchase Amount Distribution

Code
data %>% filter(!is.na(amount)) %>%
  ggplot(aes(x = amount)) +
  geom_histogram(bins = 30, fill = "#9B59B6", color = "white", alpha = 0.9) +
  labs(
    title = "Purchase Amount Distribution",
    x = "Purchase Amount (USD)",
    y = "Number of Purchases"
  ) +
  theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(face = "bold", size = 16),
    axis.title = element_text(size = 12),
    axis.text = element_text(size = 10)
  )

The histogram below shows how purchase amounts are distributed across all sessions. The distribution appears fairly uniform, indicating diverse spending behaviors. Notably, purchases span the full range up to $500, suggesting a wide product price mix. This even spread offers flexibility in designing pricing strategies, bundling offers, or personalizing recommendations.

Having performed the purchase amount distribution, we move on to conduct a drop off analysis. this will allow us to glean insight as to when a user will not convert so as to device marketing strategies for conversion.

Drop-off Analysis

Code
# Drop-off frequency by last event in session
last_events <- data %>%
  arrange(user_id, session_id, timestamp) %>%
  group_by(user_id, session_id) %>%
  summarise(last_event = last(event_type))
`summarise()` has grouped output by 'user_id'. You can override using the
`.groups` argument.
Code
last_events %>% count(last_event, sort = TRUE)
# A tibble: 5,507 × 3
# Groups:   user_id [1,000]
   user_id last_event       n
     <dbl> <chr>        <int>
 1     864 login            7
 2     303 add_to_cart      6
 3     467 logout           6
 4     534 login            6
 5     543 click            6
 6     630 page_view        6
 7     773 page_view        6
 8       3 click            5
 9      47 product_view     5
10      55 page_view        5
# ℹ 5,497 more rows