This workbook documents the full analytical process behind the HubSpot Integrations Model Exercise, from data ingestion and cleaning through to synthesis and recommendations. Each section follows a consistent structure: setup and rationale, code with inline comments, and post-analysis observations interpreting the results. All data is the synthetic HubSpot dataset provided for this exercise (January–June 2024, n=13,651 customers).
Data is synthetic and provided by HubSpot for recruitment purposes. This section loads all required libraries and imports the two datasets from Google Drive. Both files contain monthly snapshots of HubSpot customer data across a six month observation period (January–June 2024).
# ============================================================
# SECTION 1: SETUP & DATA INGESTION
# ============================================================
# --- STEP 1: SETUP & PACKAGE INSTALLATION ---
# Install packages (only runs if not already installed)
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("readxl")) install.packages("readxl")
if (!require("scales")) install.packages("scales")
if (!require("googledrive")) install.packages("googledrive")
if (!require("googlesheets4")) install.packages("googlesheets4")
# Load libraries
library(tidyverse)
library(readxl)
library(scales)
library(googledrive)
library(googlesheets4)
# --- STEP 2: GOOGLE DRIVE AUTHORIZATION & FILE IMPORT ---
# Authenticate with Google Drive
# A browser prompt will open asking you to authorize — sign in
# with your Google account and allow access
drive_auth()
# Download files from Google Drive to your local session
drive_download(
file = "customer_metadata.xlsx",
path = "customer_metadata.xlsx",
overwrite = TRUE
)
drive_download(
file = "customer_integrations.xlsx",
path = "customer_integrations.xlsx",
overwrite = TRUE
)
# Load into dataframes
df_metadata <- read_excel("customer_metadata.xlsx")
df_integrations <- read_excel("customer_integrations.xlsx")
# Quick sanity check
glimpse(df_metadata)## Rows: 62,339
## Columns: 8
## $ CUSTOMER_ID <chr> "C00020862", "C00020863", "C00020864", "C00…
## $ SNAPSHOT_MONTH <dttm> 2024-01-01, 2024-01-01, 2024-01-01, 2024-0…
## $ MARKETING_HUB_TIER <chr> "Ent", "STarter", "STarter", "Pro", "Pro", …
## $ MARKETING_HUB_MRR <chr> "674.14", "14.16", "9.49", "346.97", "295.9…
## $ MARKETING_HUB_MRR_CANCELED <chr> "NA", "NA", "NA", "NA", "NA", "NA", "NA", "…
## $ SALES_HUB_TIER <chr> "Enterprise", "NA", "NA", "NA", "NA", "NA",…
## $ SALES_HUB_MRR <chr> "-17.55", "NA", "NA", "NA", "NA", "NA", "NA…
## $ SALES_HUB_MRR_CANCELED <chr> "NA", "NA", "NA", "NA", "NA", "NA", "NA", "…
## Rows: 207,506
## Columns: 3
## $ CUSTOMER_ID <chr> "C00020862", "C00020862", "C00020862", "C00020862", "C0…
## $ SNAPSHOT_MONTH <dttm> 2024-01-01, 2024-01-01, 2024-01-01, 2024-01-01, 2024-0…
## $ INTEGRATION <chr> "GoogleCalendar", "FacebookAds", "GoogleAds", "Calendly…
Upon initial inspection of both datasets, the following issues were identified and will be addressed before analysis:
Customer Metadata:
MARKETING_HUB_TIER contains inconsistent naming
conventions for the same tiers: Ent /
Enterprise, STarter / Starter,
Pro / Professional - will normalize to a
single standard.
MARKETING_HUB_MRR,
MARKETING_HUB_MRR_CANCELED, SALES_HUB_MRR, and
SALES_HUB_MRR_CANCELED were imported as character strings
instead of numeric. This is due to "NA" being stored as
literal text rather than a true NA value. Will convert to numeric after
replacing.
SALES_HUB_TIER also contains string
"NA" values where a customer has no Sales Hub product -
will convert to true NA.
SALES_HUB_MRR contains at least one negative value
(-17.55), which may represent a credit or adjustment. Will flag and
monitor but not remove, as it may be intentional.
Customer Integrations:
INTEGRATION contains duplicate naming formats for three
integrations: GoogleAds / Google Ads,
FacebookAds / Facebook Ads,
GoogleCalendar / Google Calendar - will
standardize to spaced format for consistency and display.# ============================================================
# SECTION 2: DATA CLEANING
# ============================================================
## --- CLEANING: CUSTOMER_METADATA ---
df_metadata_clean <- df_metadata %>%
# 1. Normalize Marketing Hub tier naming
mutate(MARKETING_HUB_TIER = case_when(
MARKETING_HUB_TIER %in% c("Ent", "Enterprise") ~ "Enterprise",
MARKETING_HUB_TIER %in% c("STarter", "Starter") ~ "Starter",
MARKETING_HUB_TIER %in% c("Pro", "Professional") ~ "Professional",
TRUE ~ NA_character_
)) %>%
# 2. Normalize Sales Hub tier naming (already cleaner but apply same pattern)
mutate(SALES_HUB_TIER = case_when(
SALES_HUB_TIER == "Enterprise" ~ "Enterprise",
SALES_HUB_TIER == "Starter" ~ "Starter",
SALES_HUB_TIER == "Professional" ~ "Professional",
TRUE ~ NA_character_
)) %>%
# 3. Replace string "NA" with true NA, then convert MRR columns to numeric
mutate(across(
c(MARKETING_HUB_MRR, MARKETING_HUB_MRR_CANCELED,
SALES_HUB_MRR, SALES_HUB_MRR_CANCELED),
~ as.numeric(na_if(., "NA"))
)) %>%
# 4. Ensure SNAPSHOT_MONTH is stored as Date (not datetime)
mutate(SNAPSHOT_MONTH = as.Date(SNAPSHOT_MONTH))
# --- CLEANING: CUSTOMER_INTEGRATIONS ---
df_integrations_clean <- df_integrations %>%
# 5. Standardize integration names to spaced format
mutate(INTEGRATION = case_when(
INTEGRATION == "GoogleAds" ~ "Google Ads",
INTEGRATION == "FacebookAds" ~ "Facebook Ads",
INTEGRATION == "GoogleCalendar" ~ "Google Calendar",
TRUE ~ INTEGRATION
)) %>%
# 6. Ensure SNAPSHOT_MONTH is stored as Date
mutate(SNAPSHOT_MONTH = as.Date(SNAPSHOT_MONTH))
# --- JOIN VALIDATION ---
# Check unique customers in each dataset
meta_customers <- df_metadata_clean %>% distinct(CUSTOMER_ID)
int_customers <- df_integrations_clean %>% distinct(CUSTOMER_ID)
# --- POST-CLEANING VALIDATION ---
# Confirm Normalization --- Marketing Hub Tiers ---
print(table(df_metadata_clean$MARKETING_HUB_TIER, useNA = "always"))##
## Enterprise Professional Starter <NA>
## 7299 23361 29175 2504
# Confirm Normalization --- Sales Hub Tiers ---
print(table(df_metadata_clean$SALES_HUB_TIER, useNA = "always"))##
## Enterprise Professional Starter <NA>
## 4918 6049 6101 45271
## [1] "Calendly" "Canva" "Eventbrite" "Facebook Ads"
## [5] "Gmail" "Google Ads" "Google Calendar" "Instagram"
## [9] "LinkedIn" "Mailchimp" "Typeform" "Zapier"
# Confirm --- MRR Column Types ---
print(sapply(df_metadata_clean[, c(
"MARKETING_HUB_MRR",
"MARKETING_HUB_MRR_CANCELED",
"SALES_HUB_MRR",
"SALES_HUB_MRR_CANCELED")], class))## MARKETING_HUB_MRR MARKETING_HUB_MRR_CANCELED
## "numeric" "numeric"
## SALES_HUB_MRR SALES_HUB_MRR_CANCELED
## "numeric" "numeric"
# Check Count --- Negative MRR Values in Sales Hub ---
df_metadata_clean %>%
filter(!is.na(SALES_HUB_MRR) & SALES_HUB_MRR < 0) %>%
summarise(count = n(), total_negative_mrr = sum(SALES_HUB_MRR)) %>%
print()## # A tibble: 1 × 2
## count total_negative_mrr
## <int> <dbl>
## 1 4 -70.2
# Review Coverage Counts
cat("--- Customer Coverage ---\n",
"Unique customers in metadata:",
nrow(meta_customers), "\n",
"Unique customers in integrations:",
nrow(int_customers), "\n",
"Customers in BOTH datasets:",
nrow(inner_join(meta_customers, int_customers, by = "CUSTOMER_ID")), "\n",
"Customers in metadata ONLY (no integrations):",
nrow(anti_join(meta_customers, int_customers, by = "CUSTOMER_ID")), "\n",
"Customers in integrations ONLY (not in metadata):",
nrow(anti_join(int_customers, meta_customers, by = "CUSTOMER_ID")), "\n")## --- Customer Coverage ---
## Unique customers in metadata: 13651
## Unique customers in integrations: 13368
## Customers in BOTH datasets: 13368
## Customers in metadata ONLY (no integrations): 283
## Customers in integrations ONLY (not in metadata): 0
# Check for duplicate CUSTOMER_ID + SNAPSHOT_MONTH in metadata
# (should be zero — one row per customer per month)
meta_dupes <- df_metadata_clean %>%
group_by(CUSTOMER_ID, SNAPSHOT_MONTH) %>%
filter(n() > 1) %>%
ungroup()
cat("--- Metadata Duplicate Check ---\n",
"Duplicate CUSTOMER_ID + SNAPSHOT_MONTH rows:", nrow(meta_dupes), "\n")## --- Metadata Duplicate Check ---
## Duplicate CUSTOMER_ID + SNAPSHOT_MONTH rows: 0
Marketing Hub Tiers: All three tier values
normalized cleanly into Enterprise,
Professional, and Starter. The 2,504 NA values
are expected and represent customers who hold a Sales Hub subscription
only, with no Marketing Hub product.
Sales Hub Tiers: Normalization applied successfully.
The large NA count (45,271) is expected. The majority of customers in
this dataset are Marketing Hub only, which is confirmed by the ~73% NA
rate in SALES_HUB_TIER. As a result, Sales Hub analysis
will be treated as directional only throughout this workbook, with
Marketing Hub MRR serving as the primary and more statistically reliable
signal.
Integration Names: All 12 integrations are now
cleanly and distinctly named. The three duplicate naming formats
(GoogleAds, FacebookAds,
GoogleCalendar) have been standardized to their spaced
equivalents for consistency and display readability.
MRR Column Types: All four MRR columns
(MARKETING_HUB_MRR,
MARKETING_HUB_MRR_CANCELED, SALES_HUB_MRR,
SALES_HUB_MRR_CANCELED) successfully converted from
character strings to numeric. String "NA" values have been
replaced with true R NA.
Negative MRR Values: Four rows were identified with a combined Sales Hub MRR of -$70.20. Given the scale of the dataset (62,000+ rows), this is negligible and most likely represents a credit or billing adjustment. These rows have been flagged and will be monitored but not removed, as removal could introduce bias into aggregated MRR calculations.
Join Validation: The join between both datasets is clean and behaves as expected:
CUSTOMER_ID + SNAPSHOT_MONTH
combinations - the metadata grain is confirmed as one row per customer
per month, making it safe to join without aggregation concerns# ============================================================
# SECTION 3: INTEGRATION LANDSCAPE
# ============================================================
# --- 3.1 OVERALL ADOPTION RATE ---
# What percentage of total customer base used at least
# one integration in any given month?
adoption_by_month <- df_metadata_clean %>%
left_join(
df_integrations_clean %>%
distinct(CUSTOMER_ID, SNAPSHOT_MONTH) %>%
mutate(has_integration = TRUE),
by = c("CUSTOMER_ID", "SNAPSHOT_MONTH")
) %>%
mutate(has_integration = replace_na(has_integration, FALSE)) %>%
group_by(SNAPSHOT_MONTH) %>%
summarise(
total_customers = n_distinct(CUSTOMER_ID),
with_integration = sum(has_integration),
adoption_rate = with_integration / total_customers
)
# Adoption by month
knitr::kable(adoption_by_month,
digits = 3,
caption = "Monthly Integration Adoption Rate",
col.names = c("Snapshot Month", "Total Customers",
"With Integration", "Adoption Rate"))| Snapshot Month | Total Customers | With Integration | Adoption Rate |
|---|---|---|---|
| 2024-01-01 | 9996 | 9622 | 0.963 |
| 2024-02-01 | 10070 | 9726 | 0.966 |
| 2024-03-01 | 10229 | 9921 | 0.970 |
| 2024-04-01 | 10395 | 10119 | 0.973 |
| 2024-05-01 | 10675 | 10404 | 0.975 |
| 2024-06-01 | 10974 | 10747 | 0.979 |
# Overall adoption rate across full period
overall_adoption <- df_metadata_clean %>%
left_join(
df_integrations_clean %>%
distinct(CUSTOMER_ID) %>%
mutate(has_integration = TRUE),
by = "CUSTOMER_ID"
) %>%
mutate(has_integration = replace_na(has_integration, FALSE)) %>%
summarise(
total_customers = n_distinct(CUSTOMER_ID),
with_integration = sum(has_integration) / 6, # divide by 6 months
adoption_rate = (n_distinct(CUSTOMER_ID[has_integration]) /
n_distinct(CUSTOMER_ID))
)
# Overall adoption
knitr::kable(overall_adoption,
digits = 3,
caption = "Overall Adoption Rate (Full Period)",
col.names = c("Total Customers", "With Integration",
"Adoption Rate"))| Total Customers | With Integration | Adoption Rate |
|---|---|---|
| 13651 | 10221.33 | 0.979 |
# --- 3.2 INTEGRATION POPULARITY & TRENDS ---
# Which integrations are most used, and how has usage
# trended across the 6-month observation period?
# Overall popularity ranking
integration_popularity <- df_integrations_clean %>%
group_by(INTEGRATION) %>%
summarise(
total_usage = n(),
unique_customers = n_distinct(CUSTOMER_ID)
) %>%
arrange(desc(unique_customers))
# --- Integration Popularity Ranking ---
knitr::kable(integration_popularity %>%
select(INTEGRATION, total_usage, unique_customers),
digits = 0,
caption = "Integration Popularity Ranking",
col.names = c("Integration", "Total Usage",
"Unique Customers"))| Integration | Total Usage | Unique Customers |
|---|---|---|
| Mailchimp | 19296 | 5528 |
| Typeform | 23483 | 5416 |
| Canva | 19949 | 4598 |
| 19786 | 4590 | |
| Google Ads | 18855 | 4390 |
| Eventbrite | 17963 | 4185 |
| Facebook Ads | 13915 | 4142 |
| 16593 | 4047 | |
| Gmail | 16957 | 3954 |
| Google Calendar | 16135 | 3729 |
| Calendly | 14314 | 3404 |
| Zapier | 10260 | 2365 |
# Monthly trend by integration
integration_trends <- df_integrations_clean %>%
group_by(SNAPSHOT_MONTH, INTEGRATION) %>%
summarise(
unique_customers = n_distinct(CUSTOMER_ID),
.groups = "drop"
)
# --- Monthly Trends by Integration (first rows) ---
print(head(integration_trends, 20))## # A tibble: 20 × 3
## SNAPSHOT_MONTH INTEGRATION unique_customers
## <date> <chr> <int>
## 1 2024-01-01 Calendly 2180
## 2 2024-01-01 Canva 3173
## 3 2024-01-01 Eventbrite 2803
## 4 2024-01-01 Facebook Ads 1301
## 5 2024-01-01 Gmail 2632
## 6 2024-01-01 Google Ads 2952
## 7 2024-01-01 Google Calendar 2516
## 8 2024-01-01 Instagram 2721
## 9 2024-01-01 LinkedIn 3102
## 10 2024-01-01 Mailchimp 2015
## 11 2024-01-01 Typeform 3708
## 12 2024-01-01 Zapier 1455
## 13 2024-02-01 Calendly 2230
## 14 2024-02-01 Canva 3205
## 15 2024-02-01 Eventbrite 2851
## 16 2024-02-01 Facebook Ads 1638
## 17 2024-02-01 Gmail 2694
## 18 2024-02-01 Google Ads 2993
## 19 2024-02-01 Google Calendar 2560
## 20 2024-02-01 Instagram 2684
# --- 3.3 INTEGRATION COUNT DISTRIBUTION ---
# How many integrations does each customer use per month?
# Bucketed into 0, 1, 2, 3+ for cleaner storytelling.
integration_counts <- df_metadata_clean %>%
left_join(
df_integrations_clean %>%
group_by(CUSTOMER_ID, SNAPSHOT_MONTH) %>%
summarise(int_count = n_distinct(INTEGRATION), .groups = "drop"),
by = c("CUSTOMER_ID", "SNAPSHOT_MONTH")
) %>%
mutate(
int_count = replace_na(int_count, 0),
int_bucket = case_when(
int_count == 0 ~ "0 Integrations",
int_count == 1 ~ "1 Integration",
int_count == 2 ~ "2 Integrations",
int_count >= 3 ~ "3+ Integrations"
),
int_bucket = factor(int_bucket, levels = c(
"0 Integrations", "1 Integration",
"2 Integrations", "3+ Integrations"
))
)
# Distribution summary
int_distribution <- integration_counts %>%
group_by(int_bucket) %>%
summarise(
customer_months = n(),
pct = customer_months / nrow(integration_counts)
)
# Integration distribution
knitr::kable(int_distribution,
digits = 3,
caption = "Integration Count Distribution",
col.names = c("Integration Depth", "Customer-Months",
"% of Total"))| Integration Depth | Customer-Months | % of Total |
|---|---|---|
| 0 Integrations | 1800 | 0.029 |
| 1 Integration | 7165 | 0.115 |
| 2 Integrations | 12078 | 0.194 |
| 3+ Integrations | 41296 | 0.662 |
# note: This is a customer-month table, not distinct customers.
# With 13,651 customers × 6 months = ~62,339
# It tells us the share of customer-months spent at each integration depth.
# A--- Average Integrations Per Customer Per Month ---
integration_counts %>%
summarise(avg_integrations = mean(int_count)) %>%
print()## # A tibble: 1 × 1
## avg_integrations
## <dbl>
## 1 3.33
Adoption Rate: Integration adoption is near-universal, ranging from 96.3% in January to 97.9% by June 2024 and trending upward each month. This suggests integrations are not a niche feature, they are a core behavior across the customer base.
Integration Popularity: Mailchimp and Typeform lead in unique customers, while Zapier ranks last. Notably, Mailchimp’s high unique customer count relative to its total usage rows suggests broad but moderate adoption. Zapier’s low ranking is worth investigating further - as a workflow automation tool, it may represent an underutilized but high-value integration given its operational requirement.
Integration Depth: 66.2% of customers use 3 or more integrations per month, with an average of 3.33 integrations per customer. This signals genuine platform embedding rather than casual or exploratory usage. The zero-integration segment is small (2.9%, 283 customers) but analytically important as a baseline comparison group.
# ============================================================
# SECTION 3: VISUALIZATIONS
# ============================================================
hs_orange <- "#FF7A59"
hs_aqua <- "#00BDA5"
hs_navy <- "#2D3E50"
hs_peach <- "#FFF1EE"
# --- VIZ 1: Monthly Adoption Rate Trend ---
# Using orange line on peach background band for brand colors
ggplot(adoption_by_month, aes(x = SNAPSHOT_MONTH, y = adoption_rate)) +
annotate("rect",
xmin = min(adoption_by_month$SNAPSHOT_MONTH) - 15,
xmax = max(adoption_by_month$SNAPSHOT_MONTH) + 15,
ymin = 0.95, ymax = 1.00,
fill = hs_peach, alpha = 0.5
) +
geom_line(color = hs_orange, linewidth = 1.5) +
geom_point(color = hs_orange, size = 3) +
geom_text(
aes(label = percent(adoption_rate, accuracy = 0.1)),
vjust = -1.2, size = 3.5, fontface = "bold", color = hs_navy
) +
scale_y_continuous(
labels = percent_format(accuracy = 1),
limits = c(0.95, 1.00)
) +
scale_x_date(date_labels = "%b %Y", date_breaks = "1 month") +
labs(
title = "Integration Adoption Rate is Growing Month Over Month",
subtitle = "% of customers with at least one active integration",
x = NULL,
y = "Adoption Rate"
) +
theme_minimal(base_size = 13) +
theme(
plot.title = element_text(face = "bold", color = hs_navy),
plot.subtitle = element_text(color = "gray50"),
axis.text = element_text(color = hs_navy),
panel.grid.minor = element_blank()
)ggsave("viz1_adoption_trend.png", width = 10, height = 6, dpi = 300)
# --- VIZ 2: Integration Popularity Ranking ---
# Highlight top 3 in orange, remaining in aqua to create
# visual hierarchy without losing brand cohesion
integration_popularity <- integration_popularity %>%
mutate(bar_color = ifelse(
rank(desc(unique_customers)) <= 3, hs_orange, hs_aqua
))
ggplot(integration_popularity,
aes(x = reorder(INTEGRATION, unique_customers),
y = unique_customers,
fill = bar_color)) +
geom_col() +
geom_text(
aes(label = comma(unique_customers)),
hjust = -0.1, size = 3.5, color = hs_navy
) +
scale_fill_identity() +
coord_flip() +
scale_y_continuous(
labels = comma_format(),
limits = c(0, max(integration_popularity$unique_customers) * 1.15)
) +
labs(
title = "Mailchimp and Typeform Lead Integration Adoption",
subtitle = "Unique customers using each integration (Jan–Jun 2024)",
x = NULL,
y = "Unique Customers"
) +
theme_minimal(base_size = 13) +
theme(
plot.title = element_text(face = "bold", color = hs_navy),
plot.subtitle = element_text(color = "gray50"),
axis.text = element_text(color = hs_navy),
panel.grid.minor = element_blank(),
panel.grid.major.y = element_blank()
)ggsave("viz2_integration_popularity.png", width = 10, height = 6, dpi = 300)
# --- VIZ 3: Integration Count Distribution ---
# Gradient effect: deepen color as integration depth increases
# to visually reinforce "more is better"
dist_colors <- c(
"0 Integrations" = "gray80",
"1 Integration" = hs_aqua,
"2 Integrations" = hs_orange,
"3+ Integrations" = hs_navy
)
ggplot(int_distribution,
aes(x = int_bucket, y = pct, fill = int_bucket)) +
geom_col() +
geom_text(
aes(label = percent(pct, accuracy = 0.1)),
vjust = -0.5, size = 4, fontface = "bold", color = hs_navy
) +
scale_fill_manual(values = dist_colors) +
scale_y_continuous(
labels = percent_format(accuracy = 1),
limits = c(0, 0.75)
) +
labs(
title = "Most Customers Use 3 or More Integrations",
subtitle = "Distribution of integration depth per customer-month",
x = "Integration Depth",
y = "% of Customer-Months"
) +
theme_minimal(base_size = 13) +
theme(
plot.title = element_text(face = "bold", color = hs_navy),
plot.subtitle = element_text(color = "gray50"),
axis.text = element_text(color = hs_navy),
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank(),
legend.position = "none"
)# ============================================================
# SECTION 4: DO INTEGRATIONS MATTER? (CORE ANALYSIS)
# ============================================================
# --- SETUP: Build the master analysis table ---
# Join metadata with integration depth buckets created in
# Section 3, so every customer-month has an integration
# bucket label alongside their MRR and cancellation data
master <- df_metadata_clean %>%
left_join(
integration_counts %>%
select(CUSTOMER_ID, SNAPSHOT_MONTH, int_count, int_bucket),
by = c("CUSTOMER_ID", "SNAPSHOT_MONTH")
)
# Confirm Join --- Master Table Dimensions --
glimpse(master)## Rows: 62,339
## Columns: 10
## $ CUSTOMER_ID <chr> "C00020862", "C00020863", "C00020864", "C00…
## $ SNAPSHOT_MONTH <date> 2024-01-01, 2024-01-01, 2024-01-01, 2024-0…
## $ MARKETING_HUB_TIER <chr> "Enterprise", "Starter", "Starter", "Profes…
## $ MARKETING_HUB_MRR <dbl> 674.14, 14.16, 9.49, 346.97, 295.94, 19.18,…
## $ MARKETING_HUB_MRR_CANCELED <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SALES_HUB_TIER <chr> "Enterprise", NA, NA, NA, NA, NA, NA, NA, N…
## $ SALES_HUB_MRR <dbl> -17.55, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SALES_HUB_MRR_CANCELED <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ int_count <int> 6, 2, 3, 2, 4, 3, 2, 3, 0, 1, 3, 1, 3, 4, 0…
## $ int_bucket <fct> 3+ Integrations, 2 Integrations, 3+ Integra…
# --- 4.1 CANCELLATION RATE BY INTEGRATION DEPTH ---
# Cancellation rate = canceled MRR / total MRR
# Using Marketing Hub as primary lens (dominant product)
# A customer-month is a "cancellation event" if
# MARKETING_HUB_MRR_CANCELED is not NA (i.e. has a value)
cancellation_by_bucket <- master %>%
group_by(int_bucket) %>%
summarise(
total_customer_months = n(),
cancellation_events = sum(!is.na(MARKETING_HUB_MRR_CANCELED)),
total_mrr = sum(MARKETING_HUB_MRR, na.rm = TRUE),
canceled_mrr = sum(MARKETING_HUB_MRR_CANCELED, na.rm = TRUE),
cancellation_rate = cancellation_events / total_customer_months,
mrr_cancellation_rate = canceled_mrr / total_mrr
) %>%
arrange(int_bucket)
# --- Cancellation Rate by Integration Depth ---
knitr::kable(cancellation_by_bucket,
digits = 4,
caption = "Cancellation Rate by Integration Depth")| int_bucket | total_customer_months | cancellation_events | total_mrr | canceled_mrr | cancellation_rate | mrr_cancellation_rate |
|---|---|---|---|---|---|---|
| 0 Integrations | 1800 | 171 | 157247.8 | 13195.61 | 0.0950 | 0.0839 |
| 1 Integration | 7165 | 519 | 814699.0 | 50777.29 | 0.0724 | 0.0623 |
| 2 Integrations | 12078 | 959 | 2023784.4 | 151109.68 | 0.0794 | 0.0747 |
| 3+ Integrations | 41296 | 2325 | 12663783.2 | 580640.17 | 0.0563 | 0.0459 |
# --- 4.2 MRR RETENTION BY INTEGRATION DEPTH ---
# Customer Dollar Retention (C$R) as defined in brief:
# (1 - (lost value / starting value)) ^ 12
# First calculate monthly retention, then annualize
mrr_retention_by_bucket <- master %>%
group_by(int_bucket, SNAPSHOT_MONTH) %>%
summarise(
starting_mrr = sum(MARKETING_HUB_MRR, na.rm = TRUE),
canceled_mrr = sum(MARKETING_HUB_MRR_CANCELED, na.rm = TRUE),
monthly_retention = 1 - (canceled_mrr / starting_mrr),
.groups = "drop"
) %>%
group_by(int_bucket) %>%
summarise(
avg_monthly_retention = mean(monthly_retention, na.rm = TRUE),
annualized_retention = avg_monthly_retention ^ 12
) %>%
arrange(int_bucket)
# MRR retention by bucket
knitr::kable(mrr_retention_by_bucket,
digits = 4,
caption = "MRR Retention by Integration Depth",
col.names = c("Integration Depth", "Avg Monthly Retention",
"Annualized Retention (C$R)"))| Integration Depth | Avg Monthly Retention | Annualized Retention (C$R) |
|---|---|---|
| 0 Integrations | 0.9170 | 0.3536 |
| 1 Integration | 0.9378 | 0.4629 |
| 2 Integrations | 0.9254 | 0.3942 |
| 3+ Integrations | 0.9541 | 0.5688 |
# --- 4.3 AVERAGE MRR BY INTEGRATION DEPTH ---
# Are higher-integration customers also higher value?
# Important context for VP of CS
mrr_by_bucket <- master %>%
group_by(int_bucket) %>%
summarise(
avg_mrr = mean(MARKETING_HUB_MRR, na.rm = TRUE),
median_mrr = median(MARKETING_HUB_MRR, na.rm = TRUE),
total_mrr = sum(MARKETING_HUB_MRR, na.rm = TRUE)
) %>%
arrange(int_bucket)
# MRR by bucket
knitr::kable(mrr_by_bucket,
digits = 2,
caption = "Average MRR by Integration Depth",
col.names = c("Integration Depth", "Avg MRR",
"Median MRR", "Total MRR"))| Integration Depth | Avg MRR | Median MRR | Total MRR |
|---|---|---|---|
| 0 Integrations | 91.37 | 15.66 | 157247.8 |
| 1 Integration | 119.04 | 17.92 | 814699.0 |
| 2 Integrations | 174.95 | 25.62 | 2023784.4 |
| 3+ Integrations | 318.97 | 295.57 | 12663783.2 |
Cancellation Rate by Integration Depth: A clear pattern emerges: customers with zero integrations cancel at 9.5% monthly, declining to 5.6% for customers with 3+ integrations. That represents a ~41% reduction in cancellation rate at peak integration depth. One nuance worth noting: the 2-integration bucket (7.9%) slightly underperforms the 1-integration bucket (7.2%), which reappears in later analysis and appears to be a consistent pattern rather than noise.
Annualized Retention: The gap widens significantly when annualized. Customers with zero integrations retain at 35.4% annually versus 56.9% for customers with 3+ integrations (a 21.5 percentage point difference) that represents a meaningful business impact at scale.
Average MRR: Deeply integrated customers are not only more likely to stay, they are worth significantly more. Average MRR climbs from $91 for zero-integration customers to $319 for those with 3+ integrations, a 3.5x difference. This combination of higher value and lower churn makes the 3+ integration segment the most strategically important cohort in the dataset.
# ============================================================
# SECTION 4: VISUALIZATIONS
# ============================================================
# --- VIZ 4: Cancellation Rate by Integration Depth ---
# Highlight the 3+ bucket in orange, that's the headline
cancellation_colors <- c(
"0 Integrations" = "gray80",
"1 Integration" = hs_aqua,
"2 Integrations" = hs_aqua,
"3+ Integrations" = hs_orange
)
ggplot(cancellation_by_bucket,
aes(x = int_bucket, y = cancellation_rate, fill = int_bucket)) +
geom_col() +
geom_text(
aes(label = percent(cancellation_rate, accuracy = 0.1)),
vjust = -0.5, size = 4, fontface = "bold", color = hs_navy
) +
scale_fill_manual(values = cancellation_colors) +
scale_y_continuous(
labels = percent_format(accuracy = 1),
limits = c(0, 0.12)
) +
labs(
title = "Customers with 3+ Integrations Cancel at Nearly Half the Rate",
subtitle = "Monthly cancellation rate by integration depth | Marketing Hub",
x = "Integration Depth",
y = "Cancellation Rate"
) +
theme_minimal(base_size = 13) +
theme(
plot.title = element_text(face = "bold", color = hs_navy),
plot.subtitle = element_text(color = "gray50"),
axis.text = element_text(color = hs_navy),
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank(),
legend.position = "none"
)ggsave("viz4_cancellation_by_depth.png", width = 10, height = 6, dpi = 300)
# --- VIZ 5: Annualized MRR Retention by Integration Depth ---
retention_colors <- c(
"0 Integrations" = "gray80",
"1 Integration" = hs_aqua,
"2 Integrations" = hs_aqua,
"3+ Integrations" = hs_orange
)
ggplot(mrr_retention_by_bucket,
aes(x = int_bucket, y = annualized_retention, fill = int_bucket)) +
geom_col() +
geom_text(
aes(label = percent(annualized_retention, accuracy = 0.1)),
vjust = -0.5, size = 4, fontface = "bold", color = hs_navy
) +
scale_fill_manual(values = retention_colors) +
scale_y_continuous(
labels = percent_format(accuracy = 1),
limits = c(0, 0.70)
) +
labs(
title = "3+ Integrations Drives 57% Annualized Revenue Retention",
subtitle = "Annualized Customer Dollar Retention (C$R) by integration depth | Marketing Hub",
x = "Integration Depth",
y = "Annualized C$R"
) +
theme_minimal(base_size = 13) +
theme(
plot.title = element_text(face = "bold", color = hs_navy),
plot.subtitle = element_text(color = "gray50"),
axis.text = element_text(color = hs_navy),
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank(),
legend.position = "none"
)ggsave("viz5_mrr_retention_by_depth.png", width = 10, height = 6, dpi = 300)
# --- VIZ 6: Average MRR by Integration Depth ---
# more integrations = higher value customer
mrr_colors <- c(
"0 Integrations" = "gray80",
"1 Integration" = hs_aqua,
"2 Integrations" = hs_aqua,
"3+ Integrations" = hs_orange
)
ggplot(mrr_by_bucket,
aes(x = int_bucket, y = avg_mrr, fill = int_bucket)) +
geom_col() +
geom_text(
aes(label = dollar(avg_mrr, accuracy = 1)),
vjust = -0.5, size = 4, fontface = "bold", color = hs_navy
) +
scale_fill_manual(values = mrr_colors) +
scale_y_continuous(
labels = dollar_format(accuracy = 1),
limits = c(0, 380)
) +
labs(
title = "Deeply Integrated Customers Are Worth 3.5x More",
subtitle = "Average monthly Marketing Hub MRR by integration depth",
x = "Integration Depth",
y = "Average MRR"
) +
theme_minimal(base_size = 13) +
theme(
plot.title = element_text(face = "bold", color = hs_navy),
plot.subtitle = element_text(color = "gray50"),
axis.text = element_text(color = hs_navy),
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank(),
legend.position = "none"
)# ============================================================
# SECTION 5: DEEPER CUTS
# ============================================================
# --- 5.1 TIER x INTEGRATION DEPTH ---
# Are Enterprise customers more deeply integrated?
# This helps CS understand whether integration depth is
# driven by customer sophistication or is an opportunity
# to develop across all tiers
tier_integration <- master %>%
filter(!is.na(MARKETING_HUB_TIER)) %>%
group_by(MARKETING_HUB_TIER, int_bucket) %>%
summarise(
customer_months = n(),
.groups = "drop"
) %>%
group_by(MARKETING_HUB_TIER) %>%
mutate(
pct_within_tier = customer_months / sum(customer_months)
) %>%
ungroup() %>%
mutate(MARKETING_HUB_TIER = factor(
MARKETING_HUB_TIER,
levels = c("Starter", "Professional", "Enterprise")
))
# Tier x integration depth
knitr::kable(tier_integration,
digits = 3,
caption = "Integration Depth by Marketing Hub Tier",
col.names = c("Tier", "Integration Depth",
"Customer-Months", "% Within Tier"))| Tier | Integration Depth | Customer-Months | % Within Tier |
|---|---|---|---|
| Enterprise | 0 Integrations | 38 | 0.005 |
| Enterprise | 1 Integration | 179 | 0.025 |
| Enterprise | 2 Integrations | 625 | 0.086 |
| Enterprise | 3+ Integrations | 6457 | 0.885 |
| Professional | 0 Integrations | 297 | 0.013 |
| Professional | 1 Integration | 1635 | 0.070 |
| Professional | 2 Integrations | 3798 | 0.163 |
| Professional | 3+ Integrations | 17631 | 0.755 |
| Starter | 0 Integrations | 1386 | 0.048 |
| Starter | 1 Integration | 5030 | 0.172 |
| Starter | 2 Integrations | 7145 | 0.245 |
| Starter | 3+ Integrations | 15614 | 0.535 |
# Average integration count by tier
avg_int_by_tier <- master %>%
filter(!is.na(MARKETING_HUB_TIER)) %>%
group_by(MARKETING_HUB_TIER) %>%
summarise(
avg_integrations = mean(int_count, na.rm = TRUE),
median_integrations = median(int_count, na.rm = TRUE)
) %>%
mutate(MARKETING_HUB_TIER = factor(
MARKETING_HUB_TIER,
levels = c("Starter", "Professional", "Enterprise")
)) %>%
arrange(MARKETING_HUB_TIER)
# Average integration count by tier
knitr::kable(avg_int_by_tier,
digits = 2,
caption = "Average Integration Count by Tier",
col.names = c("Tier", "Avg Integrations",
"Median Integrations"))| Tier | Avg Integrations | Median Integrations |
|---|---|---|
| Starter | 2.75 | 3 |
| Professional | 3.68 | 4 |
| Enterprise | 4.61 | 5 |
# --- 5.2 INTEGRATION x RETENTION ---
# Which specific integrations correlate with
# the best MRR retention?
# Approach: for each integration, compare the cancellation
# rate of customers who used it vs. those who did not
# Build a customer-month level flag for each integration
integration_retention <- df_integrations_clean %>%
distinct(CUSTOMER_ID, SNAPSHOT_MONTH, INTEGRATION) %>%
left_join(
master %>%
select(CUSTOMER_ID, SNAPSHOT_MONTH,
MARKETING_HUB_MRR, MARKETING_HUB_MRR_CANCELED),
by = c("CUSTOMER_ID", "SNAPSHOT_MONTH")
) %>%
group_by(INTEGRATION) %>%
summarise(
customer_months = n(),
cancellation_events = sum(!is.na(MARKETING_HUB_MRR_CANCELED)),
total_mrr = sum(MARKETING_HUB_MRR, na.rm = TRUE),
canceled_mrr = sum(MARKETING_HUB_MRR_CANCELED, na.rm = TRUE),
cancellation_rate = cancellation_events / customer_months,
mrr_cancellation_rate = canceled_mrr / total_mrr
) %>%
arrange(cancellation_rate)
# Integration x retention
knitr::kable(integration_retention,
digits = 4,
caption = "Cancellation Rate by Integration",
col.names = c("Integration", "Customer-Months",
"Cancellation Events", "Total MRR",
"Canceled MRR", "Cancellation Rate",
"MRR Cancellation Rate"))| Integration | Customer-Months | Cancellation Events | Total MRR | Canceled MRR | Cancellation Rate | MRR Cancellation Rate |
|---|---|---|---|---|---|---|
| Zapier | 10260 | 378 | 3639402 | 95822.68 | 0.0368 | 0.0263 |
| Facebook Ads | 13915 | 724 | 4527552 | 183194.00 | 0.0520 | 0.0405 |
| Canva | 19949 | 1066 | 6044117 | 238670.10 | 0.0534 | 0.0395 |
| Google Calendar | 16135 | 864 | 5014105 | 212112.69 | 0.0535 | 0.0423 |
| Mailchimp | 19296 | 1054 | 5792311 | 249575.21 | 0.0546 | 0.0431 |
| Gmail | 16957 | 939 | 5174913 | 229987.97 | 0.0554 | 0.0444 |
| Calendly | 14314 | 797 | 4601354 | 218160.08 | 0.0557 | 0.0474 |
| Google Ads | 18855 | 1059 | 5491260 | 244888.18 | 0.0562 | 0.0446 |
| 19786 | 1131 | 6090605 | 263425.37 | 0.0572 | 0.0433 | |
| Eventbrite | 17963 | 1030 | 5529064 | 250285.27 | 0.0573 | 0.0453 |
| Typeform | 23483 | 1354 | 6973749 | 315886.64 | 0.0577 | 0.0453 |
| 16593 | 1274 | 5116288 | 312826.41 | 0.0768 | 0.0611 |
# --- 5.3 SALES HUB LENS ---
# Does the integration-retention relationship hold
# for Sales Hub customers?
# Note: smaller sample
sales_hub_retention <- master %>%
filter(!is.na(SALES_HUB_TIER)) %>%
group_by(int_bucket) %>%
summarise(
customer_months = n(),
cancellation_events = sum(!is.na(SALES_HUB_MRR_CANCELED)),
total_mrr = sum(SALES_HUB_MRR, na.rm = TRUE),
canceled_mrr = sum(SALES_HUB_MRR_CANCELED, na.rm = TRUE),
cancellation_rate = cancellation_events / customer_months,
avg_mrr = mean(SALES_HUB_MRR, na.rm = TRUE)
) %>%
arrange(int_bucket)
# Sales Hub retention
knitr::kable(sales_hub_retention,
digits = 4,
caption = "Sales Hub Retention by Integration Depth",
col.names = c("Integration Depth", "Customer-Months",
"Cancellation Events", "Total MRR",
"Canceled MRR", "Cancellation Rate",
"Avg MRR"))| Integration Depth | Customer-Months | Cancellation Events | Total MRR | Canceled MRR | Cancellation Rate | Avg MRR |
|---|---|---|---|---|---|---|
| 0 Integrations | 432 | 36 | 142427.5 | 12343.77 | 0.0833 | 329.6933 |
| 1 Integration | 2009 | 131 | 953531.4 | 42480.21 | 0.0652 | 474.6299 |
| 2 Integrations | 3164 | 235 | 1537452.8 | 95858.05 | 0.0743 | 485.9206 |
| 3+ Integrations | 11463 | 652 | 5756460.9 | 286567.28 | 0.0569 | 502.1775 |
Tier × Integration Depth: Integration depth scales predictably with tier: 88.5% of Enterprise customers use 3+ integrations compared to 53.5% of Starter customers. However, the more interesting finding is that even at the Starter tier, the majority of customers are deeply integrated. This suggests integrations are a universal retention lever accessible across all tiers, not exclusive to sophisticated high-value accounts.
Integration × Retention: Zapier stands out significantly, with a 3.7% monthly cancellation rate, roughly half the rate of most other integrations. It’s workflow automation nature likely creates operational dependency on HubSpot, resulting in deeper attachment than a data-connection integration. Instagram sits at the opposite end at 7.7%, though this likely reflects a customer profile effect. Smaller, more transactional businesses - rather than a causal relationship between Instagram usage and churn.
Sales Hub Lens: The integration-retention relationship holds directionally in Sales Hub, with cancellation declining from 8.3% at zero integrations to 5.7% at 3+. The 1→2 integration bump observed in Marketing Hub reappears here, suggesting it is a genuine pattern worth further investigation. Given the smaller Sales Hub sample (~27% of customers), these findings should be treated as directional only.
# ============================================================
# SECTION 5: VISUALIZATIONS
# ============================================================
# --- VIZ 7: Integration Depth by Tier (Stacked Bar) ---
# Shows how integration behavior scales with tier
# Stacked to show full composition within each tier
ggplot(tier_integration,
aes(x = MARKETING_HUB_TIER,
y = pct_within_tier,
fill = int_bucket)) +
geom_col(position = "stack") +
geom_text(
aes(label = ifelse(pct_within_tier > 0.05,
percent(pct_within_tier, accuracy = 0.1), ""),
color = ifelse(int_bucket %in% c("0 Integrations", "1 Integration"),
hs_navy, "white")),
position = position_stack(vjust = 0.5),
size = 3.5, fontface = "bold"
) +
scale_color_identity() +
scale_fill_manual(values = c(
"0 Integrations" = "gray80",
"1 Integration" = hs_aqua,
"2 Integrations" = hs_orange,
"3+ Integrations" = hs_navy
)) +
scale_y_continuous(labels = percent_format(accuracy = 1)) +
labs(
title = "Enterprise Leads Integration Depth, But 3+ Is Common Across All Tiers",
subtitle = "% of customer-months by integration bucket within each Marketing Hub tier",
x = "Marketing Hub Tier",
y = "% of Customer-Months",
fill = "Integration Depth"
) +
theme_minimal(base_size = 13) +
theme(
plot.title = element_text(face = "bold", color = hs_navy),
plot.subtitle = element_text(color = "gray50"),
axis.text = element_text(color = hs_navy),
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank(),
legend.position = "bottom"
)ggsave("viz7_tier_x_depth.png", width = 10, height = 6, dpi = 300)
# --- VIZ 8: Cancellation Rate by Integration ---
# Zapier highlighted in orange as the standout
# Instagram highlighted in navy as the outlier to flag
integration_bar_colors <- integration_retention %>%
mutate(bar_color = case_when(
INTEGRATION == "Zapier" ~ hs_orange,
INTEGRATION == "Instagram" ~ hs_navy,
TRUE ~ hs_aqua
)) %>%
pull(bar_color)
ggplot(integration_retention,
aes(x = reorder(INTEGRATION, cancellation_rate),
y = cancellation_rate)) +
geom_col(fill = integration_bar_colors) +
geom_text(
aes(label = percent(cancellation_rate, accuracy = 0.1)),
hjust = -0.1, size = 3.5, fontface = "bold", color = hs_navy
) +
coord_flip() +
scale_y_continuous(
labels = percent_format(accuracy = 1),
limits = c(0, 0.10)
) +
labs(
title = "Zapier Users Cancel at Half the Rate & Instagram Warrants Attention",
subtitle = "Monthly cancellation rate by integration | Marketing Hub customers",
x = NULL,
y = "Cancellation Rate"
) +
theme_minimal(base_size = 13) +
theme(
plot.title = element_text(face = "bold", color = hs_navy),
plot.subtitle = element_text(color = "gray50"),
axis.text = element_text(color = hs_navy),
panel.grid.minor = element_blank(),
panel.grid.major.y = element_blank(),
legend.position = "none"
)ggsave("viz8_cancellation_by_integration.png", width = 10, height = 6, dpi = 300)
# --- VIZ 9: Sales Hub Retention by Integration Depth ---
# Directional comparison — mirrors Marketing Hub finding
sales_colors <- c(
"0 Integrations" = "gray80",
"1 Integration" = hs_aqua,
"2 Integrations" = hs_aqua,
"3+ Integrations" = hs_orange
)
ggplot(sales_hub_retention,
aes(x = int_bucket,
y = cancellation_rate,
fill = int_bucket)) +
geom_col() +
geom_text(
aes(label = percent(cancellation_rate, accuracy = 0.1)),
vjust = -0.5, size = 4, fontface = "bold", color = hs_navy
) +
scale_fill_manual(values = sales_colors) +
scale_y_continuous(
labels = percent_format(accuracy = 1),
limits = c(0, 0.11)
) +
labs(
title = "Integration Depth Drives Lower Cancellation in Sales Hub Too",
subtitle = "Monthly cancellation rate by integration depth | Sales Hub (smaller sample)",
x = "Integration Depth",
y = "Cancellation Rate"
) +
theme_minimal(base_size = 13) +
theme(
plot.title = element_text(face = "bold", color = hs_navy),
plot.subtitle = element_text(color = "gray50"),
axis.text = element_text(color = hs_navy),
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank(),
legend.position = "none"
)Finding 1: Integration adoption is near-universal and growing
Finding 2: Integration depth is strongly associated with lower cancellation rates
Finding 3: Integration depth tracks directly with customer value
Finding 4: Zapier is the highest-retention integration; Instagram warrants monitoring
Finding 5: Integration depth scales with tier but is broadly adopted across all tiers
1. Correlation, Not Causation
Integration depth correlates with better retention but does not prove causation. It is possible that customers who intend to stay longer are simply more likely to invest in integrations - rather than integrations themselves driving retention. A controlled experiment or longer longitudinal study would be needed to establish causality.
2. Short Observation Window
Six months of data limits our ability to assess true retention patterns. Churn events often lag integration behavior by months or quarters. A 12–24 month window would produce more reliable C$R calculations.
3. Integration Timing Unknown
We do not know when a customer first adopted an integration relative to their subscription start date or any churn signals. Early integration adoption may be a stronger predictor of retention than late-stage adoption. Time-to-first-integration analysis would require historical onboarding data.
4. Sales Hub Sample Size
Only ~27% of customers have Sales Hub data, making Sales Hub findings directional only. Conclusions should not be generalized without a larger sample.
5. Instagram Interpretation
Instagram’s elevated cancellation rate likely reflects customer profile differences rather than a causal relationship. Smaller, more transactional businesses may gravitate toward social integrations while also having higher natural churn rates. Controlling for tier and company size would clarify this finding.
1. Identify and Prioritize Zero-Integration Customers
The 283 customers with no integrations represent an immediate CS intervention opportunity. These customers cancel at 9.5% monthly and generate the lowest average MRR. A targeted outreach campaign to drive even one integration adoption could meaningfully improve retention for this segment.
2. Build a Zapier Promotion Strategy
Given Zapier’s outsized retention correlation (3.7% cancellation rate), CS and Product should evaluate whether active promotion of Zapier during onboarding or QBRs could accelerate retention improvement - particularly for Professional and Starter tier customers who use it least.
3. Monitor the Instagram Cohort
Flag Instagram-only customers for proactive CS outreach. Investigate whether this segment skews toward Starter tier and shorter tenure - if so, targeted onboarding support could reduce early churn.
4. Time-to-First-Integration Analysis
With onboarding timestamp data, we could measure whether customers who adopt integrations earlier in their lifecycle retain better than late adopters. This would inform whether integration adoption should be a formal onboarding milestone.
5. Integration Sequence Modeling
Do customers who adopt integrations in a specific order (e.g., Gmail → Google Calendar → Zapier) show better retention than those who adopt them in other sequences? Sequence analysis could inform a prescriptive integration journey for CS playbooks.
6. Controlled Retention Experiment
To move from correlation to causation, design an A/B test where one customer cohort receives active integration adoption support during onboarding and another follows the standard path. Measure the 12-month retention delta between cohorts.
7. Expand to Full Product Suite
Current data covers Marketing Hub and Sales Hub only. Extending this analysis to Service Hub, CMS Hub, and Operations Hub would give a complete picture of integration impact across the HubSpot ecosystem.