Summary_of_churn_analysis_and_exploration_in

1 Preparing the data

for preparing and importing dataset or the csv

library(tidyverse)

## Warning: package 'tidyverse' was built under R version 4.5.2

## Warning: package 'ggplot2' was built under R version 4.5.2

## Warning: package 'tibble' was built under R version 4.5.2

## Warning: package 'tidyr' was built under R version 4.5.2

## Warning: package 'readr' was built under R version 4.5.2

## Warning: package 'purrr' was built under R version 4.5.2

## Warning: package 'dplyr' was built under R version 4.5.2

## Warning: package 'stringr' was built under R version 4.5.2

## Warning: package 'forcats' was built under R version 4.5.2

## Warning: package 'lubridate' was built under R version 4.5.2

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.1     ✔ tibble    3.3.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(readr)
customer_subscription_churn_usage_patterns <- read_csv("customer_subscription_churn_usage_patterns.csv", 
                                                       col_types = cols(signup_date = col_datetime(format = "%Y-%m-%d")))

knitr::kable(head(customer_subscription_churn_usage_patterns,10))

user_id	signup_date	plan_type	monthly_fee	avg_weekly_usage_hours	support_tickets	payment_failures	tenure_months	last_login_days_ago	churn
1	2023-04-15	Premium	699	1.1	4	1	8	14	Yes
2	2023-08-27	Premium	699	2.6	6	0	35	1	Yes
3	2023-10-12	Premium	699	14.3	8	3	2	14	Yes
4	2023-12-11	Basic	199	17.6	5	2	11	9	Yes
5	2023-02-14	Basic	199	9.8	5	2	6	38	Yes
6	2024-10-05	Premium	699	13.6	6	0	30	35	Yes
7	2023-10-24	Premium	699	14.6	1	0	24	42	Yes
8	2024-01-09	Basic	199	21.7	6	2	15	29	Yes
9	2023-03-15	Basic	199	9.2	4	5	24	59	Yes
10	2024-10-17	Premium	699	13.6	3	1	11	29	No

2 first look at data

glimpse(customer_subscription_churn_usage_patterns)

## Rows: 2,800
## Columns: 10
## $ user_id                <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, …
## $ signup_date            <dttm> 2023-04-15, 2023-08-27, 2023-10-12, 2023-12-11…
## $ plan_type              <chr> "Premium", "Premium", "Premium", "Basic", "Basi…
## $ monthly_fee            <dbl> 699, 699, 699, 199, 199, 699, 699, 199, 199, 69…
## $ avg_weekly_usage_hours <dbl> 1.1, 2.6, 14.3, 17.6, 9.8, 13.6, 14.6, 21.7, 9.…
## $ support_tickets        <dbl> 4, 6, 8, 5, 5, 6, 1, 6, 4, 3, 5, 5, 3, 2, 6, 8,…
## $ payment_failures       <dbl> 1, 0, 3, 2, 2, 0, 0, 2, 5, 1, 0, 1, 5, 4, 4, 0,…
## $ tenure_months          <dbl> 8, 35, 2, 11, 6, 30, 24, 15, 24, 11, 36, 18, 10…
## $ last_login_days_ago    <dbl> 14, 1, 14, 9, 38, 35, 42, 29, 59, 29, 14, 41, 3…
## $ churn                  <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes"…

summary(customer_subscription_churn_usage_patterns)

##     user_id        signup_date                   plan_type        
##  Min.   :   1.0   Min.   :2023-01-01 00:00:00   Length:2800       
##  1st Qu.: 700.8   1st Qu.:2023-07-09 18:00:00   Class :character  
##  Median :1400.5   Median :2024-01-14 00:00:00   Mode  :character  
##  Mean   :1400.5   Mean   :2024-01-09 16:06:20                     
##  3rd Qu.:2100.2   3rd Qu.:2024-07-12 06:00:00                     
##  Max.   :2800.0   Max.   :2024-12-31 00:00:00                     
##   monthly_fee    avg_weekly_usage_hours support_tickets payment_failures
##  Min.   :199.0   Min.   : 0.50          Min.   :0.000   Min.   :0.000   
##  1st Qu.:199.0   1st Qu.: 6.70          1st Qu.:2.000   1st Qu.:1.000   
##  Median :399.0   Median :12.80          Median :4.000   Median :2.000   
##  Mean   :434.2   Mean   :12.89          Mean   :3.888   Mean   :2.492   
##  3rd Qu.:699.0   3rd Qu.:19.20          3rd Qu.:6.000   3rd Qu.:4.000   
##  Max.   :699.0   Max.   :25.00          Max.   :8.000   Max.   :5.000   
##  tenure_months   last_login_days_ago    churn          
##  Min.   : 1.00   Min.   : 0          Length:2800       
##  1st Qu.:10.00   1st Qu.:14          Class :character  
##  Median :18.00   Median :30          Mode  :character  
##  Mean   :18.61   Mean   :30                            
##  3rd Qu.:27.00   3rd Qu.:46                            
##  Max.   :36.00   Max.   :60

the data haves 2800 rows and summary function summarize each columns to see medians max and statistical summary for example the max support tickets is 8.

After looking at the table and understand columns and what they mean its time to check for the credibility of data.

3 cleaning

3.1 checking for nulls

colSums(is.na(customer_subscription_churn_usage_patterns))

##                user_id            signup_date              plan_type 
##                      0                      0                      0 
##            monthly_fee avg_weekly_usage_hours        support_tickets 
##                      0                      0                      0 
##       payment_failures          tenure_months    last_login_days_ago 
##                      0                      0                      0 
##                  churn 
##                      0

customer_subscription_churn_usage_patterns %>% summarise(across(everything(), ~sum(is.na(.))))

there is no nulls in the dataset

3.2 checking for dupes

library(janitor)

## Warning: package 'janitor' was built under R version 4.5.2

## 
## Attaching package: 'janitor'

## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

get_dupes(customer_subscription_churn_usage_patterns)

## No variable names specified - using all columns.

## No duplicate combinations found of: user_id, signup_date, plan_type, monthly_fee, avg_weekly_usage_hours, support_tickets, payment_failures, tenure_months, last_login_days_ago, churn

means there are no dupes in the dataset

dataset is clean.

4 exploring data

Before diving into detailed analysis, we performed a preliminary exploration to understand the dataset and the general behavior of customers. This step helps frame the business context and highlights initial patterns worth investigating further.

churn_prc <- customer_subscription_churn_usage_patterns %>% group_by(churn) %>% summarise(total=n(),prc=100*(total/2800))
knitr::kable(head(churn_prc))

churn	total	prc
No	1195	42.67857
Yes	1605	57.32143

Examined the proportion of churned vs retained customers, identifying that 57% of users had churned.

plan_prc <- customer_subscription_churn_usage_patterns %>% group_by(plan_type) %>% summarise(total=n(),prc=100*(total/2800))
knitr::kable(head(plan_prc))

plan_type	total	prc
Basic	923	32.96429
Premium	944	33.71429
Standard	933	33.32143

Reviewed subscription plans (Basic, Standard, Premium) to ensure balanced representation and identify potential plan-specific trends. And as we see the distr of clients in plans are same.

plan_churn_prc <- customer_subscription_churn_usage_patterns %>% group_by(churn, plan_type) %>% summarise(total = n(), .groups = "drop") %>%  group_by(churn) %>%  mutate(prc = 100 * total / sum(total)) %>%  ungroup()
knitr::kable(head(plan_churn_prc))

churn	plan_type	total	prc
No	Basic	389	32.55230
No	Premium	396	33.13808
No	Standard	410	34.30962
Yes	Basic	534	33.27103
Yes	Premium	548	34.14330
Yes	Standard	523	32.58567

Checked how churn was distributed within each plan to get an early sense of whether subscription type influenced attrition.

5 aggregating data

5.1 support tickets and payment failures

5.1.1 general look

to look more into other variables, i did need to see average support tickets and payment failures by churn state and see total numbers and comapare if it respects the ration of 57% and 43%.

tickets_churn_pattern <- customer_subscription_churn_usage_patterns %>% group_by(churn) %>% summarise(tickets_num=sum(support_tickets),avg_tickets=mean(support_tickets),payments_fail_num=sum(payment_failures),avg_pay_fail=mean(payment_failures))
knitr::kable(head(tickets_churn_pattern))

churn	tickets_num	avg_tickets	payments_fail_num	avg_pay_fail
No	4106	3.435983	2477	2.072803
Yes	6780	4.224299	4500	2.803738

difference between avg support tickets is 4.22 for yes and 3.44 for no, meaning that clients who leaves haves more support tickets. for payment failures also 2.8 for leaving and 2 for staying meaning they tend to have more payment failures. and for total of these variables they show higher sum more than the ratio meaning it not because of churn yes clients are more than the others.

5.1.2 tickets and last login

tickets_range <- customer_subscription_churn_usage_patterns %>% group_by(churn,support_tickets) %>% summarise(total=n(),avg_last_login=mean(last_login_days_ago))

## `summarise()` has grouped output by 'churn'. You can override using the
## `.groups` argument.

knitr::kable(head(tickets_range))

churn	support_tickets	total	avg_last_login
No	0	187	29.35294
No	1	165	25.69697
No	2	152	27.27632
No	3	151	28.39073
No	4	125	24.36000
No	5	94	26.01064

This aggregation summarizes customer behavior by churn status and number of support tickets. This helps identify patterns between support interactions and churn, highlighting whether higher support ticket volumes correlate with increased attrition.

5.1.3 payment failure

PF_range <- customer_subscription_churn_usage_patterns %>% group_by(churn,payment_failures) %>% summarise(total=n(),avg_tickets=mean(support_tickets))

## `summarise()` has grouped output by 'churn'. You can override using the
## `.groups` argument.

knitr::kable(head(PF_range))

churn	payment_failures	total	avg_tickets
No	0	267	3.674157
No	1	297	3.673401
No	2	171	3.397661
No	3	155	3.296774
No	4	152	2.842105
No	5	153	3.333333

Purpose: This aggregation analyzes the relationship between payment failures (PF) and churn:

Total customers are counted for each combination of churn status (Yes/No) and number of payment failures.

Average support tickets per group are calculated to see how support interactions vary with payment failures.

This helps identify whether customers with more payment failures are more likely to churn and whether higher PFs coincide with increased support needs.

5.1.4 churn by month

churn_rate_for_MONTHS <- customer_subscription_churn_usage_patterns %>% mutate(month=make_date(year(signup_date),month(signup_date),1)) %>% group_by(month) %>% summarise(total_cust=n(),churn_count=sum(churn=="Yes"),churn_rate=mean(churn=="Yes"),.groups = "drop")
knitr::kable(head(churn_rate_for_MONTHS))

month	total_cust	churn_count	churn_rate
2023-01-01	88	54	0.6136364
2023-02-01	112	70	0.6250000
2023-03-01	99	51	0.5151515
2023-04-01	118	62	0.5254237
2023-05-01	125	66	0.5280000
2023-06-01	121	64	0.5289256

Purpose: This aggregation calculates monthly churn trends to understand how churn evolves over time:

The signup_date is converted to a month-level date to group customers by the month they signed up.

For each month, it computes:

total_cust: total number of customers who signed up that month

churn_count: number of customers who churned

churn_rate: proportion of churned customers for that month

This allows us to observe patterns, such as whether certain months have higher churn rates or if churn correlates with customer acquisition volume.

5.1.5 churn by payment failures

churn_rate_for_PF <- customer_subscription_churn_usage_patterns  %>% group_by(payment_failures) %>% summarise(total_cust=n(),churn_count=sum(churn=="Yes"),churn_rate=mean(churn=="Yes"),.groups = "drop")

knitr::kable(head(churn_rate_for_PF))

payment_failures	total_cust	churn_count	churn_rate
0	442	175	0.3959276
1	496	199	0.4012097
2	478	307	0.6422594
3	471	316	0.6709130
4	453	301	0.6644592
5	460	307	0.6673913

bin_pf <- churn_rate_for_PF %>% group_by(pf_bin=case_when(payment_failures == 0 ~ "0", payment_failures == 1 ~ "1",payment_failures == 2 ~ "2", payment_failures >= 3 ~ "3+")) %>% summarise(total_cus=sum(total_cust),churn_count=sum(churn_count),rate=sum(churn_rate*total_cust)/sum(total_cust))

knitr::kable(head(bin_pf))

pf_bin	total_cus	churn_count	rate
0	442	175	0.3959276
1	496	199	0.4012097
2	478	307	0.6422594
3+	1384	924	0.6676301

Purpose: This code investigates the relationship between payment failures (PF) and churn risk, identifying critical thresholds.

churn_rate_for_PF

Groups customers by payment_failures.

Calculates for each PF count:

total_cust: total customers with that number of failed payments

churn_count: number of churned customers

churn_rate: proportion of churned customers

This provides a raw view of how churn changes as payment failures increase.

bin_pf

Groups the PF counts into binned categories: 0, 1, 2, 3+

Aggregates totals and computes a weighted churn rate for each bin.

This makes it easier to visualize churn patterns and highlights the “two-payment-fail tipping point”, where churn jumps significantly.

5.1.6 plans and pf

plan_total_pf <- customer_subscription_churn_usage_patterns %>% group_by(plan_type,churn) %>% summarise(total=n(),PF=sum(payment_failures),avg_pf=mean(payment_failures))

## `summarise()` has grouped output by 'plan_type'. You can override using the
## `.groups` argument.

knitr::kable(head(plan_total_pf))

plan_type	churn	total	PF	avg_pf
Basic	No	389	839	2.156812
Basic	Yes	534	1482	2.775281
Premium	No	396	796	2.010101
Premium	Yes	548	1512	2.759124
Standard	No	410	842	2.053659
Standard	Yes	523	1506	2.879541

Purpose: This code explores payment failures across different subscription plans and how they relate to churn.

Groups data by plan_type (Basic, Standard, Premium) and churn status (Yes/No).

For each combination, calculates:

total: number of customers

PF: total number of payment failures

avg_pf: average number of payment failures

Insight: This helps identify whether certain plans are more prone to payment issues and churn, providing a plan-specific view of risk.

5.2 tenure

5.2.1 tenure by churn

tenure_patterns <- customer_subscription_churn_usage_patterns %>% group_by(churn) %>% summarise(total=sum(tenure_months),avr_tenure=mean(tenure_months))
knitr::kable(head(tenure_patterns))

churn	total	avr_tenure
No	22213	18.58828
Yes	29903	18.63115

Purpose: This code examines customer tenure in relation to churn.

Groups the dataset by churn status (Yes/No).

Calculates:

total: total months of tenure for each churn group

avr_tenure: average tenure (in months) per group

Insight: This gives a general understanding of how long churned vs. retained customers have been with the service, helping to see whether longer-tenured customers are more or less likely to leave.

5.3 hours usage and last login

avg_hours_pattern <- customer_subscription_churn_usage_patterns %>% group_by(churn) %>% summarise(total=sum(avg_weekly_usage_hours),avr_hours=mean(avg_weekly_usage_hours))
knitr::kable(head(avg_hours_pattern))

churn	total	avr_hours
No	16427	13.74644
Yes	19669	12.25483

avg_hours_pattern Purpose: Analyzes average weekly app usage for churned vs. retained customers.

Groups by churn status.

Calculates:

total: total weekly usage hours for each group

avr_hours: average weekly usage per customer

Insight: Helps understand whether more engaged users (higher weekly usage) are less likely to churn. the avg hours are same which means that for churn variable it shows that they both use it for same period 13 yes 12 no.

mean_last_login <- customer_subscription_churn_usage_patterns %>% group_by(churn) %>% summarise(avg=mean(last_login_days_ago))
knitr::kable(head(mean_last_login))

churn	avg
No	26.14979
Yes	32.87539

mean_last_login Purpose: Measures recency of activity among customers.

Groups by churn.

Calculates the average number of days since the last login for each group.

Insight: Provides a sense of whether recently active customers are more likely to stay, which can help in identifying dormant users at risk of churning. it shows that churned tend to have higher avg last login days.

6 First spotted patterns,trends or difference

the patterns are organized by the importance.

6.1 payment failures

the bin_pf dataframe shows that 2 payment failures is where churn no clients number start to decrease and churn yes increase meaning that 2 payment failures is dead point. also the number of payment failures for churned people is higher meaning they get more payment failures. and the churn rate from 1 to 2 jumps from 0.4 to 0.64.

6.2 support tickets

support tickets shows that churned people have done more support tickets than staying people with an average of 4.22 compared to 3.44 making a difference of approximatly 1 support tickets.

6.3 plans

plans doesnt show difference in term of churn state they are same arround 33.3% for each and also same in payment failures.

6.4 others

for avg weekly hour usage it shows a difference of 1h from 13 for no and 12 for yes meaning the difference is little.

for tenure months they are the same arround 18.5.

last login in days shows that churned people have from 30-36 days without using the subscription while people who stayed 22-29 days.

7 plotting

note: in plotting phase i used AI by giving him first manual code for plot using ggplot and asking him to improve it and make it more beautiful. which cuts time.

Summary_of_churn_analysis_and_exploration_in_R

Oussama Ait-Idmou

2026-02-03