Introduction

This analysis of the Telco Customer Churn dataset focuses specifically on customers who subscribe to internet services. The objective is to examine how tenure relates to contract type and total charges, while also exploring differences in payment methods across internet service categories. In addition, the analysis evaluates the demographic composition of the customer base (particularly senior citizen status ) and investigates the adoption of value-added services such as Online Security.

Together, these factors provide insight into customer behavior patterns and potential drivers of churn within the internet service segment.

Dataset

The Telco Customer Churn dataset from Kaggle can be found by clicking the link below.

Kaggle Telco Dataset

Findings

Tab 1

Customers by Internet Service stacked by Senior Citizen Status

Fiber optic is the company’s largest internet service segment and contains the highest concentration of senior citizens. By comparison, DSL customers and those without internet service are predominantly non-senior, indicating that demographic differences exist but do not fundamentally alter the overall service distribution.

Although senior citizens represent a notable portion of the Fiber Optic segment, the overall customer distribution across internet services remains heavily driven by service type rather than demographic segmentation. Therefore, senior status will not be treated as a key differentiating factor in the subsequent churn-focused analysis. The analysis will instead prioritize tenure, contract type, total charges, and payment behavior as primary explanatory variables.

# visual - 2 bar graph

#data preparation
internet_senior_df <- df %>%
  mutate(
    SeniorCitizenLabel = ifelse(SeniorCitizen == 1,
                                "Senior Citizen",
                                "Not Senior Citizen")
  ) %>%
  count(InternetService, SeniorCitizenLabel) %>%
  data.frame()

internet_senior_df
##   InternetService SeniorCitizenLabel    n
## 1             DSL Not Senior Citizen 2162
## 2             DSL     Senior Citizen  259
## 3     Fiber optic Not Senior Citizen 2265
## 4     Fiber optic     Senior Citizen  831
## 5              No Not Senior Citizen 1474
## 6              No     Senior Citizen   52
# plotting the bar chart
internet_senior_bar <- ggplot(
  internet_senior_df,
  aes(x = InternetService, y = n, fill = SeniorCitizenLabel)
) +
  geom_bar(stat = "identity", colour = "black") +
  geom_text(
    aes(label = comma(n)), #add comma to show the numbers better
    position = position_stack(vjust = 0.5),
    size = 4
  ) +
  scale_fill_manual(
    values = c(
      "Senior Citizen" = "coral",
      "Not Senior Citizen" = "aquamarine"
    )
  ) +
  labs(
    title = "Customers by Internet Service\n(Stacked by Senior Citizen Status)",
    x = "Internet Service",
    y = "Number of Customers",
    fill = "Senior Status"
  ) +
  scale_y_continuous(labels = comma) + #add comma to numbers
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5))

internet_senior_bar

Tab 2

Customers with Internet Service with added Online Security

The majority of customers with internet service do not subscribe to Online Security as an add-on feature. However, without detailed pricing information or insight into how these service options are positioned and communicated to customers, it is difficult to determine the underlying drivers of this adoption gap. While Online Security adoption remains below 40%, the root cause is unclear. Is the additional cost perceived as too high relative to its value, or is customer awareness of the service limited? Further business investigation into pricing strategy and service presentation would provide greater clarity into whether adoption is constrained by cost, communication, or perceived value.

#prepping data for pie chart
pie_df <- df %>%
  dplyr::filter(InternetService != "No") %>%
  dplyr::mutate(
    online_security = dplyr::case_when(
      OnlineSecurity == "Yes" ~ "Has Online Security",
      OnlineSecurity == "No"  ~ "No Online Security",
      TRUE ~ NA_character_
    )
  ) %>%
  dplyr::filter(!is.na(online_security)) %>%
  dplyr::count(online_security) %>%
  dplyr::mutate(
    percent = n / sum(n),
    label = scales::percent(percent, accuracy = 0.1)
  )

pie_df
##        online_security     n   percent  label
##                 <char> <int>     <num> <char>
## 1: Has Online Security  2019 0.3659598  36.6%
## 2:  No Online Security  3498 0.6340402  63.4%
#plotting pie chart
ggplot(pie_df, aes(x = "", y = n, fill = online_security)) +
  geom_col(width = 1, color = "white") +
  coord_polar(theta = "y") +
  geom_text(aes(label = label), position = position_stack(vjust = 0.5), size = 4) +
  labs(
    title = "Customers With Internet Service and Online Security",
    fill = ""
  ) +
  theme_void() +
  theme(plot.title = element_text(hjust = 0.5))

Tab 3

Average Monthly Charges by Tenure and the Type of Payment Method

The purpose of this analysis is to determine whether payment behavior is correlated with internet service type. Specifically, I wanted to assess whether customers within different internet service categories (DSL, Fiber optic, or No Internet) demonstrate differing patterns in their use of electronic payment methods (electronic check, credit card, or automatic bank transfer) versus traditional mailed check payments.

From the distribution, there appears to be a measurable association between internet service tier and payment method. Fiber optic customers show a significantly higher concentration of electronic check usage compared to DSL and non-internet customers. In contrast, customers without internet service disproportionately rely on mailed checks. DSL customers fall somewhere in between, with a more balanced distribution across payment types. While this does not establish causation, it does indicate a clear correlation between service type and payment behavior.

In addition to payment behavior, I evaluated tenure and average monthly charges to understand how revenue evolves across the customer lifecycle. The data shows a positive correlation between tenure and average monthly charges across all internet service categories, though the magnitude of this relationship differs substantially by service tier.

For Fiber optic customers, average monthly charges increase from approximately $77–$80 during the first 10 months of tenure to roughly $105–$107 by 60–70 months. This represents an increase of nearly $30 per month over time. DSL customers follow a similar trajectory, increasing from approximately $44–$48 in early tenure to about $75–$77 at higher tenure levels — also an increase of roughly $30 per month. In contrast, customers without internet service show only a modest increase from about $20 to $23 as tenure increases.

These results suggest two important patterns:

There is a clear positive relationship between tenure and monthly revenue, particularly among internet-enabled customers.

Revenue growth is significantly more pronounced in Fiber optic and DSL segments compared to customers without internet service.

Together, the findings indicate that longer-tenured internet customers generate substantially higher recurring revenue, and payment behavior varies meaningfully by service tier. This reinforces the importance of understanding service-level dynamics when evaluating revenue performance and potential churn exposure.

line_df <- df %>%
  filter(!is.na(tenure), !is.na(MonthlyCharges), !is.na(InternetService)) %>%
  group_by(InternetService, tenure) %>%
  dplyr::summarise(
    avg_monthly = mean(MonthlyCharges, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  data.frame()

names(line_df)
## [1] "InternetService" "tenure"          "avg_monthly"
head(line_df)
##   InternetService tenure avg_monthly
## 1             DSL      0    64.94000
## 2             DSL      1    44.21061
## 3             DSL      2    47.75061
## 4             DSL      3    47.18451
## 5             DSL      4    48.24590
## 6             DSL      5    50.53023
hi_lo <- line_df %>%
  group_by(InternetService) %>%
  filter(
    avg_monthly == min(avg_monthly) |
      avg_monthly == max(avg_monthly)
  ) %>%
  ungroup() %>%
  data.frame()

hi_lo
##   InternetService tenure avg_monthly
## 1             DSL      1    44.21061
## 2             DSL     72    76.87594
## 3     Fiber optic      1    77.58766
## 4     Fiber optic     72   107.15552
## 5              No     16    19.80789
## 6              No     65    23.10417
inter_service_line <- ggplot(line_df, aes(x = tenure, y = avg_monthly, group = InternetService, color = InternetService)) +
  geom_line(size = 1) +
  geom_point(size = 2) +
  geom_point(data = hi_lo, aes(x = tenure, y = avg_monthly), size = 3, shape = 21, fill = "white", color = "black") +
  geom_label_repel(data = hi_lo, aes(label = paste0("$", round(avg_monthly, 2))),
    box.padding = 1,
    point.padding = 1,
    size = 3,
    color = "grey40",
    segment.color = "darkblue") +
  labs(
    title = "Average Monthly Charges by Tenure\n(Min & Max per Internet Service)",
    x = "Tenure (Months)",
    y = "Average Monthly Charges",
    color = "Internet Service") +
  scale_y_continuous(
    labels = dollar,
    breaks = seq(
      0,
      ceiling(max(line_df$avg_monthly) / 20) * 20,by = 20)) + #do this to show the $ amounts in the y axis is a cleaner fashion
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5))

inter_service_line

if ("package:plotly" %in% search()) detach("package:plotly", unload = TRUE)

#preparing data for donuts
donut_df <- df %>%
  count(InternetService, PaymentMethod) %>%
  group_by(InternetService) %>%
  mutate(
    pct = n / sum(n),
    label = percent(pct, accuracy = 0.1)
  ) %>%
  ungroup() %>%
  data.frame()

donut_df
##    InternetService             PaymentMethod    n        pct label
## 1              DSL Bank transfer (automatic)  566 0.23378769 23.4%
## 2              DSL   Credit card (automatic)  594 0.24535316 24.5%
## 3              DSL          Electronic check  648 0.26765799 26.8%
## 4              DSL              Mailed check  613 0.25320116 25.3%
## 5      Fiber optic Bank transfer (automatic)  646 0.20865633 20.9%
## 6      Fiber optic   Credit card (automatic)  597 0.19282946 19.3%
## 7      Fiber optic          Electronic check 1595 0.51518088 51.5%
## 8      Fiber optic              Mailed check  258 0.08333333  8.3%
## 9               No Bank transfer (automatic)  332 0.21756225 21.8%
## 10              No   Credit card (automatic)  331 0.21690695 21.7%
## 11              No          Electronic check  122 0.07994758  8.0%
## 12              No              Mailed check  741 0.48558322 48.6%
#plotting donut charts
#remove thick white borders so it does not look like there are pieces missing from the donut
payment_donut <- ggplot(donut_df, aes(x = 2, y = pct, fill = PaymentMethod)) +
  geom_bar(stat = "identity", width = 1, color = NA) +
  coord_polar(theta = "y") +
  xlim(0.5, 2.5) +                                      
  geom_text(
    aes(label = label),
    position = position_stack(vjust = 0.5),
    size = 4
  ) +
  facet_wrap(~InternetService) +
  labs(
    title = "Payment Method Distribution by Internet Service",
    fill = "Payment Method"
  ) +
  theme_void() +
  theme(
    plot.title = element_text(hjust = 0.5)
  )

payment_donut

Tab 4

Churn Risk Heat Maps for Internet Only Customers by Tenure compared by Contract Type and Lifetime Value

Churn Risk by Tenure and Customer Lifetime Value

The first heat map highlights a clear inverse relationship between tenure and churn risk, particularly as customer lifetime value increases.

Customers in the 0–12 month range with low lifetime value ($0–$1,500) exhibit the highest churn risk, reaching 51.5%. Even customers with slightly higher lifetime value in early tenure still show elevated churn rates (40–46%). This suggests that the first year of the customer lifecycle represents the most volatile period, regardless of spend level.

As tenure increases, churn risk declines materially across nearly all value tiers. For customers in the 49–72 month range, churn risk falls into the low single digits for most lifetime value segments. Notably, high-value customers ($3,000+) with tenure beyond 49 months demonstrate churn rates below 10%.

This pattern indicates:

  • A strong negative correlation between tenure and churn risk.

  • Higher lifetime value customers become significantly more stable as tenure increases.

  • Early-tenure customers represent the greatest retention vulnerability, especially within lower value tiers.

From a revenue perspective, the riskiest segment is low-tenure, moderate-value customers — they generate revenue but have a high probability of churn.

Churn Risk by Tenure and Contract Type (Internet Customers Only)

The second heat map isolates internet customers and introduces contract type as a structural driver of churn.

The most significant finding is the consistently elevated churn rate among month-to-month contracts, particularly in early tenure. Customers on month-to-month contracts in the 0–12 month range exhibit a churn rate of 58.3%, the highest observed in the analysis.

Even as tenure increases, month-to-month contracts maintain materially higher churn rates compared to fixed-term contracts:

At 61–72 months, month-to-month churn remains at 21.5%, still significantly higher than one-year or two-year contracts.

In contrast:

  • One-year contracts show moderate churn (10–18%), decreasing gradually over tenure.

  • Two-year contracts exhibit the lowest churn across all tenure ranges, remaining near 0–5%.

This suggests:

  • Contract structure is a primary churn driver.

  • Month-to-month agreements significantly increase churn risk independent of tenure.

  • Long-term contracts materially stabilize customer retention

#visual #5 - heatmap
#group tenure in monthly groups by Lifetime Value
heat_df_lifecycle <- df %>%
  filter(!is.na(TotalCharges)) %>%
  mutate(
    ChurnFlag = ifelse(Churn == "Yes", 1, 0),
    
    # Tenure grouping (same structure as before)
    TenureGroup = cut(
      tenure,
      breaks = c(0, 12, 24, 36, 48, 60, 72), #group the tenure months
      labels = c("0–12", "13–24", "25–36", "37–48", "49–60", "61–72"),
      include.lowest = TRUE
    ),
    
    # Business lifecycle value grouping
    ChargeGroup = cut(
      TotalCharges,
      breaks = c(0, 500, 1500, 3000, 5000, Inf), #group the $ amounts to avoid scientific notation
      labels = c("$0–$500",
                 "$500–$1,500",
                 "$1,500–$3,000",
                 "$3,000–$5,000",
                 "$5,000+"),
      include.lowest = TRUE
    )
  ) %>%
  group_by(TenureGroup, ChargeGroup) %>%
  summarise(
    churn_rate = mean(ChurnFlag, na.rm = TRUE),
    n = n(),
    .groups = "drop"
  ) %>%
  data.frame()

heat_df_lifecycle
##    TenureGroup   ChargeGroup  churn_rate    n
## 1         0–12       $0–$500 0.467316514 1744
## 2         0–12   $500–$1,500 0.515081206  431
## 3        13–24       $0–$500 0.058823529  238
## 4        13–24   $500–$1,500 0.327380952  504
## 5        13–24 $1,500–$3,000 0.407801418  282
## 6        25–36       $0–$500 0.000000000   18
## 7        25–36   $500–$1,500 0.049586777  242
## 8        25–36 $1,500–$3,000 0.274944568  451
## 9        25–36 $3,000–$5,000 0.363636364  121
## 10       37–48   $500–$1,500 0.036842105  190
## 11       37–48 $1,500–$3,000 0.164835165  182
## 12       37–48 $3,000–$5,000 0.268817204  372
## 13       37–48       $5,000+ 0.444444444   18
## 14       49–60   $500–$1,500 0.032051282  156
## 15       49–60 $1,500–$3,000 0.103448276   87
## 16       49–60 $3,000–$5,000 0.123867069  331
## 17       49–60       $5,000+ 0.251937984  258
## 18       61–72   $500–$1,500 0.007246377  138
## 19       61–72 $1,500–$3,000 0.000000000  165
## 20       61–72 $3,000–$5,000 0.032653061  245
## 21       61–72       $5,000+ 0.097788126  859
# Plot
#show the higher churn rate in red with the lower churn rate in green
ggplot(
  heat_df_lifecycle,
  aes(x = TenureGroup,
      y = ChargeGroup,
      fill = churn_rate)
) +
  geom_tile(color = "white") +
  geom_text(
    aes(label = percent(churn_rate, accuracy = 0.1)),
    size = 3
  ) +
  labs(
    title = "Churn Risk Heat Map\nby Tenure and Customer Lifetime Value",
    x = "Tenure (Months)",
    y = "Total Charges (Customer Value Stage)"
  ) +
  scale_fill_gradient2(
    low = "darkgreen",          # Low churn = strong retention
    mid = "lightyellow",
    high = "red",               # High churn = risk
    midpoint = mean(heat_df_lifecycle$churn_rate, na.rm = TRUE),
    labels = percent,
    name = "Churn Rate"
  ) +
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5))

#visual - 6 Heat Map Tenure, charges and churn
#only focus on customers with internet service by Contract type
#add in the same tenure grouping



  heat_contract_internet_df <- df %>%
  filter(InternetService != "No") %>% 
  mutate(
    ChurnFlag = ifelse(Churn == "Yes", 1, 0),
    
    TenureGroup = cut(
      tenure,
      breaks = c(0, 12, 24, 36, 48, 60, 72), #group tenure months
      labels = c("0–12", "13–24", "25–36", "37–48", "49–60", "61–72"),
      include.lowest = TRUE
    )
  ) %>%
  group_by(TenureGroup, Contract) %>%
  summarise(
    churn_rate = mean(ChurnFlag, na.rm = TRUE),
    n = n(),
    .groups = "drop"
  ) %>%
  data.frame()

heat_contract_internet_df
##    TenureGroup       Contract churn_rate    n
## 1         0–12 Month-to-month 0.58281445 1606
## 2         0–12       One year 0.18367347   49
## 3         0–12       Two year 0.00000000   18
## 4        13–24 Month-to-month 0.41095890  657
## 5        13–24       One year 0.14705882  102
## 6        13–24       Two year 0.00000000   26
## 7        25–36 Month-to-month 0.34977578  446
## 8        25–36       One year 0.10497238  181
## 9        25–36       Two year 0.00000000   35
## 10       37–48 Month-to-month 0.35099338  302
## 11       37–48       One year 0.15887850  214
## 12       37–48       Two year 0.05063291   79
## 13       49–60 Month-to-month 0.27896996  233
## 14       49–60       One year 0.15162455  277
## 15       49–60       Two year 0.04968944  161
## 16       61–72 Month-to-month 0.21495327  107
## 17       61–72       One year 0.13286713  286
## 18       61–72       Two year 0.04200542  738
# Plot heat map
#use the same heatmap colors to indicate churn rate
#only focus on customers with internet
ggplot(
  heat_contract_internet_df,
  aes(x = TenureGroup,
      y = Contract,
      fill = churn_rate)
) +
  geom_tile(color = "white") +
  geom_text(
    aes(label = percent(churn_rate, accuracy = 0.1)),
    size = 4
  ) +
  labs(
    title = "Churn Risk Heat Map (Internet Customers Only)\nby Tenure and Contract Type",
    x = "Tenure (Months)",
    y = "Contract Type"
  ) +
  scale_fill_gradient2(
    low = "darkgreen",     # Low churn
    mid = "lightyellow",
    high = "red",          # High churn
    midpoint = mean(heat_contract_internet_df$churn_rate, na.rm = TRUE),
    labels = percent,
    name = "Churn Rate"
  ) +
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5))

Conclusion

The analysis reveals several consistent patterns across internet service type, payment behavior, tenure, revenue, and churn risk.

Tenure demonstrates a strong negative correlation with churn. Customers within their first 12 months exhibit the highest churn rates, particularly those on month-to-month contracts. As tenure increases, churn risk declines materially across all segments, with long-tenured customers under two-year contracts showing the greatest stability.

Contract structure is a significant driver of churn. Month-to-month customers consistently exhibit elevated churn rates compared to one-year and two-year contracts, indicating that commitment length materially impacts retention outcomes.

Revenue trends show a positive correlation between tenure and average monthly charges, especially among Fiber optic and DSL customers. Longer-tenured internet customers generate substantially higher recurring revenue compared to early-tenure customers and those without internet service.

Payment behavior varies meaningfully by internet service type, with Fiber optic customers more heavily utilizing electronic payment methods, while non-internet customers rely more on mailed checks. Although causality cannot be inferred, the association suggests behavioral differences across service tiers.

Collectively, the findings indicate that early-tenure, month-to-month internet customers represent the highest churn exposure, while long-tenured customers under fixed-term contracts drive the most stable and valuable revenue. Retention strategies should therefore prioritize stabilizing customers within their first year and incentivizing longer-term contract adoption.