South East Technological University

MSc in Digital Marketing and Analytics

Author

Chenjerai Muchenje

Published

Invalid Date

The Relationship betweeen Contact Frequency and Contract Renewal

The boxplots show a substantial difference between the two groups: the Box Pots represent the numbers of contacts that renewed and did not renew their contracts. The contacts who did not renew their contracts are heavily concentrated towards zero with a very small interquartile range with many contacts appearing as outliers. This reflects that while these contacts are few, they are high value customers. In comparison to the renewing group this group has few customers. In the contacts who renewed their contracts, the box plot’s larger size reveal that they are more contacts who renewed their contracts as compared to those who did not renew their contracts. the median for the renewing group is higher than the one in the non-renewing group. This means that the renewing customer are likely to renew.It indicates that the central tendency of the renewing for the renewal customers is greater than that of the non-renewing customers. This implies a potential association in which higher customer counts are more characteristic of the renewing population than the non-renewing population. They are also several outliers reaching the 30-40 range showing that the renewing customers have larger number as reflected by large interquartile range.

The organisation needs to prioritise customers with low counts, the non-renewing group shows a concentration of low customer counts suggesting that the customer with lower customer counts is likely not to renew. As such the company should implement targeted retention strategies such as personalised outreach, incentives, or enhanced support especially for customers exhibiting low customer counts, as they are at greater risk of churn. The renewing group has higher medians and more substantial variability, with several high-value outliers and the customers with higher counts appear more engaged and more likely to renew. This then calls for strengthening engagements with high value customers. This will be achieved by developing loyalty programs to high value customers to cement the possibility of renewing. The plots show that low customer counts are associated with non-renewal and the organisation needs to understand the reasons for customers remaining at low levels. This necessitates an investigation into the causes of low customer counts. This helps in identifying the reasons why customer activity will not be improving and will assist in the formulation of strategies to increase customer activity The difference between these groups show that customer counts reflect the number of contacts who are likely to renew. In line with this, user count can be used as early warning sign for churn and this will enable the identification of customers who are likely not to renew who may require intervention. The company also need to tailor communication and offers based on segment and personalise messages, prices and incentives according to segments

Renewal Number of customers
No 426
Yes 424

Relationship between Contact Recency and Contract Renewal

The boxplot compares Contact Recency (days since last contact) for customers who renewed versus those who did not renew. The boxplots show that customers who renewed had more recent contracts. The median recency for renewed customers is lower (around 10–12 days) indicating that customers who were contacted more recently are highly likely to renew their contracts. Comparably, the median is higher for customers who did not renew their contracts and this reflects that the longer the customer is not contacted the less likely is the renewal of the contract. The interquartile range is almost similar for both customers who renewed and those who did not. This indicates that the main difference between the two groups lies in their average recency, not in how widely their recency values vary. These outliers in both cases show extreme cases. Some renewed customers show long periods of recency meaning some people renew even after long time, but this is less common. The non-renewals had very recent contact suggesting contact alone doesn’t guarantee renewal, but it improves the chances. This plot strongly indicates a negative relationship between time-since-last-contact and renewal likelihood.

The customers who renewed were contacted more recently and this requires the music label music streaming company to contact customers earlier to improve renewal likelihood. This can be achieved through implementing automated reminders 30 to 60 days before renewal.Customer touch points should be increased as the customer approach the end of contract. The company should introduce a structured customer engagement strategies by introducing monthly check-ins or loyalty, updates or loyalty messages through the use of customer relations management. It will also be prudent to prioritise customers who have not been contacted through reengagement offers, personalised incentives. The box plots show that contact recency impacts renewal therefore trigger emails or messages, discount offers and product education reminders could be implemented. This can also be enhanced by improving customer support through frequent communication through support teams follow ups, satisfaction surveys and keep warm communication challenges.

Question 3

relationship between renewed and number of complaints

Average Daily Price by Property Type
No of Complaints No of customer renewed
0 660
1 100
2 44
3 12
4 12
5 8
6 8
7 3
8 1
10 1
17 1

The histogram shows a highly right-skewed distribution of the number of complaints with the majority of customers making a few to no complaints reflecting that more customers have not reported any issues. It also shows that a small segment generates repeated complaints with a noticeable drop occurring after 1 to 2 complaints, but a long tail extends to 5, 10, and even 15+ complaints. This indicates a small number of high-friction customers who experience persistent issues. The distribution shows that complaints are not common across the whole customer base, suggesting general satisfaction but significant pain points for a subset. Extreme cases of complaints are noticed on the ten or more categories and these are likely outliers worth investigating individually, as they may reflect service errors, billing issues, or unresolved disputes

It is clear that most of the customer complain at least once or never complain, it is important for the company to ensure quick and effective first-contact resolution can prevent escalation. High complaint customers are very few hence it will be strategic to investigate their issues individually flag these customers for priority intervention. This would also warrant the intervention by retention specialist or the or customer care team to assess the unresolved systemic issues. The common root causes behind the multiple complaints should be investigated and these investigations can always assist in coming up with predictive models to forecast churn. Customer education and self service can also be introduced through FAQs, self-service portals and refined on boarding communication

The Relationship between Expenditure and Contract Renewal

The box plot compares the amount spent by customers who renewed their contracts versus those who did not renew. It shows that the customers who renewed their contracts tend to spend slightly more than those that did not renew their contracts suggesting that higher spending customers are more likely to renew their contacts. The se two groups of customers have a similar spending pattern with overlapping interquartile ranges. This shows that expenditure alone is not a strong predictor of contract renewal, other factors such as satisfaction and service quality may influence contract renewal. Surprisingly a few customers with very high spending did not renew possibly translating to dissatisfaction among high value customers. The box plots also show that both groups have comparable spending with a difference in the central tendency shown by the median. This signifies that renewal is affected by the spending level.

High spending customers not renewing can be caused by dissatisfaction hence the need to develop strategies to retain them. Some of the strategies might include personalised outreach, priority support and loyalty discounts. There is also a need to incentivise moderate spenders to encourage them to increase their spending. These customers may be engaged through usage-based recommendations, and up selling related services. It will also be prudent for the organisation to investigate the associated reason for churning. Churning may be caused by service issues, price sensitivity and the availability of alternatives. Due to overlapping spending ranges, it was also concluded that spending was not the only factor that led to churning. A such the impact of factors like number of complaints, recency of contact, customer tenure, contract type should be investigated

You can add options to executable code like this

The Relationship between Contract Renewal and Tenure

The histogram indicates that the length of time a customer has been with the company varies a lot, from relatively new customers (0–30 days) to long-term customers (300+ days). But the distribution is mostly in the first 150 days, which means that most of the consumers are rather new. There is also a clear second group about 300 days, which shows that there are a lot of long-term clients. The graphic doesn’t show renewal segmentation directly, but the trend suggests that consumers who have been with the service for a long time may be more likely to renew, since they keep using it for longer lengths of time. Customers with shorter tenures may be more inclined to leave, therefore it’s important to get them involved early.

The music streaming company should strengthen the new customer recruitment as the largest group of customers is new. This can be achieved by early check ins accompanied by welcome offers or personalised recruitment. Continuous engagement will also reduce churn. The streaming company may also target transitioning short-term customers to long term customers through upselling or cross selling opportunities. Account reviews will be critical to determine whether it will be worthwhile to offer loyalty rewards. Implement Mid-Tenure Engagement Campaigns. The main goal of the campaign is to ensure that the customers stay long with the company therefore tenure should the backbone of the company’s churn predictive model. Use tenure clusters to segment communication and promotion to retain long term customer sand move short term to long term customers. From the histogram reflecting that there is a period that customers might be dropping off, it is recommended that the company investigates the causes of drop offs. This will assist in coming up with strategies to address service abandonment

The bar chart shows the number of customers who renewed or did not renew their contracts, separated by gender. It reflects that males renew at higher rates as compared to females. This suggests that the are more loyal male customers. On the other hand, females are more likely not to renew as the number of non-renewals is higher among females. This might signal that there are issues that need to be addressed that affect females. Overall, there are more males than females with the males having both high renewals and non-renewals indicating the company is targeting male customer effectively.

To increase female patronage, the company should be developing female-focused retention strategies by collecting data through surveys and interviews to understand why the females are living. Tailored communication that resonates with female customers is also another strategy that can be implemented. The introduction of targeted reminders before contract expiry and loyalty incentives can also be used to drive retention. While trying to recruit more female customers, the company should maintain and enhance engagement with males by continuing with strategies that are working well, as male renewal rates are strong. Refer-a-friend programs can also be used to leverage loyal male customers. Investigations are also key to examine the underlying drivers of gender differences. The company should ensure gender inclusive marketing and communication and this can be ensured through the review of the brand tone, imagery and product positioning and assure that the campaigns appeal equally to male and female customers.

Relationship between Renewal and Age

The median age is slightly higher among the customers who renewed their contracts suggesting that older customers are more likely to stay with the service. The renewed and non-renewed groups have similar age distributions, the interquartile ranges overlap considerable showing minimal age differences. The groups also have wide spread of ages, showing that renewal behaviour is not strongly concentrated in one age band. Young outliers appear more in the non-renewed group with several young customers under the 20 years appear more in the non-renewed group. This may indicate that younger customers are likely not to renew their contracts. the upper quartile for renewals extends higher suggesting that customers above 60 renew at a higher rate than similar -aged non renewals. Older customers may have higher loyalty rates and more stable routines.

The company will need to target younger customers as they are more likely to not to renew and this can be done by offering young adult discounts, flexible plans and using communication channels such as TikTok and Instagram that appeal to this age group. Loyalty strategies for older people should also be sustained through loyalty rewards and personalised support backed by service reliability and convenience. group. The marketers should also target customers aged 40 years and above through retention strategies such as email campaigns, renewal bonuses, research I also key in establishing the causes of churning among young users.

predictive Analytics - Classification Trees

Music Subscription Renewal Prediction

The classification tree model

Interpretation of the tree model

The decision tree identifies four most influential factors affecting contract renewal and the se are tenure (Length of Relationship (lor)), spending, number of contacts, and age. Each split in the tree represents a subgroup of customers whose behaviour differs meaningfully, and each percentage represents the predicted probability of renewal or non-renewal for customers falling into that node. The next section interprets the decision tree.

Decision Tree interpreatation

1. Root Node (Overall Population)

Root node

o No: 50% | Yes: 50% (100% of users)

At the overall population level, outcomes are evenly split between in renewal and renewal of subscription. This indicates that there is no dominant behaviour across the entire audience and highlights the necessity of segmentation to uncover meaningful patterns. This means renewal is influenced by other demographic or behavioural factors hence the need to look for a primary node.

  1. Primary Driver: Length of Relationship (LOR)

LOR ≥ 140

o No: 39% | Yes: 61% (45% of users)

Users with a longer relationship history are significantly more likely to convert with a probability of 61 %. This suggests that trust, familiarity, and prior experience with the brand strongly influence purchase decisions. These users represent loyal or experienced customers with high intent.

LOR < 140

o No: 59% | Yes: 41% (55% of users)

Users with shorter relationships are less likely to convert and require further segmentation to understand what conditions may encourage conversion.

  1. Spend as a Secondary Factor (Low LOR Users)

Spend < 182

o No: 59% | Yes: 41% (55% of users)

Lower spend among short-relationship users corresponds with reduced conversion likelihood, reinforcing that these users are still in an evaluation phase. This shows that more than half of users fall into a lower-spend category, where non-conversion becomes more likely than conversion. Users at this stage are typically still evaluating options, comparing prices, or lacking sufficient commitment to complete a purchase. Conversely, users above the spend threshold move to the other branch, where conversion probability improves significantly.

Spend ≥ 182

o No: 76% | Yes: 24% (10% of users)

High spend alone does not guarantee conversion when relationship depth is low. This suggests price sensitivity or insufficient trust, where users may be comparing Seed Golf against competitors before committing.

4. Role of Engagement Frequency (Number of Contacts)

Num_contacts < 7.5

o No: 55% | Yes: 45% (45% of users)

Limited contact frequency reduces renewal likelihood, indicating that single or infrequent interactions are insufficient to establish confidence. While still slightly weighted toward non-conversion, this group demonstrates greater purchase readiness than lower-spend users. The narrowing gap between Yes and No suggests that users are closer to completing a transaction but may still require reassurance or incentives.

Num_contacts ≥ 7.5

o No: 32% | Yes: 68% (3% of users)

Repeated interactions significantly increase renewal probability, even among users with shorter relationships. This highlights the importance of retargeting, email marketing, and consistent exposure.

5. Age as a Refinement Variable

Age < 61

o No: 57% | Yes: 43% (42% of users)

Younger users demonstrate moderate renewal likelihood, often requiring additional touchpoints and reassurance before purchasing.

Age ≥ 61

o No: 33% | Yes: 67% (3% of users)

Older users, while fewer in number, show a high probability of conversion once engaged. These users tend to be cautious but decisive, often representing high-value purchases. Although this group represents a small proportion of the audience, the conversion probability more than doubles compared to low-engagement users. This confirms that repeated interactions significantly increase purchase confidence

Summary of Behavioural Insights reflect that long relationship users and highly engaged users exhibit the highest conversion probabilities and repeated contact can compensate for shorter relationship length. The insights also show that trust us key the renewal of the contracts and older r users renew less frequently but more decisively. The decision tree demonstrates that conversion behaviour is primarily driven by trust-related factors such as relationship length and engagement frequency rather than spend alone. Users with longer relationships or repeated brand interactions exhibit significantly higher conversion probabilities, while older users convert decisively once confidence is established. These insights reinforce the importance of relationship-based marketing, repeated exposure, and reassurance-focused messaging in driving sustainable conversion growth. The Music subscription should prioritise relationship management and loyalty for long relationship customers. Retargeting and email frequency should also be used to build trust among new users. The organisation needs to concentrate on other aspects of the services like value, quality and reassurance messaging. For inclusivity, the entity needs to design campaigns that support multi-touch conversion journeys, especially in Europe

Variable Importance

            lor           spend             age    num_contacts contact_recency 
     21.5220728      10.3707565       9.9887537       3.7116877       0.5951338 

The variable importance values indicate how much each feature contributed to splitting the data and improving the predictive accuracy of the decision tree. Higher values mean the variable played a more significant role in determining whether customers renewed their contracts. In this scenario Tenure is the most important with a value of 21.52 means this variable contributed the most to reducing uncertainty. This the length of tenure is the strongest indicator of renewal as reflected by customers with tenure less than 140 days behaving differently from long-term customers, making tenure the key driver of churn Amount spent is the second most influential variable with a value of 10.37 reflecting that spending levels help identify high-risk customers among those with short tenure. As reflected by the tree low spenders are likely to churn, especially early in the relationship. Age was the third most important variable with a value of 9.99. Age assists in distinguishing behavioural patterns with older customers 60 years and above renewing contracts due to contacts. The number of contacts was a weak variable with a value of 3.71 followed by contact recency was the weakest factor with a value of 0.59

#variable importance as a percentage of all improvements to the model

Variable Importance Table
x
lor 21.5220728
spend 10.3707565
age 9.9887537
num_contacts 3.7116877
contact_recency 0.5951338

The variable importance percentages indicate how strongly each predictor contributed to the decision tree’s ability to classify whether a customer renewed their contract. The length of the relationship is the most important predictor with 47% accounting for nearly half of the model’s predictive power. This means the length of the customer relationship is the strongest indicator of renewal likelihood. The tree relies heavily on this factor to create the first and most impactful split, distinguishing high-churn early-tenure customers from loyal long-tenure customers. Spending is the second most important contributor to the model with a contribution of 22%. Age is also a significant contributor to the model with a 22% contribution. Age becomes important when analysing high-contact customers, differentiating younger customers (lower renewal) from older customers (higher renewal). Contact frequency has a smaller but still meaningful effect contribution of 8% as it helps refine predictions for customers who are already split by tenure and spending. More frequent contacts improve renewal likelihood only for specific demographics. Contact recency contributes very meagre 1% to the overall model. This suggests that how recently a customer was contacted is far less important than how often they were contacted, how long they’ve been a customer, and their spending patterns

# Model Accuracy

Accuracy on Training data

   id renewed num_contacts contact_recency num_complaints spend lor gender age
1 187      No            0              28              0   213 248   Male  45
2 269      No            1              12              2   425  82   Male  60
3 376      No            0              28              2     0  15 Female  53
4 400      No            1              11              1     0  12   Male  44
5 679     Yes            0              28              0   216 300   Male  68
6 565     Yes            0              28              0   425 349 Female  68
         No       Yes train_preds
1 0.3932292 0.6067708         Yes
2 0.5927052 0.4072948          No
3 0.7560976 0.2439024          No
4 0.7560976 0.2439024          No
5 0.3932292 0.6067708         Yes
6 0.3932292 0.6067708         Yes
      Predicted
Actual  No Yes
   No  257 169
   Yes 154 270

Confusion Matrix Interpretation

Performance Metrics

  1. Overall model accuracy

“Accuracy”=(257+270)/850=527/850=0.62

Overall accuracy = 62%

  1. Of all customers the model predicted to churn (Predicted No)

Total predicted No = 257 + 154 = 411

Correctly predicted churners (True Negatives) = 257

“Precision for No”=257/411=0.63

63% of predicted churners actually churned

3. Of all customers the model predicted not to churn (Predicted Yes)

Total predicted Yes = 169 + 270 = 439

Correctly predicted renewals (True Positives) = 270

“Precision for Yes”=270/439=0.62

62% of predicted renewals actually renewed

4. Of all customers who DID churn, the model correctly identified

Actual churners = 257 + 169 = 426

Correctly identified churners = 257

“Sensitivity (Recall for No)”=257/426=0.60

 60% of actual churners were correctly identified

  1. Of all customers who did NOT churn, the model correctly identified

Actual renewals = 154 + 270 = 424

Correctly identified renewals = 270

“Sensitivity (Recall for Yes)”=270/424=0.64

64% of actual renewals were correctly identified

The model performs moderately well with an overall accuracy of 62%, showing balanced performance across both churn and renewal classes. It correctly identifies 60% of customers who churn and 64% of customers who renew, meaning it is slightly better at detecting loyal customers than at predicting churn. Precision scores of 63% for churn prediction and 62% for renewal prediction indicate that the model is equally reliable in both types of predictions, but improvements could be made especially in catching more at-risk churners.

      Predicted
Actual No Yes
   No  36  38
   Yes 35  41
  1. Overall Accuracy =(36+41)/150=77/150=0.5133

    Overall Accuracy = 51.3%

  2. Of all customers the model predicted to churn (Predicted “Yes”) =41/(41+38)=41/79=0.519

    51.9% of predicted churners were correct

  3. Of all customers the model predicted NOT to churn

    =36/(36+35)=36/71=0.507

    50.7% of predicted non-churners were correct

  4. Of all customers who DID churn, the model correctly identified =41/(41+35)=41/76=0.539

    53.9% of churners were correctly identified

  5. Of all customers who did NOT churn, the model correctly identified =36/(36+38)=36/74=0.486

    48.6% of non-churners were correctly identified

The confusion matrix shows that the model performs moderately well at distinguishing between customers who will churn and those who will renew, with an overall accuracy of 53.5%. The model predicts churn with limited precision, as just over half of the customers it labels as churners actually churned. Its ability to correctly identify non-churners is also modest, correctly classifying 48.6% of renewing customers. The ability to detect true churners is slightly stronger at 53.9%, meaning the model is somewhat better at recognising customers at risk. The ability to detect true non-churners is 48.6%, indicating challenges in correctly predicting renewals. Overall, the model provides directional insights but would benefit from further refinement to improve its predictive reliability.

Comparison of Model 1 (Training Data) and Model 2 (Test Data)

The results show that Model 1 performs better across all evaluation measures than Model 2, indicating that the model fits the training data more strongly than unseen test data. Model 1 achieves an overall accuracy of 62%, compared with 51% for Model 2, meaning it makes correct predictions more often when evaluated on familiar data. When looking at customers predicted to renew, Model 1 identifies correct cases more frequently than Model 2, and the same pattern appears for predicted non-renewals. Model 1 also correctly identifies a higher proportion of actual renewals and actual non-renewals than Model 2. The combined performance scores show the same trend, with Model 1 achieving stronger, more balanced results than Model 2. Overall, this indicates that the model loses effectiveness when applied to new customers, suggesting mild overfitting and showing that the model would benefit from additional refinement or more diverse training data to improve generalisation.

Marketing recommendations

Based on the tree model, the company need to priorities new customers below 140 as they have the highest level of churn. This can be achieved by managing the on boarding process through reinforcement personalised emails. First 90-day engagement plans are also key to reinforce the on boarding process. The marketing teams can also be proactive by reaching out to low engagement customers early. Discounts and incentives can also be used to cement early commitment and usage. Low spending customers may also be targeted as they have a very high churning of 76%. Customers spending below 182 require value boosting to enable them to spend more. The company can send tailored offers to this segment encouraging them to increase their spending. Product education will also be key as it will encourage product usage as it unlocks more information about benefits features and savings, upselling promotions will enable the customer to increase their spending and this will be attained through promotion and information dissemination. The company should also focus on quality of contact on the customer who are likely not to renew as more contacts may not guarantee renewal. Contact relevance through personalised communication rather than more messages can yield results for the entity. Follow ups when activity by customer activity falls and optimised support are key in reducing churning. Optimisation can be achieved through shorter response time s to queries and more proactive assistance

Age difference is a major determinant for churning hence the need for organisation to prioritise contacts and promotions according to age. The older customers aged 61 years and above represent the strongest segment in the tree with a 67% renewal probability. Senior loyalty programs such as exclusive offers, personalised reminders and simple sign-up processes can maintain or improve the renewal. The aged group requires careful handling through warm communication, human style communication and assurance. Senior citizens are more worried about stability and trust hence the need for assurance in all processes.

High tenure customers with over 140 days should also nurtured, as they show a 61% probability of renewal this can be achieved by creating special offers for this segment High tenure customers can be good brand advocates hence the introduction of refer a friend incentives and upselling. The organisation also needs to introduce segment specific strategy as the tree shows that contacts are key in contract renewal. The new customer would require few highly relevant messages, onboarding and education and simple sign-up processes. High tenure customers could benefit from frequent touch points with seniors requiring high touch communication to reinforce loyalty

Cluster Analysis

Computing the euclidean scaling data

The hierarchical clustering dendrogram indicates the presence of clearly differentiated user segments within the music subscription customer base. Large vertical gaps at higher linkage heights suggest strong dissimilarity between major groups, confirming that the customer behaviour is not homogeneous. The structure supports the existence of approximately three to five meaningful clusters, reflecting distinct consumption patterns. Lower-level clusters show tightly grouped users with similar behaviours, while higher-level merges only occur at substantial dissimilarity, highlighting stable and well-defined segments. This implies that the consumers differ significantly

The heatmap reveals three major customer clusters, visible as distinct darker blocks along the diagonal, indicating groups of subscribers with high internal similarity and clear differences between groups. The largest cluster represents a mainstream segment with broadly similar but moderately varied consumtion patterns, suggesting regular but average consumption of the ernegy drink.. A second, smaller and more compact cluster is noticeably darker, indicating a highly cohesive group of consumers whose behaviours are very similar, likely reflecting highly engaged or loyal customers. The third and smallest cluster appears lighter and more dispersed, suggesting more varied and less consistent behaviour, characteristic of casual or exploratory consumers. The clear separation between these blocks, supported by the hierarchical structure surrounding the heatmap, confirms that these three segments are meaningfully different and validates the use of a three-cluster solution to inform targeted engagement, retention, and personalisation strategies

Th clusters to used in this scenario will be 3

Quality of the segmentation

Silhouette of 840 units in 3 clusters from silhouette.default(x = chencluster1, dist = chen_8) :
 Cluster sizes and average silhouette widths:
      403       275       162 
0.2258527 0.1582097 0.4083880 
Individual silhouette widths:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-0.3307  0.1453  0.2765  0.2389  0.3645  0.5474 

The silhouette analysis evaluates how well subscribers have been grouped into three clusters based on similarity of behaviour. Overall, the average silhouette score of 0.24 indicates a moderate but meaningful level of separation between the clusters. This suggests that while consumer behaviours overlap to some extent, the clustering still captures real and actionable differences across user groups as reflected by the median silhouette score of 0.28 suggests.

Looking at the clusters individually, Cluster 3 with 162 customers performs the strongest, with a high average silhouette score of 0.41, indicating that subscribers in this group are very similar to one another and clearly different from users in other clusters. This cluster likely represents a well-defined ernegy drink segment, such as highly engaged or loyal customers, making it a strong candidate for targeted retention. Cluster 1, the largest group with 403 customers, shows a lower but acceptable silhouette score of 0.23, suggesting a broad and diverse segment with shared characteristics but some internal variation—typical of a mainstream or mixed-usage subscriber group. With 275 customers, Cluster 2 has the weakest separation of 0.16, implying that users in this group overlap more with other clusters and may represent transitional or less clearly defined listening behaviours.

Negative silhouette values indicate that a small number of subscribers do not fit neatly into any single cluster, which is expected in behavioural data and does not undermine the overall segmentation. The results confirm that the three-cluster solution is usable for strategic decision-making, particularly for identifying one strongly defined high-value segment alongside two broader behavioural groups. However, it also suggests that further refinement—such as adding behavioural variables or testing an additional cluster—could improve differentiation if more granular targeting is required.

Clusters and Number of customers
cluster num_customers
C1 403
C2 275
C3 162

The clustering results identify three distinct customer segments with meaningful differences in size. Cluster 1 (C1) is the largest group, comprising 403 customers, indicating it represents the core or mainstream segment of the customer base. Cluster 2 (C2) includes 275 customers, forming a substantial but smaller segment that likely reflects a more defined behavioural pattern, such as higher or more consistent engagement. Cluster 3 (C3) is the smallest group with 162 customers, suggesting a niche segment with more specialised or distinct behaviour. The distribution shows that while most customers fall into the mainstream cluster, a significant proportion belongs to smaller, more differentiated segments, reinforcing the value of segmented strategies rather than a one-size-fits-all approach.

`` ```

```