Assignment 1 Adv Analytics 2
The following is an analysis into Spotify and an insight into their customer subscription renewal.
Question 1.
Bar chart showing the relationship between Renewals and Gender
Here we can see that men are 10% more likely to renew their subscription than women. This means that Spotify should:
Continue their relationship with men
Nourish their relationship with women through targeted marketing
Boxplot showing the relationship between Renewals and Spend
Those who choose to renew their subscription have a higher median spend with over €400, while those who do not renew have a median spend closer to €300.
Boxplot showing the relationship between Renewals and LOR (Length of Relationship)
Here we can see that there is a correlation between the length of a relationship with a customer and their odds of renewing their membership. Subscribers with a longer relationship are more likely to remain with Spotify.
Boxplot showing the relationship between Renewals and Number of Contacts
Here we can see that users that have been contacted by Spotify renew their subscription more than those who aren’t. Spotify may see an increase in renewals if they:
Contact users reminding them to renew their contract
Offering users special deals and incentives to continue with their subscription
Boxplot showing the relationship between Renewals and Age
Here we can see that the older a customer is, the more likely they are to renew their contract. This could be due to the following:
Users not using internet banking and not keeping track of their subscriptions
Users having a higher disposable income and being able to afford a premium version
This could be increased by offering incentives to students and young adults
Clustered Barchart showing the relationship between Renewals and Contact Recency
Those who were contacted in the last 30 days were much more likely to churn, in comparison to those who were contacted 2 months prior. This could be due to the following:
Users are being reminded that they are subscribed and therefore, unsubscribe from the service
Users may be frustrated due to the amount of times they are being contacted
Question 2.
Classification Tree
Here we can see that if a user has a Length of Relationship of less than 140 days, they have a 61% chance of resubscribing - this is 45% of the database. This node is pure as it directly leads from the root node.
If a user has a Length of Relationship less than 140 days, and has a spend of less than €182, they are more likely to churn with a 76% chance - this is 10% of the database. This node is not pure as it stems from secondary nodes.
The following is a table of the share of the database each node has:
Rule | % of database |
---|---|
LOR < 140, Spend < 182 | 10% |
LOR < 140, Spend > 182, Num Contacts < 7.5, Age < 61 | 39% |
LOR < 140, Spend > 182, Num Contacts < 7.5, Age > 61 | 3% |
LOR < 140, Spend > 182, Num Contacts > 7.5 | 3% |
LOR > 140 | 45% |
Training Dataset
Predicted
Actual No Yes
No 257 169
Yes 154 270
From the above confusion matrix, we know the following:
The overall model accuracy is (257+270)/850 = 0.62 or 62% accuracy.
Of all customers the model predicted to churn, they got 270/439 = 0.62 or 62% accuracy
Of all customers the model predicted not to churn, they got 257/411 = 0.63 or 63% accuracy
Of all customers the model who did churn, the model correctly identified 270/424 = 0.64 or 64%
Of all customers the model who did not churn, the model correctly identified 257/426 = 0.60 or 60%
Testing Dataset
Predicted
Actual No Yes
No 36 38
Yes 35 41
From the above confusion matrix, we know the following:
The overall model accuracy is (36+41)/150 = 0.51 or 51% accuracy.
Of all customers the model predicted to churn, they got 41/79 = 0.52 or 52% accuracy
Of all customers the model predicted not to churn, they got 36/71 = 0.51 or 51% accuracy
Of all customers the model who did churn, the model correctly identified 41/76 = 0.54 or 54%
Of all customers the model who did not churn, the model correctly identified 36/74 = 0.49 or 49%
We can gather that the training model does have overfitting and would need to be pruned.
Question 3.
Part i
[1] "No" "Yes"
[1] "No" "Yes"
Call:
glm(formula = renewed ~ num_contacts + contact_recency + num_complaints +
spend + lor + gender + age, family = binomial(link = "logit"),
data = sub_training)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.3077726 0.3889894 -3.362 0.000774 ***
num_contacts 0.0400276 0.0250176 1.600 0.109604
contact_recency -0.0061870 0.0085374 -0.725 0.468641
num_complaints 0.0463282 0.0575359 0.805 0.420701
spend 0.0004400 0.0005652 0.779 0.436256
lor 0.0026467 0.0007908 3.347 0.000818 ***
genderMale 0.4179377 0.1498906 2.788 0.005299 **
age 0.0098297 0.0064033 1.535 0.124762
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1178.3 on 849 degrees of freedom
Residual deviance: 1125.4 on 842 degrees of freedom
AIC: 1141.4
Number of Fisher Scoring iterations: 4
Part ii
(A)
Here, we can see that the genderFemale variable has been omitted.
(B)
-1.31 + (0.04 x num_contacts) + (-0.01 x contact_recency) + (0.05 x num_complaints) + (0.00 x spend) + (0.00 x lor) + (0.42 x genderMale) + (0.01 x age)
(C)
The variables: genderMale and lor are significant in predicting whether or not a user will churn. This is due to them having a p value < 0.05
(D)
Number of Contacts, Number of Complaints, Male Gender and Age all have positive coefficients meaning that they increase the chance of a person re-subscribing. All others are negative meaning that they have a negative impact of a person re-subscribing.
Part iii
Training Dataset
Predicted
Actual No Yes
No 279 147
Yes 176 248
From the above confusion matrix, we know the following:
The overall model accuracy is (38+35)/150 = 0.49 or 49% accuracy.
Of all customers the model predicted to churn, they got 35/71 = 0.49 or 49% accuracy
Of all customers the model predicted not to churn, they got 38/79 = 0.48 or 48% accuracy
Of all customers the model who did churn, the model correctly identified 35/76 = 0.46 or 46%
Of all customers the model who did not churn, the model correctly identified 38/74 = 0.51 or 51%
Testing Dataset
Predicted
Actual No Yes
No 38 36
Yes 41 35
From the above confusion matrix, we know the following:
The overall model accuracy is (38+35)/150 = 0.49 or 49% accuracy.
Of all customers the model predicted to churn, they got 35/71 = 0.49 or 49% accuracy
Of all customers the model predicted not to churn, they got 38/79 = 0.48 or 48% accuracy
Of all customers the model who did churn, the model correctly identified 35/76 = 0.46 or 46%
Of all customers the model who did not churn, the model correctly identified 38/74 = 0.51 or 51%
These are the exact same results as the Training Dataset and do not need any pruning.
Question 4.
Part i
The testing dataset accuracy of the classification tree model is 51%, while the accuracy of the binary logistic regression model testing dataset is 49%.
Part ii
We can gather that the classification tree model should be used in this instance as it is more reliable. The accuracy of these models can be later improved by the process of “pruning”. This can change in the future once both models go through pruning.
Part iii
(A)
Those who spent less than €182 have a higher chance of churning.
Those who have been contacted less than 7.5 times have a higher chance of churning.
Of those who have been contacted less than 7.5 times, those who are younger than 61 years old have a 59% chance of churning.
(B)
By lengthening the relationship with customers, Spotify can enjoy a higher rate of renewals. This can be done by pushing long-term contracts such as discounts for a 6 month subscription.
By lowering prices, the Length of Relationship with customers, and Total Spend may see an increase. If a user accumulates a Total Spend of more than €182 they will be more likely to renew their subscription.
Keeping in contact with customers will nourish relationships. By contacting them 8+ times, they will have a higher chance of renewal. This could be done by sending out personalised push-notifications such as music recommendations and new releases. This could also be achieved through emails such as newsletters with similar content to the push-notifications previously suggested.
Catering to a younger audience and targeting them through marketing such as interactive games e.g. Surveys that build a user a personalised playlist, or by partnering with popular acts for young people to release tickets early and exclusive content.