ANALISIS KETERLIBATAN PELANGGAN KONTEKS Menggunakan Watson Analytics, dapat memprediksi perilakau untuk mempertahankan pelanggan,dapat menganalisisis semua data pelanggan yang relevan dan mengembangkan program retensi pelanggan yang terfokus.
SUBJEK Data memiliki 9.134 catatan pelanggan degan 24 variabel, semua pelanggan dlam data memiliki kebijakan yang kedaluwarsa anatara 1 januari hingga 28 febuari 2011
Tugas dalam bisnis Temukan apa yang memengaruhi keterlibatan pelanggan dan berikan rekomendasi yang dapat ditidaklanjuti untuk bisnis.
Pendakatan analisis data 1. AKan melihat data(EDA) secara mendatail untuk memahami titik data dan menemukan beberapa pola 2. Kemudian saya akan melakukan analisis regresi(regresi logistik) dan menginterpretasikan hasil nya 3. akan memberikan beberapa rekomendasi bisnis
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
data <- read.csv("WA_Fn-UseC_-Marketing-Customer-Value-Analysis.csv", header = T, sep = ",")
data <- data %>%
mutate(Response=ifelse(Response=="No",
0,1))
data$Response <- as.integer(data$Response)
data <- data %>%
rename(Engaged=Response)
EngagementRate <- data %>%
group_by(Engaged) %>%
summarize(count=n()) %>%
mutate(Percentage=count/nrow(data)*100)
transposed <- t(EngagementRate)
colnames(transposed) <- EngagementRate$Engaged
transposed <- transposed[-1,]
transposed
## 0 1
## count 7826.00000 1308.00000
## Percentage 85.67988 14.32012
SalesChannel <- data %>%
group_by(Engaged, Sales.Channel) %>%
summarize(Count=n()) %>%
arrange(Sales.Channel)
## `summarise()` has grouped output by 'Engaged'. You can override using the `.groups` argument.
options(repr.plot.width = 16, repr.plot.height = 8)
ggplot(SalesChannel, aes(x="", y=Count, fill=Sales.Channel)) +
geom_bar(stat = "identity", position = position_fill()) +
geom_text(aes(x=1.25, label=Count), position = position_fill(vjust=0.5)) +
coord_polar("y") + facet_wrap(~Engaged) +
ggtitle("Sales Channel (0: Not Engaged, 1: Engaged)") +
theme(legend.position = "bottom", legend.text=element_text(size=15), plot.title = element_text(size=15))
Sepertinya yang didapat bisa dilihat diagram lingkaran diatas, lebih dari setengah pelanggan yang terlibat berasal dari dari agent, sementara pelanggan yang tidak terlibat didistribusikan secara lebih merata di 4 chanel (saluaran).
conversionsState <- data %>%
group_by(State) %>%
summarize(TotalCount=n(), NumConversions=sum(Engaged)) %>%
mutate(ConversionRate=NumConversions/TotalCount*100)
conversionsState
## # A tibble: 5 x 4
## State TotalCount NumConversions ConversionRate
## <chr> <int> <int> <dbl>
## 1 Arizona 1703 243 14.3
## 2 California 3150 456 14.5
## 3 Nevada 882 124 14.1
## 4 Oregon 2601 376 14.5
## 5 Washington 798 109 13.7
Pelanggan dari Washington terlihat sedikit lebih sedikit dari pada yang lain.
conversionsEdu <- data %>%
group_by(Education) %>%
summarize(TotalCount=n(), NumConversions=sum(Engaged)) %>%
mutate(ConversionRate=NumConversions/TotalCount*100)
ggplot(conversionsEdu, aes(x=Education, y=ConversionRate)) +
geom_bar(width=0.5, stat = "identity", fill="darkgreen") +
labs(title="Conversion Rates by Education") +
theme(axis.text.x = element_text(size=10),axis.text.y = element_text(size=10),
axis.title = element_text(size = 12), plot.title = element_text(size=12))
Customers dengan gelar Doctoral dan Master cenderung libih terlibat.
conversionsEmp <- data %>%
group_by(EmploymentStatus) %>%
summarize(TotalCount=n(), NumConversions=sum(Engaged)) %>%
mutate(ConversionRate=NumConversions/TotalCount*100)
ggplot(conversionsEmp, aes(x=EmploymentStatus, y=ConversionRate)) +
geom_bar(width=0.5, stat = "identity", fill="darkgreen") +
labs(title="Conversion Rates by Employment Status") +
theme(axis.text.x = element_text(size=10),axis.text.y = element_text(size=10),
axis.title = element_text(size = 12), plot.title = element_text(size=12))
Tingkat keterlibatan di antara pelanggan pensiunan (Retried) jauh lebih tinggi dibandingkan dengan kelompok lain
#Jenis Kelamin(gender)
conversionsGender <- data %>%
group_by(Gender) %>%
summarize(TotalCount=n(), NumConversions=sum(Engaged)) %>%
mutate(ConversionRate=NumConversions/TotalCount*100)
ggplot(conversionsGender, aes(x=Gender, y=ConversionRate)) +
geom_bar(width=0.2, stat = "identity", fill="darkgreen") +
labs(title="Conversion Rates by Gender") +
theme(axis.text.x = element_text(size=18),axis.text.y = element_text(size=18),
axis.title = element_text(size = 20), plot.title = element_text(size=22))
saya tidak melihat perbedaan yang signifikan dalam tingkta keterlibatan anatara pelanggan Pria(F) dan Wanita(M).
#location code (kode lokasi)
conversionsLocation <- data %>%
group_by(Location.Code) %>%
summarize(TotalCount=n(), NumConversions=sum(Engaged)) %>%
mutate(ConversionRate=NumConversions/TotalCount*100)
ggplot(conversionsLocation, aes(x=Location.Code, y=ConversionRate)) +
geom_bar(width=0.5, stat = "identity", fill="darkgreen") +
labs(title="Conversion Rates by Location Code") +
theme(axis.text.x = element_text(size=12),axis.text.y = element_text(size=12),
axis.title = element_text(size = 10), plot.title = element_text(size=10))
Pelanggan dengan kode lokasi “(Suburban(pinggiran Kota)” memiliki tingkat keterlibatan 2 kali lebih tinggi.
#marital status (status pernikahan)
conversionsMarital <- data %>%
group_by(Marital.Status) %>%
summarize(TotalCount=n(), NumConversions=sum(Engaged)) %>%
mutate(ConversionRate=NumConversions/TotalCount*100)
ggplot(conversionsMarital, aes(x=Marital.Status, y=ConversionRate)) +
geom_bar(width=0.5, stat = "identity", fill="darkgreen") +
labs(title="Conversion Rates by Marital Status") +
theme(axis.text.x = element_text(size=10),axis.text.y = element_text(size=10),
axis.title = element_text(size = 12), plot.title = element_text(size=12))
Customers yang bercerai (Divorced) lebih mungkin untuk terlibat
Pertama saya akan membagun model regresi logistik dengan variabel kontinu(numeric)
#Membuat Kolom Numeric
continuous <-select_if(data, is.numeric)
#Menggunakan regresi dengan variabel kontinyu
logit.fit <- glm(Engaged ~ ., data=continuous, family=binomial)
summary(logit.fit)
##
## Call:
## glm(formula = Engaged ~ ., family = binomial, data = continuous)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.7629 -0.5704 -0.5477 -0.5216 2.1018
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.787e+00 1.234e-01 -14.476 <2e-16 ***
## Customer.Lifetime.Value -6.327e-06 4.863e-06 -1.301 0.1933
## Income 2.042e-06 1.092e-06 1.869 0.0616 .
## Monthly.Premium.Auto -1.194e-04 1.226e-03 -0.097 0.9224
## Months.Since.Last.Claim -4.489e-03 2.987e-03 -1.503 0.1329
## Months.Since.Policy.Inception 2.125e-04 1.073e-03 0.198 0.8429
## Number.of.Open.Complaints -3.257e-02 3.379e-02 -0.964 0.3351
## Number.of.Policies -2.443e-02 1.283e-02 -1.904 0.0569 .
## Total.Claim.Amount 2.772e-04 1.463e-04 1.895 0.0581 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 7503.3 on 9133 degrees of freedom
## Residual deviance: 7488.1 on 9125 degrees of freedom
## AIC: 7506.1
##
## Number of Fisher Scoring iterations: 4
Keterangan Pr(>|Z|) adalah nilai-p- seberapa besar kemungkinan hubungan antara fitur yang dibeikan dan variabel keluaran secara kebetulan. P-value kuarn dari 0,05 menandakan hubungan yang kuat.
Kolom Estimasi memberikan nilai yang dihitung untuk masing-masing koefisin fitur
Singnif, codes tambahan menunjukkan seberapa kuat hubungan itu. ***- hubungan terkuat dengan nilai-p di 0. # Analisis Regression (kategory variabel)
logit.fit <-glm(Engaged ~ factor(Education) + factor(Gender)+ factor(State) +
factor(Coverage) + factor(EmploymentStatus) +
factor(Location.Code) + factor(Marital.Status)+
factor(Policy.Type) + factor(Policy) + factor(Renew.Offer.Type) +
factor(Sales.Channel) + factor(Vehicle.Class) +
factor(Vehicle.Size), data = data, family=binomial)
summary(logit.fit)
##
## Call:
## glm(formula = Engaged ~ factor(Education) + factor(Gender) +
## factor(State) + factor(Coverage) + factor(EmploymentStatus) +
## factor(Location.Code) + factor(Marital.Status) + factor(Policy.Type) +
## factor(Policy) + factor(Renew.Offer.Type) + factor(Sales.Channel) +
## factor(Vehicle.Class) + factor(Vehicle.Size), family = binomial,
## data = data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.3536 -0.5663 -0.3824 -0.0002 2.9734
##
## Coefficients: (2 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.28578 0.28029 -4.587 4.49e-06 ***
## factor(Education)College 0.18150 0.08717 2.082 0.03732 *
## factor(Education)Doctor 0.52311 0.17000 3.077 0.00209 **
## factor(Education)High School or Below -0.05552 0.08994 -0.617 0.53702
## factor(Education)Master 0.42176 0.12899 3.270 0.00108 **
## factor(Gender)M -0.02268 0.06747 -0.336 0.73672
## factor(State)California 0.02374 0.09662 0.246 0.80590
## factor(State)Nevada 0.02611 0.13344 0.196 0.84484
## factor(State)Oregon 0.04786 0.09971 0.480 0.63127
## factor(State)Washington -0.05978 0.13829 -0.432 0.66557
## factor(Coverage)Extended -0.09753 0.07602 -1.283 0.19952
## factor(Coverage)Premium 0.03894 0.11576 0.336 0.73661
## factor(EmploymentStatus)Employed -0.05497 0.14892 -0.369 0.71206
## factor(EmploymentStatus)Medical Leave 0.08322 0.19264 0.432 0.66573
## factor(EmploymentStatus)Retired 2.51577 0.21179 11.878 < 2e-16 ***
## factor(EmploymentStatus)Unemployed -0.83143 0.16391 -5.073 3.93e-07 ***
## factor(Location.Code)Suburban 0.88022 0.10148 8.674 < 2e-16 ***
## factor(Location.Code)Urban -0.10899 0.12657 -0.861 0.38920
## factor(Marital.Status)Married -0.57490 0.08869 -6.482 9.04e-11 ***
## factor(Marital.Status)Single -0.62305 0.10556 -5.902 3.58e-09 ***
## factor(Policy.Type)Personal Auto -0.13820 0.18110 -0.763 0.44538
## factor(Policy.Type)Special Auto 0.37689 0.29143 1.293 0.19593
## factor(Policy)Corporate L2 0.16800 0.21442 0.784 0.43332
## factor(Policy)Corporate L3 0.04132 0.19840 0.208 0.83501
## factor(Policy)Personal L1 0.17985 0.10497 1.713 0.08665 .
## factor(Policy)Personal L2 0.11534 0.08918 1.293 0.19589
## factor(Policy)Personal L3 NA NA NA NA
## factor(Policy)Special L1 -0.14664 0.42131 -0.348 0.72781
## factor(Policy)Special L2 -0.46370 0.35087 -1.322 0.18631
## factor(Policy)Special L3 NA NA NA NA
## factor(Renew.Offer.Type)Offer2 0.73436 0.07259 10.117 < 2e-16 ***
## factor(Renew.Offer.Type)Offer3 -2.09144 0.19709 -10.611 < 2e-16 ***
## factor(Renew.Offer.Type)Offer4 -16.73604 195.05340 -0.086 0.93162
## factor(Sales.Channel)Branch -0.65541 0.08449 -7.757 8.71e-15 ***
## factor(Sales.Channel)Call Center -0.62194 0.09679 -6.426 1.31e-10 ***
## factor(Sales.Channel)Web -0.58930 0.10780 -5.467 4.59e-08 ***
## factor(Vehicle.Class)Luxury Car -0.48034 0.32118 -1.496 0.13476
## factor(Vehicle.Class)Luxury SUV -0.02636 0.23617 -0.112 0.91112
## factor(Vehicle.Class)Sports Car 0.30064 0.14191 2.119 0.03413 *
## factor(Vehicle.Class)SUV 0.22591 0.08732 2.587 0.00968 **
## factor(Vehicle.Class)Two-Door Car 0.04737 0.08878 0.534 0.59360
## factor(Vehicle.Size)Medsize -0.26434 0.10443 -2.531 0.01137 *
## factor(Vehicle.Size)Small -0.65268 0.12722 -5.130 2.89e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 7503.3 on 9133 degrees of freedom
## Residual deviance: 5865.7 on 9093 degrees of freedom
## AIC: 5947.7
##
## Number of Fisher Scoring iterations: 17
Disini dapat melihat fitur-fitur yang secar signifkan mempengaruhi variabel keluaran Engaged(TOP5):
#combine continuous and categorical variables
continuous$Education <- factor(data$Education)
continuous$Gender <- factor(data$Gender)
continuous$State <- factor(data$State)
continuous$Coverage <- factor(data$Coverage)
continuous$EmploymentStatus <- factor(data$EmploymentStatus)
continuous$Location.Code <- factor(data$Location.Code)
continuous$Marital.Status <- factor(data$Marital.Status)
continuous$Policy.Type <- factor(data$Policy.Type)
continuous$Policy <- factor(data$Policy)
continuous$Renew.Offer.Type <- factor(data$Renew.Offer.Type)
continuous$Sales.Channel <- factor(data$Sales.Channel)
continuous$Vehicle.Class <- factor(data$Vehicle.Class)
continuous$Vehicle.Size <- factor(data$Vehicle.Size)
#fit regression model with continuous and categorical variables
logit.fit <- glm(Engaged ~ ., data=continuous, family=binomial)
summary(logit.fit)
##
## Call:
## glm(formula = Engaged ~ ., family = binomial, data = continuous)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.3333 -0.5672 -0.3692 -0.0002 3.0309
##
## Coefficients: (2 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.610e+00 4.532e-01 -3.552 0.000382 ***
## Customer.Lifetime.Value -4.533e-06 5.415e-06 -0.837 0.402483
## Income 4.956e-06 1.933e-06 2.564 0.010349 *
## Monthly.Premium.Auto 5.235e-03 5.095e-03 1.027 0.304230
## Months.Since.Last.Claim -4.586e-03 3.402e-03 -1.348 0.177708
## Months.Since.Policy.Inception 2.733e-05 1.207e-03 0.023 0.981944
## Number.of.Open.Complaints -4.023e-02 3.814e-02 -1.055 0.291429
## Number.of.Policies -1.977e-02 1.424e-02 -1.389 0.164980
## Total.Claim.Amount -1.416e-03 2.837e-04 -4.993 5.95e-07 ***
## EducationCollege 1.682e-01 8.773e-02 1.918 0.055172 .
## EducationDoctor 5.233e-01 1.712e-01 3.057 0.002234 **
## EducationHigh School or Below -4.400e-02 9.038e-02 -0.487 0.626341
## EducationMaster 4.257e-01 1.294e-01 3.291 0.000998 ***
## GenderM -4.989e-03 6.789e-02 -0.073 0.941418
## StateCalifornia 2.114e-02 9.695e-02 0.218 0.827403
## StateNevada 1.830e-02 1.341e-01 0.136 0.891490
## StateOregon 3.701e-02 1.000e-01 0.370 0.711425
## StateWashington -6.045e-02 1.386e-01 -0.436 0.662773
## CoverageExtended -5.469e-02 1.266e-01 -0.432 0.665633
## CoveragePremium 1.206e-01 2.640e-01 0.457 0.647777
## EmploymentStatusEmployed -2.395e-01 1.628e-01 -1.471 0.141423
## EmploymentStatusMedical Leave 1.081e-01 1.933e-01 0.559 0.576052
## EmploymentStatusRetired 2.491e+00 2.119e-01 11.756 < 2e-16 ***
## EmploymentStatusUnemployed -6.388e-01 1.689e-01 -3.783 0.000155 ***
## Location.CodeSuburban 1.466e+00 1.498e-01 9.782 < 2e-16 ***
## Location.CodeUrban 2.145e-01 1.433e-01 1.497 0.134302
## Marital.StatusMarried -5.794e-01 8.893e-02 -6.515 7.27e-11 ***
## Marital.StatusSingle -5.594e-01 1.060e-01 -5.278 1.31e-07 ***
## Policy.TypePersonal Auto -1.432e-01 1.817e-01 -0.788 0.430552
## Policy.TypeSpecial Auto 3.522e-01 2.926e-01 1.203 0.228803
## PolicyCorporate L2 1.557e-01 2.153e-01 0.723 0.469563
## PolicyCorporate L3 3.594e-02 1.992e-01 0.180 0.856827
## PolicyPersonal L1 1.818e-01 1.054e-01 1.724 0.084779 .
## PolicyPersonal L2 1.164e-01 8.949e-02 1.300 0.193545
## PolicyPersonal L3 NA NA NA NA
## PolicySpecial L1 -8.557e-02 4.238e-01 -0.202 0.839994
## PolicySpecial L2 -4.412e-01 3.533e-01 -1.249 0.211738
## PolicySpecial L3 NA NA NA NA
## Renew.Offer.TypeOffer2 7.253e-01 7.408e-02 9.791 < 2e-16 ***
## Renew.Offer.TypeOffer3 -2.104e+00 1.982e-01 -10.615 < 2e-16 ***
## Renew.Offer.TypeOffer4 -1.676e+01 1.944e+02 -0.086 0.931283
## Sales.ChannelBranch -6.477e-01 8.478e-02 -7.640 2.17e-14 ***
## Sales.ChannelCall Center -6.123e-01 9.722e-02 -6.299 3.00e-10 ***
## Sales.ChannelWeb -5.796e-01 1.081e-01 -5.361 8.28e-08 ***
## Vehicle.ClassLuxury Car -1.334e-01 7.444e-01 -0.179 0.857740
## Vehicle.ClassLuxury SUV 2.433e-01 6.976e-01 0.349 0.727224
## Vehicle.ClassSports Car 3.764e-01 2.596e-01 1.450 0.146976
## Vehicle.ClassSUV 2.992e-01 2.303e-01 1.300 0.193772
## Vehicle.ClassTwo-Door Car 3.864e-02 8.917e-02 0.433 0.664785
## Vehicle.SizeMedsize -2.639e-01 1.050e-01 -2.512 0.011999 *
## Vehicle.SizeSmall -6.474e-01 1.278e-01 -5.064 4.10e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 7503.3 on 9133 degrees of freedom
## Residual deviance: 5823.1 on 9085 degrees of freedom
## AIC: 5921.1
##
## Number of Fisher Scoring iterations: 17
Analisis ini memberi saya fitur yang secara signifikan mempengaruhi variabel hasil terlibat.
1.Positive correlation:
2.Education Master
3.Employment Status Retired
4.Location.Code Suburban
5.Renew.Offer.TypeOffer2
6.Education Doctor
7.Income
8.Policy Personal L1
9.Education College
Negative correlation:
1.Total.Claim.Amount
2.Vehicle.Size Small
3.Marital.Status Single
4.Sales.Channel Web
5.Sales.Channel Call Center
6.Marital.Status Married
7.Sales.Channel Branch
8.Renew.Offer.TypeOffer3
9.Vehicle.Size Medsize
#Meringkas rekomendasi untuk bisnis Analisis regresi logistik memberikan daftar fitur yang secara positivly atau negativly berkorelasi dengan variabel Enagement. Apa yang bisa kami rekomendasikan berdasarkan data ini? Employment Status Unemployed
1.Pertama, saya sarankan menggunakan fitur-fitur ini untuk membagi basis data pelanggan yang ada dan fokus pada kelompok yang lebih mungkin untuk terlibat dengan panggilan pemasaran (variabel berkorelasi positif). Itu akan membantu perusahaan meningkatkan penjualan tanpa biaya tambahan.
Terima kasih atas minat Anda dalam analisis saya!
Tolong beri tahu saya jika Anda memiliki pertanyaan atau rekomendasi untuk perbaikan.