Individual-project-Round-1.utf8.md

Retention Data and Customer Intelligence-Round 1

MINGLIANG WEI

23/10/2019

Our strategy is to find the people who has the most largetst possiblity to leave and invite them to prevent them from leaving.Firstly, I have checked the data and find out that there are so many missing values there.

streamraw=read.csv("Retention_train.csv")
summary(streamraw)

To avoid missing values which will cause a misleading to the regression, we will give the values of the NA obeying the caracteristic of those variables who have missing values.(Average, Maximum,Medium based on the real business meaning environment) and changing those factors varibales into factor format.(e.g.)

streamraw$timeSinceLastTechProb[is.na(streamraw$timeSinceLastTechProb)]=100
streamraw$minutesVoice[is.na(streamraw$minutesVoice)]=200

Set the artificial variable ‘Freq’ which means the number of people who are using the plan in each single family to simulate the factor of conformity behavior.

summary(streamraw$Freq)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.000   1.000   1.163   1.000   5.000

Seperate the data into the training set and the testing set.

Doing the binary regression and get mod1

mod1=glm(churnIn3Month~.,family="binomial", data=train)

Doing the prediction based on our model we get from the training data.

p1=predict(mod1,newdata=validate,type="response")
cbind(p1,validate)[sort.list(p1,decreasing=TRUE)[1:2],]

Adding predict_correction_p1 as our index showing the prediction correction rate of our model

cbind1=cbind(p1,validate)[sort.list(p1,decreasing=TRUE)[1:10000],]
predict_correction_p1=sum(cbind1$churnIn3Month)/10000
predict_correction_p1

## [1] 0.0712

We can get the mod2 by using the stepwise of mod1

mod2=step(mod1,trace=FALSE)

predict_correction_p2

## [1] 0.071

Using the backforward Algorithm, we can get another model named “bestp”

predict_correction_bestp

## [1] 0.0755

Considering the correlations between the variables, we can get mod3

predict_correction_p3

## [1] 0.0728

We can also compare the roc curves of all the models above.

## [1] 0.7214807

## [1] 0.7213935

## [1] 0.721159

## [1] 0.6568115

The best AUC we get is 0.7214807 which is from mod1, so we will choose mod1 as our optimal model.

leaving_rate=sum(streamraw$churnIn3Month)/nrow(streamraw)
leaving_rate

## [1] 0.02714581

But the prediction correction rate for this model is too small for a good binory model,after checking the previous dataset. We can get the leaving rate for all the clients. #### Modifying Strategy a.We find out that only 2.7% percentage of people will leave in three months, and the most largest possibiliy of mod1 is 12.4% combing the largest 7.12% prediction correction rate from all the models,which means that we don’t have a big confidence to find out those who will leave in 3 months and since there will not be too much people to leave no matter if we invites them or not, so we would like to invite as less clients as possible according to the condition.
b.Plus we are not sure if we invite them to come to our dinner event will help to change their mind from leaving, so we modify our strategy from inviting those who has the largest possibilities to leave to another modified strategy that finding the expectation money we will lost for each person, it means that we will take the potential value of each customer into consideration.
c.We will use the equations as follows to caculate the potential value of each customer:
\(PotentialValue=baseMonthlyRateForPlan+(baseMonthlyRateForPhone+cashDown+phonePrice+phoneBalance)^{1/2}\)
\(ExpectationLossingValue=PotentialValue*Probability\)

Processing the score data and use the equation we mentioned above to predict the expectation money we will lost for each person and find the largest 8000 ones.
We will filter those people whose Freq is 1 because based on the rule of conformity behavior, the one who has the least degree pf conformity behavior will more likely to leave.

p1_score_cbind=p1_score_cbind%>%
  mutate(expectation_value_p1=(baseMonthlyRateForPlan+(baseMonthlyRateForPhone+cashDown+phonePrice+phoneBalance)^0.5)*p1_score)
p1_score_cbind=p1_score_cbind[sort.list(p1_score_cbind$expectation_value_p1,decreasing=TRUE)[1:nrow(p1_score_cbind)],]%>%filter(Freq<=1)

head(p1_score_cbind)[1:2,]