Project Overview

This uses Churn Dataset to generate and deploy new machine learning model to predict Churn. It has two version of a model wherein the user can choose to update the previous ML model to reflect and incorporate the info in the new dataset in an updated ML model. The implementation of this project will be carried out using R Shiny and will subsequently be made available online. online.

Data Pre-processing

From the given data set, we created a csv generator to randomly split the data 50-50 and generate csv files to be used on the application. The first 50% of the data will be used to train and obtain a “Version 1” of the prediction model while the second 50% of the data will be used to update the “Version 2” model.

library(RWeka)
library(caTools)

#Version 1 Data
sample <- sample.split(churndata$Churn, SplitRatio = 0.5)
ChurnTrain <- subset(churndata, sample == TRUE)
ChurnTest <- subset(churndata, sample == FALSE, select = -Churn)

write.csv(ChurnTest, "ChurnTest.csv", row.names = FALSE)
write.csv(ChurnTrain, "ChurnTrain.csv", row.names = FALSE)

#Version 2 Data
sample_v2 <- sample.split(churndata$Churn, SplitRatio = 0.5)
ChurnTrain_v2 <- subset(churndata, sample_v2 == TRUE)
ChurnTest_v2 <- subset(churndata, sample_v2 == FALSE, select = -Churn)

write.csv(ChurnTest_v2, "ChurnTest_v2.csv", row.names = FALSE)
write.csv(ChurnTrain_v2, "ChurnTrain_v2.csv", row.names = FALSE)

Creation of Functions

We then created two model functions, load_and_train function and train_and_create, for Version 1 and Version 2 models respectively. Both functions consist of model generation using Random Forest, with both values of m set to 2*sqrt(p) as this gives the most optimized segment. Moreover, we created an evaluate_model function with k = 5 folds for cross-validation and summary_model function to display the summary of the model

library(RWeka)
library(caTools)

load_and_train_model <- function() {
    churndata <- read.csv("churndata.csv", stringsAsFactors = TRUE)
    new_column_names <- c("AccountLength", "IntlPlan", "VMailPlan",
                          "VMailMessage", "DayMins", "DayCalls", "DayCharge",
                          "EveMins", "EveCalls", "EveCharge", "NightMins",
                          "NightCalls", "NightCharge", "IntlMins", "IntlCalls",
                          "IntlCharge", "CustServCalls", "Churn")
    colnames(churndata) <- new_column_names
    
    set.seed(123)
    
    sample <- sample.split(churndata$Churn, SplitRatio = 0.5)
    ChurnTrain <- subset(churndata, sample == TRUE)
    
    RF <- make_Weka_classifier("weka/classifiers/trees/RandomForest")
    rfmodel <- RF(Churn ~ ., data = ChurnTrain,
                  control = Weka_control(K=floor(2*sqrt(18))))
    
    return(rfmodel)
}

train_and_create_model <- function(training_data) {
    new_column_names <- c("AccountLength", "IntlPlan", "VMailPlan",
                          "VMailMessage", "DayMins", "DayCalls", "DayCharge",
                          "EveMins", "EveCalls", "EveCharge", "NightMins",
                          "NightCalls", "NightCharge", "IntlMins", "IntlCalls",
                          "IntlCharge", "CustServCalls", "Churn")
    colnames(training_data) <- new_column_names
    
    RF <- make_Weka_classifier("weka/classifiers/trees/RandomForest")
    rfmodel <- RF(Churn ~ ., data = training_data, control = Weka_control(K=floor(2*sqrt(18))))
    return(rfmodel)
}

evaluate_model <- function(model, test_data) {
    evaluation_results <- evaluate_Weka_classifier(
        model, newdata = test_data,
        numFolds = 5, class = TRUE, seed = 1
    )
    return(evaluation_results)
}

model_summary <- function(model){
    return(summary(model))
}

Churn Prediction Application Features

The application lets the user choose from two different versions of the model. Specifically, it can accept new data for prediction and use any version to predict new rows. Specifically, each model has their own specifications:

The application has the following general functions:

Application Demo

This section demonstrates how the application works and the information it provides to the user. The demo is provided for each version of the model.

Version 1

The prediction table shows the head of the dataset with their corresponding churn values

The application also provides model summary and evaluation of the model to the users

Version 2

The prediction table shows the head of the dataset with their corresponding churn values

The application also provides model summary and evaluation of the model to the users

To test our Churn Prediction Application, you may also visit the link below:

https://qyu1db-keana0francheska-bautista.shinyapps.io/developingDataProducts_Sy_Bautista/