Developing Data Products

Project Overview

This uses Churn Dataset to generate and deploy new machine learning model to predict Churn. It has two version of a model wherein the user can choose to update the previous ML model to reflect and incorporate the info in the new dataset in an updated ML model. The implementation of this project will be carried out using R Shiny and will subsequently be made available online. online.

Data Pre-processing

From the given data set, we created a csv generator to randomly split the data 50-50 and generate csv files to be used on the application. The first 50% of the data will be used to train and obtain a “Version 1” of the prediction model while the second 50% of the data will be used to update the “Version 2” model.

library(RWeka)
library(caTools)

#Version 1 Data
sample <- sample.split(churndata$Churn, SplitRatio = 0.5)
ChurnTrain <- subset(churndata, sample == TRUE)
ChurnTest <- subset(churndata, sample == FALSE, select = -Churn)

write.csv(ChurnTest, "ChurnTest.csv", row.names = FALSE)
write.csv(ChurnTrain, "ChurnTrain.csv", row.names = FALSE)

#Version 2 Data
sample_v2 <- sample.split(churndata$Churn, SplitRatio = 0.5)
ChurnTrain_v2 <- subset(churndata, sample_v2 == TRUE)
ChurnTest_v2 <- subset(churndata, sample_v2 == FALSE, select = -Churn)

write.csv(ChurnTest_v2, "ChurnTest_v2.csv", row.names = FALSE)
write.csv(ChurnTrain_v2, "ChurnTrain_v2.csv", row.names = FALSE)

Creation of Functions

We then created two model functions, load_and_train function and train_and_create, for Version 1 and Version 2 models respectively. Both functions consist of model generation using Random Forest, with both values of m set to 2*sqrt(p) as this gives the most optimized segment. Moreover, we created an evaluate_model function with k = 5 folds for cross-validation and summary_model function to display the summary of the model

library(RWeka)
library(caTools)

load_and_train_model <- function() {
    churndata <- read.csv("churndata.csv", stringsAsFactors = TRUE)
    new_column_names <- c("AccountLength", "IntlPlan", "VMailPlan",
                          "VMailMessage", "DayMins", "DayCalls", "DayCharge",
                          "EveMins", "EveCalls", "EveCharge", "NightMins",
                          "NightCalls", "NightCharge", "IntlMins", "IntlCalls",
                          "IntlCharge", "CustServCalls", "Churn")
    colnames(churndata) <- new_column_names
    
    set.seed(123)
    
    sample <- sample.split(churndata$Churn, SplitRatio = 0.5)
    ChurnTrain <- subset(churndata, sample == TRUE)
    
    RF <- make_Weka_classifier("weka/classifiers/trees/RandomForest")
    rfmodel <- RF(Churn ~ ., data = ChurnTrain,
                  control = Weka_control(K=floor(2*sqrt(18))))
    
    return(rfmodel)
}

train_and_create_model <- function(training_data) {
    new_column_names <- c("AccountLength", "IntlPlan", "VMailPlan",
                          "VMailMessage", "DayMins", "DayCalls", "DayCharge",
                          "EveMins", "EveCalls", "EveCharge", "NightMins",
                          "NightCalls", "NightCharge", "IntlMins", "IntlCalls",
                          "IntlCharge", "CustServCalls", "Churn")
    colnames(training_data) <- new_column_names
    
    RF <- make_Weka_classifier("weka/classifiers/trees/RandomForest")
    rfmodel <- RF(Churn ~ ., data = training_data, control = Weka_control(K=floor(2*sqrt(18))))
    return(rfmodel)
}

evaluate_model <- function(model, test_data) {
    evaluation_results <- evaluate_Weka_classifier(
        model, newdata = test_data,
        numFolds = 5, class = TRUE, seed = 1
    )
    return(evaluation_results)
}

model_summary <- function(model){
    return(summary(model))
}

Churn Prediction Application Features

The application lets the user choose from two different versions of the model. Specifically, it can accept new data for prediction and use any version to predict new rows. Specifically, each model has their own specifications:

Version 1 - a viable model is already created and the user can upload new data to test the model.
Version 2 - the user can upload a new data to train the model and upload a test data to predict new rows.

The application has the following general functions:

Upload Test and Train Data Button - Lets the user input data to the models via CSV upload
Train and Predict Button - Lets the user train the model and predict churn
Clear Button - Lets the user clear the reactive values
Export Predictions as CSV Button - Lets the user export the predicted data as CSV
Prediction Table Tab - Display the head of the dataset with predicted churn values
Model Summary - Displays the summary of the model
Evaluation Results - Displays the performance of the model on the new data using Weka classifier

Application Demo

This section demonstrates how the application works and the information it provides to the user. The demo is provided for each version of the model.

Version 1

The prediction table shows the head of the dataset with their corresponding churn values

The application also provides model summary and evaluation of the model to the users

Version 2