Introduction

Data analysis should be reproducible, meaning: every step taken to manipulate, clean, transform, summarize, visualize or model data should be documented exactly so that results can be replicated. RMarkdown is a tool—or, specifically, a document type—for doing reproducible data science by keeping the code for a project together with the written analysis and interpretation.

This is an RMarkdown template that you can use for calculating answers to the project quiz questions for this module. You will also knit this document to HTML (or Word) and submit it for the File Upload assignment.

RMarkdown uses a very simple markup language. For example, rather than interacting with a menu to format the text, as in MS Word, you use simple code outside of the code chunks:

Notes on compiling this document

In the code chunk above (entitled “setup”) echo is set to TRUE. This means that the code in your chunks will be displayed, along with the results, in your compiled document.

Load and Transform Data

Below is code to clean and prepare the dataset for modeling. Before running that code, follow these preparatory steps:

  1. After downloading the RMarkdown template and the dataset for the assignment from Canvas, make sure to copy or move these files from your downloads folder to a folder dedicated to this class–say, MKTG-6487.
  2. You need to define that folder as your “working directory.” To do so, navigate to that folder using the files tab in the lower right quadrant in RStudio. (You should see your files you moved into this folder in the previous step.) Click the “More” button in the menu under the Files tab and select “Set As Working Directory.”

Once the files are in the right location on your computer then run this code to clean and format the data:

# You must run this code to format the dataset properly!
setwd("/Users/txharris/Desktop/MKTG 6487")
advise_invest <- read_csv("adviseinvest.csv")  %>%            # Download data and save it (via assignment operator)
  select(-product) %>%                                        # Remove the product column
  filter(income > 0,                                          # Filter out mistaken data
         num_accts < 5) %>% 
  mutate(answered = ifelse(answered==0, "no","yes"),          # Turn answered into yes/no 
         answered = factor(answered,                          # Turn answered into factor
                           levels  = c("no", "yes")),         # Specify factor levels
         female = factor(female),                             # Make other binary and categorical                                                                                                        # variables into factors
         job = factor(job),
         rent = factor(rent),
         own_res = factor(own_res),
         new_car = factor(new_car),
         mobile = factor(mobile),
         chk_acct = factor(chk_acct),
         sav_acct = factor(sav_acct)) 
## Rows: 29502 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (14): answered, income, female, age, job, num_dependents, rent, own_res,...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Questions

Use the code chunks below to write code that will enable you to answer the questions in the project quiz.

Some of the questions do not require writing code and have been omitted from this template.

Q2.

setwd("/Users/txharris/Desktop/MKTG 6487")

answered_calls <- sum(advise_invest$answered == "yes")
unanswered_calls <- sum(advise_invest$answered == "no")

cat("Number of Answered Calls:", answered_calls, "\n")
## Number of Answered Calls: 16124
cat("Number of Unanswered Calls:", unanswered_calls, "\n")
## Number of Unanswered Calls: 13375
accuracy_majority_class <- answered_calls / (answered_calls + unanswered_calls)
accuracy_majority_class <- round(accuracy_majority_class, 3)

cat("The accuracy of the majority class classifier is:", accuracy_majority_class, "\n")
## The accuracy of the majority class classifier is: 0.547

Q3.

setwd("/Users/txharris/Desktop/MKTG 6487")

income_model <- rpart(answered ~ income, data = advise_invest)
predictions <- predict(income_model, advise_invest, type = "class")

accuracy_income_model <- mean(predictions == advise_invest$answered)
accuracy_income_model <- round(accuracy_income_model, 3)

cat("Accuracy of the income model:", accuracy_income_model, "\n")
## Accuracy of the income model: 0.642

Q4.

Q5.

setwd("/Users/txharris/Desktop/MKTG 6487")

tree_model <- rpart(answered ~ . - mobile, data = advise_invest)
library(rpart)
library(rpart.plot)
library(caret)
## Loading required package: lattice
## 
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
## 
##     lift
tweak_value <- 1.5
pdf("tree_model_plot.pdf", width = 15, height = 10)
prp(tree_model, tweak = tweak_value)
dev.off()  
## quartz_off_screen 
##                 2
variable_importance <- varImp(tree_model)
print(variable_importance)
##                   Overall
## age            2164.84704
## chk_acct       2800.30216
## female          509.65339
## income         3895.99382
## job            1151.78956
## new_car        1123.21411
## num_accts       893.11606
## num_dependents   25.00611
## own_res         357.53904
## rent            403.31879
## sav_acct       1290.75719

Q6.

setwd("/Users/txharris/Desktop/MKTG 6487")

income_model <- rpart(answered ~ income, data = advise_invest)
income_predictions <- predict(income_model, advise_invest, type = "class")
accuracy_income_model <- mean(income_predictions == advise_invest$answered)

tree_model <- rpart(answered ~ . - mobile, data = advise_invest)
tree_predictions <- predict(tree_model, advise_invest, type = "class")
accuracy_tree_model <- mean(tree_predictions == advise_invest$answered)

cat("Accuracy of the income model:", accuracy_income_model, "\n")
## Accuracy of the income model: 0.6420218
cat("Accuracy of the tree model:", accuracy_tree_model, "\n")
## Accuracy of the tree model: 0.8047391
if (accuracy_tree_model > accuracy_income_model) {
  cat("The tree model is better\n")
} else if (accuracy_tree_model < accuracy_income_model) {
  cat("The income model is better\n")
} 
## The tree model is better