Introduction

Recently, i’ve learned a few more Machine Learning techniques from Kaggle. I might as well apply it here. The previous version made use of caret. This time i will make use of the RPart package.

Importing the dataset and its libraries

library(readxl)
library(rpart)
library(modelr)
library(tidyverse)
## -- Attaching packages --------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.1.0     v purrr   0.2.5
## v tibble  1.4.2     v dplyr   0.7.6
## v tidyr   0.8.1     v stringr 1.3.1
## v readr   1.1.1     v forcats 0.3.0
## -- Conflicts ------------------------------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
candy_crush <- read_excel("D:/Working Directory/candy_crush.xlsx")

head(candy_crush)
## # A tibble: 6 x 5
##   player_id             dt                  level num_attempts num_success
##   <chr>                 <dttm>              <dbl>        <dbl>       <dbl>
## 1 6dd5af4c7228fa353d50~ 2014-01-04 00:00:00     4            3           1
## 2 c7ec97c39349ab7e4d39~ 2014-01-01 00:00:00     8            4           1
## 3 c7ec97c39349ab7e4d39~ 2014-01-05 00:00:00    12            6           0
## 4 a32c5e9700ed356dc8dd~ 2014-01-03 00:00:00    11            1           1
## 5 a32c5e9700ed356dc8dd~ 2014-01-07 00:00:00    15            6           0
## 6 b94d403ac4edf639442f~ 2014-01-01 00:00:00     8            8           1
summary(candy_crush)
##   player_id               dt                          level       
##  Length:16865       Min.   :2014-01-01 00:00:00   Min.   : 1.000  
##  Class :character   1st Qu.:2014-01-02 00:00:00   1st Qu.: 6.000  
##  Mode  :character   Median :2014-01-04 00:00:00   Median : 9.000  
##                     Mean   :2014-01-04 01:06:15   Mean   : 9.287  
##                     3rd Qu.:2014-01-06 00:00:00   3rd Qu.:14.000  
##                     Max.   :2014-01-07 00:00:00   Max.   :15.000  
##   num_attempts      num_success     
##  Min.   :  0.000   Min.   : 0.0000  
##  1st Qu.:  1.000   1st Qu.: 0.0000  
##  Median :  3.000   Median : 1.0000  
##  Mean   :  5.535   Mean   : 0.6272  
##  3rd Qu.:  7.000   3rd Qu.: 1.0000  
##  Max.   :258.000   Max.   :55.0000

Creating the model.

Given there are three consequential entries that we need to keep an eye on, namely level, num_attempts and num_success. They will form the basis for our model. Our target will be num_success

fit_a <- rpart(num_success ~ num_attempts + level, data = candy_crush)

plot(fit_a, uniform = TRUE)
text(fit_a, cex = 0.6)

Making the prediction

print(predict(fit_a, head(candy_crush, 30)))
##         1         2         3         4         5         6         7 
## 0.9956124 0.5546383 0.5546383 0.5546383 0.3430181 0.5546383 0.5546383 
##         8         9        10        11        12        13        14 
## 0.5546383 0.9956124 0.3430181 0.3430181 0.5546383 0.3430181 0.3430181 
##        15        16        17        18        19        20        21 
## 0.3430181 0.3430181 0.3430181 0.5546383 0.5546383 0.5546383 0.5546383 
##        22        23        24        25        26        27        28 
## 0.5546383 0.9956124 0.5546383 0.3430181 0.3430181 0.5546383 0.5546383 
##        29        30 
## 0.8982387 0.5546383

Getting the MAE

mae(model = fit_a, data = candy_crush)
## [1] 0.4431938