Introduction

Recently, i’ve learned a few more Machine Learning techniques from Kaggle. I might as well apply it here. The previous version made use of caret. This time i will make use of the RPart package.

Importing the dataset and its libraries

library(readxl)
library(rpart)
library(modelr)
library(tidyverse)

## -- Attaching packages --------------------------------------------------------------- tidyverse 1.2.1 --

## v ggplot2 3.1.0     v purrr   0.2.5
## v tibble  1.4.2     v dplyr   0.7.6
## v tidyr   0.8.1     v stringr 1.3.1
## v readr   1.1.1     v forcats 0.3.0

## -- Conflicts ------------------------------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

candy_crush <- read_excel("D:/Working Directory/candy_crush.xlsx")

head(candy_crush)

## # A tibble: 6 x 5
##   player_id             dt                  level num_attempts num_success
##   <chr>                 <dttm>              <dbl>        <dbl>       <dbl>
## 1 6dd5af4c7228fa353d50~ 2014-01-04 00:00:00     4            3           1
## 2 c7ec97c39349ab7e4d39~ 2014-01-01 00:00:00     8            4           1
## 3 c7ec97c39349ab7e4d39~ 2014-01-05 00:00:00    12            6           0
## 4 a32c5e9700ed356dc8dd~ 2014-01-03 00:00:00    11            1           1
## 5 a32c5e9700ed356dc8dd~ 2014-01-07 00:00:00    15            6           0
## 6 b94d403ac4edf639442f~ 2014-01-01 00:00:00     8            8           1

summary(candy_crush)

##   player_id               dt                          level       
##  Length:16865       Min.   :2014-01-01 00:00:00   Min.   : 1.000  
##  Class :character   1st Qu.:2014-01-02 00:00:00   1st Qu.: 6.000  
##  Mode  :character   Median :2014-01-04 00:00:00   Median : 9.000  
##                     Mean   :2014-01-04 01:06:15   Mean   : 9.287  
##                     3rd Qu.:2014-01-06 00:00:00   3rd Qu.:14.000  
##                     Max.   :2014-01-07 00:00:00   Max.   :15.000  
##   num_attempts      num_success     
##  Min.   :  0.000   Min.   : 0.0000  
##  1st Qu.:  1.000   1st Qu.: 0.0000  
##  Median :  3.000   Median : 1.0000  
##  Mean   :  5.535   Mean   : 0.6272  
##  3rd Qu.:  7.000   3rd Qu.: 1.0000  
##  Max.   :258.000   Max.   :55.0000

Creating the model.

Given there are three consequential entries that we need to keep an eye on, namely level, num_attempts and num_success. They will form the basis for our model. Our target will be num_success

fit_a <- rpart(num_success ~ num_attempts + level, data = candy_crush)

plot(fit_a, uniform = TRUE)
text(fit_a, cex = 0.6)

Making the prediction

print(predict(fit_a, head(candy_crush, 30)))

##         1         2         3         4         5         6         7 
## 0.9956124 0.5546383 0.5546383 0.5546383 0.3430181 0.5546383 0.5546383 
##         8         9        10        11        12        13        14 
## 0.5546383 0.9956124 0.3430181 0.3430181 0.5546383 0.3430181 0.3430181 
##        15        16        17        18        19        20        21 
## 0.3430181 0.3430181 0.3430181 0.5546383 0.5546383 0.5546383 0.5546383 
##        22        23        24        25        26        27        28 
## 0.5546383 0.9956124 0.5546383 0.3430181 0.3430181 0.5546383 0.5546383 
##        29        30 
## 0.8982387 0.5546383

Getting the MAE

mae(model = fit_a, data = candy_crush)

## [1] 0.4431938

Applying Machine Learning on the Candy Crush Dataset with RPart

Joel Jr Rudinas

December 2, 2018

Introduction

Importing the dataset and its libraries

Creating the model.

Making the prediction

Getting the MAE