Recently, i’ve learned a few more Machine Learning techniques from Kaggle. I might as well apply it here. The previous version made use of caret. This time i will make use of the RPart package.
library(readxl)
library(rpart)
library(modelr)
library(tidyverse)
## -- Attaching packages --------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.1.0 v purrr 0.2.5
## v tibble 1.4.2 v dplyr 0.7.6
## v tidyr 0.8.1 v stringr 1.3.1
## v readr 1.1.1 v forcats 0.3.0
## -- Conflicts ------------------------------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
candy_crush <- read_excel("D:/Working Directory/candy_crush.xlsx")
head(candy_crush)
## # A tibble: 6 x 5
## player_id dt level num_attempts num_success
## <chr> <dttm> <dbl> <dbl> <dbl>
## 1 6dd5af4c7228fa353d50~ 2014-01-04 00:00:00 4 3 1
## 2 c7ec97c39349ab7e4d39~ 2014-01-01 00:00:00 8 4 1
## 3 c7ec97c39349ab7e4d39~ 2014-01-05 00:00:00 12 6 0
## 4 a32c5e9700ed356dc8dd~ 2014-01-03 00:00:00 11 1 1
## 5 a32c5e9700ed356dc8dd~ 2014-01-07 00:00:00 15 6 0
## 6 b94d403ac4edf639442f~ 2014-01-01 00:00:00 8 8 1
summary(candy_crush)
## player_id dt level
## Length:16865 Min. :2014-01-01 00:00:00 Min. : 1.000
## Class :character 1st Qu.:2014-01-02 00:00:00 1st Qu.: 6.000
## Mode :character Median :2014-01-04 00:00:00 Median : 9.000
## Mean :2014-01-04 01:06:15 Mean : 9.287
## 3rd Qu.:2014-01-06 00:00:00 3rd Qu.:14.000
## Max. :2014-01-07 00:00:00 Max. :15.000
## num_attempts num_success
## Min. : 0.000 Min. : 0.0000
## 1st Qu.: 1.000 1st Qu.: 0.0000
## Median : 3.000 Median : 1.0000
## Mean : 5.535 Mean : 0.6272
## 3rd Qu.: 7.000 3rd Qu.: 1.0000
## Max. :258.000 Max. :55.0000
Given there are three consequential entries that we need to keep an eye on, namely level, num_attempts and num_success. They will form the basis for our model. Our target will be num_success
fit_a <- rpart(num_success ~ num_attempts + level, data = candy_crush)
plot(fit_a, uniform = TRUE)
text(fit_a, cex = 0.6)
print(predict(fit_a, head(candy_crush, 30)))
## 1 2 3 4 5 6 7
## 0.9956124 0.5546383 0.5546383 0.5546383 0.3430181 0.5546383 0.5546383
## 8 9 10 11 12 13 14
## 0.5546383 0.9956124 0.3430181 0.3430181 0.5546383 0.3430181 0.3430181
## 15 16 17 18 19 20 21
## 0.3430181 0.3430181 0.3430181 0.5546383 0.5546383 0.5546383 0.5546383
## 22 23 24 25 26 27 28
## 0.5546383 0.9956124 0.5546383 0.3430181 0.3430181 0.5546383 0.5546383
## 29 30
## 0.8982387 0.5546383
mae(model = fit_a, data = candy_crush)
## [1] 0.4431938