Player Unknown’s BattleGrounds (PUBG) :

Overview of PUBG Player Unknown’s BattleGrounds is a muti platform game, where players will find themselves stranded on an island with up to 100 others. It’s survival of the fittest; only one player can be left standing (or team in duo / squad mode (max of 4 players)). Players begin with nothing and fight for limited resources, gear, and vehicles to survive.

Objective: Create a model which predicts players’ finishing placement based on their final stats, on a scale from 1 [first place] to 0 [last place].

library(readr)
## Warning: package 'readr' was built under R version 3.5.3
library(ggplot2)
library(scales)
## Warning: package 'scales' was built under R version 3.5.3
## 
## Attaching package: 'scales'
## The following object is masked from 'package:readr':
## 
##     col_factor
library(corrplot)
## Warning: package 'corrplot' was built under R version 3.5.3
## corrplot 0.84 loaded
library(ranger)
## Warning: package 'ranger' was built under R version 3.5.3
library(pdp)
## Warning: package 'pdp' was built under R version 3.5.3
library(lime)
## Warning: package 'lime' was built under R version 3.5.3
library(h2o)
## Warning: package 'h2o' was built under R version 3.5.3
## 
## ----------------------------------------------------------------------
## 
## Your next step is to start H2O:
##     > h2o.init()
## 
## For H2O package documentation, ask for help:
##     > ??h2o
## 
## After starting H2O, you can use the Web UI at http://localhost:54321
## For more information visit http://docs.h2o.ai
## 
## ----------------------------------------------------------------------
## 
## Attaching package: 'h2o'
## The following objects are masked from 'package:stats':
## 
##     cor, sd, var
## The following objects are masked from 'package:base':
## 
##     %*%, %in%, &&, ||, apply, as.factor, as.numeric, colnames,
##     colnames<-, ifelse, is.character, is.factor, is.numeric, log,
##     log10, log1p, log2, round, signif, trunc
library(MASS)
## Warning: package 'MASS' was built under R version 3.5.3
library(data.table)
## Warning: package 'data.table' was built under R version 3.5.3
## 
## Attaching package: 'data.table'
## The following objects are masked from 'package:h2o':
## 
##     hour, month, week, year
library(GGally)
## Warning: package 'GGally' was built under R version 3.5.3
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 3.5.3
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.5.3
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:gridExtra':
## 
##     combine
## The following object is masked from 'package:GGally':
## 
##     nasa
## The following objects are masked from 'package:data.table':
## 
##     between, first, last
## The following object is masked from 'package:MASS':
## 
##     select
## The following object is masked from 'package:lime':
## 
##     explain
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(caret)
## Loading required package: lattice
library(reprex)
## Warning: package 'reprex' was built under R version 3.5.3

Read the Training Data Set fread is a more conveient and faster command than read.table

cat("Reading...")
## Reading...
train<- fread('C:/Users/Niall Graham/Desktop/Data Science/Neural Network/Train/train_V2.csv')
test <- fread("C:/Users/Niall Graham/Desktop/Data Science/Neural Network/Test/test_V2.csv")

View the Data

str(train)
## Classes 'data.table' and 'data.frame':   4446966 obs. of  29 variables:
##  $ Id             : chr  "7f96b2f878858a" "eef90569b9d03c" "1eaf90ac73de72" "4616d365dd2853" ...
##  $ groupId        : chr  "4d4b580de459be" "684d5656442f9e" "6a4a42c3245a74" "a930a9c79cd721" ...
##  $ matchId        : chr  "a10357fd1a4a91" "aeb375fc57110c" "110163d8bb94ae" "f1f1f4ef412d7e" ...
##  $ assists        : int  0 0 1 0 0 0 0 0 0 0 ...
##  $ boosts         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ damageDealt    : num  0 91.5 68 32.9 100 ...
##  $ DBNOs          : int  0 0 0 0 0 1 0 0 0 0 ...
##  $ headshotKills  : int  0 0 0 0 0 1 0 0 0 0 ...
##  $ heals          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ killPlace      : int  60 57 47 75 45 44 96 48 64 74 ...
##  $ killPoints     : int  1241 0 0 0 0 0 1262 1000 0 0 ...
##  $ kills          : int  0 0 0 0 1 1 0 0 0 0 ...
##  $ killStreaks    : int  0 0 0 0 1 1 0 0 0 0 ...
##  $ longestKill    : num  0 0 0 0 58.5 ...
##  $ matchDuration  : int  1306 1777 1318 1436 1424 1395 1316 1967 1375 1930 ...
##  $ matchType      : chr  "squad-fpp" "squad-fpp" "duo" "squad-fpp" ...
##  $ maxPlace       : int  28 26 50 31 97 28 28 96 28 29 ...
##  $ numGroups      : int  26 25 47 30 95 28 28 92 27 27 ...
##  $ rankPoints     : int  -1 1484 1491 1408 1560 1418 -1 -1 1493 1349 ...
##  $ revives        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ rideDistance   : num  0 0.0045 0 0 0 ...
##  $ roadKills      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ swimDistance   : num  0 11 0 0 0 ...
##  $ teamKills      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ vehicleDestroys: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ walkDistance   : num  244.8 1434 161.8 202.7 49.8 ...
##  $ weaponsAcquired: int  1 5 2 3 2 1 1 6 4 1 ...
##  $ winPoints      : int  1466 0 0 0 0 0 1497 1500 0 0 ...
##  $ winPlacePerc   : num  0.444 0.64 0.775 0.167 0.188 ...
##  - attr(*, ".internal.selfref")=<externalptr>
str(test)
## Classes 'data.table' and 'data.frame':   1934174 obs. of  28 variables:
##  $ Id             : chr  "9329eb41e215eb" "639bd0dcd7bda8" "63d5c8ef8dfe91" "cf5b81422591d1" ...
##  $ groupId        : chr  "676b23c24e70d6" "430933124148dd" "0b45f5db20ba99" "b7497dbdc77f4a" ...
##  $ matchId        : chr  "45b576ab7daa7f" "42a9a0b906c928" "87e7e4477a048e" "1b9a94f1af67f1" ...
##  $ assists        : int  0 0 1 0 0 0 0 0 0 0 ...
##  $ boosts         : int  0 4 0 0 4 0 3 0 0 0 ...
##  $ damageDealt    : num  51.5 179.1 23.4 65.5 330.2 ...
##  $ DBNOs          : int  0 0 0 0 1 0 3 0 0 0 ...
##  $ headshotKills  : int  0 0 0 0 2 0 2 0 0 0 ...
##  $ heals          : int  0 2 4 0 1 0 17 0 0 0 ...
##  $ killPlace      : int  73 11 49 54 7 89 3 73 56 54 ...
##  $ killPoints     : int  0 0 0 0 0 0 0 0 0 1023 ...
##  $ kills          : int  0 2 0 0 3 0 5 0 0 0 ...
##  $ killStreaks    : int  0 1 0 0 1 0 1 0 0 0 ...
##  $ longestKill    : num  0 361.9 0 0 60.1 ...
##  $ matchDuration  : int  1884 1811 1793 1834 1326 1775 1328 1870 1815 1336 ...
##  $ matchType      : chr  "squad-fpp" "duo-fpp" "squad-fpp" "duo-fpp" ...
##  $ maxPlace       : int  28 48 28 45 28 29 49 29 28 27 ...
##  $ numGroups      : int  28 47 27 44 27 29 48 27 27 27 ...
##  $ rankPoints     : int  1500 1503 1565 1465 1480 1490 1538 1487 1640 -1 ...
##  $ revives        : int  0 2 0 0 1 0 0 0 0 0 ...
##  $ rideDistance   : num  0 4669 0 0 0 ...
##  $ roadKills      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ swimDistance   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ teamKills      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ vehicleDestroys: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ walkDistance   : num  588 2017 788 1812 2963 ...
##  $ weaponsAcquired: int  1 6 4 3 4 0 4 5 7 5 ...
##  $ winPoints      : int  0 0 0 0 0 0 0 0 0 1495 ...
##  - attr(*, ".internal.selfref")=<externalptr>

Training data contains colum “winPlacePerc”, this column is missing from the Test data, we will need to predict this column in Test in order to meet our objective

Identify Target Variable

summary(train$winPlacePerc)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.0000  0.2000  0.4583  0.4728  0.7407  1.0000       1

Our Median value will denoting the quantity of players lying at the midpoint of a frequency distribution of observed values or quantities.

Median : 0.4583

Therefore most players finished in the middle ranks of the training set.

We can assume that the test data

cat("Results")
## Results
submission <- rep(0.4583, 1934174)

submission <- fread("C:/Users/Niall Graham/Desktop/Data Science/Neural Network/Test/test_V2.csv") %>% mutate(winPlacePerc = submission)

write.csv(submission, "first_submission.csv", append = FALSE)
## Warning in write.csv(submission, "first_submission.csv", append = FALSE):
## attempt to set 'append' ignored
head(submission)
##               Id        groupId        matchId assists boosts damageDealt
## 1 9329eb41e215eb 676b23c24e70d6 45b576ab7daa7f       0      0       51.46
## 2 639bd0dcd7bda8 430933124148dd 42a9a0b906c928       0      4      179.10
## 3 63d5c8ef8dfe91 0b45f5db20ba99 87e7e4477a048e       1      0       23.40
## 4 cf5b81422591d1 b7497dbdc77f4a 1b9a94f1af67f1       0      0       65.52
## 5 ee6a295187ba21 6604ce20a1d230 40754a93016066       0      4      330.20
## 6 3e2539b5d78183 029b5a79e08cd6 10186f5c852f62       0      0        0.00
##   DBNOs headshotKills heals killPlace killPoints kills killStreaks
## 1     0             0     0        73          0     0           0
## 2     0             0     2        11          0     2           1
## 3     0             0     4        49          0     0           0
## 4     0             0     0        54          0     0           0
## 5     1             2     1         7          0     3           1
## 6     0             0     0        89          0     0           0
##   longestKill matchDuration matchType maxPlace numGroups rankPoints
## 1        0.00          1884 squad-fpp       28        28       1500
## 2      361.90          1811   duo-fpp       48        47       1503
## 3        0.00          1793 squad-fpp       28        27       1565
## 4        0.00          1834   duo-fpp       45        44       1465
## 5       60.06          1326 squad-fpp       28        27       1480
## 6        0.00          1775 squad-fpp       29        29       1490
##   revives rideDistance roadKills swimDistance teamKills vehicleDestroys
## 1       0            0         0            0         0               0
## 2       2         4669         0            0         0               0
## 3       0            0         0            0         0               0
## 4       0            0         0            0         0               0
## 5       1            0         0            0         0               0
## 6       0            0         0            0         0               0
##   walkDistance weaponsAcquired winPoints winPlacePerc
## 1        588.0               1         0       0.4583
## 2       2017.0               6         0       0.4583
## 3        787.8               4         0       0.4583
## 4       1812.0               3         0       0.4583
## 5       2963.0               4         0       0.4583
## 6          0.0               0         0       0.4583

Possible overfitting walkDistance and weaponsAcquired when compared against winPlacePerc

Data provided is very clean , only one NULL value .

Replace with a 0 value.

train$winPlacePerc[is.na(train$winPlacePerc)] <- 0
train$matchType<- as.factor(train$matchType)
test$matchType <- as.factor(test$matchType)

The Median value : 0.4583 is being applied to every other player.

================================================= Experincing overfitting . =================================================

“The goal of a good machine learning model is to generalize well from the training data to any data from the problem domain. This allows us to make predictions in the future on data the model has never seen.”

================================================= *Correlation between variables. =================================================

Last 10 players should be in the final circle of the game and hence have travelled the greatest walk distance also acquired the most weapons and used the most healt boost.

train.num <- train[,c(4:15,17:29)]
train.cor<-as.data.frame(lapply(train.num, as.numeric))
corrplot(cor(train.cor),method = "square")

Positively Correlated ->WalkDistance,WeaponsAcquired,Boosts, Headshots

Negatively Correlated -> KillPlace

============================================================= NumGroups (Number of groups we have data for in the match.) ==============================================================

table(train$winPlacePerc[train$numGroups == 1])
## 
##    0 
## 1147

Walk Distance (Total Distance traveled on Foot measured in Meters).

train.new <- train%>% mutate(kmperhour = (walkDistance/1000)/ (matchDuration/3600))

ggplot(train.new,aes(kmperhour))+geom_histogram(bins =500,fill = "DarkGreen")+labs("Walk Distance Perhour")

It the walk speed in the game can not exceed 20km therefore the outliers/ players that exceed this speed are utilising a cheatcode and hence must be excluded from the data.

train$cheaters <- 0
train$cheaters[train$kmperhour > 20] <- 1

test$cheaters <- 0
test$cheaters[test$kmperhour > 20] <- 1

===================================== Player Kills Per Match =====================================

Create a flag for players id’s who have more than 40 kills.

train$killer_flag <- 0
train$killer_flag[train$kills > 40] <- 1

test$killer_flag <- 0
test$killer_flag[test$kills > 40] <- 1

==================================== HeadShot Rates ==================================== Creat a variable headshot_rate to understand the rate of head shot kills from the total kills.

train$headshot_rate <- 0
test$headshot_rate <- 0
train$headshot_rate <- train$headshotKills / train$kills
test$headshot_rate <- train$headshotKills / test$kills
## Warning in train$headshotKills/test$kills: longer object length is not a
## multiple of shorter object length
## Warning in `[<-.data.table`(x, j = name, value = value): Supplied 4446966
## items to be assigned to 1934174 items of column 'headshot_rate' (2512792
## unused)
table(train$headshot_rate)
## 
##                  0  0.037037037037037 0.0384615384615385 
##            1166394                  1                  2 
## 0.0416666666666667 0.0434782608695652 0.0454545454545455 
##                  3                  2                  1 
## 0.0476190476190476               0.05 0.0526315789473684 
##                  2                  7                  6 
## 0.0555555555555556 0.0588235294117647             0.0625 
##                  9                 16                 21 
## 0.0666666666666667 0.0714285714285714 0.0731707317073171 
##                 41                 65                  1 
## 0.0740740740740741              0.075 0.0769230769230769 
##                  2                  1                118 
##               0.08 0.0833333333333333 0.0869565217391304 
##                  2                204                  3 
##  0.087719298245614 0.0909090909090909 0.0952380952380952 
##                  2                420                  4 
##                0.1  0.103448275862069  0.105263157894737 
##                801                  1                 11 
##  0.107142857142857  0.108695652173913  0.111111111111111 
##                  4                  1               1605 
##  0.115384615384615  0.117647058823529               0.12 
##                  4                 26                  2 
##  0.121212121212121  0.121951219512195              0.125 
##                  2                  1               3335 
##  0.127272727272727  0.130434782608696  0.131578947368421 
##                  1                  8                  1 
##  0.133333333333333  0.135135135135135  0.136363636363636 
##                 73                  1                 14 
##  0.137931034482759  0.142857142857143  0.148148148148148 
##                  1               6519                  3 
##               0.15  0.150943396226415  0.153846153846154 
##                 19                  2                208 
##            0.15625  0.157894736842105               0.16 
##                  1                 19                  1 
##  0.161290322580645  0.162162162162162  0.166666666666667 
##                  2                  1              13197 
##  0.170731707317073  0.171428571428571  0.172413793103448 
##                  1                  1                  3 
##  0.173913043478261  0.176470588235294  0.178571428571429 
##                  4                 37                  2 
##  0.181818181818182  0.184210526315789  0.185185185185185 
##                617                  2                  2 
##             0.1875  0.189189189189189   0.19047619047619 
##                 48                  1                 15 
##  0.192307692307692  0.193548387096774  0.194444444444444 
##                  2                  2                  1 
##  0.195121951219512                0.2  0.205882352941176 
##                  1              25999                  1 
##  0.206896551724138  0.208333333333333  0.209302325581395 
##                  1                  2                  1 
##  0.210526315789474  0.212121212121212  0.214285714285714 
##                 13                  1                131 
##  0.217391304347826            0.21875  0.222222222222222 
##                 10                  1               2022 
##  0.225806451612903  0.227272727272727  0.230769230769231 
##                  2                  7                227 
##  0.232142857142857  0.233333333333333  0.234042553191489 
##                  1                  2                  1 
##  0.235294117647059  0.238095238095238               0.24 
##                 44                  8                  6 
##  0.241379310344828  0.242424242424242  0.244444444444444 
##                  2                  1                  1 
##               0.25  0.255813953488372  0.256410256410256 
##              53362                  1                  1 
##  0.258064516129032  0.259259259259259  0.260869565217391 
##                  3                  2                  6 
##  0.261904761904762  0.263157894736842  0.264705882352941 
##                  1                 13                  1 
##  0.266666666666667  0.269230769230769  0.272727272727273 
##                 81                  4                595 
##  0.277777777777778               0.28  0.282051282051282 
##                 26                  1                  1 
##  0.285714285714286  0.290322580645161  0.291666666666667 
##               6201                  1                  5 
##  0.294117647058824  0.296296296296296  0.297297297297297 
##                 32                  3                  1 
##                0.3  0.304347826086957  0.307692307692308 
##                974                  3                166 
##  0.310344827586207             0.3125  0.315789473684211 
##                  1                 41                 15 
##  0.318181818181818               0.32  0.321428571428571 
##                 12                  2                  2 
##   0.32258064516129  0.333333333333333  0.340909090909091 
##                  1             104061                  1 
##  0.342857142857143  0.346153846153846  0.347826086956522 
##                  1                  3                  2 
##               0.35  0.352941176470588  0.357142857142857 
##                  6                 10                101 
##               0.36  0.363636363636364  0.366666666666667 
##                  1                458                  1 
##  0.368421052631579   0.37037037037037              0.375 
##                  7                  1               2432 
##  0.379310344827586  0.380952380952381  0.384615384615385 
##                  1                  7                124 
##  0.388888888888889  0.391304347826087  0.392857142857143 
##                 10                  2                  1 
##                0.4  0.409090909090909  0.411764705882353 
##              17034                  3                 20 
##  0.416666666666667  0.421052631578947  0.428571428571429 
##                198                  4               3759 
##  0.433333333333333  0.434782608695652  0.435897435897436 
##                  1                  1                  1 
##             0.4375               0.44  0.444444444444444 
##                 28                  1                966 
##               0.45  0.454545454545455  0.461538461538462 
##                  1                271                106 
##  0.464285714285714  0.466666666666667  0.470588235294118 
##                  1                 25                 11 
##  0.473684210526316  0.476190476190476               0.48 
##                  4                  3                  1 
##  0.483870967741935  0.486486486486487                0.5 
##                  1                  1             196009 
##               0.52  0.523809523809524  0.526315789473684 
##                  1                  2                  4 
##  0.529411764705882  0.533333333333333  0.538461538461538 
##                  7                 25                 70 
##  0.541666666666667  0.545454545454545               0.55 
##                  1                168                  1 
##  0.551020408163265  0.555555555555556               0.56 
##                  1                527                  1 
##             0.5625  0.565217391304348  0.571428571428571 
##                  7                  2               1666 
##  0.578947368421053  0.580645161290323  0.583333333333333 
##                  6                  1                 72 
##  0.584905660377358  0.586206896551724  0.588235294117647 
##                  1                  2                  8 
##  0.592592592592593                0.6  0.606060606060606 
##                  1               6550                  1 
##  0.607142857142857  0.608695652173913  0.611111111111111 
##                  1                  1                  6 
##  0.615384615384615  0.619047619047619              0.625 
##                 38                  4                571 
##  0.631578947368421  0.636363636363636               0.64 
##                  3                 81                  2 
##  0.642857142857143  0.647058823529412               0.65 
##                 24                  7                  2 
##  0.653846153846154  0.666666666666667  0.678571428571429 
##                  1              33412                  1 
##  0.681818181818182  0.684210526315789             0.6875 
##                  2                  3                  7 
##  0.692307692307692                0.7  0.703703703703704 
##                 22                103                  1 
##  0.705882352941177  0.714285714285714            0.71875 
##                  5                584                  1 
##  0.722222222222222  0.727272727272727  0.733333333333333 
##                  6                 40                  8 
##  0.736842105263158  0.742857142857143  0.745454545454545 
##                  1                  1                  1 
##               0.75  0.761904761904762  0.764705882352941 
##               6886                  1                  4 
##  0.769230769230769  0.772727272727273              0.775 
##                 14                  1                  1 
##  0.777777777777778  0.782608695652174  0.785714285714286 
##                110                  1                  9 
##  0.787878787878788  0.790697674418605                0.8 
##                  1                  1               1533 
##             0.8125  0.818181818181818  0.823529411764706 
##                  4                 36                  1 
##  0.833333333333333               0.84  0.842105263157895 
##                456                  2                  2 
##  0.846153846153846               0.85  0.857142857142857 
##                  5                  2                172 
##  0.866666666666667  0.867924528301887              0.875 
##                  6                  1                 93 
##  0.882352941176471  0.888888888888889  0.894736842105263 
##                  2                 41                  1 
##                0.9  0.904761904761905  0.909090909090909 
##                 18                  1                 21 
##  0.916666666666667               0.92  0.923076923076923 
##                 11                  1                  3 
##  0.928571428571429  0.933333333333333             0.9375 
##                  6                  4                  3 
##  0.944444444444444  0.951219512195122  0.952380952380952 
##                  3                  1                  1 
##  0.958333333333333                  1 
##                  1             254068
cat("splitting data")
## splitting data
train.rows <- createDataPartition(train$winPlacePerc, p = 0.8, list = FALSE)
train1 <- train[train.rows,]
test1 <- train[train.rows,]
str(train)
## Classes 'data.table' and 'data.frame':   4446966 obs. of  32 variables:
##  $ Id             : chr  "7f96b2f878858a" "eef90569b9d03c" "1eaf90ac73de72" "4616d365dd2853" ...
##  $ groupId        : chr  "4d4b580de459be" "684d5656442f9e" "6a4a42c3245a74" "a930a9c79cd721" ...
##  $ matchId        : chr  "a10357fd1a4a91" "aeb375fc57110c" "110163d8bb94ae" "f1f1f4ef412d7e" ...
##  $ assists        : int  0 0 1 0 0 0 0 0 0 0 ...
##  $ boosts         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ damageDealt    : num  0 91.5 68 32.9 100 ...
##  $ DBNOs          : int  0 0 0 0 0 1 0 0 0 0 ...
##  $ headshotKills  : int  0 0 0 0 0 1 0 0 0 0 ...
##  $ heals          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ killPlace      : int  60 57 47 75 45 44 96 48 64 74 ...
##  $ killPoints     : int  1241 0 0 0 0 0 1262 1000 0 0 ...
##  $ kills          : int  0 0 0 0 1 1 0 0 0 0 ...
##  $ killStreaks    : int  0 0 0 0 1 1 0 0 0 0 ...
##  $ longestKill    : num  0 0 0 0 58.5 ...
##  $ matchDuration  : int  1306 1777 1318 1436 1424 1395 1316 1967 1375 1930 ...
##  $ matchType      : Factor w/ 16 levels "crashfpp","crashtpp",..: 16 16 3 16 14 16 16 14 15 15 ...
##  $ maxPlace       : int  28 26 50 31 97 28 28 96 28 29 ...
##  $ numGroups      : int  26 25 47 30 95 28 28 92 27 27 ...
##  $ rankPoints     : int  -1 1484 1491 1408 1560 1418 -1 -1 1493 1349 ...
##  $ revives        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ rideDistance   : num  0 0.0045 0 0 0 ...
##  $ roadKills      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ swimDistance   : num  0 11 0 0 0 ...
##  $ teamKills      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ vehicleDestroys: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ walkDistance   : num  244.8 1434 161.8 202.7 49.8 ...
##  $ weaponsAcquired: int  1 5 2 3 2 1 1 6 4 1 ...
##  $ winPoints      : int  1466 0 0 0 0 0 1497 1500 0 0 ...
##  $ winPlacePerc   : num  0.444 0.64 0.775 0.167 0.188 ...
##  $ cheaters       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ killer_flag    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ headshot_rate  : num  NaN NaN NaN NaN 0 1 NaN NaN NaN NaN ...
##  - attr(*, ".internal.selfref")=<externalptr>

=============================== Regression ===============================

Linear regression is the first go to method when it comes to regression.

model_linear <- lm(winPlacePerc~ walkDistance + rideDistance + kills + 
                     matchDuration+ assists+ boosts + killPoints + 
                     winPoints+ vehicleDestroys + winPoints, train1)

summary(model_linear)
## 
## Call:
## lm(formula = winPlacePerc ~ walkDistance + rideDistance + kills + 
##     matchDuration + assists + boosts + killPoints + winPoints + 
##     vehicleDestroys + winPoints, data = train1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.7009 -0.1168 -0.0059  0.1023  0.9065 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      5.156e-01  5.871e-04  878.22   <2e-16 ***
## walkDistance     1.761e-04  9.814e-08 1793.98   <2e-16 ***
## rideDistance     2.869e-05  6.665e-08  430.48   <2e-16 ***
## kills            1.609e-02  6.680e-05  240.85   <2e-16 ***
## matchDuration   -1.928e-04  3.664e-07 -526.31   <2e-16 ***
## assists          1.221e-02  1.600e-04   76.30   <2e-16 ***
## boosts           2.160e-02  7.271e-05  297.04   <2e-16 ***
## killPoints      -7.638e-05  7.779e-07  -98.20   <2e-16 ***
## winPoints        6.293e-05  6.587e-07   95.54   <2e-16 ***
## vehicleDestroys -1.055e-02  9.502e-04  -11.11   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1641 on 3557564 degrees of freedom
## Multiple R-squared:  0.715,  Adjusted R-squared:  0.715 
## F-statistic: 9.916e+05 on 9 and 3557564 DF,  p-value: < 2.2e-16

To be continued