Sampurna Tuladhar

Data Mining - Project 1, Project 2

df <-read.csv("dota2.csv")

Introduction

The Data set I worked on is about online MOBA game called Dota2 (Defense of The Ancient 2). First it was just a mod for Blizzard Entertainment’s most popular game called Warcraft III: Frozen Throne, then it became more popular than the game itself and later on early 2010 Valve bought the rights for the mod and made it into a full fledge game as we now known as Dota2.

For this project I pulled it’s data set from Kaggle (link: https://www.kaggle.com/devinanzelmo/dota-2-matches ). Its data set is massive and I only took like 5 variables including categorical and quantitative data. My categorical Variables are from match.csv -> radiant_win, ability_ids.csv -> ability_name and purchase_id -> item_id and my quantitative variables are from match.csv -> duration and match.csv -> first_blood_duration. These are the variables that I will be using for my analysis.

Steps to clean and organize the dataset

For this project, I sorted out the data that I want for my project. Then I was able to go into next step, but I was not able to extract the required data then I had to download all the data from https://www.kaggle.com/devinanzelmo/dota-2-matches. After download whole dataset then I made a new .csv file which would include all of my categorical and quantative datas namely radiant_in, ability_name, item_id, duration and first_blood_time. After sorting everything in my .csv file then I uploaded the dataset into RStudio Cloud.

To show the label and position of the text and color in the graph. I used ‘main’ for the top text, ‘xlab’ for bottom text, ‘ylab’ for the left text and ‘col’ for the colour of the graph and ‘horizontal’ to position the graph’s angle.

The following are the data which have been organized into number from 1 to 6 indcluding what kind of data I am dealing with and whatnot, there are bunch of graphs, plots, summary, texts and tables of different variables.

1—————————————————————————-

Summary of Duration (Quantative Variable)

So, for my project I had to select one of the quantitative variable and have to find it’s mean, standard deviation and it’s five number summary. Then I had to show each graphical displays which are histogram, box plot and qqplot. And also had to find if there are any outliers using IQR and what kind of distribution does the variable have.

Mean, Standard Deviation and Five Number Summary

This chunk of code basically returns mean, standard deviation and five number summary respectively of variable Duration.

mean(df$duration,trim = 0, na.rm = FALSE)
## [1] 2569.424
sd(df$duration)
## [1] 668.4912
summary(df$duration)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1443    2116    2461    2569    2946    5344

Grpah,Plots and outliers

So, there are three graphical displays histogram, box plot and qqplot including if there are any outlier and what kind of distrubution does this variable has. For me this section was the hardest because I had to find out how to do each of the following and took me a while to figure out things and was fun.

Histogram of Duration

Showing histogram was pretty easy but I had to figure out how to show the lines in the graph for finding the distribution for the variable. First i used hist() function using probability then I used lines(density()) function and binded it with the duration for it to show the lines on the histogram.

hist(df$duration,
     main = "Dota2 game Length (Positive Skewed Distribution)",
     xlab = "Game duration (sec)",
     col = "pink",
     probability = TRUE)

lines(density(df$duration),col = "blue")

Quantile-Quantile Plots qqplots

This was probably the easiest plot for me to graph as I only had to call the qqnorm() function and then qqline() function for graphical display of a line in the plot.

qqnorm(df$duration,
       main = "QQPlot (Positive Skewed Distribution)",
       ylab = "Duration (sec)")
qqline(df$duration)

Box Plot

This is also an easy plot as I only had to call the boxplot() function and organize the way i wanted using horizontal.

boxplot(df$duration,
        main ="BoxPlot (Positive Skewed Distribution) ",
        horizontal = TRUE,
        xlab = "Duration (sec)")

Outliers

For the outliers I had to use IQR() function for the variable to find the data which are not in the range. There are about six data which are not in the range in the graph. We can also see some of the data leaving the range in the graph and plot but using IQR() function it was easy to find out what were those data as we cannot really pinpoint the data in the graphical display. I also had to use different function to show the table like display using boxplot.stats() function and also which() function to show the result in clean and crisp manner.

IQR(df$duration, na.rm = FALSE)
## [1] 829.5
out <- boxplot.stats(df$duration)$out
out_ind <- which(df$duration %in% c(out))
out_ind
## [1]  45  81 139 145 184 242
df[out_ind,]
##               ability_name radiant_win item_id duration first_blood_time
## 45            mirana_arrow        TRUE      29     4348              210
## 81  razor_unstable_current        TRUE      59     4208              135
## 139      riki_smoke_screen       FALSE      46     4295              177
## 145  enigma_midnight_pulse       FALSE      60     5344               14
## 184        pugna_decrepify        TRUE      46     5267               84
## 242      life_stealer_rage        TRUE      46     4494               89

These are the outliers of the variable duration.

2—————————————————————————-

Graphical Display of multiple variables and their correlation

I used two variables duration and first_blood_time to plot their relation and find their correlation. I used plot() function which takes two numeric variables x and y.

plot(df$duration,df$first_blood_time,
     main = "Multiple variable Graphical Display",
     xlab = "Duration (sec)",
     ylab ="First blood Time (sec)",
     col = "blue",
     pch = 19)

“Correlation: From the scatter plot we can assume that the First Blood time is densely populated around 2000s - 3000s(mid game) in duration. So, we can say that around mid game there is higher chance of getting first blood than during early and late game.”

3—————————————————————————-

Tables

I have two tables frequency and relative frequency. The frequency table only shows the integer value of categorical variable which in this case was for radiant_win that only contains TRUE and FALSE. Relative frequency table contains the percentage or floating value of the categorical variables which is used for good data analyis.

Frequency Table

table(df$radiant_win)
## 
## FALSE  TRUE 
##   118   132

Relative Frequency Table

table(df$radiant_win)/length(df$radiant_win)
## 
## FALSE  TRUE 
## 0.472 0.528
cat("\n")

Data Analysis from Relative Frequency Table

47.2% of all values in the data set are FALSE. 52.8% of all values in the data set are TRUE.

4—————————————————————————-

Two way table

For this part I used table() function and assigned two of my categorical variables and assigned it into two_way_table and then displayed the table.

two_way_table <- table(df$ability_name,df$radiant_win)

two_way_table
##                                     
##                                      FALSE TRUE
##   ability_base                           0    1
##   antimage_blink                         0    1
##   antimage_mana_break                    1    0
##   antimage_mana_void                     0    1
##   antimage_spell_shield                  0    1
##   attribute_bonus                        1    0
##   axe_battle_hunger                      1    0
##   axe_berserkers_call                    0    1
##   axe_counter_helix                      1    0
##   axe_culling_blade                      0    1
##   bane_brain_sap                         1    0
##   bane_enfeeble                          1    0
##   bane_fiends_grip                       0    1
##   bane_nightmare                         0    1
##   beastmaster_boar_poison                1    0
##   beastmaster_call_of_the_wild           1    0
##   beastmaster_hawk_invisibility          1    0
##   beastmaster_inner_beast                1    0
##   beastmaster_primal_roar                0    1
##   beastmaster_wild_axes                  1    0
##   bloodseeker_blood_bath                 1    0
##   bloodseeker_bloodrage                  1    0
##   bloodseeker_rupture                    1    0
##   bloodseeker_thirst                     0    1
##   courier_burst                          0    1
##   courier_return_stash_items             0    1
##   courier_return_to_base                 1    0
##   courier_shield                         0    1
##   courier_take_stash_items               0    1
##   courier_transfer_items                 0    1
##   crystal_maiden_brilliance_aura         0    1
##   crystal_maiden_crystal_nova            1    0
##   crystal_maiden_freezing_field          1    0
##   crystal_maiden_frostbite               1    0
##   dark_seer_ion_shell                    0    1
##   dark_seer_surge                        0    1
##   dark_seer_vacuum                       0    1
##   dazzle_poison_touch                    1    0
##   dazzle_shadow_wave                     0    1
##   dazzle_shallow_grave                   0    1
##   dazzle_weave                           0    1
##   death_prophet_carrion_swarm            0    1
##   death_prophet_exorcism                 1    0
##   death_prophet_silence                  0    1
##   death_prophet_witchcraft               0    1
##   default_attack                         1    0
##   dragon_knight_breathe_fire             1    0
##   dragon_knight_dragon_blood             0    1
##   dragon_knight_dragon_tail              1    0
##   dragon_knight_elder_dragon_form        1    0
##   dragon_knight_frost_breath             0    1
##   drow_ranger_frost_arrows               0    1
##   drow_ranger_marksmanship               0    1
##   drow_ranger_silence                    0    1
##   drow_ranger_trueshot                   0    1
##   earthshaker_aftershock                 0    1
##   earthshaker_echo_slam                  0    1
##   earthshaker_enchant_totem              1    0
##   earthshaker_fissure                    0    1
##   enigma_black_hole                      1    0
##   enigma_demonic_conversion              1    0
##   enigma_malefice                        1    0
##   enigma_midnight_pulse                  1    0
##   faceless_void_backtrack                0    1
##   faceless_void_chronosphere             0    1
##   faceless_void_time_lock                1    0
##   faceless_void_time_walk                1    0
##   furion_force_of_nature                 1    0
##   furion_sprout                          1    0
##   furion_teleportation                   1    0
##   furion_wrath_of_nature                 0    1
##   Invoker_sun_strike                     1    0
##   juggernaut_blade_dance                 1    0
##   juggernaut_blade_fury                  0    1
##   juggernaut_healing_ward                1    0
##   juggernaut_omni_slash                  0    1
##   kunkka_ghostship                       1    0
##   kunkka_return                          0    1
##   kunkka_tidebringer                     0    1
##   kunkka_torrent                         1    0
##   kunkka_x_marks_the_spot                1    0
##   leshrac_diabolic_edict                 0    1
##   leshrac_lightning_storm                1    0
##   leshrac_pulse_nova                     0    1
##   leshrac_split_earth                    0    1
##   lich_chain_frost                       1    0
##   lich_dark_ritual                       1    0
##   lich_frost_armor                       0    1
##   lich_frost_nova                        0    1
##   life_stealer_consume                   1    0
##   life_stealer_feast                     0    1
##   life_stealer_infest                    0    1
##   life_stealer_open_wounds               0    1
##   life_stealer_rage                      0    1
##   lina_dragon_slave                      0    1
##   lina_fiery_soul                        1    0
##   lina_laguna_blade                      1    0
##   lina_light_strike_array                0    1
##   lion_finger_of_death                   1    0
##   lion_impale                            1    0
##   lion_mana_drain                        0    1
##   lion_voodoo                            0    1
##   luna_eclipse                           0    1
##   luna_lucent_beam                       1    0
##   luna_lunar_blessing                    0    1
##   luna_moon_glaive                       0    1
##   mirana_arrow                           0    1
##   mirana_invis                           0    1
##   mirana_leap                            1    0
##   mirana_starfall                        0    1
##   morphling_adaptive_strike              0    1
##   morphling_morph                        1    0
##   morphling_morph_agi                    1    0
##   morphling_morph_replicate              0    1
##   morphling_morph_str                    1    0
##   morphling_replicate                    1    0
##   morphling_waveform                     0    1
##   necrolyte_death_pulse                  0    1
##   necrolyte_heartstopper_aura            0    1
##   necrolyte_reapers_scythe               1    0
##   necrolyte_sadist                       1    0
##   necronomicon_archer_aoe                1    0
##   necronomicon_archer_mana_burn          0    1
##   necronomicon_warrior_last_will         0    1
##   necronomicon_warrior_mana_burn         0    1
##   necronomicon_warrior_sight             1    0
##   nevermore_dark_lord                    1    0
##   nevermore_necromastery                 1    0
##   nevermore_requiem                      1    0
##   nevermore_shadowraze1                  0    1
##   nevermore_shadowraze2                  0    1
##   nevermore_shadowraze3                  1    0
##   phantom_assassin_blur                  0    1
##   phantom_assassin_coup_de_grace         1    0
##   phantom_assassin_phantom_strike        1    0
##   phantom_assassin_stifling_dagger       1    0
##   phantom_lancer_doppelwalk              0    1
##   phantom_lancer_juxtapose               1    0
##   phantom_lancer_phantom_edge            0    1
##   phantom_lancer_spirit_lance            0    1
##   puck_dream_coil                        0    1
##   puck_ethereal_jaunt                    0    1
##   puck_illusory_orb                      0    1
##   puck_phase_shift                       0    1
##   puck_waning_rift                       0    1
##   pudge_dismember                        1    0
##   pudge_flesh_heap                       0    1
##   pudge_meat_hook                        1    0
##   pudge_rot                              0    1
##   pugna_decrepify                        0    1
##   pugna_life_drain                       1    0
##   pugna_nether_blast                     0    1
##   pugna_nether_ward                      1    0
##   queenofpain_blink                      0    1
##   queenofpain_scream_of_pain             0    1
##   queenofpain_shadow_strike              0    1
##   queenofpain_sonic_wave                 0    1
##   rattletrap_battery_assault             0    1
##   rattletrap_hookshot                    1    0
##   rattletrap_power_cogs                  0    1
##   rattletrap_rocket_flare                0    1
##   razor_eye_of_the_storm                 0    1
##   razor_plasma_field                     1    0
##   razor_static_link                      0    1
##   razor_unstable_current                 0    1
##   riki_blink_strike                      0    1
##   riki_permanent_invisibility            0    1
##   riki_smoke_screen                      1    0
##   riki_tricks_of_the_trade               1    0
##   roshan_bash                            0    1
##   roshan_devotion                        1    0
##   roshan_inherent_buffs                  1    0
##   roshan_slam                            1    0
##   roshan_spell_block                     0    1
##   sandking_burrowstrike                  1    0
##   sandking_caustic_finale                1    0
##   sandking_epicenter                     1    0
##   sandking_sand_storm                    0    1
##   shadow_shaman_ether_shock              0    1
##   shadow_shaman_mass_serpent_ward        0    1
##   shadow_shaman_shackles                 0    1
##   shadow_shaman_voodoo                   1    0
##   skeleton_king_hellfire_blast           1    0
##   skeleton_king_mortal_strike            0    1
##   skeleton_king_reincarnation            0    1
##   skeleton_king_vampiric_aura            1    0
##   slardar_amplify_damage                 1    0
##   slardar_bash                           0    1
##   slardar_slithereen_crush               1    0
##   slardar_sprint                         1    0
##   sniper_assassinate                     0    1
##   sniper_headshot                        1    0
##   sniper_shrapnel                        0    1
##   sniper_take_aim                        1    0
##   storm_spirit_ball_lightning            0    1
##   storm_spirit_electric_vortex           1    0
##   storm_spirit_overload                  0    1
##   storm_spirit_static_remnant            1    0
##   sven_gods_strength                     1    0
##   sven_great_cleave                      1    0
##   sven_storm_bolt                        1    0
##   sven_warcry                            1    0
##   templar_assassin_meld                  0    1
##   templar_assassin_psi_blades            1    0
##   templar_assassin_psionic_trap          1    0
##   templar_assassin_refraction            1    0
##   templar_assassin_self_trap             1    0
##   templar_assassin_trap                  1    0
##   tidehunter_anchor_smash                1    0
##   tidehunter_gush                        0    1
##   tidehunter_kraken_shell                0    1
##   tidehunter_ravage                      0    1
##   tinker_heat_seeking_missile            1    0
##   tinker_laser                           0    1
##   tinker_march_of_the_machines           0    1
##   tinker_rearm                           0    1
##   tiny_avalanche                         0    1
##   tiny_craggy_exterior                   1    0
##   tiny_grow                              0    1
##   tiny_toss                              0    1
##   vengefulspirit_command_aura            0    1
##   vengefulspirit_magic_missile           1    0
##   vengefulspirit_nether_swap             1    0
##   vengefulspirit_wave_of_terror          1    0
##   venomancer_plague_ward                 1    0
##   venomancer_poison_nova                 0    1
##   venomancer_poison_sting                1    0
##   venomancer_venomous_gale               0    1
##   viper_corrosive_skin                   0    1
##   viper_nethertoxin                      0    1
##   viper_poison_attack                    1    0
##   viper_viper_strike                     0    1
##   warlock_fatal_bonds                    1    0
##   warlock_golem_flaming_fists            0    1
##   warlock_golem_permanent_immolation     1    0
##   warlock_rain_of_chaos                  1    0
##   warlock_shadow_word                    0    1
##   warlock_upheaval                       0    1
##   windrunner_focusfire                   1    0
##   windrunner_powershot                   1    0
##   windrunner_shackleshot                 1    0
##   windrunner_windrun                     1    0
##   witch_doctor_death_ward                0    1
##   witch_doctor_maledict                  0    1
##   witch_doctor_paralyzing_cask           0    1
##   witch_doctor_voodoo_restoration        1    0
##   zuus_arc_lightning                     0    1
##   zuus_lightning_bolt                    1    0
##   zuus_static_field                      1    0
##   zuus_thundergods_wrath                 0    1

They have equal proportion to each other.

Relationship between two Variables

The game revolves around the Heroes, their ability, items and their play style or skills of the players and many other independent variables. But the relationship between ability_name and radiant_win is also strong in this case. There are many games in which any ability_name can wipe out opponent team which can cause the radiant_win to be TRUE in this case.

5—————————————————————————-

Side by side plot for one categorical and quantative varible.

For this part I had to show one categorical and one quantitative variable’s plot side by side and use their summary statics to compare groups. To plot two graphs side by side I used par(mfrow=c(1,2)) function for plotting 1:2 row and column which basically plot two of my different variables side by side. Then I used plot() function for each of my variables then I used summary() function to find their five number summary.

par(mfrow=c(1,2))
plot(df$duration,
     main="Scatterplot of Duration (sec)",
     xlab = "",
     ylab = "Duration")

plot(df$item_id,
     main="Scatterplot of Item Id",
     xlab = "",
     ylab = "Item ID")

print("Duration Summary: \n")
## [1] "Duration Summary: \n"
summary(df$duration)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1443    2116    2461    2569    2946    5344
print("\nItem Id Summary: \n")
## [1] "\nItem Id Summary: \n"
summary(df$item_id)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2.00   25.00   46.00   56.22   55.50  218.00

We can see that their summary is very different. Duration is basically the time it took for the game to finish and has bigger value cause it is in seconds and item id is just the assigned id. From the summary we can compare that during min. 1443 sec, item id 2 were being purchased and during 1st quartile item 25, during median item 46 was being purchased and so on. They are gradually increasing in ordinal manner and it is easy to compare both of their summary statistics.

6—————————————————————————-

#Visualization of Data

I had to find a variable which was not used for graphical display and item_id variable was the perfect data to show the different graphical representation. I have three visualization of Data barplot, scatter plot and a heatmap. For barplot and scatter plot i just had to call their function barplot() and scatter.smooth() to show thier visualization and was pretty easy.

For heatmap I had to use require(ggplot2) library for better result. I showed all of my variable’s heatmap for this part. First, I had to assign my ‘dota2.csv’ file to data and then make a matrix binding data for the visualization and assign it to data again to store my values. Then I used heatmap() function and organized it using different values.

barplot(df$item_id,
        main = "Bar Chart",
        xlab = "Item Id")

scatter.smooth(df$item_id,
               main = "Scatter Plot")

require(ggplot2)
data <- read.csv("dota2.csv", header = TRUE)
data <- data.matrix(data[,-1])
heatmap(t(data),
        main = "Heat Map",
        Rowv = NA,
        Colv = NA,
        col = heat.colors(200,alpha = 1,rev = FALSE),
        scale = "row")

summary(df$item_id)           
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2.00   25.00   46.00   56.22   55.50  218.00

Conclusion

In conclusion, the most interesting features of my data was graphs, tables and calculations. I can say that graphs are the main reason to use RStudio because they provide so much information and how we can manipulate the data according to our needs. Its like making whole new thing just from a function which can be called anywhere in the application. And making tables and doing calculation is also very interesting and easy to perform as it is the same call and put data. so, for me this project was pretty challenging because I didn’t knew how to use RStudio cloud and had to learn each and every steps from course materials, stack overflow, github issue forum and from other sources. but at the end it was all worth it because I learned something new which is very interesting and good for my personal growth.

Project Part 2

Decision Tree

df <- df %>% mutate(
    radiant_win = factor(radiant_win == TRUE,
                         levels = c(TRUE,FALSE),
                         labels = c('Win','Lose'))
)

I am converting my variable radiant_win into factor.

library(rpart)
library(rpart.plot)


tree <- rpart(radiant_win~., data = df)
tree
## n= 250 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 250 118 Win (0.5280000 0.4720000)  
##   2) ability_name=ability_base,antimage_blink,antimage_mana_void,antimage_spell_shield,axe_berserkers_call,axe_culling_blade,bane_fiends_grip,bane_nightmare,beastmaster_primal_roar,bloodseeker_thirst,courier_burst,courier_return_stash_items,courier_shield,courier_take_stash_items,courier_transfer_items,crystal_maiden_brilliance_aura,dark_seer_ion_shell,dark_seer_surge,dark_seer_vacuum,dazzle_shadow_wave,dazzle_shallow_grave,dazzle_weave,death_prophet_carrion_swarm,death_prophet_silence,death_prophet_witchcraft,dragon_knight_dragon_blood,dragon_knight_frost_breath,drow_ranger_frost_arrows,drow_ranger_marksmanship,drow_ranger_silence,drow_ranger_trueshot,earthshaker_aftershock,earthshaker_echo_slam,earthshaker_fissure,faceless_void_backtrack,faceless_void_chronosphere,furion_wrath_of_nature,juggernaut_blade_fury,juggernaut_omni_slash,kunkka_return,kunkka_tidebringer,leshrac_diabolic_edict,leshrac_pulse_nova,leshrac_split_earth,lich_frost_armor,lich_frost_nova,life_stealer_feast,life_stealer_infest,life_stealer_open_wounds,life_stealer_rage,lina_dragon_slave,lina_light_strike_array,lion_mana_drain,lion_voodoo,luna_eclipse,luna_lunar_blessing,luna_moon_glaive,mirana_arrow,mirana_invis,mirana_starfall,morphling_adaptive_strike,morphling_morph_replicate,morphling_waveform,necrolyte_death_pulse,necrolyte_heartstopper_aura,necronomicon_archer_mana_burn,necronomicon_warrior_last_will,necronomicon_warrior_mana_burn,nevermore_shadowraze1,nevermore_shadowraze2,phantom_assassin_blur,phantom_lancer_doppelwalk,phantom_lancer_phantom_edge,phantom_lancer_spirit_lance,puck_dream_coil,puck_ethereal_jaunt,puck_illusory_orb,puck_phase_shift,puck_waning_rift,pudge_flesh_heap,pudge_rot,pugna_decrepify,pugna_nether_blast,queenofpain_blink,queenofpain_scream_of_pain,queenofpain_shadow_strike,queenofpain_sonic_wave,rattletrap_battery_assault,rattletrap_power_cogs,rattletrap_rocket_flare,razor_eye_of_the_storm,razor_static_link,razor_unstable_current,riki_blink_strike,riki_permanent_invisibility,roshan_bash,roshan_spell_block,sandking_sand_storm,shadow_shaman_ether_shock,shadow_shaman_mass_serpent_ward,shadow_shaman_shackles,skeleton_king_mortal_strike,skeleton_king_reincarnation,slardar_bash,sniper_assassinate,sniper_shrapnel,storm_spirit_ball_lightning,storm_spirit_overload,templar_assassin_meld,tidehunter_gush,tidehunter_kraken_shell,tidehunter_ravage,tinker_laser,tinker_march_of_the_machines,tinker_rearm,tiny_avalanche,tiny_grow,tiny_toss,vengefulspirit_command_aura,venomancer_poison_nova,venomancer_venomous_gale,viper_corrosive_skin,viper_nethertoxin,viper_viper_strike,warlock_golem_flaming_fists,warlock_shadow_word,warlock_upheaval,witch_doctor_death_ward,witch_doctor_maledict,witch_doctor_paralyzing_cask,zuus_arc_lightning,zuus_thundergods_wrath 132   0 Win (1.0000000 0.0000000) *
##   3) ability_name=antimage_mana_break,attribute_bonus,axe_battle_hunger,axe_counter_helix,bane_brain_sap,bane_enfeeble,beastmaster_boar_poison,beastmaster_call_of_the_wild,beastmaster_hawk_invisibility,beastmaster_inner_beast,beastmaster_wild_axes,bloodseeker_blood_bath,bloodseeker_bloodrage,bloodseeker_rupture,courier_return_to_base,crystal_maiden_crystal_nova,crystal_maiden_freezing_field,crystal_maiden_frostbite,dazzle_poison_touch,death_prophet_exorcism,default_attack,dragon_knight_breathe_fire,dragon_knight_dragon_tail,dragon_knight_elder_dragon_form,earthshaker_enchant_totem,enigma_black_hole,enigma_demonic_conversion,enigma_malefice,enigma_midnight_pulse,faceless_void_time_lock,faceless_void_time_walk,furion_force_of_nature,furion_sprout,furion_teleportation,Invoker_sun_strike,juggernaut_blade_dance,juggernaut_healing_ward,kunkka_ghostship,kunkka_torrent,kunkka_x_marks_the_spot,leshrac_lightning_storm,lich_chain_frost,lich_dark_ritual,life_stealer_consume,lina_fiery_soul,lina_laguna_blade,lion_finger_of_death,lion_impale,luna_lucent_beam,mirana_leap,morphling_morph,morphling_morph_agi,morphling_morph_str,morphling_replicate,necrolyte_reapers_scythe,necrolyte_sadist,necronomicon_archer_aoe,necronomicon_warrior_sight,nevermore_dark_lord,nevermore_necromastery,nevermore_requiem,nevermore_shadowraze3,phantom_assassin_coup_de_grace,phantom_assassin_phantom_strike,phantom_assassin_stifling_dagger,phantom_lancer_juxtapose,pudge_dismember,pudge_meat_hook,pugna_life_drain,pugna_nether_ward,rattletrap_hookshot,razor_plasma_field,riki_smoke_screen,riki_tricks_of_the_trade,roshan_devotion,roshan_inherent_buffs,roshan_slam,sandking_burrowstrike,sandking_caustic_finale,sandking_epicenter,shadow_shaman_voodoo,skeleton_king_hellfire_blast,skeleton_king_vampiric_aura,slardar_amplify_damage,slardar_slithereen_crush,slardar_sprint,sniper_headshot,sniper_take_aim,storm_spirit_electric_vortex,storm_spirit_static_remnant,sven_gods_strength,sven_great_cleave,sven_storm_bolt,sven_warcry,templar_assassin_psi_blades,templar_assassin_psionic_trap,templar_assassin_refraction,templar_assassin_self_trap,templar_assassin_trap,tidehunter_anchor_smash,tinker_heat_seeking_missile,tiny_craggy_exterior,vengefulspirit_magic_missile,vengefulspirit_nether_swap,vengefulspirit_wave_of_terror,venomancer_plague_ward,venomancer_poison_sting,viper_poison_attack,warlock_fatal_bonds,warlock_golem_permanent_immolation,warlock_rain_of_chaos,windrunner_focusfire,windrunner_powershot,windrunner_shackleshot,windrunner_windrun,witch_doctor_voodoo_restoration,zuus_lightning_bolt,zuus_static_field 118   0 Lose (0.0000000 1.0000000) *
rpart.plot(tree,
           main = "Match outcome\n",
           shadow.col = "gray",
           under = TRUE,
           extra = 2,
           tweak = 1.2,
           
           )

I imported two libraries ‘rpart’ and ‘rpart.plot’ to create visualization on decision tree.

Predict And Confusion Matrix

Then I used the decision tree to predict the dataset that we are going to be using for rest of the project.

Predict

pred <- predict(tree,df,type = "class")
head(pred)
##    1    2    3    4    5    6 
##  Win Lose Lose Lose  Win  Win 
## Levels: Win Lose

Each has been classified into its category.

predict(tree, df) %>%
  head()  
##   Win Lose
## 1   1    0
## 2   0    1
## 3   0    1
## 4   0    1
## 5   1    0
## 6   1    0

Confusion Matrix

We use the following data to create Confusion Matrix.

confusion_table <- with(df,table(radiant_win, pred))
confusion_table
##            pred
## radiant_win Win Lose
##        Win  132    0
##        Lose   0  118

Confusion matrix is used to predict the accuracy and precision of the dataset. In my model the confusion matrix predicted the win and lose values accurately. The win is 132 which falls on True Positive, O True Negative, O on False Positive and 118 on False Positive. So, this gives me accurate prediction for continuing my project while being correctly classified.

Cross Validation

I will now use cross validation if the data withhold the same result by splitting the data into thirds for training and testing

library(caret)
## Loading required package: lattice
## 
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
## 
##     lift
training <- createDataPartition(y = df$first_blood_time, p = .66, list = FALSE)
df_train <- df %>% slice(training)
df_test <- df %>% slice(-training)
dim(df_train)
## [1] 167   5
dim(df_test)
## [1] 83  5

For cross validation process I used createDataPartition() function to train and test the data into thirds of the full dataset. The model is not overfit becuase the training data is not low compared to the testing data; thus we can say that the model is not overfit.

Training Dataset

Using training set to build my model and then test it. In my original tree, ability_name is very important variable. I am going to remove it so that I can check if the new prediction holds the same result as my original prediction.

tree_from_train <- rpart(radiant_win~., 
                         data = subset(df_train, select = c(-ability_name)))

pred_test <- predict(tree_from_train,subset(df_test, select = c(-ability_name)),type = "class")

with(df_test, table(radiant_win, pred_test))
##            pred_test
## radiant_win Win Lose
##        Win   20   18
##        Lose  31   14

As we can see the prediction data is off if we take out the important variable.

Creating Full Tree

Now I will create full tree.

df_no_ability <- subset(df, select = c(-ability_name))

tree_full <- sample_n(df_no_ability,180) %>%
  rpart(radiant_win~., data = ., control = rpart.control(minsplit = 2, cp =0))

rpart.plot(tree_full, extra = 2, roundint = FALSE,
           box.palette = list("Gn", "Bu"),
           )
## Warning: labs do not fit even at cex 0.15, there may be some overplotting

Now, that’s a lot of data and looks difficult to interpret!

We have 250 data out of which 180 are perfectly classified and remaining 70 are mis-classified.

Predict Full Tree

pred_full <- predict(tree_full, df_no_ability, type = "class")
with(df, table(radiant_win, pred_full))
##            pred_full
## radiant_win Win Lose
##        Win  115   17
##        Lose  14  104

looking good, but we still have some data that are mis-classified.

Chi-squared statistic

I will now use chi-squared test for significance.

library(FSelector)

weights <- df %>% chi.squared(radiant_win~., data =.) %>%
  as_tibble(rownames = "feature") %>%
  arrange(desc(attr_importance))
weights
## # A tibble: 4 × 2
##   feature          attr_importance
##   <chr>                      <dbl>
## 1 ability_name                   1
## 2 item_id                        0
## 3 duration                       0
## 4 first_blood_time               0
ggplot(weights,
       aes(x = attr_importance, y = reorder(feature,attr_importance))) +
       geom_bar(stat = "identity") +
        xlab("Importance score") + ylab("Feature")

I used chi-squared statistic to find the importance of the feature but my variable only showed ability_name as the most important among other features.

VarImp() funciton

I did not got the satisfying result using chi-squared statistics so,I am using another method to test significance on the model.

imp <- varImp(tree)
head(imp)
##                     Overall
## ability_name     124.608000
## duration           4.394060
## first_blood_time   3.422815
## item_id            2.361191
imp %>% ggplot(aes(x = row.names(imp),weight = Overall))+
  geom_bar() + xlab("Feature") +ylab("Importance Score")

As we can see that varImp() function showed different result than chi-squared statistics. In this method, ability_name has the highest importance among other varaiables. So, this confirms the test significance that ability_name has more weight compared to other variables.