Background

This analysis was built for fun as a personal project and to demonstrate some examples of R applications for data analysis. The Overwatch League data used could have easily been replaced with numerous other data sets (obviously), but I felt like that would be a unique and interesting example.

Downloaded 2023 data from website https://overwatchleague.com/en-us/statslab

dt <- read.csv("C:/Users/cates/R Projects/phs-2023-000000000000 (2).csv")
maps <- dt %>% group_by(map_name,map_type) %>% summarise(games = length(unique(esports_match_id))) %>% arrange(desc(games))

Summary

First checked the overview of which game modes were played the most frequently. Surprisingly, Push was played less than half as often as Control.

ggplot(maps, aes(x=map_type, y=games, fill=map_type)) + geom_bar(stat = "identity", colour="black") +  coord_flip() + scale_fill_brewer(palette="Set3") + theme(legend.position = "none")

Looking further into the map distribution, it seems clear that Push is played less frequently as a function of team/administrative choice. There are three Hybrid maps and three Push maps, so the lower pick rate for Push is not a function of having fewer options.

ggplot(maps, aes(x=map_name, y=games, fill=map_type)) + geom_bar(stat = "identity", colour="black") +  coord_flip() + scale_fill_brewer(palette="Set3")

Hero Performance

Looking at individual hero performance, there is a lot more information to process. Generally, the goal of the game is to eliminate characters on the other team by doing damage with a variety of weapons and abilities, so damage done is a good proxy for performance.

Ramattra is the obvious choice for further analysis with the highest total damage of any hero all season plus many of the top 10-15 overall.

dt %>% filter(stat_name == 'Hero Damage Done' & hero_name != 'All Heroes' & hero_name != 'Lifeweaver' & hero_name != 'Junkrat' & hero_name != 'Mercy') %>% group_by(hero_name,esports_match_id,team_name, player_name) %>% summarise(dmg_per_game = mean(amount)) %>% ggplot(aes(x=hero_name, y=dmg_per_game, fill=hero_name)) + geom_boxplot() + theme(legend.position = "none") + coord_flip()

Reframing the damage data differently, a few new heroes stand out that are less obvious from the boxplots. Tracer and Winston have significantly more total damage than all other heroes.

Similarly, Ana and Lucio snuck into the top 10 for total damage despite relatively low per-game damage.

This suggested two more lines on inquiry: 1) what would the damage totals look like when adjusted for time played and 2) how is damage distributed by hero role?

dps <- dt %>% filter(hero_name != 'All Heroes' & stat_name == 'Hero Damage Done') %>% group_by(hero_name) %>% summarize(dmg = sum(amount)) %>% arrange(desc(dmg)) %>% slice(1:10) %>% select(hero_name)

dt %>% filter(hero_name != 'All Heroes' & stat_name == 'Hero Damage Done') %>% group_by(hero_name) %>% summarize(dmg = sum(amount)) %>% arrange(desc(dmg)) %>% slice(1:10) 
## # A tibble: 10 × 2
##    hero_name      dmg
##    <chr>        <dbl>
##  1 Tracer    6455254.
##  2 Winston   4043133.
##  3 Sombra    2893331.
##  4 Ramattra  2582117.
##  5 Ana       2151837.
##  6 Lucio     2122520.
##  7 Hanzo     2097912.
##  8 Mei       1717573.
##  9 Brigitte  1546828.
## 10 Baptiste  1214297.
dt %>% filter(stat_name == 'Hero Damage Done' & hero_name %in% dps$hero_name) %>% group_by(hero_name,esports_match_id,team_name, player_name) %>% summarise(dmg_per_game = mean(amount)) %>% ggplot(aes(x=hero_name, y=dmg_per_game, fill=hero_name)) + geom_boxplot() + theme(legend.position = "none") +geom_jitter(alpha = 0.5) + coord_flip()

Ramattra

First looking at the overall ratio of eliminations to damage, Ramattra is lower than expected. In terms of total damage, we already knew he was behind Tracer, Winston, and Sombra, but he is also behind each of those three in terms of conversion from Damage to Eliminations.

dt_wider <- dt %>% filter(stat_name %in% c('Hero Damage Done','Eliminations') & hero_name != 'All Heroes') %>% pivot_wider(names_from="stat_name",values_from="amount")

dt_wider[is.na(dt_wider)] <- 0

hero_sum <- dt_wider %>% group_by(hero_name) %>% summarise(across(c(Eliminations, 'Hero Damage Done'), sum))

hero_sum$`Hero Damage Done` = hero_sum$`Hero Damage Done` / 1000

ggplot(hero_sum,aes(x=`Hero Damage Done`, y=Eliminations)) + geom_point() + geom_text(label=hero_sum$hero_name, nudge_x = 0.25, nudge_y = 0.25, check_overlap = T) + geom_smooth(method=lm , color="red", se=FALSE)

Checking highest damage games for Ramattra:

dt %>% filter(hero_name == 'Ramattra' & stat_name == 'Hero Damage Done') %>% select(tournament_title,map_type,player_name,hero_name,amount) %>% arrange(desc(amount)) %>% slice(1:15)
##     tournament_title map_type player_name hero_name   amount
## 1   Spring Knockouts  payload      Coluge  Ramattra 27515.69
## 2   Spring Knockouts  payload      Kellan  Ramattra 26639.39
## 3   Spring Knockouts  payload       ToYou  Ramattra 26569.39
## 4   Spring Knockouts  payload      Hanbin  Ramattra 23757.66
## 5  Spring Qualifiers   hybrid      Vulcan  Ramattra 21285.21
## 6  Spring Qualifiers  payload     Someone  Ramattra 19772.09
## 7  Spring Qualifiers   hybrid        Hadi  Ramattra 19737.03
## 8  Spring Qualifiers  payload        Void  Ramattra 19722.49
## 9  Spring Qualifiers  payload      MirroR  Ramattra 18349.85
## 10 Spring Qualifiers  payload      Coluge  Ramattra 18257.32
## 11  Spring Knockouts  control      MirroR  Ramattra 18042.90
## 12 Spring Qualifiers  payload      Kellan  Ramattra 17757.72
## 13 Spring Qualifiers  payload      MirroR  Ramattra 17709.57
## 14 Spring Qualifiers  payload      Vulcan  Ramattra 17460.14
## 15 Midseason Madness  payload     Someone  Ramattra 17457.92

vs Tracer/Winston/Sombra combined:

 dt %>% filter(hero_name %in% c('Winston','Tracer','Sombra') & stat_name == 'Hero Damage Done') %>% select(tournament_title,map_type,player_name,hero_name,amount) %>% arrange(desc(amount)) %>% slice(1:15)
##     tournament_title map_type player_name hero_name   amount
## 1             Pro-Am   hybrid        Hadi   Winston 20749.97
## 2             Pro-Am   hybrid        TR33    Tracer 18519.44
## 3   Spring Knockouts   hybrid  ChoiSehwan    Tracer 18444.57
## 4  Spring Qualifiers   hybrid         LIP    Sombra 18227.19
## 5  Spring Qualifiers   hybrid      Profit    Tracer 17685.04
## 6   Spring Knockouts  payload     Spectra    Tracer 17586.16
## 7  Spring Qualifiers   hybrid       Decay    Tracer 17419.73
## 8             Pro-Am   hybrid       Decay    Tracer 16937.32
## 9  Spring Qualifiers   hybrid    Birdring    Sombra 16903.48
## 10  Spring Knockouts   hybrid       Piggy   Winston 16729.92
## 11 Spring Qualifiers   hybrid        punk   Winston 16692.00
## 12  Spring Knockouts   hybrid      Marve1   Winston 16571.92
## 13            Pro-Am   hybrid    Backbone    Tracer 16246.71
## 14 Midseason Madness  payload         shy    Sombra 16189.07
## 15  Spring Knockouts   hybrid        MuZe   Winston 16123.43

The 15th “best” Ramattra game is roughly equal to the 15th “best” for the other three heroes combined, but the top end of the distribution shows a huge disparity. As such, I wanted to compare a relative frequency distribution for these four heroes to see whether it confirms that Ramattra is more skewed towards these extreme games.

dt %>% filter(hero_name %in% c('Ramattra','Tracer','Winston','Sombra') & stat_name == 'Hero Damage Done') %>% group_by(player_name, esports_match_id, map_name, hero_name) %>% summarise(game_damage = sum(amount)) %>% ggplot(aes(x = game_damage,fill = hero_name)) + geom_histogram(color="black") + scale_fill_manual(values = c('purple','orchid','orange','wheat')) + facet_wrap(~hero_name)

In conclusion, Ramattra is far from the most impactful hero in the game. That would be a reasonable inference at first glance from the hero damage summary, but the boxplot does not account for the overall distribution of each heroes’ perfTankormance. Ramattra has the greatest frequency of outlier damage games, but is not consistently more effective than others, especially the three above.

Damage Adjustments

With basically the same approach, I calculated damage per minute per hero on a per player, per game basis.

hero_time_played <- dt %>% filter(stat_name == 'Time Played' & hero_name != 'All Heroes') %>% pivot_wider(names_from="stat_name",values_from="amount") %>% rename('time' = `Time Played`)

hero_time_played[is.na(hero_time_played)] <- 0

time_sum <- hero_time_played %>% group_by(hero_name) %>% summarise(seconds_played = sum(time))

Since this grain of grouping has been consistent across most of the analysis, I joined this dataframe back to the hero data with total damage that I calculated earlier. I also normalized both damage and eliminations by minutes played to account for disparities in hero pick rate.

hero_pivot <- merge(x= hero_sum, y=time_sum, by='hero_name') %>% mutate(minutes_played = seconds_played / 60, damage_per_minute = `Hero Damage Done` / minutes_played, eliminations_per_minute = Eliminations / minutes_played )

Visualizing those normalized values reveals a few final conclusions. I only evaluated the top 15 heroes, in terms of total eliminations, to reduce the clutter.

hero_pivot %>% arrange(desc(Eliminations)) %>% slice(1:15) %>% ggplot(aes(x=damage_per_minute, y=eliminations_per_minute)) + geom_point() + geom_label(aes(label = hero_name))

Once again, Ramattra is back as the apparent #1, but we know that is a function of the outlier performances skewing the averages upwards. Interestingly, Sojourn leads all heroes in eliminations per minute played even despite the outlier games from Ramattra.

Clustered together around 700 damage/minute (0.7 *1000) and 1.6 elims/minute are what appear to be the top tier of heroes: Sombra, Winston, Tracer, Mei, and Genji. Ironically, there is another cluster of heroes at the bottom, composed of three Support heroes: Ana, Kiriko, and Brigitte.

While this does declare any hero victorious as the #1 producer of damage or eliminations, it does suggest that the best way to interpret hero data is probably in tiers, like the clusters I mentioned.

Hero Roles

Since nobody would read about my R experience if I made this analysis into an exhuastive Overwatch guide, I will explain what this last section means in 1.5 sentences. Each hero belongs to one of three roles on a team, and those roles correspond to different priorities.

To better understand the damage comparisons i’ve made thus far, I want to summarize the data at a slightly higher level by role, rather than hero.

First, I will define the roles as lists of characters.

DPS <- c('Symmetra','Tracer','Cassidy','Sombra','Mei','Bastion','Sojourn','Pharah','Hanzo','Soldier: 76','Genji','Reaper','Widowmaker','Echo','Ashe','Torbjorn','Junkrat')

Support <- c('Ana','Lucio','Kiriko','Baptiste','Zenyatta','Mercy','Brigitte','Moira','Lifeweaver')

Tank <- c('Sigma','Winston','Ramattra','Wrecking Ball','D.Va','Doomfist','Reinhardt','Junker Queen','Roadhog','Zarya','Orisa')

Adding this new distinction to my existing hero dataframe:

hero_pivot <- hero_pivot %>% mutate(Role = ifelse(hero_name %in% DPS, 'DPS',ifelse(hero_name %in% Tank, 'Tank', 'Support')))

Checking the results:

hero_pivot %>% ggplot(aes(x=damage_per_minute, y=eliminations_per_minute, color=Role, shape=Role)) + geom_point(size=6) + geom_text_repel(label=hero_pivot$hero_name, color="black")

Finally, connecting the dots (not the scatterplots) between the individual hero analysis and the broader role trends, it seems that the damage players do deal damage as the role suggests. However, tanks tend to be more efficient in converting that damage to eliminations. There is a wider disparity among support heroes than either of the other two roles, with some heroes like Moira rivaling almost all other heroes in eliminations per minute, and others like Mercy who literally do not deal damage at all.

Thank you!

If you read this far into my analysis, I hope it was worth a few minutes!