🎮 Simulated Match Dataset
Data was simulated for every player over 500 matches in a 5v5 online
player-versus-player game something similar to League of Legends.
Session ID, game length, winning team, map, game rule sets, player_id,
team, chosen character, skin, kills, deaths, assists, disconnects,
premium currency bought, premium currency spent, minion kills, gold
earned, characters sent to chat, emotes used, whether they reported
another player, if they were reported by another player, if they stopped
playing before the end of the match. Each of these were included in
every observation or row.
📊 Top 20 Characters by Average Premium Currency
Bought
Show code
all_matches %>%
group_by(character_played) %>%
summarise(
avg_bought = mean(premium_bought),
avg_spent = mean(premium_spent),
n = n()
) %>%
arrange(desc(avg_bought)) %>%
slice_head(n = 20) %>%
ggplot(aes(x = reorder(character_played, avg_bought), y = avg_bought)) +
geom_col(fill = "skyblue") +
labs(
title = "Top 20 Characters by Average Premium Currency Bought",
x = "Character Played",
y = "Average Premium Bought"
) +
coord_flip() +
theme_minimal()
```

✍️ Observations
Across all matches the character with the most average premium
currency spending is Char_4. This could be due to many factors. They may
have more cosmetics available. The character visuals could be especially
appealing. If the character has a high win rate it might lead to players
investing more. If the character is played substantially more or less
than other character it could skew the data.
📊 Times Played for Top-Spending Characters
Show code
top20_times <- all_matches %>%
group_by(character_played) %>%
summarise(
avg_bought = mean(premium_bought),
n = n(),
.groups = "drop"
) %>%
arrange(desc(avg_bought)) %>%
slice_head(n = 20) %>%
mutate(character_played = factor(character_played, levels = rev(character_played)))
ggplot(top20_times, aes(x = character_played, y = n)) +
geom_col(fill = "orchid") +
labs(
title = "Number of Times Top 20 Premium-Spending Characters Were Played",
x = "Character Played",
y = "Times Played"
) +
coord_flip() +
theme_minimal()
```

✍️ Observations
This chart shows how often each of the top spending characters were
played. This shows that there are no significant differences in how
often these characters were played. There is enough data for each
character to draw significant conclusions more than 80 plays for each
character. Despite this char_4 had 50% more premium spending than
char_40 suggesting other factors impacting spending.
📊 Top 20 Characters — Premium Bought vs Times
Played
Show code
top20_data <- all_matches %>%
group_by(character_played) %>%
summarise(
avg_bought = mean(premium_bought),
n = n(),
.groups = "drop"
) %>%
arrange(desc(avg_bought)) %>%
slice_head(n = 20) %>%
mutate(character_played = factor(character_played, levels = rev(character_played)))
scale_factor <- max(top20_data$avg_bought) / max(top20_data$n)
ggplot(top20_data, aes(x = character_played)) +
geom_col(aes(y = avg_bought), fill = "skyblue", width = 0.6) +
geom_segment(aes(xend = character_played, y = 0, yend = n * scale_factor),
colour = "darkorchid", linewidth = 1) +
geom_point(aes(y = n * scale_factor), colour = "darkorchid", size = 3) +
scale_y_continuous(
name = "Average Premium Bought",
sec.axis = sec_axis(~ . / scale_factor, name = "Times Played")
) +
labs(
title = "Top 20 Characters: Premium Bought vs Times Played",
x = "Character Played"
) +
coord_flip() +
theme_minimal()
```

✍️ Observations
This chart compares how much the top 20 spenders spent alongside how
often the character was played. Char_9 had the most plays but had
significantly less spending than Char_4. How often the character was
played was not a major factor in average spending with no signs of
correlation.
📊 Premium Bought, Play Count and Win Rate
Show code
top20_combo <- all_matches %>%
mutate(did_win = team == winner) %>%
group_by(character_played) %>%
summarise(
avg_bought = mean(premium_bought),
n = n(),
win_rate = mean(did_win),
.groups = "drop"
) %>%
arrange(desc(avg_bought)) %>%
slice_head(n = 20) %>%
mutate(character_played = factor(character_played, levels = rev(character_played)))
scale_factor <- max(top20_combo$avg_bought) / max(top20_combo$n)
ggplot(top20_combo, aes(x = character_played)) +
geom_col(aes(y = avg_bought), fill = "skyblue", width = 0.6) +
geom_segment(aes(xend = character_played, y = 0, yend = n * scale_factor),
colour = "gray50", linewidth = 1) +
geom_point(aes(y = n * scale_factor, colour = win_rate), size = 4) +
scale_y_continuous(
name = "Average Premium Bought",
sec.axis = sec_axis(~ . / scale_factor, name = "Times Played")
) +
scale_colour_gradient(
low = "red",
high = "green",
labels = percent_format(accuracy = 1)
) +
labs(
title = "Top 20 Characters: Premium Bought, Play Count, and Win Rate",
x = "Character Played",
colour = "Win Rate"
) +
coord_flip() +
theme_minimal()
```

✍️ Observations
This final chart compares average spending, play count, and win rate.
I theorized that win rate may have been a major factor in spending. The
chart indicates that Char_4 had a very high win rate but some of the
lower spenders like Char_43 had significant win rates. My next steps
would be to examine what cosmetic offerings were available for each
character. This data is not currently included. If there is not a clear
answer based on that such as having a wider range of skins available. My
next suggestion would be to interview players and get feedback on what
is special about this character. It may be purely a visual preference
suggesting the art team may want to review the styles and designs of
this character to boost sales on future characters.