🎮 Simulated Match Dataset

Data was simulated for every player over 500 matches in a 5v5 online player-versus-player game something similar to League of Legends.
Session ID, game length, winning team, map, game rule sets, player_id, team, chosen character, skin, kills, deaths, assists, disconnects, premium currency bought, premium currency spent, minion kills, gold earned, characters sent to chat, emotes used, whether they reported another player, if they were reported by another player, if they stopped playing before the end of the match. Each of these were included in every observation or row.

📊 Top 20 Characters by Average Premium Currency Bought

Show code
all_matches %>%
  group_by(character_played) %>%
  summarise(
    avg_bought = mean(premium_bought),
    avg_spent  = mean(premium_spent),
    n          = n()
  ) %>%
  arrange(desc(avg_bought)) %>%
  slice_head(n = 20) %>%
  ggplot(aes(x = reorder(character_played, avg_bought), y = avg_bought)) +
  geom_col(fill = "skyblue") +
  labs(
    title = "Top 20 Characters by Average Premium Currency Bought",
    x     = "Character Played",
    y     = "Average Premium Bought"
  ) +
  coord_flip() +
  theme_minimal()

```

✍️ Observations

Across all matches the character with the most average premium currency spending is Char_4. This could be due to many factors. They may have more cosmetics available. The character visuals could be especially appealing. If the character has a high win rate it might lead to players investing more. If the character is played substantially more or less than other character it could skew the data.


📊 Times Played for Top-Spending Characters

Show code
top20_times <- all_matches %>%
  group_by(character_played) %>%
  summarise(
    avg_bought = mean(premium_bought),
    n          = n(),
    .groups    = "drop"
  ) %>%
  arrange(desc(avg_bought)) %>%
  slice_head(n = 20) %>%
  mutate(character_played = factor(character_played, levels = rev(character_played)))

ggplot(top20_times, aes(x = character_played, y = n)) +
  geom_col(fill = "orchid") +
  labs(
    title = "Number of Times Top 20 Premium-Spending Characters Were Played",
    x     = "Character Played",
    y     = "Times Played"
  ) +
  coord_flip() +
  theme_minimal()

```

✍️ Observations

This chart shows how often each of the top spending characters were played. This shows that there are no significant differences in how often these characters were played. There is enough data for each character to draw significant conclusions more than 80 plays for each character. Despite this char_4 had 50% more premium spending than char_40 suggesting other factors impacting spending.


📊 Top 20 Characters — Premium Bought vs Times Played

Show code
top20_data <- all_matches %>%
  group_by(character_played) %>%
  summarise(
    avg_bought = mean(premium_bought),
    n          = n(),
    .groups    = "drop"
  ) %>%
  arrange(desc(avg_bought)) %>%
  slice_head(n = 20) %>%
  mutate(character_played = factor(character_played, levels = rev(character_played)))

scale_factor <- max(top20_data$avg_bought) / max(top20_data$n)

ggplot(top20_data, aes(x = character_played)) +
  geom_col(aes(y = avg_bought), fill = "skyblue", width = 0.6) +
  geom_segment(aes(xend = character_played, y = 0, yend = n * scale_factor),
               colour = "darkorchid", linewidth = 1) +
  geom_point(aes(y = n * scale_factor), colour = "darkorchid", size = 3) +
  scale_y_continuous(
    name     = "Average Premium Bought",
    sec.axis = sec_axis(~ . / scale_factor, name = "Times Played")
  ) +
  labs(
    title = "Top 20 Characters: Premium Bought vs Times Played",
    x     = "Character Played"
  ) +
  coord_flip() +
  theme_minimal()

```

✍️ Observations

This chart compares how much the top 20 spenders spent alongside how often the character was played. Char_9 had the most plays but had significantly less spending than Char_4. How often the character was played was not a major factor in average spending with no signs of correlation.


📊 Premium Bought, Play Count and Win Rate

Show code
top20_combo <- all_matches %>%
  mutate(did_win = team == winner) %>%
  group_by(character_played) %>%
  summarise(
    avg_bought = mean(premium_bought),
    n          = n(),
    win_rate   = mean(did_win),
    .groups    = "drop"
  ) %>%
  arrange(desc(avg_bought)) %>%
  slice_head(n = 20) %>%
  mutate(character_played = factor(character_played, levels = rev(character_played)))

scale_factor <- max(top20_combo$avg_bought) / max(top20_combo$n)

ggplot(top20_combo, aes(x = character_played)) +
  geom_col(aes(y = avg_bought), fill = "skyblue", width = 0.6) +
  geom_segment(aes(xend = character_played, y = 0, yend = n * scale_factor),
               colour = "gray50", linewidth = 1) +
  geom_point(aes(y = n * scale_factor, colour = win_rate), size = 4) +
  scale_y_continuous(
    name     = "Average Premium Bought",
    sec.axis = sec_axis(~ . / scale_factor, name = "Times Played")
  ) +
  scale_colour_gradient(
    low   = "red",
    high  = "green",
    labels = percent_format(accuracy = 1)
  ) +
  labs(
    title = "Top 20 Characters: Premium Bought, Play Count, and Win Rate",
    x     = "Character Played",
    colour = "Win Rate"
  ) +
  coord_flip() +
  theme_minimal()

```

✍️ Observations

This final chart compares average spending, play count, and win rate. I theorized that win rate may have been a major factor in spending. The chart indicates that Char_4 had a very high win rate but some of the lower spenders like Char_43 had significant win rates. My next steps would be to examine what cosmetic offerings were available for each character. This data is not currently included. If there is not a clear answer based on that such as having a wider range of skins available. My next suggestion would be to interview players and get feedback on what is special about this character. It may be purely a visual preference suggesting the art team may want to review the styles and designs of this character to boost sales on future characters.