Introduction

This R environment creates visualizations for the last 30 replays from the top 1200 players (06/17/2025, 7PM) retrieved via scrapping the Capcom website (https://www.streetfighter.com/6/buckler/ranking/master). This dataset contains approximately 29,000 replays (retrieved at 23/06/2025, 00PM), and the goal is to observe trends from the best players of which characters they are using, check win rates from most played characters and characters with less representation to find out if character usage reflects in higher or lower win rates.

Installing Packages

For this analysis, we are going to use the tidyverse and ggplot2 packages.

## Error in install.packages : Updating loaded packages
## Error in install.packages : Updating loaded packages

Cleaning the Dataset From Duplicate Replays

First of all, lets remove the duplicate replays that this dataset contains. This happens because the matchmaking system prioritize people with similar MR to play against each other.

In the chunk below, we will use the read_csv() function to import data from a .csv in the project folder called “Top_1200_replays_23_06_25_00PM.csv” and save it as a data frame called sf6_replays:

library(dplyr)

# Loading our Dataset

sf6_replays <- read.csv("Top_1200_Replays_23_06_25_00PM.csv")

# Remove duplicate rows based on Replay_ID, keeping the first occurrence.
sf6_replays_clean <- sf6_replays %>% 
  distinct(Replay_ID, .keep_all = TRUE)

# Check how many duplicates were removed.
message(paste("Removed", nrow(sf6_replays) - nrow(sf6_replays_clean), "duplicate replays"))
## Removed 2206 duplicate replays

Creating Visualizations

Now it’s time to get some visualizations from our dataset. First, let’s take a look in the character distribution among the players.

Basic Character Visualization

# Get the characters from the replays.
character_counts <- sf6_replays_clean %>%
  select(Player1_Character, Player2_Character) %>%
  pivot_longer(cols = everything(), names_to = "Player", values_to = "Character") %>%
  count(Character, sort = TRUE)

# Create the viz.
ggplot(character_counts, aes(x = reorder(Character, n), y = n)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  labs(
    title = "Recent Character Usage in Ranked Mode (Top 1200 Players)",
    x = "Character Used",
    y = "Number of Games"
  )

We can see that JP, Mai and Ryu are the top 3 most played characters in this scenario. So let’s keep that in mind for other analysis that we can make. On the other hand, Elena who is the new character is in the middle of the pack, which is very uncommon based on previously character releases. Also, let’s find out if these characters have a high MR among the players.

Average Master Rating (MR) by Character.

# First we need to store the MR from all the players.
mr_data <- sf6_replays_clean %>%
  pivot_longer(
    cols = c(Player1_Character, Player2_Character),
    names_to = "Player_type",
    values_to = "Character"
  ) %>%
  mutate(
    MR = if_else(Player_type == "Player1_Character", Player1_MR, Player2_MR)
  ) %>%
# Grouping by character and calculating the average MR.
group_by(Character) %>%
  summarise(
    Avg_MR = mean(MR, na.rm = TRUE),
    Play_Count = n() # Counting the appearances
  ) %>%
  arrange(desc(Avg_MR))

# After that, we can create a chart based on the Average Master Rating of the players accordingly to their characters.
ggplot(mr_data, aes(x = reorder(Character, -Avg_MR), y = Avg_MR)) +
  geom_segment(aes(x = Character, xend = Character, y = 0, yend = Avg_MR)) +
  geom_point(size = 2, pch = 21, bg = 4, col = 1) +
  coord_flip() +
                     labs(
                       title = "Average Master Rating (MR) by Character",
                       x = "Character",
                       y = "Average MR"
                     )+
                     theme_minimal()+
                     theme(
                       #This rotates the labels.
                       axis.text.x = element_text(angle = 45, hjust = 1),
                     )

As we can see, JP and Mai are in a decent spot, while Ryu is the 2nd worst and Elena players are REALLY struggling. But let’s find out how many games we can find to see if the games played have influence on this result.

Average Master Rating (MR) by Character with Play Count.

# Combine the Play Count to precisely search for trends.
mr_data <- sf6_replays_clean %>%
  pivot_longer(
    cols = c(Player1_Character, Player2_Character),
    names_to = "Player_type",
    values_to = "Character"
  ) %>%
  mutate(
    MR = if_else(Player_type == "Player1_Character", Player1_MR, Player2_MR)
  ) %>%
# Grouping by character and calculating the average MR.
group_by(Character) %>%
  summarise(
    Avg_MR = mean(MR, na.rm = TRUE),
    Play_Count = n() # Counting the appearances.
  ) %>%
  arrange(desc(Avg_MR))
# Here we actually use the Play Count.
ggplot(mr_data, aes(x = reorder(Character, Avg_MR), y = Avg_MR, fill = Play_Count)) +
  geom_bar(stat = "identity") +
  labs(
    title = "Average MR by Character with Play Count",
    x = "Character",
    y = "Average MR",
    fill = "Play Count"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Now that we know some characters are being used more than others, let’s find out if they are winning most of their games, so we can speculate about the character being good or not (without having the knowledge of who is using them).

Character Win Ratio (JP).

# Filter matches where JP is either Player1 or Player2
jp_winrate <- sf6_replays_clean %>%
  mutate(
    JP_Wins = case_when(
      Player1_Character == "JP" & Match_Winner == Player1_Name ~ "JP Wins",
      Player2_Character == "JP" & Match_Winner == Player2_Name ~ "JP Wins",
      Player1_Character == "JP" & Match_Winner != Player1_Name ~ "JP Loses",
      Player2_Character == "JP" & Match_Winner != Player2_Name ~ "JP Loses",
      TRUE ~ NA_character_  # Ignore other characters
    )
  ) %>%
  filter(!is.na(JP_Wins)) %>%  # Keep only JP matches
  count(JP_Wins) %>%
  mutate(
    Win_Rate = n / sum(n) * 100  # Convert to percentage
  )

# Print the win rate table
print(jp_winrate)
##    JP_Wins    n Win_Rate
## 1 JP Loses 1591 49.25697
## 2  JP Wins 1639 50.74303
ggplot(jp_winrate, aes(x = "", y = Win_Rate, fill = JP_Wins)) +
  geom_bar(stat = "identity", width = 1, color = "white") +
  coord_polar("y", start = 0) +  # Convert to pie chart
  geom_text(aes(label = paste0(round(Win_Rate, 1), "%")),
            position = position_stack(vjust = 0.5)) +
  labs(
    title = "JP Recent Win Ratio",
    fill = "Outcome"
  ) +
  theme_void() +  # Clean background
  scale_fill_manual(values = c("#F44336", "#4CAF50"))  # Green = wins, red = losses

Besides being the most played character on this analysis, JP have a 50.4% win rate, which is ok but not fantastic. This win rate could reflect based on the quantity of games. More people using the character = more people learning the character (maybe), so let’s hop to Mai win rate.

Character Win Ratio (Mai).

mai_winrate <- sf6_replays_clean %>%
  mutate(
    Mai_Wins = case_when(
      Player1_Character == "Mai" & Match_Winner == Player1_Name ~ "Mai Wins",
      Player2_Character == "Mai" & Match_Winner == Player2_Name ~ "Mai Wins",
      Player1_Character == "Mai" & Match_Winner != Player1_Name ~ "Mai Loses",
      Player2_Character == "Mai" & Match_Winner != Player2_Name ~ "Mai Loses",
      TRUE ~ NA_character_  # Ignore other characters.
    )
  ) %>%
  filter(!is.na(Mai_Wins)) %>%  # Keep only Mai matches.
  count(Mai_Wins) %>%
  mutate(
    Win_Rate = n / sum(n) * 100  # Convert to percentage.
  )

print(mai_winrate)
##    Mai_Wins    n Win_Rate
## 1 Mai Loses 1470 47.66537
## 2  Mai Wins 1614 52.33463
ggplot(mai_winrate, aes(x = "", y = Win_Rate, fill = Mai_Wins)) +
  geom_bar(stat = "identity", width = 1, color = "white") +
  coord_polar("y", start = 0) +  # Convert to pie chart
  geom_text(aes(label = paste0(round(Win_Rate, 1), "%")),
            position = position_stack(vjust = 0.5)) +
  labs(
    title = "Mai Recent Win Ratio",
    fill = "Outcome"
  ) +
  theme_void() +  # Clean background
  scale_fill_manual(values = c("#F44336", "#4CAF50"))  # Green = wins, red = losses

Mai have a 1.9% increase in her win rate compared to JP’s WR, and is the 2nd most played character in this analysis. While we can speculate about the power of this character, we need to have in mind that JP is a much more technical character than Mai. If people are learning Mai too, they can get wins more easily than with JP. So maybe the number of people playing a character does not have a big influence in the character WR, but to confirm that we would need a much bigger sample and past data from the players, which is out of scope of this project.

Character Win Ratio (Dhalsim).

dhalsim_winrate <- sf6_replays_clean %>%
  mutate(
    Dhalsim_Wins = case_when(
      Player1_Character == "Dhalsim" & Match_Winner == Player1_Name ~ "Dhalsim Wins",
      Player2_Character == "Dhalsim" & Match_Winner == Player2_Name ~ "Dhalsim Wins",
      Player1_Character == "Dhalsim" & Match_Winner != Player1_Name ~ "Dhalsim Loses",
      Player2_Character == "Dhalsim" & Match_Winner != Player2_Name ~ "Dhalsim Loses",
      TRUE ~ NA_character_  # Ignore other characters.
    )
  ) %>%
  filter(!is.na(Dhalsim_Wins)) %>%  # Keep only Dhalsim matches.
  count(Dhalsim_Wins) %>%
  mutate(
    Win_Rate = n / sum(n) * 100  # Convert to percentage.
  )

print(dhalsim_winrate)
##    Dhalsim_Wins   n Win_Rate
## 1 Dhalsim Loses 736 46.08641
## 2  Dhalsim Wins 861 53.91359
ggplot(dhalsim_winrate, aes(x = "", y = Win_Rate, fill = Dhalsim_Wins)) +
  geom_bar(stat = "identity", width = 1, color = "white") +
  coord_polar("y", start = 0) +  # Convert to pie chart
  geom_text(aes(label = paste0(round(Win_Rate, 1), "%")),
            position = position_stack(vjust = 0.5)) +
  labs(
    title = "Dhalsim Recent Win Ratio",
    fill = "Outcome"
  ) +
  theme_void() +  # Clean background
  scale_fill_manual(values = c("#F44336", "#4CAF50"))  # Green = wins, red = losses

I was only going to analyse data from Ryu, Mai and JP, but i decided to include Dhalsim (and upcoming Elena) because of their low play rate, so we can see if the WR changes based on that. As you can see, Dhalsim have a 53.5% WR, which is 1.2% more than Mai. This could be because people who plays Dhalsim aren’t new players with the character. So with that, we can start thinking about the PLAYERS being more impactful to win rates rather than the actual characters.

Character Win Ratio (Ryu).

ryu_winrate <- sf6_replays_clean %>%
  mutate(
    Ryu_Wins = case_when(
      Player1_Character == "Ryu" & Match_Winner == Player1_Name ~ "Ryu Wins",
      Player2_Character == "Ryu" & Match_Winner == Player2_Name ~ "Ryu Wins",
      Player1_Character == "Ryu" & Match_Winner != Player1_Name ~ "Ryu Loses",
      Player2_Character == "Ryu" & Match_Winner != Player2_Name ~ "Ryu Loses",
      TRUE ~ NA_character_  # Ignore other characters.
    )
  ) %>%
  filter(!is.na(Ryu_Wins)) %>%  # Keep only Ryu matches.
  count(Ryu_Wins) %>%
  mutate(
    Win_Rate = n / sum(n) * 100  # Convert to percentage.
  )

print(ryu_winrate)
##    Ryu_Wins    n Win_Rate
## 1 Ryu Loses 1527 48.07935
## 2  Ryu Wins 1649 51.92065
ggplot(ryu_winrate, aes(x = "", y = Win_Rate, fill = Ryu_Wins)) +
  geom_bar(stat = "identity", width = 1, color = "white") +
  coord_polar("y", start = 0) +  # Convert to pie chart
  geom_text(aes(label = paste0(round(Win_Rate, 1), "%")),
            position = position_stack(vjust = 0.5)) +
  labs(
    title = "Ryu Recent Win Ratio",
    fill = "Outcome"
  ) +
  theme_void() +  # Clean background
  scale_fill_manual(values = c("#F44336", "#4CAF50"))  # Green = wins, red = losses

Ryu’s win rate is interesting because he is the 2nd worst in the average MR. This could mean that people are starting playing Ryu now (which means they starts at 1500MR) and getting results fast. Usually when people starts playing a character, WR drops a lot. With Ryu being the 3rd most played, this is not the case.

Character Win Ratio (Elena).

elena_winrate <- sf6_replays_clean %>%
  mutate(
    Elena_Wins = case_when(
      Player1_Character == "Elena" & Match_Winner == Player1_Name ~ "Elena Wins",
      Player2_Character == "Elena" & Match_Winner == Player2_Name ~ "Elena Wins",
      Player1_Character == "Elena" & Match_Winner != Player1_Name ~ "Elena Loses",
      Player2_Character == "Elena" & Match_Winner != Player2_Name ~ "Elena Loses",
      TRUE ~ NA_character_  # Ignore other characters.
    )
  ) %>%
  filter(!is.na(Elena_Wins)) %>%  # Keep only Elena matches.
  count(Elena_Wins) %>%
  mutate(
    Win_Rate = n / sum(n) * 100  # Convert to percentage.
  )

print(elena_winrate)
##    Elena_Wins    n Win_Rate
## 1 Elena Loses  947 46.60433
## 2  Elena Wins 1085 53.39567
ggplot(elena_winrate, aes(x = "", y = Win_Rate, fill = Elena_Wins)) +
  geom_bar(stat = "identity", width = 1, color = "white") +
  coord_polar("y", start = 0) +  # Convert to pie chart
  geom_text(aes(label = paste0(round(Win_Rate, 1), "%")),
            position = position_stack(vjust = 0.5)) +
  labs(
    title = "Elena Recent Win Ratio",
    fill = "Outcome"
  ) +
  theme_void() +  # Clean background
  scale_fill_manual(values = c("#F44336", "#4CAF50"))  # Green = wins, red = losses

Elena have a similar win ratio of Dhalsim, and they are both played less than the other characters. Here we can see that besides the character being bad, players can still maintain a good win rate, because less people are trying the character.

Win Rate Comparison with Number of Games (JP, Mai, Ryu, Dhalsim and Elena).

# Function to calculate win rate for a given character.
get_win_rate <- function(character_name) {
  sf6_replays_clean %>%
    mutate(
      outcome = case_when(
        Player1_Character == character_name & Match_Winner == Player1_Name ~ "Wins",
        Player2_Character == character_name & Match_Winner == Player2_Name ~ "Wins",
        Player1_Character == character_name | Player2_Character == character_name ~ "Losses",
        TRUE ~ NA_character_
      )
    ) %>%
    filter(!is.na(outcome)) %>%
    count(outcome) %>%
    mutate(
      character = character_name,
      win_rate = n / sum(n) * 100
    ) %>%
    filter(outcome == "Wins")  # Keep only win rates.
}

# Get win rates for all four characters.
win_rates <- bind_rows(
  get_win_rate("Mai"),
  get_win_rate("Dhalsim"),
  get_win_rate("JP"),
  get_win_rate("Ryu"),
  get_win_rate("Elena")
) %>%
  select(character, win_rate, total_matches = n)

ggplot(win_rates, aes(x = win_rate, y = reorder(total_matches, win_rate), fill = character)) +
  geom_bar(stat = "identity", width = 0.7) +
  geom_text(aes(label = paste0(round(win_rate, 0), "%")),
            vjust = -0.5,
            color = "black",
            size = 3.5) +
  coord_flip() +
  scale_x_continuous(limits = c(0,60)) +
  labs(
    title = "Win Rate Comparison",
    x = "Win Ratio (%)",
    y = "Games Played",
    fill = "Characters"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
  )

Conclusion

As we could see, there is no relationship between character usage, win rates and their MR. To answer that, a much larger sample and other factors need to be analysed, such as playtime with the character in other seasons and who is using the character. If the player is a Pro Player, there will be a discrepancy to other players trying a new character for example. With all of that being out of scope, we can speculate that the major influence to average MR and win rates is the PLAYER. And we also saw a trend on which characters are being used the most after the balance patch in 06/05/2025, being them Ryu, Mai, JP, Akuma and Luke.