This R environment creates visualizations for the last 30 replays from the top 1200 players (06/17/2025, 7PM) retrieved via scrapping the Capcom website (https://www.streetfighter.com/6/buckler/ranking/master). This dataset contains approximately 29,000 replays (retrieved at 23/06/2025, 00PM), and the goal is to observe trends from the best players of which characters they are using, check win rates from most played characters and characters with less representation to find out if character usage reflects in higher or lower win rates.
For this analysis, we are going to use the tidyverse and ggplot2 packages.
## Error in install.packages : Updating loaded packages
## Error in install.packages : Updating loaded packages
First of all, lets remove the duplicate replays that this dataset contains. This happens because the matchmaking system prioritize people with similar MR to play against each other.
In the chunk below, we will use the read_csv() function
to import data from a .csv in the project folder called
“Top_1200_replays_23_06_25_00PM.csv” and save it as a data frame called
sf6_replays:
library(dplyr)
# Loading our Dataset
sf6_replays <- read.csv("Top_1200_Replays_23_06_25_00PM.csv")
# Remove duplicate rows based on Replay_ID, keeping the first occurrence.
sf6_replays_clean <- sf6_replays %>%
distinct(Replay_ID, .keep_all = TRUE)
# Check how many duplicates were removed.
message(paste("Removed", nrow(sf6_replays) - nrow(sf6_replays_clean), "duplicate replays"))
## Removed 2206 duplicate replays
Now it’s time to get some visualizations from our dataset. First, let’s take a look in the character distribution among the players.
# Get the characters from the replays.
character_counts <- sf6_replays_clean %>%
select(Player1_Character, Player2_Character) %>%
pivot_longer(cols = everything(), names_to = "Player", values_to = "Character") %>%
count(Character, sort = TRUE)
# Create the viz.
ggplot(character_counts, aes(x = reorder(Character, n), y = n)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(
title = "Recent Character Usage in Ranked Mode (Top 1200 Players)",
x = "Character Used",
y = "Number of Games"
)
We can see that JP, Mai and Ryu are the top 3 most played characters in this scenario. So let’s keep that in mind for other analysis that we can make. On the other hand, Elena who is the new character is in the middle of the pack, which is very uncommon based on previously character releases. Also, let’s find out if these characters have a high MR among the players.
# First we need to store the MR from all the players.
mr_data <- sf6_replays_clean %>%
pivot_longer(
cols = c(Player1_Character, Player2_Character),
names_to = "Player_type",
values_to = "Character"
) %>%
mutate(
MR = if_else(Player_type == "Player1_Character", Player1_MR, Player2_MR)
) %>%
# Grouping by character and calculating the average MR.
group_by(Character) %>%
summarise(
Avg_MR = mean(MR, na.rm = TRUE),
Play_Count = n() # Counting the appearances
) %>%
arrange(desc(Avg_MR))
# After that, we can create a chart based on the Average Master Rating of the players accordingly to their characters.
ggplot(mr_data, aes(x = reorder(Character, -Avg_MR), y = Avg_MR)) +
geom_segment(aes(x = Character, xend = Character, y = 0, yend = Avg_MR)) +
geom_point(size = 2, pch = 21, bg = 4, col = 1) +
coord_flip() +
labs(
title = "Average Master Rating (MR) by Character",
x = "Character",
y = "Average MR"
)+
theme_minimal()+
theme(
#This rotates the labels.
axis.text.x = element_text(angle = 45, hjust = 1),
)
As we can see, JP and Mai are in a decent spot, while Ryu is the 2nd worst and Elena players are REALLY struggling. But let’s find out how many games we can find to see if the games played have influence on this result.
# Combine the Play Count to precisely search for trends.
mr_data <- sf6_replays_clean %>%
pivot_longer(
cols = c(Player1_Character, Player2_Character),
names_to = "Player_type",
values_to = "Character"
) %>%
mutate(
MR = if_else(Player_type == "Player1_Character", Player1_MR, Player2_MR)
) %>%
# Grouping by character and calculating the average MR.
group_by(Character) %>%
summarise(
Avg_MR = mean(MR, na.rm = TRUE),
Play_Count = n() # Counting the appearances.
) %>%
arrange(desc(Avg_MR))
# Here we actually use the Play Count.
ggplot(mr_data, aes(x = reorder(Character, Avg_MR), y = Avg_MR, fill = Play_Count)) +
geom_bar(stat = "identity") +
labs(
title = "Average MR by Character with Play Count",
x = "Character",
y = "Average MR",
fill = "Play Count"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Now that we know some characters are being used more than others, let’s find out if they are winning most of their games, so we can speculate about the character being good or not (without having the knowledge of who is using them).
# Filter matches where JP is either Player1 or Player2
jp_winrate <- sf6_replays_clean %>%
mutate(
JP_Wins = case_when(
Player1_Character == "JP" & Match_Winner == Player1_Name ~ "JP Wins",
Player2_Character == "JP" & Match_Winner == Player2_Name ~ "JP Wins",
Player1_Character == "JP" & Match_Winner != Player1_Name ~ "JP Loses",
Player2_Character == "JP" & Match_Winner != Player2_Name ~ "JP Loses",
TRUE ~ NA_character_ # Ignore other characters
)
) %>%
filter(!is.na(JP_Wins)) %>% # Keep only JP matches
count(JP_Wins) %>%
mutate(
Win_Rate = n / sum(n) * 100 # Convert to percentage
)
# Print the win rate table
print(jp_winrate)
## JP_Wins n Win_Rate
## 1 JP Loses 1591 49.25697
## 2 JP Wins 1639 50.74303
ggplot(jp_winrate, aes(x = "", y = Win_Rate, fill = JP_Wins)) +
geom_bar(stat = "identity", width = 1, color = "white") +
coord_polar("y", start = 0) + # Convert to pie chart
geom_text(aes(label = paste0(round(Win_Rate, 1), "%")),
position = position_stack(vjust = 0.5)) +
labs(
title = "JP Recent Win Ratio",
fill = "Outcome"
) +
theme_void() + # Clean background
scale_fill_manual(values = c("#F44336", "#4CAF50")) # Green = wins, red = losses
Besides being the most played character on this analysis, JP have a 50.4% win rate, which is ok but not fantastic. This win rate could reflect based on the quantity of games. More people using the character = more people learning the character (maybe), so let’s hop to Mai win rate.
mai_winrate <- sf6_replays_clean %>%
mutate(
Mai_Wins = case_when(
Player1_Character == "Mai" & Match_Winner == Player1_Name ~ "Mai Wins",
Player2_Character == "Mai" & Match_Winner == Player2_Name ~ "Mai Wins",
Player1_Character == "Mai" & Match_Winner != Player1_Name ~ "Mai Loses",
Player2_Character == "Mai" & Match_Winner != Player2_Name ~ "Mai Loses",
TRUE ~ NA_character_ # Ignore other characters.
)
) %>%
filter(!is.na(Mai_Wins)) %>% # Keep only Mai matches.
count(Mai_Wins) %>%
mutate(
Win_Rate = n / sum(n) * 100 # Convert to percentage.
)
print(mai_winrate)
## Mai_Wins n Win_Rate
## 1 Mai Loses 1470 47.66537
## 2 Mai Wins 1614 52.33463
ggplot(mai_winrate, aes(x = "", y = Win_Rate, fill = Mai_Wins)) +
geom_bar(stat = "identity", width = 1, color = "white") +
coord_polar("y", start = 0) + # Convert to pie chart
geom_text(aes(label = paste0(round(Win_Rate, 1), "%")),
position = position_stack(vjust = 0.5)) +
labs(
title = "Mai Recent Win Ratio",
fill = "Outcome"
) +
theme_void() + # Clean background
scale_fill_manual(values = c("#F44336", "#4CAF50")) # Green = wins, red = losses
Mai have a 1.9% increase in her win rate compared to JP’s WR, and is the 2nd most played character in this analysis. While we can speculate about the power of this character, we need to have in mind that JP is a much more technical character than Mai. If people are learning Mai too, they can get wins more easily than with JP. So maybe the number of people playing a character does not have a big influence in the character WR, but to confirm that we would need a much bigger sample and past data from the players, which is out of scope of this project.
dhalsim_winrate <- sf6_replays_clean %>%
mutate(
Dhalsim_Wins = case_when(
Player1_Character == "Dhalsim" & Match_Winner == Player1_Name ~ "Dhalsim Wins",
Player2_Character == "Dhalsim" & Match_Winner == Player2_Name ~ "Dhalsim Wins",
Player1_Character == "Dhalsim" & Match_Winner != Player1_Name ~ "Dhalsim Loses",
Player2_Character == "Dhalsim" & Match_Winner != Player2_Name ~ "Dhalsim Loses",
TRUE ~ NA_character_ # Ignore other characters.
)
) %>%
filter(!is.na(Dhalsim_Wins)) %>% # Keep only Dhalsim matches.
count(Dhalsim_Wins) %>%
mutate(
Win_Rate = n / sum(n) * 100 # Convert to percentage.
)
print(dhalsim_winrate)
## Dhalsim_Wins n Win_Rate
## 1 Dhalsim Loses 736 46.08641
## 2 Dhalsim Wins 861 53.91359
ggplot(dhalsim_winrate, aes(x = "", y = Win_Rate, fill = Dhalsim_Wins)) +
geom_bar(stat = "identity", width = 1, color = "white") +
coord_polar("y", start = 0) + # Convert to pie chart
geom_text(aes(label = paste0(round(Win_Rate, 1), "%")),
position = position_stack(vjust = 0.5)) +
labs(
title = "Dhalsim Recent Win Ratio",
fill = "Outcome"
) +
theme_void() + # Clean background
scale_fill_manual(values = c("#F44336", "#4CAF50")) # Green = wins, red = losses
I was only going to analyse data from Ryu, Mai and JP, but i decided to include Dhalsim (and upcoming Elena) because of their low play rate, so we can see if the WR changes based on that. As you can see, Dhalsim have a 53.5% WR, which is 1.2% more than Mai. This could be because people who plays Dhalsim aren’t new players with the character. So with that, we can start thinking about the PLAYERS being more impactful to win rates rather than the actual characters.
ryu_winrate <- sf6_replays_clean %>%
mutate(
Ryu_Wins = case_when(
Player1_Character == "Ryu" & Match_Winner == Player1_Name ~ "Ryu Wins",
Player2_Character == "Ryu" & Match_Winner == Player2_Name ~ "Ryu Wins",
Player1_Character == "Ryu" & Match_Winner != Player1_Name ~ "Ryu Loses",
Player2_Character == "Ryu" & Match_Winner != Player2_Name ~ "Ryu Loses",
TRUE ~ NA_character_ # Ignore other characters.
)
) %>%
filter(!is.na(Ryu_Wins)) %>% # Keep only Ryu matches.
count(Ryu_Wins) %>%
mutate(
Win_Rate = n / sum(n) * 100 # Convert to percentage.
)
print(ryu_winrate)
## Ryu_Wins n Win_Rate
## 1 Ryu Loses 1527 48.07935
## 2 Ryu Wins 1649 51.92065
ggplot(ryu_winrate, aes(x = "", y = Win_Rate, fill = Ryu_Wins)) +
geom_bar(stat = "identity", width = 1, color = "white") +
coord_polar("y", start = 0) + # Convert to pie chart
geom_text(aes(label = paste0(round(Win_Rate, 1), "%")),
position = position_stack(vjust = 0.5)) +
labs(
title = "Ryu Recent Win Ratio",
fill = "Outcome"
) +
theme_void() + # Clean background
scale_fill_manual(values = c("#F44336", "#4CAF50")) # Green = wins, red = losses
Ryu’s win rate is interesting because he is the 2nd worst in the average MR. This could mean that people are starting playing Ryu now (which means they starts at 1500MR) and getting results fast. Usually when people starts playing a character, WR drops a lot. With Ryu being the 3rd most played, this is not the case.
elena_winrate <- sf6_replays_clean %>%
mutate(
Elena_Wins = case_when(
Player1_Character == "Elena" & Match_Winner == Player1_Name ~ "Elena Wins",
Player2_Character == "Elena" & Match_Winner == Player2_Name ~ "Elena Wins",
Player1_Character == "Elena" & Match_Winner != Player1_Name ~ "Elena Loses",
Player2_Character == "Elena" & Match_Winner != Player2_Name ~ "Elena Loses",
TRUE ~ NA_character_ # Ignore other characters.
)
) %>%
filter(!is.na(Elena_Wins)) %>% # Keep only Elena matches.
count(Elena_Wins) %>%
mutate(
Win_Rate = n / sum(n) * 100 # Convert to percentage.
)
print(elena_winrate)
## Elena_Wins n Win_Rate
## 1 Elena Loses 947 46.60433
## 2 Elena Wins 1085 53.39567
ggplot(elena_winrate, aes(x = "", y = Win_Rate, fill = Elena_Wins)) +
geom_bar(stat = "identity", width = 1, color = "white") +
coord_polar("y", start = 0) + # Convert to pie chart
geom_text(aes(label = paste0(round(Win_Rate, 1), "%")),
position = position_stack(vjust = 0.5)) +
labs(
title = "Elena Recent Win Ratio",
fill = "Outcome"
) +
theme_void() + # Clean background
scale_fill_manual(values = c("#F44336", "#4CAF50")) # Green = wins, red = losses
Elena have a similar win ratio of Dhalsim, and they are both played less than the other characters. Here we can see that besides the character being bad, players can still maintain a good win rate, because less people are trying the character.
# Function to calculate win rate for a given character.
get_win_rate <- function(character_name) {
sf6_replays_clean %>%
mutate(
outcome = case_when(
Player1_Character == character_name & Match_Winner == Player1_Name ~ "Wins",
Player2_Character == character_name & Match_Winner == Player2_Name ~ "Wins",
Player1_Character == character_name | Player2_Character == character_name ~ "Losses",
TRUE ~ NA_character_
)
) %>%
filter(!is.na(outcome)) %>%
count(outcome) %>%
mutate(
character = character_name,
win_rate = n / sum(n) * 100
) %>%
filter(outcome == "Wins") # Keep only win rates.
}
# Get win rates for all four characters.
win_rates <- bind_rows(
get_win_rate("Mai"),
get_win_rate("Dhalsim"),
get_win_rate("JP"),
get_win_rate("Ryu"),
get_win_rate("Elena")
) %>%
select(character, win_rate, total_matches = n)
ggplot(win_rates, aes(x = win_rate, y = reorder(total_matches, win_rate), fill = character)) +
geom_bar(stat = "identity", width = 0.7) +
geom_text(aes(label = paste0(round(win_rate, 0), "%")),
vjust = -0.5,
color = "black",
size = 3.5) +
coord_flip() +
scale_x_continuous(limits = c(0,60)) +
labs(
title = "Win Rate Comparison",
x = "Win Ratio (%)",
y = "Games Played",
fill = "Characters"
) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
)
As we could see, there is no relationship between character usage, win rates and their MR. To answer that, a much larger sample and other factors need to be analysed, such as playtime with the character in other seasons and who is using the character. If the player is a Pro Player, there will be a discrepancy to other players trying a new character for example. With all of that being out of scope, we can speculate that the major influence to average MR and win rates is the PLAYER. And we also saw a trend on which characters are being used the most after the balance patch in 06/05/2025, being them Ryu, Mai, JP, Akuma and Luke.