library(tidyverse)
library(fmsb)
library(RColorBrewer)
characters <- read_csv("hsr_character-data.csv")
light_cones <- read_csv("hsr_lightcone-data.csv")DATA110 Project 1: To Gleam the Aeons
Introduction
This project attempts to visualize a few statistical relationships in the game “Honkai Star Rail” by HoYoverse in one concise image. Honkai Star Rail is a turn based role-playing gacha game, similar in many ways to titles like Pokemon. Characters have “stats” such as HP (Hit Points), def (defense), atk (attack), spd (speed), taunt, and maximum energy. Characters in this game follow “Paths”, which amount to the character’s role in a team, whether that is to deal damage, to buff other allies, or to protect the team from incoming damage. A typical battle in Honkai Star Rail (HSR) has a simple end condition: you win if you are able to reduce all enemies’ HP to 0, and you lose if all allies HP are reduced to 0. These “stats” are explained as follows:
- atk (quantitative): a stat typically used to determine how much damage is dealt by a character’s skills. there are many caveats, as atk is used in certain characters’ kits to determine how much healing they provide to allies, and may also not be representative of how much raw damage a character outputs due to the existence of atk scaling and multipliers.
- HP (quantitative): a stat which determines how much damage a character can take before they are downed.
- def (quantitative): a stat which determines how much damage a character takes from enemy attacks.
- spd (quantitative): a stat which determines how often a character moves in the turn order.
- taunt (quantitative, discrete): a stat which determines how likely a character is to be targeted by enemies, this is determined uniquely by the character’s path (see below)
- maximum energy (quantitative, discrete): a stat which determines how much energy is required for a character to cast their ultimate ability. this is often very important for characters which get a large amount of utility from their burst, and higher maximum energy requires more resources to ensure consistent uptime on ultimate abilities.
- path (categorical): governed by the principles of the “Aeons” in the games lore, the essential roles which a character plays in their combat
- abundance: provide allies with healing and often cleanses or buffs
- destruction: deals massive damage to enemies, often multi-target
- erudition: deals consistent damage to multiple targets
- harmony: provide allies with utility buffs and cleanses debuffs
- hunt: deals large damage to single targets
- nihility: applies debuffs on enemies and removes enemy buffs, these are sometimes damage-over-time abilities
- preservation: mitigates allies’ incoming damage through shields, damage reduction, or other buffs
In this project I worked with datasets from Ridho Pandu on Kaggle titled “Honkai Star Rail Character Dataset” and “Honkai Star Rail Light Cone Dataset”. At the time of use, the characters dataset was updated for HSR version 1.3, and the light cones dataset was updated for HSR version 1.2.
While exploring the data, I remembered a form of data visualization that I had seen in the past in video games like Pokemon, which I found was called a “Radar Chart”. After researching I found that these are highly criticized in the field of data visualization, but at the end I hope to make an argument as to why these are particularly useful in visualizations for video games.
I hope to visualize how these character stats vary based on the character’s path using a radar chart. Specifically, since radar charts can get very cluttered and hard to interpret, I chose the three support-focused paths: abundance, preservation, and harmony.
Load Libraries and Data
Preliminary Graphs
When making these graphs, I didn’t realize that geom_bar won’t immediately average when you set a categorical x aesthetic. Instead, it summed them, which makes for a useless visualization because there are different raw counts for each path. thus, the first two barcharts are pretty trash.
# fully leveled atk vs. path
ggplot(characters, aes(y = atk_80, x = path)) +
geom_bar(stat="identity", alpha=0.8, aes(fill=path)) +
labs(y="atk (lvl. 80)", x="path", title="atk vs. path") +
scale_fill_brewer(palette = "Set1")# fully leveled atk vs. combat type
ggplot(characters, aes(y = atk_80, x = combat_type)) +
geom_bar(stat="identity", alpha=0.8, aes(fill=combat_type)) +
labs(y="atk (lvl. 80)", x="combat type", title="atk vs. combat type") +
scale_fill_brewer(palette = "Set2")# max energy vs. path
ggplot(characters, aes(y = max_energy, x = as.factor(path))) +
geom_boxplot(aes(fill=path), alpha=0.4) +
labs(y="max energy", x="path", title="max energy distribution vs. path") +
scale_fill_brewer(palette = "Set3")# light cone fully leveled atk vs. path
ggplot(light_cones, aes(y = atk_80, x = as.factor(path))) +
geom_boxplot(aes(fill=path), alpha=0.4) +
labs(y="atk (lvl. 80)", x="path", title="light cone atk distribution vs. path") +
scale_fill_brewer(palette = "Set3")# standalone box plots of atk, hp, def, max energy, and spd
par(mfrow=c(2,3))
boxplot(characters$atk_80, xlab="atk", col="#96CDCD")
boxplot(characters$hp_80, xlab="hp", col="#96CDCD")
boxplot(characters$def_80, xlab="def", col="#96CDCD")
boxplot(characters$max_energy, xlab="energy", col="#96CDCD")
boxplot(characters$spd, xlab="spd", col="#96CDCD")Data Cleaning
# Clean out paths we do not care about
rad_data <- select(characters,-c(19:51))
rad_data <- select(rad_data,-c(5:10))
rad_data <- select(rad_data,-c(2,6,7))
# Summarize data by path
sum_data <- rad_data |>
group_by(path) |>
summarize(atk=mean(atk_80), def=mean(def_80), hp=mean(hp_80), spd=mean(spd),
energy=mean(max_energy), taunt=mean(taunt))
# Make an overall average profile
avg_data <- rad_data |>
summarize(atk=mean(atk_80), def=mean(def_80), hp=mean(hp_80), spd=mean(spd),
energy=mean(max_energy), taunt=mean(taunt))
avg_data <- avg_data |>
mutate(path = c("average"))
# Define the variable ranges: maximum and minimum
max_min <- data.frame(path = c("max", "min"),
atk = c(756.76, 465.7), hp = c(1475, 847),
def = c(654.89, 330.75), energy = c(140, 90),
spd = c(115, 90), taunt = c(150,75))
# Bind the variable ranges to the data
vis_data <- rbind(max_min, sum_data)
rownames(vis_data) <- vis_data$path
vis_data <- select(vis_data, -c(1))
avg_data <- rbind(max_min, avg_data)
rownames(avg_data) <- avg_data$path
avg_data <- select(avg_data, -c(1))Radar Chart Function
In this section I slightly adapted code provided here:
https://www.datanovia.com/en/blog/beautiful-radar-chart-in-r-using-fmsb-and-ggplot-packages/
in order to plot multiple radar charts together easily.
# Code provided by Data Novia
pretty_rc <- function(data, atype = 1, color = "#00AFBB",
vlabels = colnames(data), vlcex = 0.7,
caxislabels = NULL, title = NULL, ...){
radarchart(
data, axistype = atype,
# Customize the polygon
pcol = color, pfcol = scales::alpha(color, 0.4), plwd = 2,
# Customize the grid
cglcol = "grey", cglty = 1, cglwd = 0.8,
# Customize the line type
plty = 1,
# Customize the axis and axis label text magnification
axislabcol = "black", calcex = 0.75,
# Variable labels
vlcex = 1, vlabels = vlabels,
caxislabels = caxislabels, title = title, ...
)
}Plotting
# preservation_data <- vis_data[c("max", "min", "preservation"), ]
# pretty_rc(preservation_data)
op <- par(mar = c(1, 2, 2, 1))
pretty_rc(data = vis_data[-c(4,5,7,8),],
color = c("#e8c48e", "#B7E8EB", "#E1CFE8"))
legend(x = "right", legend = rownames(vis_data[-c(1,2,4,5,7,8),]),
horiz = FALSE, bty = "n", pch = 20 ,
col = c("#e8c48e", "#B7E8EB", "#E1CFE8"),
text.col = "black", cex = 0.9, pt.cex = 2)
title(main = "Character Stat Distribution by Path", font.main = 2,
col.main = "#215772")
title(sub="Data from Ridho Pandu's HSR Character Dataset",
cex.sub = 0.8, col.sub="#7d7d7d", line=0.1, font.sub = 3)par(op)
# plot the average for reference:
pretty_rc(data = avg_data,
color = "#4682B4")
title(main = "Average Character Stat Distribution", font.main = 2,
col.main = "#215772")
title(sub="Data from Ridho Pandu's HSR Character Dataset",
cex.sub = 0.8, col.sub="#7d7d7d", line=0.1, font.sub = 3)par(op)Discussion
How I Cleaned the Dataset
To begin cleaning the dataset, I first disregarded the light cones dataset as it is irrelevant to the character visualizations, and while it could be tied in to provide a more accurate painting of the characters stats in battle, it would require much more sophisticated summary in the data cleaning, and also would not reflect the concept of considering only the characters’ base stats as well. I removed all columns other than the 6 stats we are concerned with, and the character with their path and combat type. Then, using piping, I grouped the data by path and summarize the 6 statistics by taking their means across all characters of the same path. I did the same similarly without considering the path to create an average of all the characters stats. The radar chart function provided in fmsb requires some very specific formatting of the input datasets as outlined in this tutorial which I followed heavily:
So I then created maximum and minimum stat vectors manually as there were not too many to deal with, and I felt it would take more time to do it with any dplyr commands. I used rbind to merge the maximum and minimum stat data into the summarized data by paths and by overall mean, and then I removed the path column, instead identifying the paths categorically by the row names. This cleaning was pretty intensive to carry out.
The Plot
I then slightly adapted the function code provided by Data Novia which aids in customizing the aesthetics of the radar chart. While plotting, I had to do a lot of research into the graphical features of base R plots. It was a lot harder to customize things like a caption, title, etc., and the margins were very hard to deal with, I had to adjust the subtitle so it would instead appear directly under the graph, because at first I couldn’t see it at all, then, since I suspected it may have been a margin error, I increased the text magnification of the subtitle, and just like that it revealed that the subtitle was indeed stuck below the margins. I also needed to figure out how to move the legend from the bottom to the side, because it was obstructing the labels as it was provided in the original code. The following helped me greatly in customizing the plot:
- https://r-graph-gallery.com/spider-or-radar-chart.html
- http://www.sthda.com/english/wiki/ggplot2-legend-easy-steps-to-change-the-position-and-the-appearance-of-a-graph-legend-in-r-software
- https://rpkgs.datanovia.com/ggpubr/reference/font.html
- https://stat.ethz.ch/R-manual/R-devel/library/graphics/html/title.html
- https://r-charts.com/base-r/title/
- http://www.di.fc.ul.pt/~jpn/r/GraphicalTools/colorPalette.html
- https://r-charts.com/colors/
In plotting, I note that I did not include all of the paths, instead, I chose to depict only abundance, preservation, and harmony, which I did intentionally as radar charts can get very easily cluttered, and I decided then to only depict the stats of those paths which are considered by the community “support” roles.
This visualization represents the point estimates of atk, def, hp, spd, max energy, and taunt throughout the paths in patch 1.3. It compares them using a hexagon as opposed to a barchart or a lollipop plot. The graph emphasizes that Preservation units have a large overall base stat lead compared to the other two support paths, with the Abundance having the least raw stats. The harmony excels in speed and require a lot of energy to unleash their ultimates. This relationships, of course, come with no p-values or indicators of significance. We can’t be sure without further analysis that these differences are in any way meaningful. Nonetheless, it is interesting to see the suggestion that abundance units receive less from base stats and, may also often carry more utility embedded in their kits to make up for it.
The average profile is nearly a regular hexagon with vertices at 50%. The main reason that it is not closer to regular is because of inherent outliers in the maxes and mins of each stat. To alleviate this, one could instead make the max something like the 99th percentile of the approximated normal distribution to the quantitative variable, and make the min something like the 1st percentile. This would reduce the impact of outliers on the shape and biases of the profiles in general.
An Argument for Radar Charts
The following article highlights the flaws in radar charts:
- https://www.data-to-viz.com/caveat/spider.html
While I agree with many of the criticisms, I argue that some of these “flaws” function more as features with respect to video games such as Pokemon, Honkai Star Rail, Onmyoji, and even classic table-top RPGs such as Dungeons and Dragons.
1st issue: Ranking/Order and Shape
I certainly agree that the shape of the radar chart changes drastically based on the ordering of axes, and this can definitely change the reader’s interpretation of the data. I believe that there are two solutions to this. The first choice is to standardize the axes. Just as in plotting in multivariable calculus, where we all agree to place the z-axis vertically, the y-axis horizontally, and the x-axis coming out of the page, games such as Pokemon agree on where to place the axes. In Pokemon games, the order is almost always as follows, clockwise from the top: HP, atk, def, spd, sp. def, sp. atk, and this way there is no issue comparing the relative strengths and deficiencies between various Pokemon, because we are familiar with what each axis represents! I employ a similar convention in this project. The second choice, in cases where there is no established convention, or the data appears extremely different when choosing different orders, is to include ranking as an innate feature of the graph. This can be done as follows: always place the individual’s highest relative score on the top-axis, then, place the remaining relative scores in decreasing order clockwise. This way, the shape is standardized to a sort of sea-shell like shape (wow you could rename this plot the seashell plot, can I get published too like those people who wrote the paper about the violin plot which they invented and now ask everyone to cite them for using??). This provides the unique advantage of comparing students strengths and weaknesses. While this could be done with side-by-side lollipop plots or barcharts, I feel that there is unique value in the polar-coordinate sensation that this graph gives. It allows the viewer to see the variables compactly, and can also be rotated or shuffled to reveal relative strengths and weakness. Shuffling can also be done with barcharts, but there is no innate sense of rotation or symmetry to barcharts.
2nd issue: Scales
Solution: normalize all variables to percentiles of the distributions or percentages of an absolute max or min. Humans are great at interpreting percents.
3rd issue: Over-evaluation of differences
The article highlights that the area of the shapes scale quadratically with any increasing statistic. Yes, this can be misrepresentative in visualizations for something like test scores, but RPGs often have a justification for this quadratic scaling, parameters interact! A character often has requirements for ALL of these stats, for example, increasing a characters HP often makes them more reliable in dealing damage, as they won’t be downed as easily. Characters may also use certain stats as multipliers of others, for example, a character may receive a damage bonus proportional to their speed, making the quadratic relationship between single stats and area appropriate. Pokemon players can attest that the difference between finding a Pokemon with four perfect IVs (internal values) and finding a Pokemon with six perfect IVs is vast, and their consequences far reaching.
My gripe with lollipop plots:
I feel that lollipop plots are simply not very concise in presenting this information for RPGs. They are also dull, and don’t catch my eye, making it harder to interpret the relative strengths and weaknesses of units.
What I wish I could have included
I don’t like how the axis labels are buried under the layers of plots as it is right now. I tried to fix this but ultimately failed. I wish there could have been more interactivity, perhaps even a tool to compare each individual character’s stats to the average. I feel that would be more true to the visualizations as they are used in Pokemon, and would also be more valued by the community as they are less skewed by the number of characters released in each path, as they are right now.
Just for Fun
What do the other paths look like? Can we compare them all side by side?
# Define colors and titles
colors <- c("#9E0142", "#F46D43", "#FECD1A", "#C9E12B", "#66C2A5", "#3288BD",
"#5E4FA2")
titles <- c("abundance", "destruction", "erudition", "harmony", "hunt",
"nihility", "preservation")
# Reduce plot margin using par()
# Split the screen into parts
op <- par(mar = c(1, 1, 1, 1))
par(mfrow = c(2,4))
# Create the radar chart
for(i in 1:7){
pretty_rc(
data = vis_data[c(1, 2, i+2), ],
atype = 0, # make the axis labels go away
color = colors[i], title = titles[i]
)
}
pretty_rc(data = avg_data,
color = "#7D7D7D", title = "average")par(op)THIS IS SO COOL!!! why is coding fun sometimes…
a neat discussion:
- https://stackoverflow.com/questions/1787124/programmatically-darken-a-hex-colour
- https://hexcolorcodes.org/darken-color
I find it so interesting how the shapes of the hunt and the erudition are so similar. This makes complete sense because they are both focused on dealing damage, either to a single target or to multiple targets respectively. And their differences are also accounted for in this shape, the hunt has more spd to move faster and get more turns, and the erudition has more atk so as to facilitate more consistent damage.
I feel like I have actually learned so much from this project, about data visualization, about coding in R, even about the character archetypes in Honkai Star Rail itself.
Data Sources and References
Wherever external images are used or articles are referenced or used for aid, there are links. The following are the most essential of the references: