For this assignment I wanted to look at the hit points of enemies in Silksong. I find this interesting because Silksong is a hard game so i want to compare the enemy hit points to that of hollow knight, and i wish to look more specifically looking into enemies featured in a section called the high halls gauntlet as well as the hardest arean in Hollow Knight being the Trial of the Fool.
I scraped this data form the HollowKinght wiki and all information found here is form the Hollowknight wiki.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(janitor)
Attaching package: 'janitor'
The following objects are masked from 'package:stats':
chisq.test, fisher.test
Rows: 344 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (9): MAIN GAME...1, MAIN GAME...2, MAIN GAME...3, ...4, ...5, ...6, ...7...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
As you can see this data is not the most tidy every thing is stored as a character, the names of things are stored as data instead of being stored in there proper places, in the hit points column it will sometimes have a “/” in it to show the enemy health when in act 3 instead of that being stored in its own column. thre are rows that are titles for the rows bellow it. So I will do the work to get the column names to be column names as well as split BT(HP of Voided Enemies) into its own column and lastly changing the data types to their appropriate types. I will also make variables for both games hardest arenas as well as separate out both the games and git rid of columns that are not needed.
Getting Rid of Unneeded Rows
#using base r because it is easier syntax than trying to do filter stuffBugs <- Bugs[-c(204, 323, 327, 331, 334), ]
Fixing Column Names.
##Use the janator library to move the rows up to column names.Bugs <- Bugs %>%row_to_names(row_number =2)##cleaning names for my own sainityBugs <-clean_names(Bugs)
Separating the Games
##use the fact skong enemies have hdjid and hollow knight enemies do not to make a game varibleBugs <- Bugs %>%mutate(from_SilkSong =!is.na(hj_id))##we dont need hunters jornal id anymore so its going to the byebye zoneBugs <- Bugs %>%select(!hj_id)
Separating HP and BT
##first we get rid of the weird numbers in hollowknigt enemiesBugs <- Bugs %>%mutate(hp_bt =if_else(from_SilkSong,hp_bt,str_extract(hp_bt, "^[0-9]+")))##now on to what we set out to doBugs <- Bugs %>%separate(hp_bt,into =c("HP", "BT"),sep ="/",fill ="right",convert =TRUE)
This was easier than I expected
Misc Things in HP
It turns out one enemy known as the Garpid has 999999 (1) as their HP. When I looked into it i found out that there is 1 HP per Garpid but doing so does not kill the swarm. So i am going to change that value to 1.
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `HP = as.numeric(HP)`.
Caused by warning:
! NAs introduced by coercion
Make the Arena Varibles
We are making a True/False variable for if an enemy is in the high halls gauntlet. We are also making one for tier 3 of the coliseum of fools the hardest arena in Hollow Knight.
Bugs %>%ggplot(aes(x = HP)) +geom_histogram() +ylim(0, 50) +xlim(0,200) +labs(title ="Distribution of Enemy Hit Points in Hollow Knight and Silksong")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 12 rows containing non-finite outside the scale range
(`stat_bin()`).
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_bar()`).
Bugs %>%filter(from_SilkSong ==TRUE) %>%ggplot(aes(x = HP)) +geom_histogram(fill ="darkred") +xlim(0, 200) +ylim(0,50) +labs(title ="Distribution of Enemy Hit Points in Silksong")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 5 rows containing non-finite outside the scale range
(`stat_bin()`).
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_bar()`).
Bugs %>%filter(from_SilkSong ==FALSE) %>%ggplot(aes(x = HP)) +geom_histogram(fill ="darkblue") +xlim(0, 200) +ylim(0,50) +labs(title ="Distribution of Enemy Hit Points in Hollow Knight")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 7 rows containing non-finite outside the scale range
(`stat_bin()`).
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_bar()`).
Bugs %>%ggplot(aes(x = HP, fill = from_SilkSong)) +geom_histogram(alpha =0.6, position ="identity", binwidth =10) +scale_fill_manual(values =c("TRUE"="darkred", "FALSE"="darkblue"),labels =c("Hollow Knight", "Silksong")) +labs(title ="Enemy Health Distributions Hollow Knight Vs Silksong",x ="Hit Points (HP)",y ="Count",fill ="Game") +xlim(0, 200) +ylim(0,50) +theme_minimal()
Warning: Removed 12 rows containing non-finite outside the scale range
(`stat_bin()`).
Warning: Removed 4 rows containing missing values or values outside the scale range
(`geom_bar()`).
It looks like form these grpahs hollow knight has a fairly simmilar distribution but it has more enemies in the high health range and is overall skewed towards lower health enemies.
Comparing Gauntlets
Bugs %>%filter(from_SilkSong ==TRUE) %>%ggplot(aes(x = HP, fill = HighHallsG)) +geom_histogram(alpha =0.6, position ="identity", binwidth =5) +scale_fill_manual(values =c("TRUE"="gold", "FALSE"="darkred"),labels =c("Not in High Halls Gauntlet", "High Halls Gauntlet")) +labs(title ="Enemy Health Distributions Hollow Knight Vs Silksong",x ="Hit Points (HP)",y ="Count",fill ="Game") +xlim(0, 200) +ylim(0,25) +theme_minimal()
Warning: Removed 5 rows containing non-finite outside the scale range
(`stat_bin()`).
Warning: Removed 4 rows containing missing values or values outside the scale range
(`geom_bar()`).
We can see that he high halls enemies are mostly around the spike that is near 50 HP meaning that they are tankier but no the games most tanky enemies.
Bugs %>%filter(from_SilkSong ==FALSE) %>%ggplot(aes(x = HP, fill = Trial_of_fool)) +geom_histogram(alpha =0.6, position ="identity", binwidth =5) +scale_fill_manual(values =c("TRUE"="#C1D366", "FALSE"="darkblue"),labels =c("Not in Trial of the Fool", "Trial of the Fool")) +labs(title ="Enemy Health Distributions Hollow Knight Vs Silksong",x ="Hit Points (HP)",y ="Count",fill ="Game") +xlim(0, 200) +ylim(0,25) +theme_minimal()
Warning: Removed 7 rows containing non-finite outside the scale range
(`stat_bin()`).
Warning: Removed 4 rows containing missing values or values outside the scale range
(`geom_bar()`).
We can see here that the trial of the fool is more spread out but has enemies around the 50 hp range as well as enemies beyond it.
Bugs %>%filter(Trial_of_fool ==TRUE| HighHallsG ==TRUE) %>%ggplot(aes(x = HP, fill = Trial_of_fool)) +geom_histogram(alpha =0.6, position ="identity", binwidth =5) +scale_fill_manual(values =c("TRUE"="forestgreen", "FALSE"="gold"),labels =c("High Halls Gauntlet", "Trial of the Fool")) +labs(title ="Enemy Health Distributions Hollow Knight Vs Silksong",x ="Hit Points (HP)",y ="Count",fill ="Game") +xlim(0, 100) +ylim(0,10) +theme_minimal()
Warning: Removed 1 row containing non-finite outside the scale range
(`stat_bin()`).
Warning: Removed 4 rows containing missing values or values outside the scale range
(`geom_bar()`).
It seems that there is more spread in trial of the fool as well as groups of stronger enemies when compared to high halls gauntlet showing that trial of the fool may be more difficult.