Dataset Overview and Source

Genshin Impact Character Popularity Dataset

This analysis examines 96 characters from Genshin Impact to identify what made a character more popular in 2024.

Data Source: Genshin Impact Character Popularity Poll (Data Booklet from Kaggle)

Key Variables:

  • Name: The character’s name
  • Votes: the Total popularity votes received
  • Element: Combat element (Anemo, Hydro, Cyro, Pyro, Dendro, Electro, Geo)
  • Region: Nation of Origin (Mondstat, Liyue, Inazuma, Fontaine, etc.)
  • Update.Playable: Game version when character was released
  • Weapon: Type of weapon used (Claymore, Sword, Catalyst, Bow)

R Code for Data Preparation

This is how I load and prepare the data:

# this loads the necessary libraries 
# (for data manipulation and visualization)
library(ggplot2); library(plotly); library(dplyr)

# this loads the data from the CSV file into data frame
df <- read.csv("Data Booklet.csv")

# this cleans the data by removing rows w/ missing votes 
# and converting the votes column to numeric values
df <- df %>% filter(!is.na(Votes))
df$Votes <- as.numeric(gsub(",", "", df$Votes))

# this converts the release version to 
# numeric for easier plotting and analysis
df$Element <- factor(df$Element)
df$Region <- factor(df$Region)

3D Plotly: Rank, Release, & Votes

3D Plot Analysis

Key Observations:

  • Elemental Spread: The distribution exhibits higher votes for characters that possess anemo and hydro combat elements.

  • Patch/Version Released trend: Characters released in versions 4.0 or above show more votes compared to characters released in previous/older patches.

  • Release pattern: New release characters have more engagement and hold more popularity

  • Combination of Factors: Shows that a characters recent release, high rank, and elemental power often align with greater popularity.

Plotly Scatter: Release date vs. Popularity Votes

ggplot Boxplot: Votes by Element

ggplot Bar Chart: Character Count by Gender and Region

Statistical Analysis: Summary Statistics

# calculates the count, mean, median, max, and standard deviation 
# of votes for each region 
# (arranges the results in descending order of mean votes)
df %>%
  group_by(Region) %>%
  summarise(
    Count = n(),
    Mean_Votes = round(mean(Votes), 0),
    Median_Votes = median(Votes),
    Max_Votes = max(Votes),
    SD_Votes = round(sd(Votes), 0)
  ) %>%
  arrange(desc(Mean_Votes))
## # A tibble: 8 × 6
##   Region      Count Mean_Votes Median_Votes Max_Votes SD_Votes
##   <fct>       <int>      <dbl>        <dbl>     <dbl>    <dbl>
## 1 Fontaine       13      28795       16615      72221    25288
## 2 Natlan          6      28686       27916.     45358    11420
## 3 Chenyu Vale     1      16748       16748      16748       NA
## 4 Inazuma        17      16466       11876      47237    14772
## 5 Liyue          20      13718        8366.     54274    13169
## 6 Sumeru         13      13546        9417      41548    11756
## 7 <NA>            4      12647       14790.     20037     8408
## 8 Mondstat       19       7270        5408      18051     4823

Summary Statistics: Interpretation

Detailed Findings:

  • Vote Skewness: For most of the regions, the mean votes were higher than the median votes which indicate that a few exceptionally popular characters heavily influence the average as outliers.

  • Regional Popularity: Fontaine characters received the highest average votes then followed by Liyue, which shows that these specific regions hold the community’s favorite characters. Smaller and very new regions like Chenyu Vale and Natlan have lower votes and fewer characters overall.

  • Max Votes: The regions that hold the highest max votes are Fontaine and Liyue which indicates that individual popular characters/fan favorites are from these regions.

  • Vote Variability: The standard deviation votes for regions like Liyue and Fontaine show variation in popularity in these nations with some characters receiving more votes than others in the same nation.

Statistical Analysis: ANOVA

# ANOVA: Does the elemental power of 
# a character significantly impact their popularity?
anova_result <- aov(Votes ~ Element, data = df)
summary(anova_result)
##             Df    Sum Sq   Mean Sq F value Pr(>F)
## Element      8 1.442e+09 180189156    0.73  0.665
## Residuals   84 2.073e+10 246827007

ANOVA: Interpretation

Comprehensive Analysis:

  • Research Question: The ANOVA test determines if there is a significant difference statistically in mean votes received by characters across the different element types.

  • P-value: This value demonstrates the probability of observing differences in mean votes between elements randomly and assumes no actual difference exists

  • Statistical Significance: If the p-value is less than 0.05 (conventional significance level), then it is concluded that there is a significant difference in popularity for at least one of the elemental groups in comparison to the others.

  • Findings: The ANOVA results for this data set shows whether a character’s elemental power impacts their overall popularity, or if character design is more of a critical factor.

Key Insights and Conclusions

Major Findings:

First: New characters spark more engagement. The 3D graph and scatter plots show that characters released in more recent game updates get more votes, hence, showing a recency effect in community popularity.

Second: Nation Affiliation Matters. The characters from Fontaine, despite being released in newer patches, show highest average popularity. In addition, Liyue contains some of the top-voted characters. This further shows that the nation origin of a character does impact popularity.

Third: Elemental Preference is Subtle. Although Hydro and Anemo appear frequently amongst some of the top-voted characters, the ANOVA statistical analysis sugests that the element of a character isnt the sole determinant of their popularity across the entire roster. The story role, individual character design, and gameplay mechanics play a critical role in the characters’ popularity.

Clinical Implications and Future Directions

Practical Applications

Character Development Strategy

Game developers should prioritize engaging and compelling storylines/narratives, unique gameplay mechanics, and appealing character design since they contribute to popularity within the community rather than solely a characters elemental power. The marketing on new characters and releasing more from popular fan favorite regions can help maximize engagement from the community.

Study Limitations:

This analysis is from a popularity poll from specifically 2024. We cannot definitively say that an element directly causes a characters popularity. In addition the participant demographics and voting rules of the poll this data was pulled from can influence the results.

Future Research Directions

  • Popularity Tracking: Analyze how a characters popularity evolves over a period of time
  • Demographic Analysis: Include the voter demographic to understand different player preferences.
  • Analyze other platforms: Possibly analyze social media accounts/comments or fan servers to further understand what makes certain characters more popular.
  • Popularity vs. Game Performance: Analyze whether meta relevance or the combat strength of a character correlates with popularity.
  • The flexibility of characters: Analyze the kits and the possible team comps the each character can be in. How flexible are they within different teams?

Thank You :)