Analyzing the Winner Stats of RuPaul’s Drag Race Contestants

In this presentation I will be using data collected from the TV show RuPaul’s Drag Race.

The data includes information on contestants (such as their respective season, age, and placement) as well as information on each episode (such as the winner(s) of mini challenges, main challenges, and lip syncs).

Correlation Between Number of Episodes Won and Winning the Season

\[ r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^2 \sum (y_i - \bar{y})^2}} \] Where…

  • i = contestant
  • \(x_i\) = number of episode wins for contestant i
  • \(y_i\) = whether contestant i won the season (1=TRUE and 0=FALSE)
  • \(\bar{x}\) = average number of episode wins across all contestants
  • \(\bar{y}\) = average number of winners per season (1 per season)

The value of \(r\) gets closer to 1 as a contestant wins more episodes, indicating that contestants who win more episodes are more likely to win the season

Probability of Winning a Season Based on Episode Wins

Individual Winner Influence on the Overall Regression Model

\[ D_i = \frac{r_i^2}{p \cdot \hat{\sigma}^2} \cdot \frac{h_{ii}}{(1 - h_{ii})^2} \]

Where…

  • \(r_i\) = residual
  • \(h_{ii}\) = leverage
  • \(p\) = number of variables (in this case, 1)
  • \(\hat{\sigma}^2\) = residual variance

Individual Winner Influence on the Overall Regression Model

Code for the Previous Plot

model <- glm(won_season ~ ep_wins, data = queen_stats, family = "binomial")

influence <- data.frame(
  residuals = residuals(model, type = "pearson"),
  leverage = hatvalues(model),
  distance = cooks.distance(model),
  name = queen_stats$contestant)

ggplot(influence, aes(x=leverage, y=residuals)) +
  geom_point(aes(size=distance, color=distance)) +
  geom_text(aes(label=ifelse(distance > 0.1, name, "")), 
            hjust=-0.15, vjust=0.3, size=3) +
  scale_color_gradient(low="skyblue", high="red") +
  labs(title="Contestant Influence on Season Outcome",
       x="Leverage",
       y="Residuals",
       color="Influence",
       size="Influence") +
  theme_minimal()

Contestant Age, Episode Wins, and Season Placement

Code for the Previous 3D Plot

queen_stats <- queens %>%
  left_join(ep_wins, by = "contestant") %>%
  mutate(ep_wins = ifelse(is.na(ep_wins), 0, ep_wins))

plot_ly(queen_stats,
        x = ~age,
        y = ~ep_wins,
        z = ~placement,
        type = "scatter3d",
        mode = "markers",
        marker=list(size=4, color=~placement, colorscale='Viridis',
                    showscale=TRUE)) %>%
        layout(scene=list(
            xaxis=list(title="Age"),
            yaxis=list(title="Episode Wins"),
            zaxis=list(title="Season Placement")),
          title="Age vs Episode Wins vs Season Placement")