1. Introduction

Twitch is one of video live streaming service that focused on video games and its related activities live streaming. However, since its inception, it has expanded to include streams dedicated to artwork creation, music, talk shows, and the occasional TV series. In 2015, Twitch had more than 100 million viewers per month which encourage us in diving deeper into its dataset which is derived from Kaggle in order to answer the trend-related and most viewed topics/games on the platform.

2. Preprocessing Data

So from our source, we have 2 datasets that can helped us in diving deeper which are Twitch global data and Twitch top 200 game ranking data that spans from 2016 January-2023 March.

## Rows: 87
## Columns: 9
## $ Year           <int> 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2…
## $ Month          <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6…
## $ Hours_watched  <dbl> 480241904, 441859897, 490669308, 377975447, 449836631, …
## $ Avg_viewers    <int> 646355, 635769, 660389, 525696, 605432, 620903, 600715,…
## $ Peak_viewers   <int> 1275257, 1308032, 1591551, 1775120, 1438962, 1755888, 1…
## $ Streams        <int> 7701675, 7038520, 7390957, 6869719, 7535519, 6663363, 7…
## $ Avg_channels   <int> 20076, 20427, 20271, 16791, 19394, 18818, 18030, 16592,…
## $ Games_streamed <int> 12149, 12134, 12234, 12282, 12424, 12374, 12961, 13693,…
## $ Viewer_ratio   <dbl> 29.08, 28.98, 28.92, 28.80, 28.85, 28.76, 28.62, 28.47,…

Twitch global data where there is one observation per month that contains the general statistics about viewership on twitch. The columns are:

## Rows: 17,400
## Columns: 12
## $ Rank             <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16…
## $ Game             <chr> "League of Legends", "Counter-Strike: Global Offensiv…
## $ Month            <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ Year             <int> 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016,…
## $ Hours_watched    <int> 94377226, 47832863, 45185893, 39936159, 16153057, 102…
## $ Hours_streamed   <int> 1362044, 830105, 433397, 235903, 1151578, 490002, 342…
## $ Peak_viewers     <int> 530270, 372654, 315083, 131357, 71639, 64432, 46130, …
## $ Peak_channels    <int> 2903, 2197, 1100, 517, 3620, 1538, 1180, 460, 148, 75…
## $ Streamers        <int> 129172, 120849, 44074, 36170, 214054, 88820, 33375, 2…
## $ Avg_viewers      <int> 127021, 64378, 60815, 53749, 21740, 13769, 11805, 106…
## $ Avg_channels     <int> 1833, 1117, 583, 317, 1549, 659, 461, 276, 71, 274, 9…
## $ Avg_viewer_ratio <dbl> 69.29, 57.62, 104.26, 169.29, 14.03, 20.88, 25.57, 38…

Twitch top 200 game data in which we find 200 observations per month representing the top games or categories on twitch for that month. The columns are:

Since the data are already fairly clean (no missing value and duplicate), we only made minor adjustments and subsetting into necessary plots for the analysis. here is the data preprocessing steps:

There are 3 main questions that we will try to answer:

  1. How is Twitch global trend viewership?
  2. What are its most viewed topics in top 200 games data?
  3. What are the relationship between variables that can affect the viewership positively?

3. Twitch Global Trend

We will examine the monthly and yearly trend of each variables from Twitch global dataset (with the exception of Viewer_ratio since it is derived from the same source as average viewers).

The process of drawing the monthly averages of the global data is by averaging each variables values monthly from 2016-2022 (2023 is excluded due to incomplete yearly data)in order to see monthly activities (January 2016-2022, February 2016-2022,… and so on.).The main takeaways:

# Aggregate data for yearly trend
yearly_agg <- gl %>%
  select(-Viewer_ratio) %>%
  filter(Year != 2023) %>%
  group_by(Year) %>%
  summarise(
    avg_hours_watched = mean(Hours_watched),
    avg_avg_viewers = mean(Avg_viewers),
    avg_peak_viewers = mean(Peak_viewers),
    avg_streams = mean(Streams),
    avg_avg_channels = mean(Avg_channels),
    avg_games_streamed = mean(Games_streamed)
    )

# Then, pivot the aggregated data to long format and proceed with plotting
yearly_long_agg <- yearly_agg %>%
  pivot_longer(cols = -Year, 
               names_to = "variable", 
               values_to = "value")

# Create the faceted plot with tooltip showing year and value with thousand separator
p_yearly <- ggplot(yearly_long_agg, aes(x = Year, y = value, group = variable, 
                                  text = paste("Year:", Year, "<br>", "Value:", scales::comma(value)))) + 
  geom_line() + 
  facet_wrap(~variable, scales = "free_y", ncol = 2) + 
  theme_minimal() +
  labs(title = "Yearly Trends of Twitch Global Statistics (2016-2022)",
       x = "Year",
       y = "Value")

# Convert to interactive plot
ggplotly(p_yearly, tooltip = "text", width = 900, height = 600)

Next, we will examine the yearly trend of the same variables as above:

4. Most Viewed Topic/Games

In the next questions, we will see the top 10 most viewed and streamed video games/activities overall while seeing the their yearly trend.

## `summarise()` has grouped output by 'Game'. You can override using the
## `.groups` argument.
## Warning in geom_bar(stat = "identity", position = "stack", aes(text =
## paste("Year:", : Ignoring unknown aesthetics: text

Overall, Just Chatting (a category for “others”) and League of Legends (multiplayer online battle arena game) are the most viewed contents on Twitch. The rest of the games have only around 50% viewership compared with the total viewership of any of these 2. Grand Theft Auto V (open world action adventure game) is mostly viewed on 2021 and Fortnite (Online battle royale and tower defense game) is mostly viewed on 2018. The top 10 viewed contents are dominated by online games and non-gaming conent in forms of Just Chatting.

## `summarise()` has grouped output by 'Game'. You can override using the
## `.groups` argument.
## Warning in geom_bar(stat = "identity", position = "stack", aes(text =
## paste("Year:", : Ignoring unknown aesthetics: text

While on the streamer side, the most streamed contents is Fortnite with the total streamed time of 356k hours and the rest of the top games is only accounted for 50% of the total streamed time of Fortnite. In Twitch, most streamers contents are dominated by multiplayer games (LOL and WOW) and shooter games (Fortnite, Apex, COD, Valorant and Counter Strike).

5. Correlation of Viewership Variables

Moving on to the final question, what are some variables that have indirect effect on the watching time and viewership? We will answer this by seeing the correlation between: - Hours_watched vs. Games_streamed, from global dataset - Hours_watched vs. Hours_streamed, from top 200 dataset - Avg_viewers vs. Peak_viewers, from top 200 dataset

The relationship between the number of unique games streamed and total hours watched is a positive realtionship which does make sense that for each more unique games, there will be more communities that bring more viewership and hours watched.

From the scatter plot, we can see the positive relationship where the more streamers stream, the higher possibilities where the number of hours watched increased.

Finally, we can see another positive relationship between the peak viewers and the average of viewers. With these 3 plots, if Twitch can somehow increased the streamer activities, their platform viewership will increase.

6. Conclusions

Overall, Twitch has been on increasing trends from 2016-2021 with slightly decreasing on 2022 according to their viewership and hours watched metrics. There are some notable times especially during holidays and special events especially in June and December where their activity number went down. Twitch most popular contents are mostly dominated by multiplayer games and just chatting activities from both streamer and viewers side. in order for Twitch to be able to increase their metric numbers, it should encourage more activities for both the streamers and viewers.