Analyzing Statistical Factors for Twitch Channel Growth in a Competitive Streaming Environment

Introduction

Streaming has been on the rise over the pandemic till now. The dominant platform Twitch, has many content creators on there for entertainment. With many content creators come competition among the them. Analyzing factors to understand what separates a streamer from the rest of the competition is our project to determine how to become a successful streamer.

Problem Statement

Our problem is to figure out why some streamers excel compared to the majority. What factor could have elevated their channel's growth to such extent?

Project Goal

The goal of this project is to compare the data from the top 1000 streamers in the past year of 2020 and utilize the variables to see if there are correlations between being a top performing streamer and those factors that they have.

Step 1: Data Visualization

Import Data / Data Summary

#Load CSV file from https://www.kaggle.com/datasets/aayushmishra1512/twitchdata/data
twitch_data <- read.csv(file.choose())
#Data Summary of the Twitch Data
head(twitch_data)
##     Channel Watch.time.Minutes. Stream.time.minutes. Peak.viewers
## 1     xQcOW          6196161750               215250       222720
## 2  summit1g          6091677300               211845       310998
## 3    Gaules          5644590915               515280       387315
## 4  ESL_CSGO          3970318140               517740       300575
## 5      Tfue          3671000070               123660       285644
## 6 Asmongold          3668799075                82260       263720
##   Average.viewers Followers Followers.gained Views.gained Partnered Mature
## 1           27716   3246298          1734810     93036735      True  False
## 2           25610   5310163          1370184     89705964      True  False
## 3           10976   1767635          1023779    102611607      True   True
## 4            7714   3944850           703986    106546942      True  False
## 5           29602   8938903          2068424     78998587      True  False
## 6           42414   1563438           554201     61715781      True  False
##     Language
## 1    English
## 2    English
## 3 Portuguese
## 4    English
## 5    English
## 6    English
summary(twitch_data)
##    Channel          Watch.time.Minutes. Stream.time.minutes.  Peak.viewers   
##  Length:1000        Min.   :1.222e+08   Min.   :  3465       Min.   :   496  
##  Class :character   1st Qu.:1.632e+08   1st Qu.: 73759       1st Qu.:  9114  
##  Mode  :character   Median :2.350e+08   Median :108240       Median : 16676  
##                     Mean   :4.184e+08   Mean   :120515       Mean   : 37065  
##                     3rd Qu.:4.337e+08   3rd Qu.:141844       3rd Qu.: 37570  
##                     Max.   :6.196e+09   Max.   :521445       Max.   :639375  
##  Average.viewers    Followers       Followers.gained   Views.gained      
##  Min.   :   235   Min.   :   3660   Min.   : -15772   Min.   :   175788  
##  1st Qu.:  1458   1st Qu.: 170546   1st Qu.:  43758   1st Qu.:  3880602  
##  Median :  2425   Median : 318063   Median :  98352   Median :  6456324  
##  Mean   :  4781   Mean   : 570054   Mean   : 205519   Mean   : 11668166  
##  3rd Qu.:  4786   3rd Qu.: 624332   3rd Qu.: 236131   3rd Qu.: 12196762  
##  Max.   :147643   Max.   :8938903   Max.   :3966525   Max.   :670137548  
##   Partnered            Mature            Language        
##  Length:1000        Length:1000        Length:1000       
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
## 

Pairs Plot of Twitch Data out of Millions

#Check the structure of the original twitch_data
str(twitch_data)
## 'data.frame':    1000 obs. of  11 variables:
##  $ Channel             : chr  "xQcOW" "summit1g" "Gaules" "ESL_CSGO" ...
##  $ Watch.time.Minutes. : num  6.20e+09 6.09e+09 5.64e+09 3.97e+09 3.67e+09 ...
##  $ Stream.time.minutes.: int  215250 211845 515280 517740 123660 82260 136275 147885 122490 92880 ...
##  $ Peak.viewers        : int  222720 310998 387315 300575 285644 263720 115633 68795 89387 125408 ...
##  $ Average.viewers     : int  27716 25610 10976 7714 29602 42414 24181 18985 22381 12377 ...
##  $ Followers           : int  3246298 5310163 1767635 3944850 8938903 1563438 4074287 508816 3530767 2607076 ...
##  $ Followers.gained    : int  1734810 1370184 1023779 703986 2068424 554201 1089824 425468 951730 1532689 ...
##  $ Views.gained        : int  93036735 89705964 102611607 106546942 78998587 61715781 46084211 670137548 51349926 36350662 ...
##  $ Partnered           : chr  "True" "True" "True" "True" ...
##  $ Mature              : chr  "False" "False" "True" "False" ...
##  $ Language            : chr  "English" "English" "Portuguese" "English" ...
#Create a new data frame with all necessary columns
twitch_summary <- data.frame(
  average_viewers = twitch_data$Average.viewers / 1e6,          # Convert to millions
  followers = twitch_data$Followers / 1e6,                      # Convert to millions
  followers_gained = twitch_data$Followers.gained / 1e6,        # Convert to millions
  peak_viewers = twitch_data$Peak.viewers / 1e6,                # Convert to millions
  views_gained = twitch_data$Views.gained / 1e6,                # Convert to millions
  stream_time = twitch_data$Stream.time.minutes. / 1e6,        # Convert to millions
  watch_time = twitch_data$Watch.time.Minutes. / 1e6           # Convert to millions
)

#Use the modified frame to show the correlation graph of Twitch Data
pairs(twitch_summary)

Correlation Chart of Twitch Data

#Load Libraries
library(ggplot2)  # For visualization
library(reshape2)  # For reshaping data

#Calculate the correlation matrix using only numeric columns
correlation_matrix <- cor(twitch_data[, c('Watch.time.Minutes.', 
                                          'Stream.time.minutes.', 
                                          'Followers', 
                                          'Peak.viewers', 
                                          'Average.viewers', 
                                          'Followers.gained', 
                                          'Views.gained')],
                          use = "complete.obs")

#Melt the correlation matrix for ggplot
correlation_melted <- melt(correlation_matrix)

#Define new variable names for the plot
variable_names <- c(
  'Watch.time.Minutes' = 'Watch Time (Minutes)',
  'Stream.time.minutes' = 'Stream Time (Minutes)',
  'Followers' = 'Followers',
  'Peak.viewers' = 'Peak Viewers',
  'Average.viewers' = 'Average Viewers',
  'Followers.gained' = 'Followers Gained',
  'Views.gained' = 'Views Gained')

#Replace the variable names in the melted data
correlation_melted$Var1 <- variable_names[correlation_melted$Var1]
correlation_melted$Var2 <- variable_names[correlation_melted$Var2]

#Create the heatmap using ggplot2
ggplot(data = correlation_melted, aes(x = Var1, y = Var2, fill = value)) +
  geom_tile() +
  geom_text(aes(label = round(value, 2)), color = "black") +
  scale_fill_gradient2(low = "purple", mid = "white", high = "red", midpoint = 0) +
  theme_minimal() +
  labs(title = "Correlation Chart for Twitch Data Set", 
       x = " ", 
       y = " ") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 10),
        axis.text.y = element_text(size = 10),
        plot.title = element_text(size = 20, hjust = 0.5))

Interpretation: The closer the plots are to red, the higher the correlation is. The closer the plots are to purple, the less the correlation is. White is moderate correlation.

Step 2: Regression Analysis / Result Interpretation

Dependent Variable: Followers Gained
Independent Variable: Stream Time in Minutes and Maturity
Equation for Followers Gained: ŷ = B0+B1x1+B2x2
ŷ = Followers Gained
x1 = Stream Time in Minutes
x2 = Maturity
model <- lm(Followers.gained ~ Stream.time.minutes. + Mature, data = twitch_data)
summary(model)
## 
## Call:
## lm(formula = Followers.gained ~ Stream.time.minutes. + Mature, 
##     data = twitch_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -273217 -156609  -92118   23846 3695919 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           2.955e+05  1.902e+04  15.540  < 2e-16 ***
## Stream.time.minutes. -6.143e-01  1.242e-01  -4.948 8.81e-07 ***
## MatureTrue           -6.948e+04  2.518e+04  -2.760  0.00589 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 334700 on 997 degrees of freedom
## Multiple R-squared:  0.03241,    Adjusted R-squared:  0.03047 
## F-statistic:  16.7 on 2 and 997 DF,  p-value: 7.373e-08
Assumptions:
Level of Significance (α) = 0.05
The relationship between the independent and dependant variables is linear
Residuals are normally distributed
No Multicollinearity

Result Interpretation: ŷ = 295500 - 0.06143(Stream Time) - 69480(Mature)
The Adjusted R-squared value: of 0.03047 → about 3% of the variation in the dependent variable ("Followers Gained", "Average Viewership") is explained by the independent variables included in our regression model. 
Model has limited predictive power; additional or alternative factors may better explain the variation in the dependent variable.

Predicted Followers Gained is 295,500 at the start of the channel.
Highest chances of gaining followers occurs before a channel begins creating content.

Stream time coefficient: - 0.06143
There is a negative correlation between gaining followers and the amount of time a streamer has been streaming. 

Maturity status coefficient: - 69,480
There is a negative correlation between the age rating of content and follower count
More inclusive content → higher chance channel will retain their follower growth.

Step 3: Conclusion and Recommendation

Followers Gained:
Our conclusion for followers gained is significantly influenced by Stream Time and Maturity Rating. Though some languages also slightly influence followers gained, it was not all significant enough to be considered in our model.

Business Recommendations:
- Stream inclusively to all audiences
- Advertise and Market your channel before you start streaming
- Focus on gaining followers at start of channel and maintaining your audience