#option: warning=FALSE
library(arules)

## Warning: package 'arules' was built under R version 4.5.2

## Loading required package: Matrix

## 
## Attaching package: 'arules'

## The following objects are masked from 'package:base':
## 
##     abbreviate, write

library(arulesViz)

Introduction

In the modern world, streaming platforms and streamers themselves become increasingly more popular. This also increases the impact they have and the profit they make for themselves and the streaming platforms such as Twitch.

In this project, we will try to analyze the Twitch user dataset to identify the trends that users follow when interacting with live streamers. By examining patterns of co-interaction between users and streamers, the project aims to uncover underlying behavioral structures within the Twitch ecosystem. In particular, we investigate whether users tend to follow groups of streamers with shared characteristics, such as content type, language, or community overlap.

To achieve this, association rule mining is applied to model user behavior in a transaction-based manner. The extracted association rules are then evaluated to determine their potential usefulness in building recommendation systems for streaming platforms. Such systems could assist platforms like Twitch in suggesting relevant streamers to users based on observed interaction patterns, thereby improving user engagement and content discoverability, which is crucial for the business.

There are multiple questions that the project aims to answer:

Which streamers are most frequently co-watched by the same users?
Are there identifiable clusters of streamers that share a common audience?
Do the extracted rules suggest potential streamer recommendation strategies for users?

Dataset description

This is a dataset of users consuming streaming content on Twitch. Authors retrieved all streamers, and all users connected in their respective chats, every 10 minutes during 43 days. Link: https://cseweb.ucsd.edu/~jmcauley/datasets.html?utm_source=chatgpt.com#twitch

Metadata

Start and stop times are provided as integers and represent periods of 10 minutes. Stream ID could be used to retrieve a single broadcast segment from a streamer (not used in our work).

User ID (anonymized)
Stream ID
Streamer username
Time start
Time stop

Example data

User ID	Stream ID	Streamer username	Time start	Time stop
1	34347669376	grimnax	5415	5419
1	34391109664	jtgtv	5869	5870
1	34395247264	towshun	5898	5899
1	34405646144	mithrain	6024	6025
2	33848559952	chfhdtpgus1	206	207
2	33881429664	sal_gu	519	524
2	33921292016	chfhdtpgus1	922	924

Data preparation

data <- read.csv("100k_a.csv")
head(data, 20)

##    X1 X33842865744        mithrain X154 X156
## 1   1  33846768288           alptv  166  169
## 2   1  33886469056        mithrain  587  588
## 3   1  33887624992            wtcn  589  591
## 4   1  33890145056       jrokezftw  591  594
## 5   1  33903958784     berkriptepe  734  737
## 6   1  33929318864 kendinemuzisyen 1021 1036
## 7   1  33942837056            wtcn 1165 1167
## 8   1  33955351648 kendinemuzisyen 1295 1297
## 9   1  34060922080        mithrain 2458 2459
## 10  1  34062621584         unlostv 2454 2456
## 11  1  34077379792        mithrain 2601 2603
## 12  1  34078096176            zeon 2603 2604
## 13  1  34079135968         elraenn 2600 2601
## 14  1  34082259232            zeon 2604 2605
## 15  1  34157036272        mithrain 3459 3460
## 16  1  34169481232 kendinemuzisyen 3600 3601
## 17  1  34185325968         unlostv 3739 3743
## 18  1  34188146896            wtcn 3755 3757
## 19  1  34188931888         jahrein 3757 3760
## 20  1  34195515568        mithrain 3874 3875

After reading the dataset and displaying first rows, we see that there are no proper column names, which has to be fixed.

We are looking for the user ID and the streamer nickname, so we are only interested in 1st and 3rd columns. Also, the rows correspond to the times that the user typed a message in the chat, so one streamer can appear multiple times for the same user. Therefore, we have to select only unique entries.

data <- data[,c(1,3)]
colnames(data) <- c("user_id", "streamer_nickname")
data <- unique(data)

head(data)

##   user_id streamer_nickname
## 1       1             alptv
## 2       1          mithrain
## 3       1              wtcn
## 4       1         jrokezftw
## 5       1       berkriptepe
## 6       1   kendinemuzisyen

Note: After selecting only unique entries, we got rid of almost half of the redundant rows.

Now, we wish to remove streamers with low support (<1%) and leave only those transactions that contain more than 1 streamer. As a result, we remove 45% of transactions. This does not change the direction of the analysis, because the data was collected for only 43 days and did not reflect the whole picture of Twitch, but rather a time period. Therefore, I decided to focus on relatively popular items.

trans <- as(split(data$streamer_nickname, data$user_id), "transactions")
trans <- trans[, itemFrequency(trans) > 0.01]
trans <- trans[size(trans) > 1]

summary(trans)

## transactions as itemMatrix in sparse format with
##  55535 rows (elements/itemsets/transactions) and
##  137 columns (items) and a density of 0.04048542 
## 
## most frequent items:
##     ninja      tfue    shroud riotgames nickmercs   (Other) 
##     16213     13896     10289      6975      5894    254758 
## 
## element (itemset/transaction) length distribution:
## sizes
##     2     3     4     5     6     7     8     9    10    11    12    13    14 
## 12878 10104  7567  5556  4090  3120  2414  1902  1513  1284  1011   789   639 
##    15    16    17    18    19    20    21    22    23    24    25    26    27 
##   507   417   342   268   222   200   156   119   104    77    53    40    37 
##    28    29    30    31    32    33    34    35    36    37    38    39    40 
##    37    23    16    12     7     7     7     5     3     3     2     1     2 
##    42 
##     1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.000   3.000   4.000   5.547   7.000  42.000 
## 
## includes extended item information - examples:
##             labels
## 1167         72hrs
## 1443     a_seagull
## 2559 admiralbahroo
## 
## includes extended transaction information - examples:
##   transactionID
## 1             1
## 2             2
## 4             4

From the summary, we can already note that the data reflects the reality, since the most popular creators are Ninja, Tfue, and Shroud. These are game streamers, so I would expect an association rule between them. Ninja and Tfue mainly play Fortnite, so they are more connected, while Shroud is general video-game streamer. We also removed transactions with 1 streamer. Mean is 5.5, which indicates that the users are moderately active. There is also a person who interacted with 42 different streamers in 43 days, which is very impressive. Around 4% density indicates that out of (55535 x 137 = 7608295) possible unique user-streamer pairs, there are around 300000 such pairs.

Displaying example transactions to confirm that the structure is proper.

inspect(trans[1:3])

##     items              transactionID
## [1] {esl_csgo,                      
##      kendinemuzisyen,               
##      mithrain,                      
##      wtcn}                         1
## [2] {hanryang1125,                  
##      lol_ambition}                 2
## [3] {kendinemuzisyen,               
##      mithrain,                      
##      mrsavage,                      
##      ninja,                         
##      solaryfortnite,                
##      tfue,                          
##      timthetatman,                  
##      wtcn}                         4

Exploratory Data Analysis

itemFrequencyPlot(
  trans,
  topN = 20,
  type = "absolute",
  main = "Top 20 Most Watched Streamers"
)

Association rules

In this stage of the project, association rule mining is applied to discover recurring patterns in user interaction behavior on the Twitch platform. Association rules describe relationships between sets of items that frequently occur together within a collection of transactions. In the context of this project, each transaction represents a single user, while the items correspond to the streamers that the user has interacted with. The objective is to identify combinations of streamers that tend to share the same audience and to evaluate the strength of these relationships.

To assess the relevance and reliability of the extracted rules, three standard measures are used:

Support reflects how often a particular combination of streamers appears across all users in the dataset, indicating the overall prevalence of a pattern.
Confidence measures the likelihood that a user interacting with a given set of streamers will also interact with another specific streamer, thus capturing the predictive strength of the rule.
Lift compares the observed confidence of a rule to the expected probability of the consequent occurring independently, allowing us to determine whether the association represents a meaningful relationship or merely a coincidental overlap. Lift values greater than one indicate a positive association between streamers, while values close to one suggest independence.

Apriori Algorithm

To efficiently discover meaningful associations, the Apriori algorithm is used in this project. Apriori is well suited for this type of analysis because it gradually narrows down the number of possible item combinations by removing those that do not meet a minimum support requirement. The key idea behind the algorithm is that if a group of streamers is rarely watched together, then any larger group containing those streamers is also unlikely to be common.

Now, we run the algorithm to obtain trends by looking at the rules that appear.

rules <- apriori(
  trans,
  parameter = list(
    supp = 0.04,
    conf = 0.4
  )
)

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.4    0.1    1 none FALSE            TRUE       5    0.04      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 2221 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[137 item(s), 55535 transaction(s)] done [0.01s].
## sorting and recoding items ... [41 item(s)] done [0.00s].
## creating transaction tree ... done [0.01s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [53 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

Analysis of rules by key metrics

rules_support <- sort(rules, by = "support", decreasing = TRUE)
inspect(rules_support[1:10])

##      lhs               rhs     support    confidence coverage   lift     count
## [1]  {tfue}         => {ninja} 0.16441883 0.6570956  0.25022058 2.250774 9131 
## [2]  {ninja}        => {tfue}  0.16441883 0.5631900  0.29194202 2.250774 9131 
## [3]  {nickmercs}    => {ninja} 0.08214639 0.7740075  0.10613127 2.651237 4562 
## [4]  {nickmercs}    => {tfue}  0.08124606 0.7655243  0.10613127 3.059398 4512 
## [5]  {dakotaz}      => {ninja} 0.07685244 0.7348485  0.10458270 2.517104 4268 
## [6]  {fortnite}     => {tfue}  0.07415144 0.7333927  0.10110741 2.930985 4118 
## [7]  {symfuhny}     => {tfue}  0.07325110 0.7812560  0.09376069 3.122269 4068 
## [8]  {fortnite}     => {ninja} 0.07206266 0.7127337  0.10110741 2.441354 4002 
## [9]  {timthetatman} => {ninja} 0.07080220 0.7395148  0.09574142 2.533088 3932 
## [10] {symfuhny}     => {ninja} 0.07056811 0.7526407  0.09376069 2.578048 3919

rules_conf <- sort(rules, by = "confidence", decreasing = TRUE)
inspect(rules_conf[1:10])

##      lhs                     rhs     support    confidence coverage   lift    
## [1]  {drlupo, tfue}       => {ninja} 0.04013685 0.8937450  0.04490862 3.061378
## [2]  {tfue, timthetatman} => {ninja} 0.05016656 0.8847253  0.05670298 3.030483
## [3]  {aydan, ninja}       => {tfue}  0.04211758 0.8634182  0.04878005 3.450628
## [4]  {chap}               => {tfue}  0.04582696 0.8416005  0.05445215 3.363434
## [5]  {cloakzy}            => {tfue}  0.04256775 0.8350406  0.05097686 3.337218
## [6]  {cloakzy}            => {ninja} 0.04256775 0.8350406  0.05097686 2.860296
## [7]  {ninja, symfuhny}    => {tfue}  0.05866571 0.8313345  0.07056811 3.322407
## [8]  {nickmercs, tfue}    => {ninja} 0.06716485 0.8266844  0.08124606 2.831673
## [9]  {drlupo}             => {ninja} 0.05587467 0.8206824  0.06808319 2.811114
## [10] {tfue, tsm_myth}     => {ninja} 0.04208157 0.8200000  0.05131899 2.808777
##      count
## [1]  2229 
## [2]  2786 
## [3]  2339 
## [4]  2545 
## [5]  2364 
## [6]  2364 
## [7]  3258 
## [8]  3730 
## [9]  3103 
## [10] 2337

rules_lift <- sort(rules, by = "lift", decreasing = TRUE)
inspect(rules_lift[1:10])

##      lhs                  rhs          support    confidence coverage  
## [1]  {asmongold}       => {sodapoppin} 0.05371387 0.6642173  0.08086792
## [2]  {sodapoppin}      => {asmongold}  0.05371387 0.5127191  0.10476276
## [3]  {symfuhny}        => {nickmercs}  0.04179346 0.4457461  0.09376069
## [4]  {ninja, tfue}     => {nickmercs}  0.06716485 0.4084985  0.16441883
## [5]  {fortnite}        => {nickmercs}  0.04116323 0.4071238  0.10110741
## [6]  {aydan, ninja}    => {tfue}       0.04211758 0.8634182  0.04878005
## [7]  {chap}            => {tfue}       0.04582696 0.8416005  0.05445215
## [8]  {cloakzy}         => {tfue}       0.04256775 0.8350406  0.05097686
## [9]  {ninja, symfuhny} => {tfue}       0.05866571 0.8313345  0.07056811
## [10] {drdisrespect}    => {shroud}     0.04150536 0.6145028  0.06754299
##      lift     count
## [1]  6.340204 2983 
## [2]  6.340204 2983 
## [3]  4.199951 2321 
## [4]  3.848993 3730 
## [5]  3.836040 2286 
## [6]  3.450628 2339 
## [7]  3.363434 2545 
## [8]  3.337218 2364 
## [9]  3.322407 3258 
## [10] 3.316786 2305

Instead of going through all generated association rules one by one, it is much more informative to analyze them using the key measures of support, confidence, and lift. Each of these metrics highlights a different aspect of the relationships between streamers and helps us better understand user viewing behavior on Twitch.

Rules with the highest support represent streamer combinations that appear most frequently among users. A high support value means that a given pair or group of streamers is commonly watched together by a large portion of the audience. In this dataset, such rules mainly reflect mainstream viewing patterns and popular streamer combinations that could be used for platform-wide recommendations. It is easy to notice that the streamers Tfue and Ninja dominate this group and appear in symmetric rules. This is not surprising, as both streamers focus on similar gaming content and have attracted overlapping communities, largely due to their popularity in Fortnite. Based on these rules, we can infer that users who watch NickMercs, Dakotaz, or the official Fortnite channel are also very likely to watch streams by Tfue and Ninja.

The confidence measure shows how reliably the presence of one streamer (or a group of streamers) predicts the presence of another. High-confidence rules indicate that when users watch the antecedent streamer(s), they are very likely to also watch the consequent streamer. In this analysis, the highest confidence values are usually associated with rules that include multiple streamers on the left-hand side, suggesting more specific and focused viewing behavior. Although these rules often have lower support and therefore apply to a smaller group of users, they are particularly valuable for personalized recommendation systems, where accuracy is more important than reaching a broad audience. A good example of such a rule is that users who watch both DrLupo and Tfue are also very likely to watch Ninja. Similarly, users who watch Ninja together with Aydan tend to also watch Tfue.

The lift metric helps assess whether a relationship between streamers is stronger than what would be expected by chance. Lift values greater than one indicate a positive association, meaning that the streamers are watched together more often than if user choices were independent. In this dataset, the highest lift values are often observed for rules involving less popular streamers that form more niche communities. Even though these rules may have relatively low support, their high lift values suggest strong and meaningful relationships. From a recommendation perspective, such rules are especially interesting, as they can reveal hidden connections between streamers and support content discovery within specific user segments. One example of this type of relationship is the pair Asmongold and Sodapoppin, which did not stand out in the previous analyses based on support or confidence alone.

plot(rules,
     measure = c("support", "confidence"),
     shading = "lift",
     main = "Scatter Plot for Twitch rules")

## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

From the above plot we can note that there are 2 rules that are particularly high in both confidence and support, which are the rule nickmercs => ninja and nickmercs => tfue, which indicates the most important finding, as the confidence level is over 0.8 and the support is around 0.06. We can see two rules with very high lift value relative to others. This is the symmetric sodapoppin and asmongold rule.

rules_top <- head(sort(rules, by = "lift"), 20)

plot(
  rules_top,
  method = "graph",
  engine = "igraph",
  control = list(
    type = "items", 
    edge.arrow.size = 0.5,
    node.label.cex = 0.8
  ),
  shading = "lift"
)

## Warning: Unknown control parameters: type, edge.arrow.size, node.label.cex

## Available control parameters (with default values):
## main  =  Graph for 20 rules
## max   =  100
## nodeCol   =  c("#EE0000FF", "#EE0303FF", "#EE0606FF", "#EE0909FF", "#EE0C0CFF", "#EE0F0FFF", "#EE1212FF", "#EE1515FF", "#EE1818FF", "#EE1B1BFF", "#EE1E1EFF", "#EE2222FF", "#EE2525FF", "#EE2828FF", "#EE2B2BFF", "#EE2E2EFF", "#EE3131FF", "#EE3434FF", "#EE3737FF", "#EE3A3AFF", "#EE3D3DFF", "#EE4040FF", "#EE4444FF", "#EE4747FF", "#EE4A4AFF", "#EE4D4DFF", "#EE5050FF", "#EE5353FF", "#EE5656FF", "#EE5959FF", "#EE5C5CFF", "#EE5F5FFF", "#EE6262FF", "#EE6666FF", "#EE6969FF", "#EE6C6CFF", "#EE6F6FFF", "#EE7272FF", "#EE7575FF",  "#EE7878FF", "#EE7B7BFF", "#EE7E7EFF", "#EE8181FF", "#EE8484FF", "#EE8888FF", "#EE8B8BFF", "#EE8E8EFF", "#EE9191FF", "#EE9494FF", "#EE9797FF", "#EE9999FF", "#EE9B9BFF", "#EE9D9DFF", "#EE9F9FFF", "#EEA0A0FF", "#EEA2A2FF", "#EEA4A4FF", "#EEA5A5FF", "#EEA7A7FF", "#EEA9A9FF", "#EEABABFF", "#EEACACFF", "#EEAEAEFF", "#EEB0B0FF", "#EEB1B1FF", "#EEB3B3FF", "#EEB5B5FF", "#EEB7B7FF", "#EEB8B8FF", "#EEBABAFF", "#EEBCBCFF", "#EEBDBDFF", "#EEBFBFFF", "#EEC1C1FF", "#EEC3C3FF", "#EEC4C4FF", "#EEC6C6FF", "#EEC8C8FF",  "#EEC9C9FF", "#EECBCBFF", "#EECDCDFF", "#EECFCFFF", "#EED0D0FF", "#EED2D2FF", "#EED4D4FF", "#EED5D5FF", "#EED7D7FF", "#EED9D9FF", "#EEDBDBFF", "#EEDCDCFF", "#EEDEDEFF", "#EEE0E0FF", "#EEE1E1FF", "#EEE3E3FF", "#EEE5E5FF", "#EEE7E7FF", "#EEE8E8FF", "#EEEAEAFF", "#EEECECFF", "#EEEEEEFF")
## itemnodeCol   =  #66CC66FF
## edgeCol   =  #ABABABFF
## labelCol  =  #000000B3
## measureLabels     =  FALSE
## precision     =  3
## arrowSize     =  0.5
## alpha     =  0.5
## cex   =  1
## layout    =  NULL
## layoutParams  =  list()
## engine    =  igraph
## plot  =  TRUE
## plot_options  =  list()
## verbose   =  FALSE

From the above graph, we can even notice clusters of the streamers. the big cluster in the middle are the streamers that focus on shooter games such as Fortnite, Call of Duty Warzone, and etc. The cluster on the bottom-right contains streamers that come from older generation and focus on variety of games. On the top-right, we have the previously mentioned pair Asmongold and sodapoppin, who create similar content, and therefore are watched together.

rules_focus <- subset(rules, rhs %in% "tfue" | rhs %in% "ninja")
plot(rules_focus, method = "grouped")

The grouped matrix plot highlights rules where Tfue and Ninja appear as consequents. The size of each circle represents how frequently a combination of antecedent streamers occurs (support), while the color indicates the strength of the association relative to chance (lift). This visualization makes it easy to see which streamer combinations are most strongly associated with these popular streamers, and can help identify patterns for targeted recommendations. We can note from the matrix that rules with Tfue as a consequent are especially strong in terms of lift.

ECLAT Algorithm

I will quickly run ECLAT algorithm to see whether it will result in different rules or give us more insights into the data.

itemsets <- eclat(
  trans, 
  parameter = list(
    supp = 0.04
  )
)

## Eclat
## 
## parameter specification:
##  tidLists support minlen maxlen            target  ext
##     FALSE    0.04      1     10 frequent itemsets TRUE
## 
## algorithmic control:
##  sparse sort verbose
##       7   -2    TRUE
## 
## Absolute minimum support count: 2221 
## 
## create itemset ... 
## set transactions ...[137 item(s), 55535 transaction(s)] done [0.01s].
## sorting and recoding items ... [41 item(s)] done [0.00s].
## creating sparse bit matrix ... [41 row(s), 55535 column(s)] done [0.00s].
## writing  ... [85 set(s)] done [0.02s].
## Creating S4 object  ... done [0.00s].

rules <- ruleInduction(itemsets, trans, confidence = 0.4)
inspect(sort(rules, by = "support")[1:10])

##      lhs               rhs     support    confidence lift     itemset
## [1]  {tfue}         => {ninja} 0.16441883 0.6570956  2.250774 44     
## [2]  {ninja}        => {tfue}  0.16441883 0.5631900  2.250774 44     
## [3]  {nickmercs}    => {ninja} 0.08214639 0.7740075  2.651237 40     
## [4]  {nickmercs}    => {tfue}  0.08124606 0.7655243  3.059398 41     
## [5]  {dakotaz}      => {ninja} 0.07685244 0.7348485  2.517104 37     
## [6]  {fortnite}     => {tfue}  0.07415144 0.7333927  2.930985 27     
## [7]  {symfuhny}     => {tfue}  0.07325110 0.7812560  3.122269 34     
## [8]  {fortnite}     => {ninja} 0.07206266 0.7127337  2.441354 26     
## [9]  {timthetatman} => {ninja} 0.07080220 0.7395148  2.533088 30     
## [10] {symfuhny}     => {ninja} 0.07056811 0.7526407  2.578048 33

inspect(sort(rules, by = "confidence")[1:10])

##      lhs                     rhs     support    confidence lift     itemset
## [1]  {drlupo, tfue}       => {ninja} 0.04013685 0.8937450  3.061378 17     
## [2]  {tfue, timthetatman} => {ninja} 0.05016656 0.8847253  3.030483 29     
## [3]  {aydan, ninja}       => {tfue}  0.04211758 0.8634182  3.450628 14     
## [4]  {chap}               => {tfue}  0.04582696 0.8416005  3.363434 11     
## [5]  {cloakzy}            => {ninja} 0.04256775 0.8350406  2.860296  3     
## [6]  {cloakzy}            => {tfue}  0.04256775 0.8350406  3.337218  4     
## [7]  {ninja, symfuhny}    => {tfue}  0.05866571 0.8313345  3.322407 32     
## [8]  {nickmercs, tfue}    => {ninja} 0.06716485 0.8266844  2.831673 39     
## [9]  {drlupo}             => {ninja} 0.05587467 0.8206824  2.811114 18     
## [10] {tfue, tsm_myth}     => {ninja} 0.04208157 0.8200000  2.808777 20

inspect(sort(rules, by = "lift")[1:10])

##      lhs                  rhs          support    confidence lift     itemset
## [1]  {sodapoppin}      => {asmongold}  0.05371387 0.5127191  6.340204  7     
## [2]  {asmongold}       => {sodapoppin} 0.05371387 0.6642173  6.340204  7     
## [3]  {symfuhny}        => {nickmercs}  0.04179346 0.4457461  4.199951 35     
## [4]  {ninja, tfue}     => {nickmercs}  0.06716485 0.4084985  3.848993 39     
## [5]  {fortnite}        => {nickmercs}  0.04116323 0.4071238  3.836040 28     
## [6]  {aydan, ninja}    => {tfue}       0.04211758 0.8634182  3.450628 14     
## [7]  {chap}            => {tfue}       0.04582696 0.8416005  3.363434 11     
## [8]  {cloakzy}         => {tfue}       0.04256775 0.8350406  3.337218  4     
## [9]  {ninja, symfuhny} => {tfue}       0.05866571 0.8313345  3.322407 32     
## [10] {drdisrespect}    => {shroud}     0.04150536 0.6145028  3.316786  2

After viewing the rules, we can note that there is practically little to no difference between ECLAT and Apriori algorithms for this project.

Conclusion

This analysis provides clear insights into user behavior on Twitch by addressing the main research questions:

Which streamers are most frequently co-watched by the same users?
Both Apriori and Eclat algorithms revealed that popular streamers like Tfue and Ninja are frequently watched together. Niche pairs, such as Asmongold and Sodapoppin, were also identified, highlighting smaller communities with shared audiences.
Are there identifiable clusters of streamers that share a common audience?
Visualizations of the rules and frequent itemsets revealed distinct clusters of streamers. For example, a large cluster centers around shooter game streamers, while other clusters represent variety content or niche communities, demonstrating patterns of audience overlap.
Do the extracted rules suggest potential streamer recommendation strategies for users?
The rules with high confidence and lift indicate strong associations that could inform recommendation strategies. Users who watch certain streamers are likely to watch specific others, suggesting opportunities to improve personalized content suggestions based on observed co-interaction patterns.

Overall, both Apriori and Eclat produced consistent results, confirming the reliability of the findings. This analysis shows that association rule mining can effectively discover viewing trends and provide insights for recommendation systems on streaming platforms.

Twitch Users’ Preferences Analysis

Osman Aliyev

2026-01-16