1 Introduction

1.1 The problem

In this analysis we will use associations rules to assess which NBA players account for the best offences, defences and overall net rating of a team. We will focus on analyzing different lineups for each NBA team. As lineup we understand the five players who appear on the floor together at once. Using association rules and three indicators of strength of offence and defence we will be able to find the players who impact their teams positively and negatively no matter what lineup they play in (no matter what teammates play alongside them). To achieve this first we have to calculate three indicators:

Is a lineup net positive? \(\Rightarrow\) is the lineups net rating positive or negative. Most commonly the net rating is the difference of points gained and lost per 100 possessions. To simplify this study we will just focus on if a lineup is positive or negative at all (not per 100 possessions), as we only need information if they are positive or negative, no relative ones.
Offensive rating \(\Rightarrow\) We will split lineups into “Great”, “Okay” and “Bad” offences based on points scored per 100 possessions. We will split the data into three quantiles and check which quantile each lineup is placed.
Defensive rating \(\Rightarrow\) We will split lineups into “Great”, “Okay” and “Bad” defences based on points lost per 100 possessions. We will split the data into three quantiles and check which quantile each lineup is placed.

We will look into the second half of the 2021/22 season (only games played in 2022), as this is the newest data available on the kaggle website: https://www.kaggle.com/datasets/xocelyk/nba-pbp.

1.2 The dataset

We start with play by play data. Meaning a unit of observation at the beginning will be one play, during one possession (multiple observations per possession), for example a personal foul or a three point attempt.

data_hoop <- read.csv("nba_plays_2022_now.csv")
# kable(head(data))
head(data_hoop)

##   PlayNum       GameID       Date Period Possession    Time AwayName AwayScore
## 1       0 202202150MIA 02/15/2022      1          0 12:00.0      DAL         0
## 2       1 202202150MIA 02/15/2022      1          1 11:43.0      DAL         0
## 3       2 202202150MIA 02/15/2022      1          2 11:18.0      DAL         2
## 4       3 202202150MIA 02/15/2022      1          3 10:56.0      DAL         2
## 5       4 202202150MIA 02/15/2022      1          4 10:38.0      DAL         2
## 6       5 202202150MIA 02/15/2022      1          5 10:34.0      DAL         2
##                                                          AwayEvent HomeName
## 1 Jump ball: D. Powell vs. B. Adebayo (P. Tucker gains possession)      MIA
## 2                                                                       MIA
## 3   D. Powell makes 2-pt jump shot from 19 ft (assist by J. Green)      MIA
## 4                                                                       MIA
## 5                           J. Brunson misses 2-pt layup from 6 ft      MIA
## 6                                                                       MIA
##   HomeScore                                                        HomeEvent
## 1         0 Jump ball: D. Powell vs. B. Adebayo (P. Tucker gains possession)
## 2         2  J. Butler makes 2-pt jump shot from 8 ft (assist by B. Adebayo)
## 3         2                                                                 
## 4         4    B. Adebayo makes 2-pt layup from 1 ft (assist by D. Robinson)
## 5         4                                                                 
## 6         4                                    Defensive rebound by K. Lowry
##   AwayIn AwayOut HomeIn HomeOut              ActivePlayers        A1        A2
## 1                                                       [] greenjo02 finnedo01
## 2                               ['butleji01', 'adebaba01'] greenjo02 finnedo01
## 3                               ['poweldw01', 'greenjo02'] greenjo02 finnedo01
## 4                               ['adebaba01', 'robindu01'] greenjo02 finnedo01
## 5                                            ['brunsja01'] greenjo02 finnedo01
## 6                                            ['lowryky01'] greenjo02 finnedo01
##          A3        A4        A5        H1        H2        H3        H4
## 1 brunsja01 doncilu01 poweldw01 robindu01 tuckepj01 adebaba01 lowryky01
## 2 brunsja01 doncilu01 poweldw01 robindu01 tuckepj01 adebaba01 lowryky01
## 3 brunsja01 doncilu01 poweldw01 robindu01 tuckepj01 adebaba01 lowryky01
## 4 brunsja01 doncilu01 poweldw01 robindu01 tuckepj01 adebaba01 lowryky01
## 5 brunsja01 doncilu01 poweldw01 robindu01 tuckepj01 adebaba01 lowryky01
## 6 brunsja01 doncilu01 poweldw01 robindu01 tuckepj01 adebaba01 lowryky01
##          H5 Date_Object
## 1 butleji01  2022-02-15
## 2 butleji01  2022-02-15
## 3 butleji01  2022-02-15
## 4 butleji01  2022-02-15
## 5 butleji01  2022-02-15
## 6 butleji01  2022-02-15

1.3 Preprocessing

For the purpose of this analysis we need to calculate points scored on each possession for offence and defence:

# leave only necessary columns
nba_filtered <- data_hoop[, c(1:5, 8, 11, c(18:28))]

# calculating points gained 
nba_stats <- nba_filtered %>%
  arrange(GameID, PlayNum) %>%
  group_by(GameID) %>%
  mutate(
    HomePoints = HomeScore - lag(HomeScore, default = 0),
    AwayPoints = AwayScore - lag(AwayScore, default = 0)
  ) %>%
  ungroup()

The next step is to calculate points scored and lost for each home and away lineup:

# create columns for away and home lineups - sorting to ensure order doesn't duplicate lineups
nba_stats$HomeLineup <- apply(nba_stats[, c("H1", "H2", "H3", "H4", "H5")], 1, function(x) paste(sort(x), collapse = ", "))
nba_stats$AwayLineup <- apply(nba_stats[, c("A1", "A2", "A3", "A4", "A5")], 1, function(x) paste(sort(x), collapse = ", "))

# for each lineup we calculate how many points they scored each possession and lost each possession
home_games <- nba_stats %>%
  select(lineup = HomeLineup, points_for = HomePoints, points_against = AwayPoints, game_id = GameID, possession = Possession)
away_games <- nba_stats %>%
  select(lineup = AwayLineup, points_for = AwayPoints, points_against = HomePoints, game_id = GameID, possession = Possession)
all_games <- bind_rows(home_games, away_games)

Now we need to aggregate the data, as right now lineups repeat with each play they have play together:

# now we group by lineup (don't need repeating lineups), and sum up points lost and gained the difference will be the lineups +/-
# points scored and lost relative to total possessions (we count per 100 possessions) will be the offensive and defensive ratings then categorize offences and defences 
lineup_summary <- all_games %>%
  group_by(lineup) %>%
  summarise(
    total_points_for = sum(points_for, na.rm = TRUE),
    total_points_against = sum(points_against, na.rm = TRUE),
    total_possessions = n_distinct(paste(game_id, possession)) 
  ) %>%
  filter(total_possessions > 50) %>% # get rid of very rare lineups which could have unbalanced statistics
  mutate(
    plus_minus = total_points_for - total_points_against,
    is_net_positive = ifelse(plus_minus > 0, "Net Positive", "Net Negative"),
    off_rating = (total_points_for / total_possessions) * 100,
    def_rating = (total_points_against / total_possessions) * 100,
  )

# we need relative values to split offences and defences into bad, okay and great
off_quants <- quantile(lineup_summary$off_rating, probs = c(0.33, 0.66))
def_quants <- quantile(lineup_summary$def_rating, probs = c(0.33, 0.66))

final_dataset <- lineup_summary %>%
  mutate(
    offence_level = case_when(
      off_rating >= off_quants[2] ~ "Great O",
      off_rating <= off_quants[1] ~ "Bad O",
      TRUE ~ "Okay O"
    ),
    defence_level = case_when(
      def_rating <= def_quants[1] ~ "Great D",
      def_rating >= def_quants[2] ~ "Bad D",
      TRUE ~ "Okay D"
    )
  )

For cleaner results minimum of 50 possession was chosen for a lineup to appear in the dataset. Finally we split the lineup column into different variables for each of the players and reassign only the columns with nominal values for further analysis with the association rules:

final_dataset <- final_dataset %>%
  separate(lineup,
           into = c("player1", "player2", "player3", "player4", "player5"),
           sep = ",\\s*")

final_dataset <- final_dataset[c("player1", "player2", "player3", "player4", "player5", "is_net_positive", "offence_level", "defence_level")]
final_dataset_net <- final_dataset[c("player1", "player2", "player3", "player4", "player5", "is_net_positive")]
final_dataset_offence <- final_dataset[c("player1", "player2", "player3", "player4", "player5", "offence_level")]
final_dataset_defence <- final_dataset[c("player1", "player2", "player3", "player4", "player5", "defence_level")]
final_dataset_no_labels <- final_dataset[c("player1", "player2", "player3", "player4", "player5")]
# write.csv(final_dataset_net, file="lineups_data_net.csv", row.names = FALSE)
# write.csv(final_dataset_offence, file="lineups_data_offence.csv", row.names = FALSE)
# write.csv(final_dataset_defence, file="lineups_data_defence.csv", row.names = FALSE)
# write.csv(final_dataset, file="lineups_data.csv", row.names = FALSE)
# write.csv(final_dataset_no_labels, file="lineups_data_no_labels.csv", row.names = FALSE)

2 Analysis with association rules

2.1 First look

After saving the newly created data to a CSV file we can now load it back, but not as data.frame, but as transaction:

trans1<-read.transactions("lineups_data.csv", format="basket", sep=",", skip=1)
trans1_nolab<-read.transactions("lineups_data_no_labels.csv", format="basket", sep=",", skip=1)
trans1_net <- read.transactions("lineups_data_net.csv", format="basket", sep=",", skip=1)
trans1_offence <- read.transactions("lineups_data_offence.csv", format="basket", sep=",", skip=1)
trans1_defence <- read.transactions("lineups_data_defence.csv", format="basket", sep=",", skip=1)

inspect(head(trans1))

##     items          
## [1] {achiupr01,    
##      anunoog01,    
##      bantoda01,    
##      barnesc01,    
##      Net Positive, 
##      Okay D,       
##      Okay O,       
##      siakapa01}    
## [2] {achiupr01,    
##      anunoog01,    
##      barnesc01,    
##      bouchch01,    
##      Great D,      
##      Net Positive, 
##      Okay O,       
##      siakapa01}    
## [3] {achiupr01,    
##      anunoog01,    
##      barnesc01,    
##      bouchch01,    
##      Great D,      
##      Great O,      
##      Net Positive, 
##      vanvlfr01}    
## [4] {achiupr01,    
##      anunoog01,    
##      Bad D,        
##      Bad O,        
##      barnesc01,    
##      Net Negative, 
##      siakapa01,    
##      trentga02}    
## [5] {achiupr01,    
##      anunoog01,    
##      barnesc01,    
##      Great D,      
##      Net Positive, 
##      Okay O,       
##      siakapa01,    
##      vanvlfr01}    
## [6] {achiupr01,    
##      anunoog01,    
##      Bad O,        
##      bouchch01,    
##      champju01,    
##      Net Negative, 
##      Okay D,       
##      vanvlfr01}

length(trans1)

## [1] 1461

We can also see, who (and what) appears in the most different lineups:

round(head(sort(itemFrequency(trans1), decreasing = TRUE), 20),3)

## Net Positive Net Negative        Bad D      Great O        Bad O      Great D 
##        0.516        0.484        0.341        0.340        0.330        0.330 
##       Okay O       Okay D    poolejo01    wiggian01    halibty01    siakapa01 
##        0.330        0.329        0.031        0.031        0.031        0.031 
##    hieldbu01    derozde01    herroty01    maxeyty01    harrito02    tatumja01 
##        0.029        0.029        0.028        0.028        0.027        0.027 
##    brownja02    whiteco01 
##        0.026        0.025

round(head(sort(itemFrequency(trans1, type="absolute"), decreasing = TRUE), 20),3)

## Net Positive Net Negative        Bad D      Great O        Bad O      Great D 
##          754          707          498          497          482          482 
##       Okay O       Okay D    poolejo01    wiggian01    halibty01    siakapa01 
##          482          481           46           46           45           45 
##    hieldbu01    derozde01    herroty01    maxeyty01    harrito02    tatumja01 
##           43           42           41           41           39           39 
##    brownja02    whiteco01 
##           38           37

Firstly, we can see, that in our dataset offence dominates over defence, as around 51.5% of lineups are net positive. Additionally bad defence and great offence also appear slightly more often than 1/3. As for players, the top stop is a four way tie: Jordan Poole and Andrew Wiggins both from the Golden State Warriors (who would end up winning the NBA championship this year), Tyrese Haliburton from the Indiana Pacers and Pascal Siakam from the Toronto Raptors (who’s currently playing in Indiana alongside Haliburton!). All four of them were and still are crucial players to their respective teams. We can also add now that they were the most versatile players when it comes to their role on the floor, as they appear in the most different lineups. Now let us see which two players were most commonly on the floor together, but first let us check how many unique players appeared on the floor during that time (in more than 50 possessions).

Unique players:

nitems(trans1_nolab)

## [1] 474

Since we have almost 480 players, we cannot use a cross table, as it would be too big to display. We will just list the most common duos:

pairs_eclat <- eclat(trans1_nolab, 
                     parameter = list(supp = 0.001, minlen = 2, maxlen = 2))

inspect(head(sort(pairs_eclat, by = "count"), 10))

##      items                  support    count
## [1]  {halibty01, hieldbu01} 0.02464066 36   
## [2]  {poolejo01, wiggian01} 0.02258727 33   
## [3]  {harrito02, maxeyty01} 0.02053388 30   
## [4]  {siakapa01, trentga02} 0.01916496 28   
## [5]  {barnesc01, siakapa01} 0.01916496 28   
## [6]  {embiijo01, maxeyty01} 0.01916496 28   
## [7]  {derozde01, whiteco01} 0.01916496 28   
## [8]  {curryst01, wiggian01} 0.01848049 27   
## [9]  {lavinza01, vucevni01} 0.01711157 25   
## [10] {looneke01, wiggian01} 0.01711157 25

Of course, the most common pairs might not exactly be those, who spent most time or possessions playing together, but those who were surrounded by the most different teammates while playing together. None the less, we now have knowledge that it is the stars (Tyrese Haliburton, Tyrese Maxey, Pascal Siakam) alongside a good role player (Buddy Hield, Tobias Harris, Garry Trent Jr) who most commonly appear together in different lineups. We can also see support for each of our pairs, it is the number of times the pair appears in different baskets (lineups) relative to the number of transactions (the whole dataset). The most common pair appears in almost 2.5% of lineups in our data.

2.2 Basic measures

We have already covered support, now it is time to discuss confidence, expected confidence and lift:

\[\text{Confidence} = \frac{\text{Number of transactions with both } A \text{ and } B}{\text{Total number of transactions with } A} = \frac{P(A \cap B)}{P(A)}\]

\[\text{ExpectedConfidence} = \frac{\text{Number of transactions with } B}{\text{Total number of transactions}} = P(B)\]

\[\text{Lift} = \frac{\text{Confidence}}{\text{Expected Confidence}} = \frac{P(A \cap B)}{P(A) \cdot P(B)}\]

In order to see an example of this, we will first need to create rules for our lineups. To achieve this we need to calculate sets of players who commonly appear on the floor. There are two algorithms that we can use. eclat() which does not create rules - it digs through frequent sets to limit the data set. As a result, we obtain frequent sets and measure values determined for them (e.g. support). And apriori() creates frequent itemsets and based on these created itemsets it creates rules. Let us see what the result will be for creating the rules for our lineups, while restricting the results to only two players:

sets1 <- eclat(trans1_nolab, parameter = list(support = 0.001, minlen = 2, maxlen = 2))
rules1 <- ruleInduction(sets1, trans1_nolab)
ins1 <- inspect(sort(rules1, by = "confidence", decreasing = TRUE))

head(ins1, 10)

##              lhs            rhs     support confidence      lift itemset
## [1]  {tuckera01} => {nworajo01} 0.001368925          1 162.33333       1
## [2]  {tuckera01} => {antetth01} 0.001368925          1 292.20000       2
## [3]   {hoodro01} => {coffeam01} 0.001368925          1  48.70000       3
## [4]   {hoodro01} => {harteis01} 0.001368925          1  73.05000       4
## [5]   {hoodro01} => {bostobr01} 0.001368925          1 132.81818       5
## [6]   {dukeda01} => {thomaca02} 0.001368925          1 182.62500       6
## [7]   {dukeda01} => {sharpda01} 0.001368925          1 208.71429       7
## [8]  {dowtije01} => {wagnemo01} 0.001368925          1 121.75000       8
## [9]  {fraziti01} => {harriga01} 0.001368925          1  58.44000       9
## [10] {fraziti01} => {wagnefr01} 0.001368925          1  69.57143      10

Two very important conclusions here. Firstly, we that most of rules with high confidence (A -> B) consist of bench or role players, who sometimes appear only in company of other role players or stars, which is consistent with intuition. Star players spend almost the whole game playing with different lineups, where as role/bench players appear only in some specific possessions, for example complementing stars with their specific abilities, or during so called “garbage time” when the games score is already decided. Of course those must be teams that win or lose by big margins often, as we are only left with lineups that have played at least 50 possessions together. In those cases they appear alongside each other. Second very important conclusion, is that lift will not be useful to us in this analysis. As we have a huge dataset of all lineups for all teams, so we would be comparing possessions spend together by players with the data from all the NBA games, where most of them do not include that team. For us lift serves more as indicator of if the two players are on the same roster.

2.3 Closed and maximal lineups

In this section we will look at closed and maximal sets. Closed set are those that have no supersets with higher support. Meaning these will be groups of players, that have played with at least two other different groups of players. Maximal set will be those that do not contain a frequent superset. For us it will simply of course be all the 5 man lineups, but we will find a fix for this to actually learn something new.

2.3.1 Closed lineups

First let us have a look at closed sets:

closed_sets1 <- apriori(trans1_nolab, parameter = list(target = "closed frequent itemsets", 
                                                    supp = 0.0001,
                                                    minlen = 2))
closed_ins1 <- inspect(head(sort(closed_sets1, by = "support"), 10))

closed_sets_3man <- subset(closed_sets1, size(closed_sets1) == 3)
closed_3man_ins <- inspect(head(sort(closed_sets_3man, by = "support"), 10))

head(closed_ins1, 10)

##                       items    support count
## [1]  {halibty01, hieldbu01} 0.02464066    36
## [2]  {poolejo01, wiggian01} 0.02258727    33
## [3]  {harrito02, maxeyty01} 0.02053388    30
## [4]  {siakapa01, trentga02} 0.01916496    28
## [5]  {barnesc01, siakapa01} 0.01916496    28
## [6]  {embiijo01, maxeyty01} 0.01916496    28
## [7]  {derozde01, whiteco01} 0.01916496    28
## [8]  {curryst01, wiggian01} 0.01848049    27
## [9]  {lavinza01, vucevni01} 0.01711157    25
## [10] {looneke01, wiggian01} 0.01711157    25

head(closed_3man_ins, 10)

##                                  items     support count
## [1]  {brissos01, halibty01, hieldbu01} 0.013004791    19
## [2]  {embiijo01, harrito02, maxeyty01} 0.013004791    19
## [3]  {derozde01, lavinza01, vucevni01} 0.010951403    16
## [4]  {curryst01, looneke01, wiggian01} 0.010951403    16
## [5]  {looneke01, poolejo01, wiggian01} 0.010951403    16
## [6]  {barnesc01, siakapa01, trentga02} 0.010951403    16
## [7]  {derozde01, dosunay01, vucevni01} 0.010951403    16
## [8]  {greendr01, poolejo01, wiggian01} 0.010266940    15
## [9]  {embiijo01, maxeyty01, niangge01} 0.010266940    15
## [10] {holidjr01, middlkh01, portibo01} 0.009582478    14

For two and three player groups they are all key starting (Haliburton, Poole, Wiggins, etc.) or rotation players (Trent Jr., Hield) that appear with at least two different teammates or teammate groups. Basically the algorithm removed redundant subsets—pairs or trios that never play apart, most likely “garbage time” groups who always play together once the score is decided and play alongside each other until the game ends. That’s why crucial groups (duos or trios), who stay on the floor with different teammates still appear. When it comes to four man groups, those will be the groups that are still crucial, but also flexible enough, that they can still play together well with only one player changing:

closed_sets_4man <- subset(closed_sets1, size(closed_sets1) == 4)
closed_4man_ins <- inspect(sort(closed_sets_4man, by = "support"))

head(closed_4man_ins, 10)

##                                             items     support count
## [1]  {brissos01, halibty01, hieldbu01, smithja04} 0.004791239     7
## [2]  {derozde01, dosunay01, lavinza01, vucevni01} 0.004791239     7
## [3]  {derozde01, lavinza01, vucevni01, whiteco01} 0.004791239     7
## [4]  {curryst01, looneke01, thompkl01, wiggian01} 0.004791239     7
## [5]  {curryst01, looneke01, poolejo01, wiggian01} 0.004791239     7
## [6]  {achiupr01, barnesc01, siakapa01, trentga02} 0.004791239     7
## [7]  {derozde01, dosunay01, vucevni01, whiteco01} 0.004791239     7
## [8]  {barnesc01, birchkh01, siakapa01, trentga02} 0.004106776     6
## [9]  {bridgmi02, martico01, plumlma01, roziete01} 0.004106776     6
## [10] {bartowi01, gordoaa01, jokicni01, morrimo01} 0.004106776     6

2.3.2 Maximal lineups

As said before, for this analysis we need to exclude all the 5 man lineups, as all of them appear only once and would become our maximal rules. If we exclude them, we can run the analysis on remaining rules and see what the most persistent players groups are:

supp_threshold <- 2 / length(trans1_nolab)

max_sets1 <- apriori(trans1_nolab, 
                    parameter = list(target = "maximally frequent itemsets", 
                                     supp = supp_threshold, 
                                     minlen = 2))

max_sets1_ins1 <- inspect(sort(max_sets1, by = "count"))

max_sets_3man <- subset(max_sets1, size(max_sets1) == 3)
max_3man_ins <- inspect(sort(max_sets_3man, by = "support"))

max_sets_2man <- subset(max_sets1, size(max_sets1) == 2)
max_2man_ins <- inspect(sort(max_sets_2man, by = "support"))

head(max_sets1_ins1, 10)

##                                             items     support count
## [1]  {brissos01, halibty01, hieldbu01, smithja04} 0.004791239     7
## [2]  {derozde01, dosunay01, lavinza01, vucevni01} 0.004791239     7
## [3]  {derozde01, lavinza01, vucevni01, whiteco01} 0.004791239     7
## [4]  {curryst01, looneke01, thompkl01, wiggian01} 0.004791239     7
## [5]  {curryst01, looneke01, poolejo01, wiggian01} 0.004791239     7
## [6]  {achiupr01, barnesc01, siakapa01, trentga02} 0.004791239     7
## [7]  {derozde01, dosunay01, vucevni01, whiteco01} 0.004791239     7
## [8]  {bridgmi02, martico01, plumlma01, roziete01} 0.004106776     6
## [9]  {barnesc01, birchkh01, siakapa01, trentga02} 0.004106776     6
## [10] {bartowi01, gordoaa01, jokicni01, morrimo01} 0.004106776     6

When it comes to 4 man groups, we see consistent four man units being used with at least two different accompanying players. Every player in the group must be important enough to complement at least two different teammates. Of course, at the end of the table, we will most likely find groups who appeared together twice, which might be a coincidence due to the low count. But for higher counts, we can safely say: This group is important to its team, they play together with many rotating pieces.

head(max_3man_ins, 10)

##                                  items     support count
## [1]  {aldrila01, brownbr01, millspa02} 0.002053388     3
## [2]  {antetth01, ibakase01, nworajo01} 0.002053388     3
## [3]  {ellebcj01, smithde03, watfotr01} 0.002053388     3
## [4]  {looneke01, moodymo01, poolejo01} 0.002053388     3
## [5]  {clarkjo01, forretr01, pascher01} 0.002053388     3
## [6]   {pokusal01, robyis01, wiggiaa01} 0.002053388     3
## [7]  {bambamo01, hamptrj01, wagnemo01} 0.002053388     3
## [8]  {hamptrj01, okekech01, wagnemo01} 0.002053388     3
## [9]  {grahade01, marshna01, mccolcj01} 0.002053388     3
## [10] {martica02, oladivi01, robindu01} 0.002053388     3

head(max_2man_ins, 10)

##                       items     support count
## [1]  {dowtije01, wagnemo01} 0.001368925     2
## [2]   {hillma01, thomama02} 0.001368925     2
## [3]  {edwarca01, mcgruro01} 0.001368925     2
## [4]    {leesa01, pickeja01} 0.001368925     2
## [5]  {bouknja01, oubreke01} 0.001368925     2
## [6]  {brownch02, korkmfu01} 0.001368925     2
## [7]  {gabriwe01, westbru01} 0.001368925     2
## [8]  {martico01, thomais02} 0.001368925     2
## [9]  {butleja02, whiteha01} 0.001368925     2
## [10]  {niangge01, reedpa01} 0.001368925     2

In case of 2 and 3 man maximal groups, those will just be (or at least contain one) bench players. Scenario in which a pair appears together twice or more, but never with the same teammate, most likely means that there is one player that is getting subbed in essentially at random, playing not cause they have specific skills, but more likely cause there is space for them to enter the game at that moment.

2.4 On court presence

In this section we will finally focus on the indicators of how players influence their team’s performance. By manually locking in the right hand side of our rules (for example choosing to see what makes an offence “Great”) we will be able to analyse the left hand side (the players), to see who influences their team to be great!

2.4.1 Net rating

First lets get rid of extremely rare players:

trans1_netf <- trans1_net[, itemFrequency(trans1_net) > 0.005]

Now we can see which players positively impact their team no matter who they are surrounded by:

rules_positive1 <- apriori(trans1_netf, 
                         parameter = list(supp = 0.001, conf = 0.8),
                         appearance = list(default = "lhs", rhs = "Net Positive"),
                         control = list(verbose = F))

rules_positive1_sort <- sort(rules_positive1, by = "count", decreasing = TRUE)
inspect(head(rules_positive1_sort, 20))

##      lhs                                  rhs            support     confidence
## [1]  {antetgi01}                       => {Net Positive} 0.016427105 0.8275862 
## [2]  {brownja02, tatumja01}            => {Net Positive} 0.013004791 0.8260870 
## [3]  {antetgi01, holidjr01}            => {Net Positive} 0.011635866 0.8095238 
## [4]  {tatumja01, williro04}            => {Net Positive} 0.010951403 0.8888889 
## [5]  {horfoal01, tatumja01}            => {Net Positive} 0.010951403 0.8421053 
## [6]  {antetgi01, portibo01}            => {Net Positive} 0.010951403 0.9411765 
## [7]  {tatumja01, whitede01}            => {Net Positive} 0.010266940 0.8333333 
## [8]  {paytoga02, wiggian01}            => {Net Positive} 0.009582478 0.8235294 
## [9]  {antetgi01, middlkh01}            => {Net Positive} 0.008898015 1.0000000 
## [10] {butleji01, strusma01}            => {Net Positive} 0.008898015 0.8125000 
## [11] {bridgmi01, paulch01}             => {Net Positive} 0.008213552 0.8000000 
## [12] {pritcpa01, tatumja01}            => {Net Positive} 0.008213552 0.8000000 
## [13] {adebaba01, vincega01}            => {Net Positive} 0.008213552 0.8000000 
## [14] {porteot01, wiggian01}            => {Net Positive} 0.007529090 0.9166667 
## [15] {brownja02, pritcpa01}            => {Net Positive} 0.007529090 0.9166667 
## [16] {brownja02, tatumja01, williro04} => {Net Positive} 0.007529090 0.9166667 
## [17] {horfoal01, pritcpa01}            => {Net Positive} 0.006844627 1.0000000 
## [18] {horfoal01, whitede01}            => {Net Positive} 0.006844627 0.8333333 
## [19] {paytoga02, poolejo01, wiggian01} => {Net Positive} 0.006844627 0.9090909 
## [20] {brownja02, horfoal01, tatumja01} => {Net Positive} 0.006844627 0.8333333 
##      coverage    lift     count
## [1]  0.019849418 1.603585 24   
## [2]  0.015742642 1.600680 19   
## [3]  0.014373717 1.568587 17   
## [4]  0.012320329 1.722370 16   
## [5]  0.013004791 1.631719 16   
## [6]  0.011635866 1.823685 16   
## [7]  0.012320329 1.614721 15   
## [8]  0.011635866 1.595725 14   
## [9]  0.008898015 1.937666 13   
## [10] 0.010951403 1.574353 13   
## [11] 0.010266940 1.550133 12   
## [12] 0.010266940 1.550133 12   
## [13] 0.010266940 1.550133 12   
## [14] 0.008213552 1.776194 11   
## [15] 0.008213552 1.776194 11   
## [16] 0.008213552 1.776194 11   
## [17] 0.006844627 1.937666 10   
## [18] 0.008213552 1.614721 10   
## [19] 0.007529090 1.761514 10   
## [20] 0.008213552 1.614721 10

one_man <- subset(rules_positive1_sort, size(rules_positive1_sort) == 2)
inspect(sort(one_man, by = "count"))

##     lhs            rhs            support     confidence coverage    lift    
## [1] {antetgi01} => {Net Positive} 0.016427105 0.8275862  0.019849418 1.603585
## [2] {yurtsom01} => {Net Positive} 0.006160164 0.9000000  0.006844627 1.743899
## [3] {nancela02} => {Net Positive} 0.006160164 0.8181818  0.007529090 1.585363
##     count
## [1] 24   
## [2]  9   
## [3]  9

Although he doesn’t manage to win with every lineup, Giannis Antetokounmpo emerges as the most positively impactful player. Appearing in 24 different lineups he manages to stay positive in 83% of them. Many other groups good enough and frequent enough to get through to this part of analysis also include Giannis. Other than him we can see many lineups from the Boston Celtics (Brown, Tatum), Miami Heat (Butler, Adebayo) and Golden State Warriors (Wiggins). One thing, we can call a surprise is the lack of Stephen Curry, despite the appearance of GSW. Arguably one of the best players of this generation, yet it is Wiggins who emerges as the most positively impactful players for the Warriors. Additionally only other singular players, who manage to positively impact the game with more than 80% of lineups they appear in are Ömer Yurtseven and Larry Nance Jr., who turn out to be our secret heroes, players who have limited roles on the floor, but managed to impact the game in a big way despite that.

2.4.2 Best offences

Let us repeat the process from the last point, this time focusing strictly on offensive rating:

trans1_offence_f <- trans1_offence[, itemFrequency(trans1_offence) > 0.005]

rules_goff1 <- apriori(trans1_offence_f, 
                         parameter = list(supp = 0.001, conf = 0.8),
                         appearance = list(default = "lhs", rhs = "Great O"),
                         control = list(verbose = F))

rules_off1_sort <- sort(rules_goff1, by = "count", decreasing = TRUE)
inspect(head(rules_off1_sort, 20))

##      lhs             rhs           support confidence    coverage     lift count
## [1]  {yurtsom01}  => {Great O} 0.006160164  0.9000000 0.006844627 2.645674     9
## [2]  {johnsst04,                                                                
##       reaveau01}  => {Great O} 0.005475702  0.8000000 0.006844627 2.351710     8
## [3]  {martica02,                                                                
##       yurtsom01}  => {Great O} 0.004106776  0.8571429 0.004791239 2.519690     6
## [4]  {hortota01,                                                                
##       johnsst04,                                                                
##       reaveau01}  => {Great O} 0.004106776  1.0000000 0.004106776 2.939638     6
## [5]  {johnske04,                                                                
##       mcderdo01}  => {Great O} 0.003422313  0.8333333 0.004106776 2.449698     5
## [6]  {forretr01,                                                                
##       mitchdo01}  => {Great O} 0.003422313  1.0000000 0.003422313 2.939638     5
## [7]  {mccolcj01,                                                                
##       murphtr02}  => {Great O} 0.003422313  1.0000000 0.003422313 2.939638     5
## [8]  {duranke01,                                                                
##       edwarke02,                                                                
##       irvinky01}  => {Great O} 0.003422313  0.8333333 0.004106776 2.449698     5
## [9]  {adebaba01,                                                                
##       herroty01,                                                                
##       lowryky01,                                                                
##       strusma01}  => {Great O} 0.003422313  0.8333333 0.004106776 2.449698     5
## [10] {mcgruro01,                                                                
##       olynyke01}  => {Great O} 0.002737851  0.8000000 0.003422313 2.351710     4
## [11] {robindu01,                                                                
##       yurtsom01}  => {Great O} 0.002737851  0.8000000 0.003422313 2.351710     4
## [12] {herroty01,                                                                
##       yurtsom01}  => {Great O} 0.002737851  1.0000000 0.002737851 2.939638     4
## [13] {greenje02,                                                                
##       riverau01}  => {Great O} 0.002737851  0.8000000 0.003422313 2.351710     4
## [14] {forretr01,                                                                
##       onealro01}  => {Great O} 0.002737851  0.8000000 0.003422313 2.351710     4
## [15] {ingrabr01,                                                                
##       murphtr02}  => {Great O} 0.002737851  0.8000000 0.003422313 2.351710     4
## [16] {joneshe01,                                                                
##       murphtr02}  => {Great O} 0.002737851  1.0000000 0.002737851 2.939638     4
## [17] {dragigo01,                                                                
##       drumman01}  => {Great O} 0.002737851  0.8000000 0.003422313 2.351710     4
## [18] {martike04,                                                                
##       woodch01}   => {Great O} 0.002737851  0.8000000 0.003422313 2.351710     4
## [19] {covinro01,                                                                
##       jacksre01}  => {Great O} 0.002737851  0.8000000 0.003422313 2.351710     4
## [20] {pritcpa01,                                                                
##       smartma01}  => {Great O} 0.002737851  1.0000000 0.002737851 2.939638     4

one_man_o <- subset(rules_off1_sort, size(rules_off1_sort) == 2)
inspect(sort(one_man_o, by = "count"))

##     lhs            rhs       support     confidence coverage    lift     count
## [1] {yurtsom01} => {Great O} 0.006160164 0.9        0.006844627 2.645674 9

When it comes to offences once again it is Yurtseven who shines. Despite his small role he is able to positively impact offence, to be the only player, whose lineups are “Great” scorers at least 80% of the time. Even though, he was just a backup playing in the place of injured players, he managed to positively impact the Miami Heat every time he stepped on the floor. When it comes to other players, we can see that there have been little changes, Milwaukee (with Giannis) and Boston do not dominate the list so heavily. In their place we see the New Orleans Pelicans (McCollum, Murphy III), Utah Jazz (Mitchell - a famously great scorer), Brooklyn Nets (Durant, Irving, Drummond) and once again the Miami Heat (Yurtseven, Herro, Adebayo).

We can also visualise those relations:

plot(head(rules_off1_sort, 10), method="graph", engine="htmlwidget")

We can see here different rules combining players of different teams. What is interesting, that Yurtseven appears on the opposing side of the graph as the rest of best offensive players for Miami, which would be consistent with him playing more alongside the second unit not the starters.

2.4.3 Best defences

And for the last time, we are going to repeat the same process, this time for defensive performance:

trans1_defence_f <- trans1_defence[, itemFrequency(trans1_defence) > 0.005]

rules_gdef1 <- apriori(trans1_defence_f, 
                         parameter = list(supp = 0.001, conf = 0.8),
                         appearance = list(default = "lhs", rhs = "Great D"),
                         control = list(verbose = F))

rules_def1_sort <- sort(rules_gdef1, by = "count", decreasing = TRUE)
inspect(head(rules_def1_sort, 20))

##      lhs             rhs           support confidence    coverage     lift count
## [1]  {paytoga02,                                                                
##       poolejo01,                                                                
##       wiggian01}  => {Great D} 0.006160164  0.8181818 0.007529090 2.480008     9
## [2]  {poolejo01,                                                                
##       porteot01,                                                                
##       wiggian01}  => {Great D} 0.004791239  0.8750000 0.005475702 2.652230     7
## [3]  {butleji01,                                                                
##       oladivi01}  => {Great D} 0.004106776  0.8571429 0.004791239 2.598103     6
## [4]  {anderky01,                                                                
##       jonesty01}  => {Great D} 0.004106776  0.8571429 0.004791239 2.598103     6
## [5]  {clarkbr01,                                                                
##       jonesty01,                                                                
##       meltode01}  => {Great D} 0.004106776  0.8571429 0.004791239 2.598103     6
## [6]  {tatumja01,                                                                
##       whitede01,                                                                
##       williro04}  => {Great D} 0.004106776  0.8571429 0.004791239 2.598103     6
## [7]  {bertada01,                                                                
##       klebima01}  => {Great D} 0.003422313  1.0000000 0.003422313 3.031120     5
## [8]  {bjeline01,                                                                
##       leeda03}    => {Great D} 0.003422313  0.8333333 0.004106776 2.525934     5
## [9]  {cartewe01,                                                                
##       okekech01}  => {Great D} 0.003422313  0.8333333 0.004106776 2.525934     5
## [10] {lowryky01,                                                                
##       vincega01}  => {Great D} 0.003422313  0.8333333 0.004106776 2.525934     5
## [11] {bertada01,                                                                
##       dinwisp01,                                                                
##       klebima01}  => {Great D} 0.003422313  1.0000000 0.003422313 3.031120     5
## [12] {butleji01,                                                                
##       oladivi01,                                                                
##       strusma01}  => {Great D} 0.003422313  1.0000000 0.003422313 3.031120     5
## [13] {adebaba01,                                                                
##       butleji01,                                                                
##       oladivi01}  => {Great D} 0.003422313  0.8333333 0.004106776 2.525934     5
## [14] {edwaran01,                                                                
##       mclaujo01,                                                                
##       princta02}  => {Great D} 0.003422313  0.8333333 0.004106776 2.525934     5
## [15] {greendr01,                                                                
##       paytoga02,                                                                
##       poolejo01}  => {Great D} 0.003422313  1.0000000 0.003422313 3.031120     5
## [16] {brownja02,                                                                
##       whitede01,                                                                
##       williro04}  => {Great D} 0.003422313  0.8333333 0.004106776 2.525934     5
## [17] {curryst01,                                                                
##       kuminjo01,                                                                
##       poolejo01}  => {Great D} 0.003422313  0.8333333 0.004106776 2.525934     5
## [18] {butleji01,                                                                
##       herroty01,                                                                
##       strusma01}  => {Great D} 0.003422313  0.8333333 0.004106776 2.525934     5
## [19] {greendr01,                                                                
##       paytoga02,                                                                
##       poolejo01,                                                                
##       wiggian01}  => {Great D} 0.003422313  1.0000000 0.003422313 3.031120     5
## [20] {brownja02,                                                                
##       smartma01,                                                                
##       tatumja01,                                                                
##       williro04}  => {Great D} 0.003422313  1.0000000 0.003422313 3.031120     5

When it comes to defence, there is no one players who shines above all else, there aren’t even that many duos. We mostly see three player groups, from teams like Miami Heat, Golden State Warriors, Boston Celtics all teams that have appeared earlier during this analysis, proving once again they have been great during the second half of the 2021/22 season. Additionally we have a couple teams that specialized strictly in defence, appearing now for the first time are Memphis Grizzlies (Tyus Jones, Brandon Clarke) and Dallas Mavericks (Davis Bertans and Maxi Kleber) - needed good defence to complement Luka Doncics offence. Overall what we have managed to find in this section, is that defence seems to be less about personal talent, but more about a common effort of a larger group of players.

Once again we can have a look at the visualization:

plot(head(rules_def1_sort, 10), method="graph", engine="htmlwidget")

3 Conclusions

We have managed to successfully identify players, who impact their teams the most on offence, defence and in general. We managed to highlight that offensive success can be an effect of one great player like Gianni Antetokounmpo, but defence is a common effort. Additionally we have managed to find a key rotational player, in Ömer Yurtseven.

On court presence analysis in the NBA with association rules

Antoni Rosiecki

2026-01-14