Load packages needed for the analysis

library(tidyverse)
library(scales)
library(arules)

Introduction

The aim of this project is to assist movie viewers in finding worthwhile films to watch by referencing the movies they have previously seen. To achieve this, user-rated movies will be analyzed to uncover association rules among them. Association rules are a data mining technique used to identify relationships and patterns between items in large datasets. Association rules are often used to identify patterns such as “if one item is present, another is likely to be as well,” revealing relationships between items in a dataset. This approach provides meaningful recommendations by leveraging patterns in user preferences, making it an ideal method for improving personalized suggestions in the movie domain.

Dataset

The dataset consists of dataset “movie” which contains information about 27278 movies and “rating” which represents movie ratings from 138493 users that at least rated 20 movies, between January 09, 1995 and March 31, 2015. The data primary comes from MovieLens site and was stored as a few datasets in Kaggle site. Direct link to the dataset: https://www.kaggle.com/datasets/grouplens/movielens-20m-dataset

Loading of the data

movies<- read.csv("movie.csv")
ratings<- read.csv("rating.csv")

Infotmation about our datasets

dim(movies)
## [1] 27278     3
dim(ratings)
## [1] 20000263        4
head(movies)
##   movieId                              title
## 1       1                   Toy Story (1995)
## 2       2                     Jumanji (1995)
## 3       3            Grumpier Old Men (1995)
## 4       4           Waiting to Exhale (1995)
## 5       5 Father of the Bride Part II (1995)
## 6       6                        Heat (1995)
##                                        genres
## 1 Adventure|Animation|Children|Comedy|Fantasy
## 2                  Adventure|Children|Fantasy
## 3                              Comedy|Romance
## 4                        Comedy|Drama|Romance
## 5                                      Comedy
## 6                       Action|Crime|Thriller
head(ratings)
##   userId movieId rating           timestamp
## 1      1       2    3.5 2005-04-02 23:53:47
## 2      1      29    3.5 2005-04-02 23:31:16
## 3      1      32    3.5 2005-04-02 23:33:39
## 4      1      47    3.5 2005-04-02 23:32:07
## 5      1      50    3.5 2005-04-02 23:29:40
## 6      1     112    3.5 2004-09-10 03:09:00

Preparing the dataset

To leave only worthwhile watching movies, only those with a rating of 4 and above are considered.

ratings<- ratings %>% filter(rating>=4)

To optimize the work the observations where droped from over 20 millions to 2 millions.

ratings<- ratings[1:2000000,]

Merging datasets

ratings <- ratings %>%
  left_join(movies %>% select(movieId, title), by = c("movieId" = "movieId"), relationship = "many-to-many")

head(ratings)
##   userId movieId rating           timestamp
## 1      1     151      4 2004-09-10 03:08:54
## 2      1     223      4 2005-04-02 23:46:13
## 3      1     253      4 2005-04-02 23:35:40
## 4      1     260      4 2005-04-02 23:33:46
## 5      1     293      4 2005-04-02 23:31:43
## 6      1     296      4 2005-04-02 23:32:47
##                                                            title
## 1                                                 Rob Roy (1995)
## 2                                                  Clerks (1994)
## 3      Interview with the Vampire: The Vampire Chronicles (1994)
## 4                      Star Wars: Episode IV - A New Hope (1977)
## 5 Léon: The Professional (a.k.a. The Professional) (Léon) (1994)
## 6                                            Pulp Fiction (1994)

Limitng it to the columns we only need and checking if the data is complete.

ratings<-select(ratings, userId, title)

missing_in_cols <- sapply(ratings, function(x) sum(is.na(x))/nrow(ratings))
percent(missing_in_cols)
## userId  title 
##   "0%"   "0%"
head(ratings)
##   userId                                                          title
## 1      1                                                 Rob Roy (1995)
## 2      1                                                  Clerks (1994)
## 3      1      Interview with the Vampire: The Vampire Chronicles (1994)
## 4      1                      Star Wars: Episode IV - A New Hope (1977)
## 5      1 Léon: The Professional (a.k.a. The Professional) (Léon) (1994)
## 6      1                                            Pulp Fiction (1994)

Creating a sparse matrix suitable to analyse our data.

reviews <- as(split(ratings[ , "title"], ratings[ , "userId"]), "transactions")
## Warning in asMethod(object): removing duplicated items in transactions
reviews
## transactions in sparse format with
##  27380 transactions (rows) and
##  14960 items (columns)

27380 rows refer to the number of rating users 14960 columns are features for each of the 14960 different movies Each cell in the matrix is a 1 if the movie was rated by the corresponding user, or 0 otherwise

Inspecting and selecting Movies

It would be very difficult to reach every participant, so in order to optimize the work and increase the readability of the final results—making them useful for everyone—I decided to group the films by their genre and then select the 100 most-watched ones. To ensure that any given person had a chance of having seen at least one of these 100 films, I selected films from each genre. The number of films chosen from a particular genre depended on its percentage share among all films.

Checking which movies are rated, how many times they were rated and what is their genre.

rated_movies <- data.frame(
  title = colnames(reviews),            
  count = colSums(as(reviews, "matrix"))
)
## Warning in asMethod(object): sparse->dense coercion: allocating vector of size
## 1.5 GiB
rated_movies <- rated_movies %>% arrange(desc(count))

rated_movies <- rated_movies %>%
  left_join(movies %>% select(title, genres), by = c("title" = "title"))

head(rated_movies)
##                                       title count                      genres
## 1          Shawshank Redemption, The (1994) 11125                 Crime|Drama
## 2                       Pulp Fiction (1994) 10363 Comedy|Crime|Drama|Thriller
## 3          Silence of the Lambs, The (1991)  9898       Crime|Horror|Thriller
## 4                       Forrest Gump (1994)  9454    Comedy|Drama|Romance|War
## 5 Star Wars: Episode IV - A New Hope (1977)  8523     Action|Adventure|Sci-Fi
## 6                   Schindler's List (1993)  8346                   Drama|War

Firstly I would like to focus on type of genres represented by movies.

count(unique(select(rated_movies,genres)))
##      n
## 1 1082

There are too many of genres it has to be fixed.

The “genres” column from “rated_movies” dataset is being limited to only primary genre for every movie.

rated_movies$genres <- sub("\\|.*", "", rated_movies$genres)
head(rated_movies)
##                                       title count genres
## 1          Shawshank Redemption, The (1994) 11125  Crime
## 2                       Pulp Fiction (1994) 10363 Comedy
## 3          Silence of the Lambs, The (1991)  9898  Crime
## 4                       Forrest Gump (1994)  9454 Comedy
## 5 Star Wars: Episode IV - A New Hope (1977)  8523 Action
## 6                   Schindler's List (1993)  8346  Drama

Counting films in terms of the genre they belong to.

genre_counts <- rated_movies %>%
  count(genres, sort = TRUE) %>%
  mutate(
    percent = n / sum(n) * 100,
    cum_percent = cumsum(percent)
  )

ggplot(genre_counts, aes(x = reorder(genres, -n), y = n)) +
  geom_bar(stat = "identity", fill = "steelblue") + 
  geom_text(aes(label = n), vjust = -0.5, size = 4, color = "black") +
  geom_text(aes(label = paste0(round(percent, 1), "%")), 
            vjust = 1.5, size = 3, color = "white") +
  labs(
    title = "Percentage of movie genres",
    x = "Genre",
    y = "Number of film appearances"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

List of selected movies

Selecting and deining the number of films for each genre

genre_counts <- c(
  Drama = 30, Comedy = 27, Action = 15, 
  Documentary = 7, Crime = 6, Adventure = 6, 
  Horror = 5, Animation = 2, Children = 3)

selected_movies <- c()

for (genre in names(genre_counts)) {
  num_movies <- genre_counts[genre]
  
  genre_movies <- rated_movies %>%
    filter(genre == genre & !(title %in% selected_movies)) %>%
    head(num_movies) %>%
    pull(title)
  
  selected_movies <- c(selected_movies, genre_movies)
}

List of chosen movies

print(selected_movies)
##   [1] "Shawshank Redemption, The (1994)"                                              
##   [2] "Pulp Fiction (1994)"                                                           
##   [3] "Silence of the Lambs, The (1991)"                                              
##   [4] "Forrest Gump (1994)"                                                           
##   [5] "Star Wars: Episode IV - A New Hope (1977)"                                     
##   [6] "Schindler's List (1993)"                                                       
##   [7] "Matrix, The (1999)"                                                            
##   [8] "Usual Suspects, The (1995)"                                                    
##   [9] "Braveheart (1995)"                                                             
##  [10] "Terminator 2: Judgment Day (1991)"                                             
##  [11] "Star Wars: Episode V - The Empire Strikes Back (1980)"                         
##  [12] "Fugitive, The (1993)"                                                          
##  [13] "Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)"
##  [14] "Godfather, The (1972)"                                                         
##  [15] "American Beauty (1999)"                                                        
##  [16] "Star Wars: Episode VI - Return of the Jedi (1983)"                             
##  [17] "Toy Story (1995)"                                                              
##  [18] "Fargo (1996)"                                                                  
##  [19] "Jurassic Park (1993)"                                                          
##  [20] "Seven (a.k.a. Se7en) (1995)"                                                   
##  [21] "Fight Club (1999)"                                                             
##  [22] "Apollo 13 (1995)"                                                              
##  [23] "Twelve Monkeys (a.k.a. 12 Monkeys) (1995)"                                     
##  [24] "Lord of the Rings: The Fellowship of the Ring, The (2001)"                     
##  [25] "Sixth Sense, The (1999)"                                                       
##  [26] "Back to the Future (1985)"                                                     
##  [27] "Saving Private Ryan (1998)"                                                    
##  [28] "Monty Python and the Holy Grail (1975)"                                        
##  [29] "Dances with Wolves (1990)"                                                     
##  [30] "Princess Bride, The (1987)"                                                    
##  [31] "Lord of the Rings: The Two Towers, The (2002)"                                 
##  [32] "One Flew Over the Cuckoo's Nest (1975)"                                        
##  [33] "Memento (2000)"                                                                
##  [34] "Lord of the Rings: The Return of the King, The (2003)"                         
##  [35] "Lion King, The (1994)"                                                         
##  [36] "Blade Runner (1982)"                                                           
##  [37] "Aladdin (1992)"                                                                
##  [38] "Alien (1979)"                                                                  
##  [39] "Terminator, The (1984)"                                                        
##  [40] "Gladiator (2000)"                                                              
##  [41] "Indiana Jones and the Last Crusade (1989)"                                     
##  [42] "Godfather: Part II, The (1974)"                                                
##  [43] "Goodfellas (1990)"                                                             
##  [44] "Groundhog Day (1993)"                                                          
##  [45] "Die Hard (1988)"                                                               
##  [46] "Reservoir Dogs (1992)"                                                         
##  [47] "Good Will Hunting (1997)"                                                      
##  [48] "L.A. Confidential (1997)"                                                      
##  [49] "Shrek (2001)"                                                                  
##  [50] "True Lies (1994)"                                                              
##  [51] "Independence Day (a.k.a. ID4) (1996)"                                          
##  [52] "Aliens (1986)"                                                                 
##  [53] "Speed (1994)"                                                                  
##  [54] "Casablanca (1942)"                                                             
##  [55] "E.T. the Extra-Terrestrial (1982)"                                             
##  [56] "Amelie (Fabuleux destin d'Amélie Poulain, Le) (2001)"                          
##  [57] "Léon: The Professional (a.k.a. The Professional) (Léon) (1994)"                
##  [58] "Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1964)"   
##  [59] "Beauty and the Beast (1991)"                                                   
##  [60] "Being John Malkovich (1999)"                                                   
##  [61] "Taxi Driver (1976)"                                                            
##  [62] "Batman (1989)"                                                                 
##  [63] "American History X (1998)"                                                     
##  [64] "Babe (1995)"                                                                   
##  [65] "Men in Black (a.k.a. MIB) (1997)"                                              
##  [66] "Clockwork Orange, A (1971)"                                                    
##  [67] "Apocalypse Now (1979)"                                                         
##  [68] "Ghostbusters (a.k.a. Ghost Busters) (1984)"                                    
##  [69] "2001: A Space Odyssey (1968)"                                                  
##  [70] "Trainspotting (1996)"                                                          
##  [71] "Rock, The (1996)"                                                              
##  [72] "Eternal Sunshine of the Spotless Mind (2004)"                                  
##  [73] "Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)"                       
##  [74] "Shining, The (1980)"                                                           
##  [75] "Dark Knight, The (2008)"                                                       
##  [76] "Rain Man (1988)"                                                               
##  [77] "Fifth Element, The (1997)"                                                     
##  [78] "Wizard of Oz, The (1939)"                                                      
##  [79] "Full Metal Jacket (1987)"                                                      
##  [80] "Pirates of the Caribbean: The Curse of the Black Pearl (2003)"                 
##  [81] "Willy Wonka & the Chocolate Factory (1971)"                                    
##  [82] "Four Weddings and a Funeral (1994)"                                            
##  [83] "Clear and Present Danger (1994)"                                               
##  [84] "Die Hard: With a Vengeance (1995)"                                             
##  [85] "Ferris Bueller's Day Off (1986)"                                               
##  [86] "Shakespeare in Love (1998)"                                                    
##  [87] "Clerks (1994)"                                                                 
##  [88] "Heat (1995)"                                                                   
##  [89] "Monsters, Inc. (2001)"                                                         
##  [90] "Finding Nemo (2003)"                                                           
##  [91] "Amadeus (1984)"                                                                
##  [92] "Stand by Me (1986)"                                                            
##  [93] "Truman Show, The (1998)"                                                       
##  [94] "Mission: Impossible (1996)"                                                    
##  [95] "Green Mile, The (1999)"                                                        
##  [96] "Kill Bill: Vol. 1 (2003)"                                                      
##  [97] "Rear Window (1954)"                                                            
##  [98] "Sense and Sensibility (1995)"                                                  
##  [99] "Psycho (1960)"                                                                 
## [100] "Monty Python's Life of Brian (1979)"

Inspecting the dataset with simple statistics

summary(reviews)
## transactions as itemMatrix in sparse format with
##  27380 rows (elements/itemsets/transactions) and
##  14960 columns (items) and a density of 0.004882721 
## 
## most frequent items:
##          Shawshank Redemption, The (1994) 
##                                     11125 
##                       Pulp Fiction (1994) 
##                                     10363 
##          Silence of the Lambs, The (1991) 
##                                      9898 
##                       Forrest Gump (1994) 
##                                      9454 
## Star Wars: Episode IV - A New Hope (1977) 
##                                      8523 
##                                   (Other) 
##                                   1950623 
## 
## element (itemset/transaction) length distribution:
## sizes
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
##   50   59   94  123  173  238  291  326  397  427  479  543  567  581  607  575 
##   17   18   19   20   21   22   23   24   25   26   27   28   29   30   31   32 
##  604  579  515  508  455  483  444  437  399  366  401  352  328  307  315  275 
##   33   34   35   36   37   38   39   40   41   42   43   44   45   46   47   48 
##  295  264  277  261  259  232  241  236  212  224  227  208  205  185  193  183 
##   49   50   51   52   53   54   55   56   57   58   59   60   61   62   63   64 
##  187  192  160  143  154  170  150  174  147  150  161  119  132  130  123   99 
##   65   66   67   68   69   70   71   72   73   74   75   76   77   78   79   80 
##   97  123  115  125  125  119  100  116  107  115   95  107   97  117  103   93 
##   81   82   83   84   85   86   87   88   89   90   91   92   93   94   95   96 
##   88   95   94  106   88   86   73   79   62   74   83   65   59   84   56   67 
##   97   98   99  100  101  102  103  104  105  106  107  108  109  110  111  112 
##   79   67   63   68   60   53   63   59   52   56   60   58   57   57   50   53 
##  113  114  115  116  117  118  119  120  121  122  123  124  125  126  127  128 
##   49   50   42   54   52   54   57   57   55   61   51   47   29   48   47   49 
##  129  130  131  132  133  134  135  136  137  138  139  140  141  142  143  144 
##   33   50   50   40   34   42   26   43   32   44   37   44   38   30   38   43 
##  145  146  147  148  149  150  151  152  153  154  155  156  157  158  159  160 
##   42   38   40   35   30   31   43   27   28   31   28   36   28   37   25   32 
##  161  162  163  164  165  166  167  168  169  170  171  172  173  174  175  176 
##   27   31   33   30   27   24   31   20   20   22   22   37   31   18   31   22 
##  177  178  179  180  181  182  183  184  185  186  187  188  189  190  191  192 
##   31   28   28   22   20   32   23   22   23   21   19   27   17   25   19   20 
##  193  194  195  196  197  198  199  200  201  202  203  204  205  206  207  208 
##   21   15   17   24   22   20   19   24   18   14   18   16   16   14   21   23 
##  209  210  211  212  213  214  215  216  217  218  219  220  221  222  223  224 
##   12   12   17   24   21   11   15   21   20   13   20   11   16    9   11    9 
##  225  226  227  228  229  230  231  232  233  234  235  236  237  238  239  240 
##   15   21    8   17   19   14   11   13   14   14   14    7   14    9   14    9 
##  241  242  243  244  245  246  247  248  249  250  251  252  253  254  255  256 
##    7   16   11   14    9    9    7   14   14    8   11   16   10    7    4   10 
##  257  258  259  260  261  262  263  264  265  266  267  268  269  270  271  272 
##    7   13    6    7   10   12    7    5   11   12   14    5    9    8    7    5 
##  273  274  275  276  277  278  279  280  281  282  283  284  285  286  287  288 
##    8    8    5    8   13   10    9   10    8    6    9   11   12    7    7    9 
##  289  290  291  292  293  294  295  296  297  298  299  300  301  302  303  304 
##    9    9    8   11    9    6    8    7    8    9    6    5    6    2   12    4 
##  305  306  307  308  309  310  311  312  313  314  315  316  317  318  319  320 
##   10    5    6    5    7    9    9    5    7    6    5    1    6   10    6    5 
##  321  322  323  324  325  326  327  328  329  330  331  332  333  334  335  336 
##    3   15    7    4    4    8    8   10    3    3    8    7   11    4    5    4 
##  337  338  339  340  341  342  343  344  345  346  347  348  349  350  351  352 
##    6    6    1    2    5    5    6    3    6    3    4    8    3    3    6    4 
##  353  354  355  356  357  358  359  360  361  362  363  364  365  366  367  368 
##    5    4    5    8    4    4    5    3    4    4    3    2    4    3    4    8 
##  369  370  371  372  373  374  375  376  377  378  379  380  381  382  383  384 
##    7    1    7    5    7    4    3    5    4    3    3    2    3    7    5    3 
##  385  386  387  389  390  391  392  393  394  395  396  397  398  399  400  401 
##    1    5    3    5    3    2    5    6    2    7    5    6    2    6    1    5 
##  402  403  405  406  407  408  409  410  411  412  413  414  415  416  417  418 
##    4    4    2    4    3    6    1    1    1    5    1    2    4    4    2    4 
##  419  420  421  422  423  424  425  426  427  428  429  430  431  433  435  436 
##    8    3    2    2    2    2    4    1    2    2    4    1    2    2    1    2 
##  437  438  439  440  441  442  443  445  446  447  448  450  451  453  454  455 
##    5    4    3    4    3    1    2    2    2    1    2    1    2    2    1    1 
##  456  458  459  460  461  462  463  464  465  466  467  469  472  473  475  477 
##    3    3    1    1    1    3    2    1    2    3    1    2    2    1    1    2 
##  478  479  480  481  483  484  485  486  487  489  490  491  492  493  494  495 
##    1    1    2    1    2    2    1    3    1    2    3    4    4    1    3    3 
##  496  497  498  501  502  504  505  506  507  508  510  512  514  515  516  517 
##    1    5    1    2    2    2    3    2    1    3    2    2    3    1    1    3 
##  520  521  522  523  525  526  527  528  529  530  531  532  533  534  535  536 
##    1    1    1    5    2    1    1    1    1    4    2    2    2    2    2    2 
##  537  538  543  544  545  547  550  551  553  554  555  556  559  560  561  562 
##    1    2    2    3    2    1    1    1    1    2    3    2    2    1    1    1 
##  563  567  568  571  576  578  579  581  584  588  589  592  593  594  595  597 
##    1    1    2    1    1    2    2    1    1    1    1    2    3    1    2    1 
##  599  600  601  602  603  604  605  608  609  610  612  613  614  615  617  620 
##    2    2    1    2    1    2    2    1    1    1    1    3    1    1    1    1 
##  621  623  624  626  628  629  633  637  638  640  641  643  644  647  650  652 
##    2    1    1    1    1    2    1    1    1    2    1    2    2    1    2    1 
##  653  655  659  660  661  668  672  675  678  681  687  688  689  693  699  705 
##    1    1    1    1    1    1    1    1    2    1    1    1    1    1    1    2 
##  708  712  713  716  721  722  723  726  731  736  741  750  755  757  762  773 
##    1    1    1    1    1    1    2    1    1    1    1    1    1    2    2    1 
##  778  781  784  787  788  789  791  793  795  797  798  800  803  804  809  822 
##    1    2    1    1    2    2    1    2    2    1    1    1    1    1    1    1 
##  824  831  841  846  851  853  859  861  862  869  872  878  884  887  890  899 
##    1    1    1    2    1    1    1    1    1    1    1    1    1    1    2    1 
##  921  944  951  952  954  972  993 1001 1003 1004 1011 1016 1038 1048 1049 1058 
##    1    1    1    1    1    2    1    1    1    1    2    1    1    1    1    1 
## 1065 1090 1125 1137 1158 1171 1190 1203 1204 1211 1246 1247 1253 1298 1337 1342 
##    1    1    1    1    1    1    1    1    1    1    1    1    1    1    1    1 
## 1575 1751 1935 1984 2502 
##    1    1    1    1    1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00   19.00   38.00   73.05   85.00 2502.00 
## 
## includes extended item information - examples:
##                          labels
## 1         ¡Three Amigos! (1986)
## 2       ...And God Spoke (1993)
## 3 ...And Justice for All (1979)
## 
## includes extended transaction information - examples:
##   transactionID
## 1             1
## 2             2
## 3             3

Density value of 0.004882721 (0.5%) refers to the proportion of non-zero matrix cells

Simple statistics

summary(itemFrequency(reviews, type="relative"))
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.0000365 0.0000730 0.0003652 0.0048827 0.0019357 0.4063185
summary(itemFrequency(reviews, type="absolute"))
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     1.0     2.0    10.0   133.7    53.0 11125.0
itemFrequencyPlot(reviews, topN = 10, main="The most frequently rated movies")

On average user rated 73 movies. There is also one user that rated over 16% of all movies!

The most watched film turned out to be Shawshank Redemption with over 11125 rates (40%)

Apriori algorithm

Investigating associations for movies from selected genres and the rest of the movies through the application of the Apriori algorithm.

After getting familiar with data statistics, regarding the algorithm I decided to set thresholds:

  • support (the proportion of user ratings in which a particular movie appears) at 0.5% (movies in a rule had to be rated by at least 137 same users)

  • confidence (measures how often a rule is correct when its antecedent occurs) at 30%

  • rules length equal to 2

Learning associaton rules with selected movies as antecedent (lhs)

selected_movies <- unlist(selected_movies)
rules <- apriori(
  data = reviews,
  parameter = list(supp = 0.01, conf = 0.3, minlen = 2, maxlen = 2),
  appearance = list(lhs = selected_movies, default = "rhs"),
  control = list(verbose = FALSE)
)

rules
## set of 360 rules
summary(rules)
## set of 360 rules
## 
## rule length distribution (lhs + rhs):sizes
##   2 
## 360 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       2       2       2       2       2       2 
## 
## summary of quality measures:
##     support          confidence        coverage           lift      
##  Min.   :0.03170   Min.   :0.3000   Min.   :0.1050   Min.   :2.866  
##  1st Qu.:0.03724   1st Qu.:0.3115   1st Qu.:0.1098   1st Qu.:3.303  
##  Median :0.04125   Median :0.3281   Median :0.1188   Median :3.591  
##  Mean   :0.04294   Mean   :0.3380   Mean   :0.1278   Mean   :3.716  
##  3rd Qu.:0.04645   3rd Qu.:0.3523   3rd Qu.:0.1391   3rd Qu.:4.016  
##  Max.   :0.07663   Max.   :0.7033   Max.   :0.2390   Max.   :7.958  
##      count     
##  Min.   : 868  
##  1st Qu.:1020  
##  Median :1130  
##  Mean   :1176  
##  3rd Qu.:1272  
##  Max.   :2098  
## 
## mining info:
##     data ntransactions support confidence
##  reviews         27380    0.01        0.3
##                                                                                                                                                                                    call
##  apriori(data = reviews, parameter = list(supp = 0.01, conf = 0.3, minlen = 2, maxlen = 2), appearance = list(lhs = selected_movies, default = "rhs"), control = list(verbose = FALSE))

Reordering the rules to be able to select the most meaningful ones

inspect(sort(rules, by = "lift")[1:10])
##      lhs                           rhs                            support confidence  coverage     lift count
## [1]  {Kill Bill: Vol. 1 (2003)} => {Kill Bill: Vol. 2 (2004)}  0.07542001  0.7033379 0.1072316 7.957600  2065
## [2]  {Rear Window (1954)}       => {Vertigo (1958)}            0.05288532  0.4943667 0.1069759 6.226200  1448
## [3]  {Dark Knight, The (2008)}  => {Iron Man (2008)}           0.04408327  0.3652042 0.1207085 6.203035  1207
## [4]  {Kill Bill: Vol. 1 (2003)} => {Sin City (2005)}           0.04612856  0.4301771 0.1072316 5.790683  1263
## [5]  {Rear Window (1954)}       => {North by Northwest (1959)} 0.05525931  0.5165586 0.1069759 5.559502  1513
## [6]  {Dark Knight, The (2008)}  => {Inception (2010)}          0.05171658  0.4284418 0.1207085 5.522945  1416
## [7]  {Dark Knight, The (2008)}  => {WALL·E (2008)}             0.04258583  0.3527988 0.1207085 5.354563  1166
## [8]  {Dark Knight, The (2008)}  => {Prestige, The (2006)}      0.03772827  0.3125567 0.1207085 5.269583  1033
## [9]  {Finding Nemo (2003)}      => {Incredibles, The (2004)}   0.05460190  0.4975042 0.1097516 5.245154  1495
## [10] {Psycho (1960)}            => {Vertigo (1958)}            0.04349890  0.4132547 0.1052593 5.204652  1191
inspect(sort(rules, by = "confidence")[1:10])
##      lhs                                                                rhs                             support confidence  coverage     lift count
## [1]  {Kill Bill: Vol. 1 (2003)}                                      => {Kill Bill: Vol. 2 (2004)}   0.07542001  0.7033379 0.1072316 7.957600  2065
## [2]  {Rear Window (1954)}                                            => {North by Northwest (1959)}  0.05525931  0.5165586 0.1069759 5.559502  1513
## [3]  {Finding Nemo (2003)}                                           => {Incredibles, The (2004)}    0.05460190  0.4975042 0.1097516 5.245154  1495
## [4]  {Rear Window (1954)}                                            => {Vertigo (1958)}             0.05288532  0.4943667 0.1069759 6.226200  1448
## [5]  {Monsters, Inc. (2001)}                                         => {Incredibles, The (2004)}    0.05230095  0.4743292 0.1102630 5.000822  1432
## [6]  {Clear and Present Danger (1994)}                               => {Crimson Tide (1995)}        0.05193572  0.4598965 0.1129291 4.505176  1422
## [7]  {Dark Knight, The (2008)}                                       => {Batman Begins (2005)}       0.05405405  0.4478064 0.1207085 4.703083  1480
## [8]  {Ferris Bueller's Day Off (1986)}                               => {Breakfast Club, The (1985)} 0.04974434  0.4423514 0.1124543 4.620977  1362
## [9]  {Pirates of the Caribbean: The Curse of the Black Pearl (2003)} => {Ocean's Eleven (2001)}      0.05021914  0.4338908 0.1157414 4.181602  1375
## [10] {Eternal Sunshine of the Spotless Mind (2004)}                  => {Donnie Darko (2001)}        0.05354273  0.4334713 0.1235208 4.568301  1466
inspect(sort(rules, by = "count")[1:10])
##      lhs                                                            rhs                                              support confidence  coverage     lift count
## [1]  {Toy Story (1995)}                                          => {Toy Story 2 (1999)}                          0.07662527  0.3205990 0.2390066 3.060670  2098
## [2]  {Kill Bill: Vol. 1 (2003)}                                  => {Kill Bill: Vol. 2 (2004)}                    0.07542001  0.7033379 0.1072316 7.957600  2065
## [3]  {Fight Club (1999)}                                         => {Snatch (2000)}                               0.07081812  0.3065613 0.2310080 3.308493  1939
## [4]  {Fight Club (1999)}                                         => {Donnie Darko (2001)}                         0.07074507  0.3062451 0.2310080 3.227479  1937
## [5]  {Apollo 13 (1995)}                                          => {Crimson Tide (1995)}                         0.06680058  0.3026142 0.2207451 2.964428  1829
## [6]  {Lord of the Rings: The Fellowship of the Ring, The (2001)} => {Ocean's Eleven (2001)}                       0.06486486  0.3115789 0.2081812 3.002827  1776
## [7]  {Lord of the Rings: The Fellowship of the Ring, The (2001)} => {Batman Begins (2005)}                        0.06406136  0.3077193 0.2081812 3.231820  1754
## [8]  {Lord of the Rings: The Fellowship of the Ring, The (2001)} => {Beautiful Mind, A (2001)}                    0.06391527  0.3070175 0.2081812 2.945389  1750
## [9]  {Indiana Jones and the Last Crusade (1989)}                 => {Indiana Jones and the Temple of Doom (1984)} 0.06336742  0.3891007 0.1628561 4.558655  1735
## [10] {Lord of the Rings: The Return of the King, The (2003)}     => {Batman Begins (2005)}                        0.06292915  0.3633488 0.1731921 3.816068  1723

Many of the resulting rules stem from the fact that some films were produced in several parts. However, the existence of such pairs indicates that the algorithm works. With the information about the film’s production year in the titles, we can also observe that films connect according to the periods from which they come. Older films have the strongest associations with other older films, and newer films with other newer films. “The Dark Night” appears several times in the ranking of rules with the highest lift, so it is worth taking a closer look at.

What drives people to watch The Dark Knight?

rules.knight<-apriori(data=reviews, parameter=list(supp=0.05,conf = 0.05), 
                       appearance=list(default="lhs", rhs="Dark Knight, The (2008)"), control=list(verbose=F)) 
rules.knight<-sort(rules.knight, by="confidence", decreasing=TRUE)

inspect((rules.knight)[1:10])
##      lhs                                                             rhs                          support confidence   coverage     lift count
## [1]  {Inception (2010)}                                           => {Dark Knight, The (2008)} 0.05171658  0.6666667 0.07757487 5.522945  1416
## [2]  {Batman Begins (2005)}                                       => {Dark Knight, The (2008)} 0.05405405  0.5677023 0.09521549 4.703083  1480
## [3]  {Lord of the Rings: The Fellowship of the Ring, The (2001),                                                                              
##       Lord of the Rings: The Return of the King, The (2003),                                                                                  
##       Lord of the Rings: The Two Towers, The (2002)}              => {Dark Knight, The (2008)} 0.05200877  0.3854900 0.13491600 3.193560  1424
## [4]  {Lord of the Rings: The Fellowship of the Ring, The (2001),                                                                              
##       Lord of the Rings: The Return of the King, The (2003)}      => {Dark Knight, The (2008)} 0.05715851  0.3830152 0.14923302 3.173058  1565
## [5]  {Lord of the Rings: The Return of the King, The (2003),                                                                                  
##       Lord of the Rings: The Two Towers, The (2002)}              => {Dark Knight, The (2008)} 0.05562454  0.3829520 0.14525201 3.172534  1523
## [6]  {Fight Club (1999),                                                                                                                      
##       Shawshank Redemption, The (1994)}                           => {Dark Knight, The (2008)} 0.05270270  0.3720031 0.14167275 3.081829  1443
## [7]  {Lord of the Rings: The Return of the King, The (2003)}      => {Dark Knight, The (2008)} 0.06387874  0.3688317 0.17319211 3.055556  1749
## [8]  {Fight Club (1999),                                                                                                                      
##       Matrix, The (1999)}                                         => {Dark Knight, The (2008)} 0.05463842  0.3646113 0.14985391 3.020592  1496
## [9]  {Lord of the Rings: The Fellowship of the Ring, The (2001),                                                                              
##       Matrix, The (1999)}                                         => {Dark Knight, The (2008)} 0.05058437  0.3644737 0.13878744 3.019452  1385
## [10] {Fight Club (1999),                                                                                                                      
##       Pulp Fiction (1994)}                                        => {Dark Knight, The (2008)} 0.05208181  0.3495098 0.14901388 2.895485  1426

Interestingly, the film “Inception” ranks first, preceding the previous installment of our film series, “Batman Begins.” Among the 10 films with the highest confidence, three parts of the Lord of the Rings series appear. Most of these films are from the 21st century, but there are also older productions such as “The Godfather” and “Forrest Gump.”

100 associations

List of films where there is something for everyone (1 association for each of the selected 100 films)

solo_rules <- apriori(
  data = reviews,
  parameter = list(supp = 0.005, conf = 0.25, minlen = 2, maxlen = 2),
  appearance = list(lhs = selected_movies, default = "rhs"),
  control = list(verbose = FALSE))

Selecting the most optimal association for each film in the LHS

solo_rules <- sort(solo_rules, by = "confidence", decreasing = TRUE)

best_rules <- solo_rules[!duplicated(lhs(rules))]

best_rules <- best_rules[1:100]

List of movies

inspect(best_rules)
##       lhs                                                                              rhs                                              support confidence  coverage     lift count
## [1]   {Kill Bill: Vol. 1 (2003)}                                                    => {Kill Bill: Vol. 2 (2004)}                    0.07542001  0.7033379 0.1072316 7.957600  2065
## [2]   {Rear Window (1954)}                                                          => {North by Northwest (1959)}                   0.05525931  0.5165586 0.1069759 5.559502  1513
## [3]   {Finding Nemo (2003)}                                                         => {Incredibles, The (2004)}                     0.05460190  0.4975042 0.1097516 5.245154  1495
## [4]   {Monsters, Inc. (2001)}                                                       => {Incredibles, The (2004)}                     0.05230095  0.4743292 0.1102630 5.000822  1432
## [5]   {Rear Window (1954)}                                                          => {Citizen Kane (1941)}                         0.04598247  0.4298395 0.1069759 4.444489  1259
## [6]   {Dark Knight, The (2008)}                                                     => {Inception (2010)}                            0.05171658  0.4284418 0.1207085 5.522945  1416
## [7]   {Rear Window (1954)}                                                          => {Chinatown (1974)}                            0.04488678  0.4195971 0.1069759 4.718098  1229
## [8]   {Pirates of the Caribbean: The Curse of the Black Pearl (2003)}               => {Incredibles, The (2004)}                     0.04663988  0.4029662 0.1157414 4.248446  1277
## [9]   {Monsters, Inc. (2001)}                                                       => {Toy Story 2 (1999)}                          0.04441198  0.4027824 0.1102630 3.845252  1216
## [10]  {Kill Bill: Vol. 1 (2003)}                                                    => {Batman Begins (2005)}                        0.04236669  0.3950954 0.1072316 4.149486  1160
## [11]  {Kill Bill: Vol. 1 (2003)}                                                    => {Donnie Darko (2001)}                         0.04203798  0.3920300 0.1072316 4.131555  1151
## [12]  {Psycho (1960)}                                                               => {Graduate, The (1967)}                        0.04123448  0.3917418 0.1052593 4.232791  1129
## [13]  {Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1964)} => {Citizen Kane (1941)}                         0.05328707  0.3907338 0.1363769 4.040140  1459
## [14]  {Indiana Jones and the Last Crusade (1989)}                                   => {Indiana Jones and the Temple of Doom (1984)} 0.06336742  0.3891007 0.1628561 4.558655  1735
## [15]  {Kill Bill: Vol. 1 (2003)}                                                    => {Snatch (2000)}                               0.04156318  0.3876022 0.1072316 4.183109  1138
## [16]  {Rear Window (1954)}                                                          => {Graduate, The (1967)}                        0.04035793  0.3772619 0.1069759 4.076334  1105
## [17]  {Reservoir Dogs (1992)}                                                       => {Big Lebowski, The (1998)}                    0.05668371  0.3741562 0.1514974 3.570721  1552
## [18]  {Being John Malkovich (1999)}                                                 => {Big Lebowski, The (1998)}                    0.05080351  0.3739247 0.1358656 3.568511  1391
## [19]  {Ferris Bueller's Day Off (1986)}                                             => {Office Space (1999)}                         0.04200146  0.3734979 0.1124543 3.656193  1150
## [20]  {Dark Knight, The (2008)}                                                     => {Departed, The (2006)}                        0.04506939  0.3733737 0.1207085 4.768177  1234
## [21]  {Kill Bill: Vol. 1 (2003)}                                                    => {Ocean's Eleven (2001)}                       0.04002922  0.3732970 0.1072316 3.597632  1096
## [22]  {Finding Nemo (2003)}                                                         => {Batman Begins (2005)}                        0.04039445  0.3680532 0.1097516 3.865477  1106
## [23]  {Four Weddings and a Funeral (1994)}                                          => {Sleepless in Seattle (1993)}                 0.04185537  0.3678973 0.1137692 3.723855  1146
## [24]  {Kill Bill: Vol. 1 (2003)}                                                    => {Bourne Identity, The (2002)}                 0.03937180  0.3671662 0.1072316 3.730245  1078
## [25]  {Kill Bill: Vol. 1 (2003)}                                                    => {Incredibles, The (2004)}                     0.03929876  0.3664850 0.1072316 3.863827  1076
## [26]  {Shrek (2001)}                                                                => {Ocean's Eleven (2001)}                       0.05412710  0.3653846 0.1481373 3.521377  1482
## [27]  {Dark Knight, The (2008)}                                                     => {Iron Man (2008)}                             0.04408327  0.3652042 0.1207085 6.203035  1207
## [28]  {Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)}                     => {O Brother, Where Art Thou? (2000)}           0.04437546  0.3642086 0.1218408 3.813397  1215
## [29]  {Full Metal Jacket (1987)}                                                    => {Big Lebowski, The (1998)}                    0.04229364  0.3641509 0.1161432 3.475236  1158
## [30]  {Rain Man (1988)}                                                             => {Dead Poets Society (1989)}                   0.04306063  0.3624347 0.1188093 3.725023  1179
## [31]  {Rear Window (1954)}                                                          => {To Kill a Mockingbird (1962)}                0.03871439  0.3618983 0.1069759 4.163351  1060
## [32]  {Wizard of Oz, The (1939)}                                                    => {Graduate, The (1967)}                        0.04229364  0.3615361 0.1169832 3.906416  1158
## [33]  {Finding Nemo (2003)}                                                         => {Toy Story 2 (1999)}                          0.03929876  0.3580699 0.1097516 3.418394  1076
## [34]  {Pirates of the Caribbean: The Curse of the Black Pearl (2003)}               => {X-Men (2000)}                                0.04141709  0.3578416 0.1157414 3.586275  1134
## [35]  {Rear Window (1954)}                                                          => {Sting, The (1973)}                           0.03823959  0.3574599 0.1069759 4.004604  1047
## [36]  {Pirates of the Caribbean: The Curse of the Black Pearl (2003)}               => {Spider-Man (2002)}                           0.04108839  0.3550016 0.1157414 4.442387  1125
## [37]  {Kill Bill: Vol. 1 (2003)}                                                    => {Big Lebowski, The (1998)}                    0.03802045  0.3545640 0.1072316 3.383745  1041
## [38]  {Trainspotting (1996)}                                                        => {Big Lebowski, The (1998)}                    0.04452155  0.3544635 0.1256026 3.382785  1219
## [39]  {Truman Show, The (1998)}                                                     => {Edward Scissorhands (1990)}                  0.03838568  0.3532773 0.1086560 3.853678  1051
## [40]  {Stand by Me (1986)}                                                          => {Jaws (1975)}                                 0.03845873  0.3527638 0.1090212 3.525063  1053
## [41]  {Apocalypse Now (1979)}                                                       => {Platoon (1986)}                              0.04477721  0.3527043 0.1269540 4.502118  1226
## [42]  {Casablanca (1942)}                                                           => {Chinatown (1974)}                            0.04937911  0.3507134 0.1407962 3.943545  1352
## [43]  {Monsters, Inc. (2001)}                                                       => {Minority Report (2002)}                      0.03853178  0.3494535 0.1102630 3.858079  1055
## [44]  {Eternal Sunshine of the Spotless Mind (2004)}                                => {Requiem for a Dream (2000)}                  0.04276844  0.3462448 0.1235208 4.423791  1171
## [45]  {Clockwork Orange, A (1971)}                                                  => {Big Lebowski, The (1998)}                    0.04422936  0.3460989 0.1277940 3.302958  1211
## [46]  {Apocalypse Now (1979)}                                                       => {Graduate, The (1967)}                        0.04379109  0.3449367 0.1269540 3.727059  1199
## [47]  {Shrek (2001)}                                                                => {Beautiful Mind, A (2001)}                    0.05058437  0.3414694 0.1481373 3.275905  1385
## [48]  {Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1964)} => {North by Northwest (1959)}                   0.04638422  0.3401178 0.1363769 3.660545  1270
## [49]  {Amadeus (1984)}                                                              => {North by Northwest (1959)}                   0.03725347  0.3398867 0.1096056 3.658057  1020
## [50]  {Shrek (2001)}                                                                => {X-Men (2000)}                                0.04915997  0.3318540 0.1481373 3.325829  1346
## [51]  {Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1964)} => {Graduate, The (1967)}                        0.04474069  0.3280664 0.1363769 3.544774  1225
## [52]  {Men in Black (a.k.a. MIB) (1997)}                                            => {X-Men (2000)}                                0.04225712  0.3272984 0.1291088 3.280173  1157
## [53]  {Rain Man (1988)}                                                             => {When Harry Met Sally... (1989)}              0.03875091  0.3261605 0.1188093 3.263989  1061
## [54]  {Pirates of the Caribbean: The Curse of the Black Pearl (2003)}               => {Kill Bill: Vol. 2 (2004)}                    0.03772827  0.3259703 0.1157414 3.688045  1033
## [55]  {True Lies (1994)}                                                            => {Crimson Tide (1995)}                         0.04696859  0.3257345 0.1441928 3.190917  1286
## [56]  {Casablanca (1942)}                                                           => {Vertigo (1958)}                              0.04579985  0.3252918 0.1407962 4.096822  1254
## [57]  {Gladiator (2000)}                                                            => {Bourne Identity, The (2002)}                 0.05292184  0.3231490 0.1637692 3.283050  1449
## [58]  {Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)}                     => {Traffic (2000)}                              0.03904310  0.3204436 0.1218408 4.284056  1069
## [59]  {Kill Bill: Vol. 1 (2003)}                                                    => {V for Vendetta (2006)}                       0.03400292  0.3170981 0.1072316 4.615707   931
## [60]  {Crouching Tiger, Hidden Dragon (Wo hu cang long) (2000)}                     => {X-Men (2000)}                                0.03813002  0.3129496 0.1218408 3.136369  1044
## [61]  {Eternal Sunshine of the Spotless Mind (2004)}                                => {O Brother, Where Art Thou? (2000)}           0.03794741  0.3072147 0.1235208 3.216649  1039
## [62]  {Stand by Me (1986)}                                                          => {Platoon (1986)}                              0.03341855  0.3065327 0.1090212 3.912757   915
## [63]  {Truman Show, The (1998)}                                                     => {Catch Me If You Can (2002)}                  0.03305332  0.3042017 0.1086560 3.748444   905
## [64]  {Monty Python's Life of Brian (1979)}                                         => {Brazil (1985)}                               0.03173850  0.3023660 0.1049671 4.040401   869
## [65]  {Groundhog Day (1993)}                                                        => {Fish Called Wanda, A (1988)}                 0.04653031  0.3020389 0.1540541 3.217831  1274
## [66]  {Aliens (1986)}                                                               => {Total Recall (1990)}                         0.04306063  0.3000000 0.1435354 3.909567  1179
## [67]  {Fifth Element, The (1997)}                                                   => {Total Recall (1990)}                         0.03553689  0.2998459 0.1185172 3.907559   973
## [68]  {Ferris Bueller's Day Off (1986)}                                             => {Christmas Story, A (1983)}                   0.03371074  0.2997727 0.1124543 4.475341   923
## [69]  {Apocalypse Now (1979)}                                                       => {North by Northwest (1959)}                   0.03805698  0.2997699 0.1269540 3.226297  1042
## [70]  {Shakespeare in Love (1998)}                                                  => {Graduate, The (1967)}                        0.03363769  0.2996096 0.1122717 3.237297   921
## [71]  {Goodfellas (1990)}                                                           => {Jaws (1975)}                                 0.04605551  0.2987444 0.1541636 2.985263  1261
## [72]  {Clockwork Orange, A (1971)}                                                  => {Jaws (1975)}                                 0.03816654  0.2986568 0.1277940 2.984388  1045
## [73]  {Ghostbusters (a.k.a. Ghost Busters) (1984)}                                  => {Big (1988)}                                  0.03761870  0.2986373 0.1259679 4.039866  1030
## [74]  {Wizard of Oz, The (1939)}                                                    => {Sound of Music, The (1965)}                  0.03484295  0.2978458 0.1169832 4.713883   954
## [75]  {Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1964)} => {Jaws (1975)}                                 0.04061359  0.2978040 0.1363769 2.975866  1112
## [76]  {Shrek (2001)}                                                                => {Shrek 2 (2004)}                              0.04408327  0.2975838 0.1481373 5.626965  1207
## [77]  {Finding Nemo (2003)}                                                         => {Shrek 2 (2004)}                              0.03265157  0.2975042 0.1097516 5.625458   894
## [78]  {Apocalypse Now (1979)}                                                       => {This Is Spinal Tap (1984)}                   0.03776479  0.2974684 0.1269540 3.521264  1034
## [79]  {Trainspotting (1996)}                                                        => {Donnie Darko (2001)}                         0.03732652  0.2971794 0.1256026 3.131937  1022
## [80]  {Stand by Me (1986)}                                                          => {Sting, The (1973)}                           0.03239591  0.2971524 0.1090212 3.328983   887
## [81]  {Good Will Hunting (1997)}                                                    => {Bourne Identity, The (2002)}                 0.04470416  0.2970153 0.1505113 3.017543  1224
## [82]  {Ghostbusters (a.k.a. Ghost Busters) (1984)}                                  => {Fish Called Wanda, A (1988)}                 0.03725347  0.2957379 0.1259679 3.150702  1020
## [83]  {Amelie (Fabuleux destin d'Amélie Poulain, Le) (2001)}                        => {Big Lebowski, The (1998)}                    0.04101534  0.2953709 0.1388605 2.818841  1123
## [84]  {Wizard of Oz, The (1939)}                                                    => {Fish Called Wanda, A (1988)}                 0.03455077  0.2953481 0.1169832 3.146549   946
## [85]  {Amadeus (1984)}                                                              => {Raising Arizona (1987)}                      0.03235939  0.2952349 0.1096056 3.970301   886
## [86]  {Monsters, Inc. (2001)}                                                       => {Kill Bill: Vol. 2 (2004)}                    0.03254200  0.2951308 0.1102630 3.339125   891
## [87]  {Lord of the Rings: The Fellowship of the Ring, The (2001)}                   => {Incredibles, The (2004)}                     0.06143170  0.2950877 0.2081812 3.111090  1682
## [88]  {Stand by Me (1986)}                                                          => {This Is Spinal Tap (1984)}                   0.03210373  0.2944724 0.1090212 3.485799   879
## [89]  {Being John Malkovich (1999)}                                                 => {Office Space (1999)}                         0.03995617  0.2940860 0.1358656 2.878826  1094
## [90]  {Rear Window (1954)}                                                          => {This Is Spinal Tap (1984)}                   0.03144631  0.2939570 0.1069759 3.479698   861
## [91]  {Memento (2000)}                                                              => {Kill Bill: Vol. 2 (2004)}                    0.05120526  0.2939203 0.1742148 3.325429  1402
## [92]  {Green Mile, The (1999)}                                                      => {Dead Poets Society (1989)}                   0.03151936  0.2938372 0.1072681 3.019994   863
## [93]  {Clockwork Orange, A (1971)}                                                  => {Graduate, The (1967)}                        0.03754565  0.2937982 0.1277940 3.174505  1028
## [94]  {Truman Show, The (1998)}                                                     => {Office Space (1999)}                         0.03192111  0.2937815 0.1086560 2.875845   874
## [95]  {Groundhog Day (1993)}                                                        => {Big Lebowski, The (1998)}                    0.04525201  0.2937411 0.1540541 2.803287  1239
## [96]  {American History X (1998)}                                                   => {Departed, The (2006)}                        0.03882396  0.2933223 0.1323594 3.745879  1063
## [97]  {Shakespeare in Love (1998)}                                                  => {Toy Story 2 (1999)}                          0.03290723  0.2931034 0.1122717 2.798177   901
## [98]  {Full Metal Jacket (1987)}                                                    => {Donnie Darko (2001)}                         0.03403944  0.2930818 0.1161432 3.088752   932
## [99]  {Monsters, Inc. (2001)}                                                       => {Office Space (1999)}                         0.03228634  0.2928122 0.1102630 2.866356   884
## [100] {Stand by Me (1986)}                                                          => {Airplane! (1980)}                            0.03192111  0.2927973 0.1090212 3.502311   874

Conclusions

In my project, I used the method of association rules learning to create and describe the key dependencies between films. The algorithm, limited to 100 films, is intended to help film enthusiasts solve the problem of choosing the next film to watch.