What recommendations can be built based on the similarity of user ratings for readers of marvel comics?

Since it would not be entirely relevant to build recommendations for an entire dataset of 777 books based on a community breakdown, I decided to choose one subgroup - a specific category of books.

Let’s examine the available data on “what other ‘shelf’, category this book is often assigned to” (popular_shelves.3.name), which is where the “marvel” category is found.

books_net_info_check2 = books_net_info_copy %>% group_by(popular_shelves.3.name) %>% summarize(number_of_books_per_category=n()) %>% arrange(desc(number_of_books_per_category)) %>% rename("Category" = popular_shelves.3.name,"N_books" =number_of_books_per_category)
books_net_info_check2 
## # A tibble: 41 × 2
##    Category          N_books
##    <chr>               <int>
##  1 graphic-novel         232
##  2 fantasy                96
##  3 marvel                 52
##  4 horror                 46
##  5 graphic-novels         43
##  6 currently-reading      42
##  7 comics                 39
##  8 batman                 34
##  9 cómics                 30
## 10 favorites              28
## # ℹ 31 more rows

There are 52 books in the “marvel” category - a sufficient number to analyze.

# initial graph
comics_marvel = comics_net

# Filtering data in the dataset by "marvel" category
books_net_info_2 = books_net_info_copy %>% filter(popular_shelves.3.name=="marvel")

# Remove unnecessary vertices from the graph
V(comics_marvel)$category = books_net_info_copy$popular_shelves.3.name
comics_marvel1 =delete.vertices(comics_marvel, V(comics_marvel)$category != "marvel")

# Add ID
V(comics_marvel1)$book_id = books_net_info_2$book_id

Betweenness

In order to reliably answer the recommendation question, we need to use the betweenness centrality measure.

The betweenness centrality measure denotes how many shortest paths pass through a point. In this case, where the links between nodes are “similarity of book scores”, the highest value of betweenness will indicate the most average score between two unrelated or poorly related groups of books, and the lowest values will have the most extraordinary scores. Let’s say if we have a group of books with scores between 3 and 3.2, and we also have another group with scores between 3.6 and 3.8, the book with a score of 3.4 will have the highest betweenness among these nodes (if it has a connection to both groups). The larger the groups and the smaller (up to 1) such bridge nodes are, the more betweenness a node acting as a bridge has.

Thus, nodes with a large value of betweenness will signal that this comic can be recommended to more than one group, and nodes with a small value - on the contrary, that this comic can be recommended to no more than one group.

marvel_betw = books_net_info_2 %>% 
  transmute(book_id,
            betw = betweenness(comics_marvel1)) %>% 
  arrange(desc(betw)) %>% rename("Book ID" = book_id, "Betwenness"=betw)

head(marvel_betw)
##     Book ID Betwenness
## 1:  4645370      119.0
## 2: 23017961       89.0
## 3: 25066770       87.0
## 4: 17899546       60.0
## 5:    59962       48.5
## 6:    31981       18.0
plot(comics_marvel1,
     vertex.size=0.2*betweenness(comics_marvel1),
     vertex.label = V(comics_marvel1)$book_id,
     vertex.label.cex = 0.8)

Only 9 comics out of 52 have a betweenness value not equal to zero. This means that the graph is extremely heterogeneous.

Closeness

Also, we can use the closeness centrality measure. The closeness centrality measure indicates which nodes are closest to other nodes. In other words, the number of steps it takes to get from one node to another will play the biggest role here. Closeness in this case will depend on which estimates were the most common. The most common estimates in the entire network have the highest closeness - since the more points that are the fastest to get to from a given point - the greater the closeness. The fewest estimates have the smallest closeness - from them the path to the largest number of estimates will be the largest.

Accordingly, the more closeness, the more books can be recommended, the less closeness - the narrower the circle of possible recommendations.

options(scipen = 9999)
marvel_betw_clo = books_net_info_2 %>% 
  transmute(book_id,
            closeness = closeness(comics_marvel1,normalized = TRUE),
            betw = betweenness(comics_marvel1)
            ) %>% 
  arrange(desc(closeness)) %>% rename("Book ID" = book_id, "Betwenness"=betw)
head(marvel_betw_clo,15)
##      Book ID closeness Betwenness
##  1: 17251115 1.0000000          2
##  2:   211461 1.0000000          0
##  3: 17182373 1.0000000          0
##  4:  9293295 1.0000000          0
##  5:   485381 1.0000000          0
##  6: 17277815 1.0000000          1
##  7:   207585 1.0000000          0
##  8:   105973 1.0000000          0
##  9: 17251114 0.7500000          0
## 10: 25066773 0.7500000          0
## 11: 18478257 0.6666667          0
## 12:   105925 0.6666667          0
## 13: 23018001 0.6000000          0
## 14:  4645370 0.4871795        119
## 15: 23017961 0.4222222         89

19/52 vertices are isolated - that is why a lot of books do not have closeness.

vertex_size = 10*closeness(comics_marvel1, normalized = T)
vertex_size[is.na(vertex_size)] = 0

plot(comics_marvel1,
     vertex.size= vertex_size,
     vertex.label = V(comics_marvel1)$book_id,
     vertex.label.cex = 0.8)

There are some groups of nodes with high closeness, most of the nodes lies in a rather narrow range of values: from 0.2435897 to 0.3877551 - which indicates that the nodes do not lie too close to each other.

Communities

Partitioning the graph into communities will make it easy to determine which to recommend: which books can be recommended in case a person liked book A - books from the community in which book A lies. In order to choose which partitioning method to use let’s look at the modularity of different methods of partitioning this graph.

Walktrap modularity

wt <- walktrap.community(comics_marvel1)
modularity(wt)
## [1] 0.7288781

Fast Greedy modularity

fg <- fastgreedy.community(comics_marvel1)
modularity(fg)
## [1] 0.7288781

Edge Betweenness modularity

eb = edge.betweenness.community(comics_marvel1)
modularity(eb)
## [1] 0.7288781

Modularity is equally high in all methods, and hence we use Walktrap.

set.seed(12346)
plot(fg, comics_marvel1,
     vertex.label = books_net_info_2$title_without_series,
     vertex.label.cex = 0.9,
     vertex.color = membership(wt),
     vertex.size=0.2*betweenness(comics_marvel1),
     edge.width=0.0001)

Conclusion

Using the Walktrap partitioning method, 27 communities were formed, of which only 8 communities are a group of multiple node links, the remaining 19 communities consist of a single node, and therefore cannot be recommended based on the criterion of similarity of user rating, regardless of what other book from this graph a person would like. Only 3 books (“The Invincible Iron Man, Volume 1: The Five Nightmares”, “Storm, Vol. 1: Make it Rain,” “Hawkeye, Volume 5: All-New Hawkeye”) can be recommended in more than one community based on similarity of user ratings. The other recommendation relationships are indicated in the box above.

The criterion of similarity of user ratings for comics from the “marvel” category is not very good for selecting recommendations. There are too few links between nodes in the graph (in other words, user ratings vary too much).