Narrative Summary
Set Up
Network Plot
In-Degree, Out-Degree, Degree, Reciprocity, and Clustering Coefficient of Follow
Betweenness
K-Step Reach
Top 15 Books
Followers Analysis
Summary

Narrative Summary

For my project, I used my Instagram account that has over 12,000 followers and posts about books to explore different topics and questions. The topic I will be exploring is my Instagram followers and their top 10 books of last year. I want to see if shared reading preferences influence their social circle/who they follow and any trends in the books they’ve read. Questions I will be exploring are about my follower’s reading preferences, if they are similar to each other, and if shared reading preferences among my followers influence their social connections within the network. I hope to see how connected the network between the different accounts. Also, if any, see if there are any nodes that are important to the network and stand out.

Set Up

The edges of this network are the books that accounts have read that are the same to others. The nodes in this network are 100 follower’s accounts. I collected the data from May 6 to May 18. I collected this data through the lists that the accounts sent me of their top 10 books which I manually entered in. For the following data, I copied who they were following and found a website that would compare it to the list of the other accounts in the network and find the account names that were similar in both lists. Here are the links to my data sets: Books, Following

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(knitr)
library(igraph)

## 
## Attaching package: 'igraph'

## The following objects are masked from 'package:dplyr':
## 
##     as_data_frame, groups, union

## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum

## The following object is masked from 'package:base':
## 
##     union

library(ggplot2)

knitr::opts_chunk$set(
  fig.width = 12,  
  fig.height = 8, 
  dpi = 300 # plots less blurry when converted to word 
)

setwd("~/Desktop/stats/Final")
getwd()

## [1] "/Users/audriellestaples/Desktop/stats/Final"

books <- read.csv("Books.csv", header =T, row.names =1)
books <- as.matrix(books)
same_books <- books%*%t(books)
booknetwork <- graph_from_incidence_matrix(books)

## Warning: `graph_from_incidence_matrix()` was deprecated in igraph 1.6.0.
## ℹ Please use `graph_from_biadjacency_matrix()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

booknetwork.bp <- bipartite_projection(booknetwork)
same_booksnet <- graph_from_adjacency_matrix(same_books, mode = "undirected",
                                             diag= F, weighted = T )

following <- read.csv("_Key 2.csv", header = T, row.names = 1)
following <-as.matrix(following)
graph <- graph_from_adjacency_matrix(following)

Network Plot

For this network plot, I wanted to see how connected the network is with their same favorite book between accounts and the nodes are weighed by how many other accounts they follow in this network.

set.seed(2020)
follow_degree <- degree(graph, mode="total")
follow_indeg <- degree(graph, mode="in") 
follow_outdeg <- degree(graph, mode = "out")
dat <- data.frame(follow_degree, follow_indeg, follow_outdeg)

plot(booknetwork.bp$proj1, 
     vertex.color = "pink", edge.color = "lightgrey", vertex.label = NA,  
     main="Book Network",
    sub = "Nodes weighed by no. of people they're following",
     vertex.size=follow_outdeg*0.7, 
    )

In-Degree, Out-Degree, Degree, Reciprocity, and Clustering Coefficient of Follow

With using the measures of degree, we can see that not everyone follows each other back. Looking and comparing the in-degree vs the out-degree, 22 accounts have the same number for in-degree and out-degree, while a large majority have a difference between the two. The highest difference being 10. A lot of these are also book accounts, so it could suggest that accounts with a greater in-degree are more particular in who they follow or their content is more seen and known to other accounts. They’re attracting a lot more accounts than they are reciprocating back the follow. Accounts that have a greater out-degree are not as selective when it comes to following back other accounts and don’t care if people follow them back, or they follow a lot of different accounts hoping those accounts will see them and follow them back. When collecting this data, I noticed that a lot of the accounts that had a larger number of followers than following are the same accounts that have a higher in-degree when looking at the table printed out and vice versa. I think the in-degree and the out-degree of this network are also reflective of their followers/following as a whole for their account. The score for the following of the default is 0.6825939, which indicates a high level of reciprocity that if an account has a follow/edge to another account, there’s a good chance that it is reciprocated back to that account. The ratio reciprocity had a score of 0.5181347 which suggests that just over half of the network for the follow/edges are reciprocated and that there’s a prominent presence of only one direction edges. The clustering coefficient had a score of 0.2240185 which is a low score and suggests that there are very few of the possible triplets in the graph that are actually closed. Overall there’s a low density of close knit communities or accounts following each other.

#the accounts that have a difference in out-degree and in-degree in a bar graph
degree_difference <- abs(follow_indeg - follow_outdeg)
degree_df <- data.frame(Node = V(graph)$name, Degree_Difference = degree_difference)
degree_df_positive <- degree_df[degree_df$Degree_Difference > 0, ]
degree_df_positive_ordered <- degree_df_positive[order(-degree_df_positive$Degree_Difference), ]
#print(degree_df_positive_ordered)

ggplot(degree_df_positive_ordered, aes(x = reorder(Node, -Degree_Difference), y = Degree_Difference)) +
  geom_bar(stat = "identity", fill = "pink") +
  theme_minimal() +
  labs(title = "Degree Difference Bar Graph",
       x = "Account",
       y = "Degree Difference") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1), plot.title = element_text(hjust = 0.5))

#reciprocity for following
reciprocity(graph, mode = c("default"))

## [1] 0.6825939

reciprocity(graph, mode = c("ratio"))

## [1] 0.5181347

#clustering coefficient 
transitivity(graph, type="global")

## [1] 0.2240185

par(mfrow=c(1,3), mar=c(1,1,1,1))

set.seed(2022)
plot(graph,vertex.size= follow_indeg *0.7, edge.arrow.size=0.28 ,
      vertex.label =NA, edge.width = 0.7, main = "In Degree", vertex.color = "pink"
      )

set.seed(2022)
plot(graph,
     vertex.size = follow_outdeg*0.7, edge.arrow.size=0.28, edge.width = 0.7,
     vertex.label =NA, main = "Out Degree", vertex.color = "pink"
      )

set.seed(2022)
plot(graph, vertex.size = follow_degree*0.5, edge.arrow.size=0.28, 
      edge.width = 0.7, main = "Degree", vertex.color = "pink",
      vertex.label =NA 
      )

Betweenness

For the betweenness of the book network there are clearly some nodes that are much larger than the rest of the nodes in the network. For the books, I wanted to see if there were any accounts that were the shortest links/distance between the accounts for books, and later on I did the same thing for the following. These nodes are the shortest links or distances between various other accounts in the network. Their central position suggests they are “middle man” connecting different parts of the network together. The larger node sizes could also imply these accounts have read multiple top books in the network which is how they are the shortest distance for many other nodes. About three or four nodes stand out as very large, indicating their high importance in the network. In the betweenness graph for the following, it appears that there’s one node that’s clearly larger than the rest. This shows that the node is important, creating connections between various followers. Its position shows its importance in maintaining the network’s connectivity and flow as it’s in the center. Comparing the two side by side, you can see that there are a lot more edges/connections with the books compared to the follow.

#betweenness for books
between1 <- betweenness(same_booksnet, directed = TRUE, normalized = FALSE)
bet_normalized1 <- betweenness(same_booksnet, directed = TRUE, normalized = TRUE)
bet.data1 <- data.frame(between1, bet_normalized1)
#bet.data1

#betweenness for follow
between <- betweenness(graph, directed = TRUE, normalized = FALSE)
bet_normalized <- betweenness(graph, directed = TRUE, normalized = TRUE)
bet.dat <- data.frame(between, bet_normalized)
#bet.dat

par(mfrow=c(1,2), mar=c(1,1,1,1))
set.seed(2022)
plot(same_booksnet, edge.arrow.size=.2, vertex.label=NA, vertex.color="pink",
     vertex.size=between1*.1, main="Betweenness (books)")
set.seed(2022)
plot(graph, edge.arrow.size=.2, vertex.label=NA, vertex.color="pink",
     vertex.size=between*.02, main="Betweenness (following)")

K-Step Reach

For the book network, a good amount of the nodes/accounts have a score above 0.8 which suggests that in two steps they’re able to reach a good amount of other nodes. In the graph you can see nodes that have a score that is closer to 0 are on the outside of the network and have one or less connection to the other nodes. It makes sense that the majority have a score close to 1 because they all have at least one book that they’ve read and liked the same as someone else, since there were over 500 books listed. I highlighted one of the nodes that had a score of 0.01 and has no edges with any of the other nodes because it’s really hard to see it being weighted with no ties to any other node. For the following plot, clearly in the middle are the nodes that have a higher score and can easily reach other nodes because of their connections which impacts their location on the network. The outer nodes once again have a slower score which shows their nodes being smaller compared to the ones toward the middle of the network. I also highlighted and grouped the three nodes that had a score of 0.01 since they don’t follow anyone and no one follows them in this network.

#K-step for books
reach3<-function(x){
  r=vector(length=vcount(x))
  for (i in 1:vcount(x)){
    n=neighborhood(x,2,nodes=i)
    ni=unlist(n)
    l=length(ni)
    r[i]=(l)/vcount(x)}
  r}

reach3(same_booksnet)

##   [1] 0.72 0.01 0.94 0.45 0.98 0.91 0.98 0.98 0.98 0.89 0.95 0.95 0.98 0.93 0.98
##  [16] 0.98 0.95 0.98 0.54 0.98 0.95 0.93 0.95 0.98 0.96 0.77 0.99 0.98 0.88 0.98
##  [31] 0.92 0.95 0.94 0.96 0.98 0.86 0.98 0.97 0.91 0.89 0.98 0.96 0.98 0.91 0.98
##  [46] 0.94 0.05 0.98 0.98 0.98 0.91 0.95 0.89 0.94 0.92 0.75 0.94 0.89 0.98 0.98
##  [61] 0.98 0.98 0.94 0.94 0.93 0.91 0.93 0.95 0.98 0.84 0.93 0.98 0.98 0.72 0.95
##  [76] 0.96 0.87 0.91 0.93 0.94 0.56 0.82 0.89 0.64 0.92 0.98 0.87 0.94 0.98 0.98
##  [91] 0.94 0.92 0.96 0.98 0.96 0.98 0.89 0.98 0.98 0.98

#K-step for follow
reach2<-function(x){
  r=vector(length=vcount(x))
  for (i in 1:vcount(x)){
    n=neighborhood(x,2,nodes=i)
    ni=unlist(n)
    l=length(ni)
    r[i]=(l)/vcount(x)}
  r}
reach2(graph)

##   [1] 0.03 0.03 0.47 0.03 0.41 0.45 0.60 0.68 0.69 0.51 0.40 0.05 0.35 0.40 0.57
##  [16] 0.64 0.41 0.47 0.33 0.68 0.42 0.01 0.38 0.45 0.41 0.55 0.81 0.20 0.60 0.27
##  [31] 0.59 0.65 0.62 0.52 0.55 0.57 0.56 0.23 0.01 0.53 0.24 0.30 0.27 0.14 0.50
##  [46] 0.60 0.60 0.67 0.52 0.61 0.46 0.63 0.01 0.48 0.25 0.38 0.67 0.33 0.62 0.55
##  [61] 0.73 0.62 0.75 0.38 0.71 0.46 0.47 0.41 0.41 0.40 0.56 0.58 0.56 0.25 0.47
##  [76] 0.68 0.80 0.62 0.44 0.58 0.08 0.32 0.28 0.15 0.59 0.73 0.48 0.19 0.59 0.38
##  [91] 0.28 0.35 0.57 0.62 0.67 0.57 0.32 0.54 0.32 0.47

par(mfrow=c(1,2), mar=c(1,1,1,1))

highlight_nodes <- c("A2")
highlight_indices <- which(V(same_booksnet)$name %in% highlight_nodes)
mark_groups <- list(highlight_indices)
mark_colors <- c("#C5E5E7")

book_reach3 <- reach3(same_booksnet)
set.seed(3022)
plot(same_booksnet, edge.arrow.size=.2, vertex.label=NA, vertex.color="pink",
     vertex.size=book_reach3*7, main = "Reach (books)", mark.groups = mark_groups, 
     mark.col = mark_colors, 
     mark.border = NA)


highlight_node1 <- c("C4", "E3", "F8")
highlight_indices1 <- which(V(graph)$name %in% highlight_node1)
mark_groups1 <- list(highlight_indices1)
mark_colors1 <- c("#C5E5E7")

follow_reach2 <- reach2(graph)
set.seed(3022)
plot(graph, edge.arrow.size=.19, vertex.label=NA, vertex.color="pink",
     vertex.size=follow_reach2*10, main = "Reach (following)", mark.groups = mark_groups1, 
     mark.col = mark_colors1, 
     mark.border = NA)

Top 15 Books

Here I wanted to explore what the top 15 most liked books for these 100 followers were in 2023. 10 out of 15 of the books listed here were released in 2023. This shows that a lot of these people are knowledgeable in new releases and books tend to be the most popular/read a lot in the year that they’re released. Even though this is a smaller network, I think it highlights the social trends for literature/books last year. 12 out of 15 of the books were romances which makes sense since romance is the #1 seller. A majority of my followers are women, and the majority group of people who read romances are women.

book_edges <- get.data.frame(booknetwork, what="edges")

## Warning: `get.data.frame()` was deprecated in igraph 2.0.0.
## ℹ Please use `as_data_frame()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

book_counts <- book_edges %>%
  group_by(to) %>%
  summarise(count = n())

sorted_books <- book_counts[order(-book_counts$count), ]
top_15_books <- head(sorted_books, 15)
print(top_15_books)

## # A tibble: 15 × 2
##    to                       count
##    <chr>                    <int>
##  1 Fourth.Wing                 36
##  2 Divine.Rivals               23
##  3 Happy.Place                 19
##  4 Heartless                   13
##  5 The.Seven.Year.Slip         12
##  6 Yours.Truly                 12
##  7 Love..Theoretically.        11
##  8 Better.Than.the.Movies      10
##  9 Caught.Up                   10
## 10 A.Court.of.Mist.and.Fury     9
## 11 Final.Offer                  9
## 12 Archer.s.Voice               8
## 13 Out.On.a.Limb                8
## 14 The.Right.Move               8
## 15 Beach.Read                   7

Followers Analysis

Here I wanted to see if accounts who read multiple of the same books as other accounts were more likely to follow each other. I took all the accounts that shared 2 or more books together and for this there were a total of 399 accounts out of 4,950 different combinations of possible accounts that could have multiple of the same favorite books. This is shown by the first table where “Follow” is if the first account follows the second, “Followed by” is if the second account follows the first, and then the last column is how many of the same favorite books they shared. Out of the 399, only 52 had at least one account following the other account. Shown in the second table. Based on these findings, I don’t believe that whether accounts follow each other impacts their favorite books that much.

edges <- get.data.frame(same_booksnet, what = "edges")

edges <- edges %>%
  dplyr::mutate(pair = apply(edges[, c("from", "to")], 1, function(x) paste(sort(x), collapse = "-"))) %>%
  dplyr::distinct(pair, .keep_all = TRUE) %>%
  dplyr::select(-pair)

pairs_three_or_more <- edges[edges$weight >= 2, ]


result_table <- data.frame(Accounts = character(),
                           Follows = character(),
                           Followed_by = character(),
                           Same_Books = numeric())

for (i in 1:nrow(pairs_three_or_more)) {
  from <- pairs_three_or_more$from[i]
  to <- pairs_three_or_more$to[i]
  follows <- graph[from, to] == 1
  followed_by <- graph[to, from] == 1
  
  result_table <- rbind(result_table, data.frame(
    Accounts = paste(from, "-", to),
    Follows = ifelse(follows, "Yes", "No"),
    Followed_by = ifelse(followed_by, "Yes", "No"),
    Same_Books = pairs_three_or_more$weight[i]
  ))
}

result_table <- result_table[order(-result_table$Same_Books), ]
#print(result_table)

#summarized version of number of accounts that follow each other and don't
counts <- result_table %>%
  group_by(Follows, Followed_by) %>%
  summarize(Count = n())

## `summarise()` has grouped output by 'Follows'. You can override using the
## `.groups` argument.

print(counts)

## # A tibble: 4 × 3
## # Groups:   Follows [2]
##   Follows Followed_by Count
##   <chr>   <chr>       <int>
## 1 No      No            345
## 2 No      Yes            14
## 3 Yes     No              9
## 4 Yes     Yes            29

Summary

These plots and measures give an insight into the dynamics of my Instagram network, focusing on shared reading preferences among my followers and their social connections/followings. I found that within this network, a majority of accounts don’t reciprocate follows, shown by the degrees. By analyzing the network using measures like degree and betweenness, I found key nodes that act are important within the network, suggesting their influence in the network. The exploration of K-step reach shows the reachability of nodes within the network, with a lot of the nodes closer to 1 indicating strong connectivity due to shared book preferences.The reciprocity scores show that there’s a good chance that if an account is following another account, that other account is following them back and that there’s a prominent presence of only one direction edges. The clustering coefficient suggests that there are very few of the possible triplets in the graph that are actually closed. Overall there’s a low density of close knit communities or accounts following each other. Also, the visualization of the network helps find isolated nodes, while also highlighting the importance of other nodes. Overall, my findings shed light on how shared interests; such as reading preferences, impact social connections within your Instagram network, offering valuable insights into community dynamics and interactions, and finding trends in the most listed books. Since there were 100 accounts and over 500 books listed, it is highly likely that there were a lot of connections between accounts. Looking further into the accounts that had the most same books; who accounts follow doesn’t impact their favorite books as much as I initially thought.

Additional work for this could be not going into the following as much, but interactions. A lot of different accounts interact with each other and see each other’s posts/reviews through the explore or recommended posts which could influence their books. I think an additional work and limitation is that these are people’s top 10 books. A lot of these books people could have read, but didn’t enjoy so it wasn’t listed here. I don’t think following/interacting impacts our own personal thoughts on whether we like a book or not, but it could have a large impact on what books we decide to pick up and read instead. I think going more into what they read in a year shows the account’s reading preferences as a whole more.

Exploring My Instagram Followers’ Reading Preferences and Their Following

Audrielle Staples

2024-05-31