We completed a sentiment analysis on four popular *Learning Management Systems (LMS); Google Classroom, Canvas, Moodle, and Blackboard, in a previous case study. We evaluated emotions and sentiment towards each LMS by assessing the most common uniwords. We evaluated strengths and weaknesses by pulling public opinion from the Twitter Resting API.
We will evaluate connections between the most frequent combination of words (Bigrams) through Social Network Analysis. Here we’ll explore this option for the tweets made in English for the 4 LMS previously observed.
Guiding Questions:
Let’s first load our libraries to read in packages that we will use to answer our questions. We will also create a function called replace_reg() by looking for and deleting strings (nonsense words), numbers, and other manual stop characters and numbers.
library(tidytext)
library(tidyverse)
library(network)
library(sna)
library(visNetwork)
library(threejs)
library(ndtv)
library(qgraph)
library(splitstackshape)
library(tidyr)
library(stringr)
library(readxl)
library(readr)
# For visualizations
library(vtree)
library(igraph)
library(ggraph)
library(tidygraph)
library(networkD3)
library(ggplot2)
# regex for parsing tweets
replace_reg <- "https?://[^\\s]+|&|<|>|&d2l;|&aristotlemrs;|&aleks;|\bRT\\b"
# Custom Color Palette
my_colors <- c("#05A4C0", "#85CEDA", "#D2A7D8", "#A67BC5", "#BB1C8B", "#8D266E")
Our initial read-in data frame includes 4521 tweets objects in the text to evaluate. After tokenizing the Bigrams and restructuring the data objects, we will include Bigrams mentioned more than five times. Once tidyed, the data consists of 1407 bigrams left to evaluate with a social network approach.
tweets <- read_excel("data/tweets.xlsx")
#select lms, text and index, grouping by lms
tweets_data2 <- tweets %>%
select(c('index', 'lms', 'text')) %>%
group_by(lms) %>%
na.omit()
vtree(tweets_data2, "lms", horiz=FALSE, palette = 4, sortfill = TRUE, title="Initial LMS Tweets Data")
# split into word pairs
lms_bigrams <- tweets_data2 %>%
mutate(text = str_replace_all(text, replace_reg, "")) %>%
unnest_tokens(bigram, text, token = "ngrams", n = 2)
# remove stop words
lms_bigrams <- lms_bigrams %>%
separate(bigram, into = c("first","second"), sep = " ", remove = FALSE) %>%
anti_join(stop_words, by = c("first" = "word")) %>%
anti_join(stop_words, by = c("second" = "word")) %>%
filter(str_detect(first, "[a-z]") &
str_detect(second, "[a-z]"))
#count up new birgams and create a new column called n only keep more than 5 counts
lms_bigrams_count <- lms_bigrams %>%
group_by(lms, bigram, first, second)%>%
summarise(n=n())%>%
filter(n >= 5)%>%
arrange(-n)%>%
ungroup()
#visualize new number of rows (previously counting tweets)
vtree(lms_bigrams_count, "lms", horiz=FALSE, palette = 4, sortfill = TRUE, title="Bigram LMS Tweets Data")
# Rename and reorder columns (so we can make the graphs more easily)
lms_bigram_tbl <- lms_bigrams_count %>%
dplyr::select(c('first','second', 'n'))
bigram_graph <- lms_bigram_tbl %>%
filter(n > 35) %>%
graph_from_data_frame()
set.seed(123)
a <- grid::arrow(type = "closed", length = unit(.15, "inches"))
ggraph(bigram_graph, layout = "fr") +
geom_edge_link(aes(edge_alpha = n), show.legend = FALSE, arrow = a) +
geom_node_point(color = "lightblue", size = 5) +
geom_node_text(aes(label = name), vjust = 1, hjust = 1) +
theme_void()
To evaluate Social Network for Bigrams we will create a igraph and table graph class network. This will easily allow for visualizing the connections and explaining the network mathmatically.
Nodes are the unique words - each word has an identification ID edges are the bigrams, meaning that they show how frequently we find a combination of 2 words (represented by their unique ID).
#filter to Bigrams that are mentioned more than 35 times.
lms_df <- lms_bigram_tbl %>%
filter(n > 35)
# Distinct first (part of bigram)
sources <- lms_df%>%
distinct(first) %>%
rename(label = first)
# Distinct second (part of bigram)
destinations <- lms_df %>%
distinct(second) %>%
rename(label = second)
#NODES AND EDGES BELOW:
# ----- NODES -----
# Unique Items + create unique ID
nodes <- full_join(sources, destinations, by="label") %>% rowid_to_column("id")
# ----- EDGES -----
# Adds unique ID of Item 1 to data
edges <- lms_df %>%
left_join(nodes, by = c("first" = "label")) %>%
rename(from = id)
# Adds unique ID of Item 2 to data
edges <- edges %>%
left_join(nodes, by = c("second" = "label")) %>%
rename(to = id) %>%
rename(weight = n)
# Select only From | To | Weight (frequency)
edges <- edges %>% select(from, to, weight)
# inspect head of nodes and edges
nodes %>%
head(5)
## # A tibble: 5 × 2
## id label
## <int> <chr>
## 1 1 google
## 2 2 canvas
## 3 3 essay
## 4 4 pearson
## 5 5 aleks
edges %>%
head(5)
## # A tibble: 5 × 3
## from to weight
## <int> <int> <int>
## 1 1 83 1688
## 2 2 4 309
## 3 3 9 232
## 4 3 10 187
## 5 4 7 182
# Create network
net1 <- graph_from_data_frame(d = edges, vertices = nodes, directed = TRUE)
E(net1) # The edges of the "net" object
## + 115/115 edges from c497fd1 (vertex names):
## [1] 1 ->83 2 ->4 3 ->9 3 ->10 4 ->7 5 ->2 6 ->13 7 ->5 8 ->11
## [10] 9 ->10 4 ->5 10->3 11->10 12->89 13->14 14->90 15->2 16->72
## [19] 17->91 2 ->92 18->93 19->1 5 ->7 20->26 21->20 22->25 23->21
## [28] 24->94 25->33 26->27 27->29 28->32 29->30 30->31 31->28 10->95
## [37] 2 ->71 32->66 16->51 2 ->16 33->28 34->16 35->37 36->39 37->36
## [46] 38->70 39->41 40->2 28->7 41->43 42->40 43->42 44->48 16->50
## [55] 45->46 46->49 47->44 48->45 49->16 50->38 51->47 52->96 2 ->75
## [64] 53->97 6 ->98 54->99 55->2 56->55 9 ->59 57->100 58->3 22->4
## [73] 14->95 11->9 59->53 60->10 61->101 62->63 63->2 64->102 65->79
## [82] 66->67 67->68 68->69 69->65 70->35 53->103 71->2 72->78 9 ->104
## + ... omitted several edges
V(net1) # The vertices of the "net" object
## + 112/112 vertices, named, from c497fd1:
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
## [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
## [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
## [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
## [91] 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108
## [109] 109 110 111 112
edge.start <- ends(net1, es=E(net1), names=F)[,1]
edge.col <- V(net1)$color[edge.start]
plot(net1, edge.color=edge.col, edge.curved=.1)
+We can see that net1 is a class of igraph. TO go further we need to change to table graph class, this will add a weight column. Second, we will visualize our added weight column.
#check class
class(net1)
## [1] "igraph"
#update to table graph for weight
net2 <- as_tbl_graph(net1)
class(net2)
## [1] "tbl_graph" "igraph"
ggraph(net2, layout = "fr") +
geom_node_point(size = 3) +
geom_edge_link(aes(colour = weight)) +
theme_graph()
The size of a network centers around the number of nodes and edges in a network. Here we can see: - i. number of vertices is 112. - ii. number of edges is 115.
#number of vertices
gorder(net2)
## [1] 112
#number of edges
gsize(net2)
## [1] 115
Degree measures the extent to which relations are focused on one or a small set of actors. Degree refers to the number of ties an actor either sends (out-degree), receives (in-degree), or in the case of a non-directed network or both sent and received in a directed network, simply just “degree” for all actors to which one is connected.
The Bigram network seems to have a very decentralize graph, slightly more centralized around in-degree.
#calculate all-degree score
centr_degree(net2, mode = "all")
## $res
## [1] 2 14 5 4 4 2 6 1 7 8 4 1 2 3 2 8 2 1 1 2 2 2 1 1 2
## [26] 2 2 4 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [51] 2 1 5 1 2 1 2 1 2 2 1 1 2 1 2 2 2 2 2 2 3 2 1 1 2
## [76] 1 2 3 2 2 1 2 2 1 1 2 1 1 1 1 1 1 1 2 2 1 1 1 1 1
## [101] 1 1 1 1 1 1 1 1 1 1 1 1
##
## $centralization
## [1] 0.05429754
##
## $theoretical_max
## [1] 24642
#activate the actors degree
net2 <- net2 |>
activate(nodes) |>
mutate(degree = centrality_degree(mode = "all"))
#calculate in-degree score
centr_degree(net2, mode = "in")
## $res
## [1] 1 9 3 2 2 0 3 0 3 4 2 0 1 1 0 3 1 0 0 1 1 0 0 0 1 1 1 2 1 1 1 1 1 0 1 1 1
## [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 2 0 1 0 1 0 1 1 0 0 1 0 1 1 1 1 1 1 2 1 0 0
## [75] 1 0 1 2 1 1 0 1 1 0 0 1 0 0 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [112] 1
##
## $centralization
## [1] 0.07183076
##
## $theoretical_max
## [1] 12432
#activate the actors degree
net2 <- net2 |>
activate(nodes) |>
mutate(degree = centrality_degree(mode = "in"))
#calculate out-degree score
centr_degree(net2, mode = "out")
## $res
## [1] 1 5 2 2 2 2 3 1 4 4 2 1 1 2 2 5 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1
## [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [112] 0
##
## $centralization
## [1] 0.03579472
##
## $theoretical_max
## [1] 12432
#activate the actors degree
net2 <- net2 |>
activate(nodes) |>
mutate(degree = centrality_degree(mode = "out"))
#Inspect degree visually
ggraph(net2) +
geom_node_point(aes(size = degree, color = degree)) +
geom_edge_link(aes(color = weight)) +
theme_graph()
We can see that we have a very low density at 0.0093. The closer this number is to 1.0, the denser the network. It appears that out network does not have very many ties.
#calculate edge density
graph.density(net2)
## [1] 0.009250322
Reciprocity reveals the direction through which resources in networks flow between dyads and whether or not it flows in both directions.
The reciprocity of 0.087 implies that response between actors with positive action is low.
#calculate reciprocity
reciprocity(net2)
## [1] 0.08695652
net2 <- net2 |>
activate(edges) |>
mutate(reciprocated = edge_is_mutual())
net2
## # A tbl_graph: 112 nodes and 115 edges
## #
## # A directed simple graph with 13 components
## #
## # Edge Data: 115 × 4 (active)
## from to weight reciprocated
## <int> <int> <int> <lgl>
## 1 1 83 1688 FALSE
## 2 2 4 309 FALSE
## 3 3 9 232 TRUE
## 4 3 10 187 TRUE
## 5 4 7 182 FALSE
## 6 5 2 170 FALSE
## # … with 109 more rows
## #
## # Node Data: 112 × 3
## name label degree
## <chr> <chr> <dbl>
## 1 1 google 1
## 2 2 canvas 5
## 3 3 essay 2
## # … with 109 more rows
ggraph(net2) +
geom_node_point(aes(size = degree)) +
geom_edge_link(aes(color = reciprocated)) +
theme_graph()
Transitivity focuses on triads, or any “triple” of actors. Transitivity is connected to actors’ tendencies to divide into exclusive subgroups or cluster over time, especially around positive relations such as friendship.
Transitivity of 0.105
#calculate transitivity
transitivity(net2)
## [1] 0.1052632
A network diameter is the longest geodesic distance (length of the shortest path between two nodes) in the network.
View diameter.
Inspect diameter visually.
Diamter calculation
diam <- get_diameter(net2, directed=T)
diam
## + 4/112 vertices, named, from c497fd1:
## [1] 19 1 83 109
vcol <- rep("gray40", vcount(net2))
vcol[diam] <- "gold"
ecol <- rep("gray80", ecount(net2))
ecol[E(net2, path=diam)] <- "orange"
# E(net, path=diam) finds edges along a path, here 'diam'
plot(net2, vertex.label=NA, vertex.color=vcol, edge.color=ecol, edge.arrow.mode=0)
The average path length, measures the mean distance between all pairs of actors in the network.
#calculate the mean distance
mean_distance(net2)
## [1] 6.883668
Purpose - The purpose of the case study is to look at the **social network* of Bigrams from a Tweet Dataset pulled previously on Sentiment for four popular Learning Management Systems(LMS); Google Classroom, Canvas, Moodle, and Blackboard. Understanding how information is shared within the network is important to understand for why a LMS may be mentioned more than another LMS on Twitter.
Methods - For this independent analysis I explored tweet Bigrams, Social Networks, and Network Analysis Mathematically.
Findings - The LMS, Google Classroom, Canvas, Moodle, and Blackboard did not seems to have a very good flow of information.
The words are paired by co-occurrence. by networks.
Canvas is a top network.
A slightly higher In-Degree connection.
The authorities did not receive very much coming from the Hubs.
Discussion - Bigrams network might show the general idea of the content of the information gathered in twitter posts. Insights from a case study like this may be used to guide Public and Private organizations looking to monitor how information regarding the product is transmitted. A Bigram analysis from collected Tweets may show terms that may not be identical to other analysis.
Hachaj, T., & Ogiela, M. R. (2018, October). What Can Be Learned from Bigrams Analysis of Messages in Social Network?. In 2018 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI) (pp. 1-4). IEEE.