1. PREPARE

Our Unit 4 Case Study: Hashtag Common Core is inspired by the work of Jonathan Supovitz, Alan Daly, Miguel del Fresno and Christian Kolouch who examined the intense debate surrounding the Common Core State Standards education reform as it played out on Twitter. As noted on their expansive and interactive website for the #COMMONCORE Project, the Common Core was a major education policy initiative of the early 21st century. A primary aim of the Common Core was to strengthen education systems across the United States through a set of specific and challenging education standards. Although these standards once enjoyed bipartisan support, we saw in our previous case study how these standards have become a political punching bag.

In Unit 4, we continue our investigation of tweets around the these controversial state standards through social network analysis. Specifically, this case study will cover the following topics pertaining to each data-intensive workflow process:

  1. Prepare: Prior to analysis, we’ll take a look at the context from which our data came, formulate some research questions, and get introduced the {tidygraph} and {ggraph} packages for analyzing relational data.

  2. Wrangle: In the wrangling section of our case study, we will learn some basic techniques for manipulating, cleaning, transforming, and merging network data.

  3. Explore: With our network data tidied, we learn to calculate some key network measures and to illustrate some of these stats through network visualization.

  4. Model: We conclude our analysis by introducing community detection algorithms for identifying groups and revisiting sentiment about the common core.

  5. Communicate: We briefly reflect on our walkthrough in and what we learned from our analysis.

1a. Review the Research

Recall from Social Network Analysis and Education: Theory, Methods & Applications that Carolyn (2013) cited the following four features used by Freeman (2004) to define the social network perspective:

  1. Social network analysis is motivated by a relational intuition based on ties connecting social actors.

  2. It is firmly grounded in systematic empirical data.

  3. It makes use of graphic imagery to represent actors and their relations with one another.

  4. It relies on mathematical and/or computational models to succinctly represent the complexity of social life.

The #COMMONCORE Project that we’ll examine next is an exemplary illustration of these four defining features of the social network perspective.

The #commoncore Project

Supovitz, J., Daly, A.J., del Fresno, M., & Kolouch, C. (2017). #commoncore Project. Retrieved from http://www.hashtagcommoncore.com.

Prologue

As noted by Supovitz et al. (2017), the Common Core State Standards have been a “persistent flashpoint in the debate over the direction of American education.” The #commoncore Project explores the Common Core debate on Twitter using a combination of social network analyses and psychological investigations which help to reveal both the underlying social structure of the conversation and the motivations of the participants.

The central question guiding this investigation was:

How are social media-enabled social networks changing the discourse in American politics that produces and sustains education policy?

Data Sources & Analyses

The methods page of the #COMMONCORE Project provides a detailed discussion of the data and analyses used to arrive at the conclusions in #commoncore: How social media is changing the politics of education. Provided below is a summary of how authors retrieved data from Twitter and the analyses applied for each of the five acts in the website. I highly encourage you to take a look at this section if you’d like to learn more about their approach and in particular if you’re unfamiliar with how users can interact and communicate on Twitter.

Data Collection

To collect data on keywords related to the Common Core, the project used a customized data collection tool developed by two of our co-authors, Miguel del Fresno and Alan J. Daly, called Social Runner LabTM. Similar to an approach we used in Unit 3, the authors downloaded data in real time directly from Twitter’s Application Programming Interface (API) based on tweets using specified keywords, keyphrases, or hashtags and then restricted their analysis to the following terms: commoncore, ccss and stopcommoncore. They also captured Twitter profile names, or user names, as well as the tweets, retweets, and mentions posted. Data included messages that are public on twitter, but not private direct messages between individuals, nor from accounts which users have made private.

Analyses

In order to address their research question, the authors applied social network analysis techniques in addition to qualitative and automated text mining approaches. For social network analyses, each node is an individual Twitter user (person, group, institution, etc.) and the connection between each node is the tweet, retweet, or mention/reply. After retrieving data from the Twitter API, the authors created a file that could be analyzed in Gephi, an open-source software program which depicts the relations as networks and provides metrics for describing features of the network.

In addition to data visualization and network descriptives, the authors examined group development and lexical tendencies among users. For group development, they used a community detection algorithm to identify and represent structural sub-communities, or “factions,” which they describe as a group with more ties within than across group even those group boundaries are somewhat porous). For lexical tendencies, the authors used the Linguistic Inquiry and Word Count (LIWC) lexicons to determine psychological drive, diagnose their level of conviction, make inferences about thinking styles, and even determine sentiment similar to what we examined in Unit 2.

For a nice summary of the data used for the analysis, as well as the samples of actors and tweets, the keywords, and the methods that were utilized, see Table 1. Data and Method for Each Act in the Methods section of the #commoncore website.

Key Findings

In the #commoncore Project, analyses of almost 1 million tweets sent by about 190,000 distinct actors over a period of 32 months revealed the following:

  • In Act 1, The Giant Network, the authors identified five major sub-communities, or factions, in the Twitter debate surrounding the Common Core, including: (1) supporters of the Common Core, (2) opponents of the standards from inside education, and (3) opponents from outside of education.
  • In Act 2, Central Actors, they noted that most of these participants were casual contributors – almost 95% of them made fewer than 10 tweets in any given six-month period. They also distinguished between two types of influence on Twitter: Transmitters who tweeted a lot, regardless of the extent of their followership; and Transceivers, those who gained their influence by being frequently retweeted and mentioned.
  • In Act 3, Key Events, the authors identified issues driving the major spikes in the conversation, like when Secretary of Education Duncan spoke about white suburban moms’ opposition to the Common Core, or the debate over the authorization of the Every Student Succeeds Act in November 2015. They also offended evidence of manufactured controversies spurred by sensationalizing minor issues and outright fake news stories.
  • In Act 4, Lexical Tendencies, the authors examined the linguistic tendencies of the three major factions and found that Common Core supporters used the highest number of conviction words, tended to use more achievement-oriented language, and used more words associated with a formal and analytic thinking style. By contrast, opponents of the Common Core from within education tended to use more words associated with sadness, and used more narrative thinking style language. Opponents of the Common Core from outside of education made the highest use of words associated with peer affiliation, used the largest number of angry words, and exhibited the lowest level of conviction in their word choices.
  • In Act 5, The Tweet Machine, examined five frames that opponents of the Common Core used to appeal to values of particular subgroups including the government frame, business frame, war frame, experiment frame, and propaganda frame. By combining these constituencies, the authors illustrated how the Common Core developed a strong transpartisan coalition of opposition.

👉 Your Turn

For our Unit 4 Walkthrough, we’ll apply some of the same techniques used by this study including calculating some basic network measures of centrality. For example, take a look at the Explore the Networks section from Act 2: Central Actors and the Transmitters, Transceivers and Transcenders.

Click on each of the boxes for a select time period and in the space below list a couple Twitter users identified by their analysis as the following:

  • Transmitters:

  • Transceivers:

  • Transcenders:

Now visit Act 2 of the methods section and In the space below, identify the network measure used for the first two central actors (hint: it’s italicized) and copy their definition used. Note, Transcenders has been completed for you.

  • Transmitters

  • Transceivers

  • Transcenders is measured by degree and consists of actors (i.e. Twitter uers) who have both high out-degree, as well as having high in-degree.

1b. Identify a Question(s)

Recall from above that the central question guiding the #COMMONCORE Project was:

How are social media-enabled social networks changing the discourse in American politics that produces and sustains education policy?

For Unit 4, we are going to focus our questions on something a bit less ambitious but inspired by this work:

  1. Who are the transmitters, transceivers, and transcenders in our Common Core Twitter network?
  2. What subgroups, or factions, exist in our network?
  3. Which actors in our network tend to be more opposed to the Common Core?

To address the last question, we’ll revisit our techniques we learned from our Unit 3 VADER sentiment analysis.

👉 Your Turn

Based on what you know about networks and the context so far, what other research question(s) might ask we ask in this context that a social network perspective might be able to answer?

In the space below, type a brief response to the following questions:

  • YOUR RESPONSE HERE

1c. Set Up Project

As highlighted in Chapter 6 of Data Science in Education Using R (DSIEUR), one of the first steps of every workflow should be to set up your “Project” within RStudio. Recall that:

A Project is the home for all of the files, images, reports, and code that are used in any given project

Since we are working in RStudio Cloud, a Project has already been set up for you as indicated by the .Rproj file in your main directory in the Files pane.

Load Libraries

In Unit 1, we also learned about packages, or libraries, which are shareable collections of R code that can contain functions, data, and/or documentation and extend the functionality of R. You can always check to see which packages have already been installed and loaded into RStudio Cloud by looking at the the Files, Plots, & Packages Pane in the lower right hand corner.

👉 Your Turn

First, load the {tidyverse}, {tidytext} and {vader} packages from previous case studies. We’ll be using several of these packages to wrangling and explore our data later sections:

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.3     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   2.0.1     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(tidytext)
library(vader)

Next, we will introduce a few new packages that will help us with our social network analyses.

tidygraph 📦

The {tidygraph} package is a huge package that exports 280 different functions and methods, including access to almost all of the dplyr verbs plus a few more, developed for use with relational data. While network data itself is not tidy, it can be envisioned as two tidy tables, one for node data and one for edge data. The {tidygraph} package provides a way to switch between the two tables and uses dplyr verbs to manipulate them. Furthermore it provides access to a lot of graph algorithms with return values that facilitate their use in a tidy workflow.

Let’s go ahead and load the {tidygraph} library:

library(tidygraph)
## 
## Attaching package: 'tidygraph'
## The following object is masked from 'package:stats':
## 
##     filter

ggraph 📦

Created by the same developer as {tidygraph}, {ggraph} – pronounced gg-raph or g-giraffe hence the logo – is an extension of {ggplot} aimed at supporting relational data structures such as networks, graphs, and trees. Both packages are more modern and widely adopted approaches data visualization in R.

While ggraph builds upon the foundation of ggplot and its API, it comes with its own self-contained set of geoms, facets, etc., as well as adding the concept of layouts to the grammar of graphics, i.e. the “gg” in ggplot and ggraph.

Let’s go ahead and load the {ggraph} library:

library(ggraph)

igraph Package 📦

Both {tidygraph} and {ggraph} depend heavily igraph network analysis package. The main goals of the igraph package and the collection of network analysis tools it contains are to provide a set of data types and functions for:

  1. pain-free implementation of graph algorithms,

  2. fast handling of large graphs, with millions of vertices (i.e., actors or nodes) and edges,

  3. allowing rapid prototyping via high level languages like R.

Run the code chunk below to load the {igraph} library:

library(igraph)
## 
## Attaching package: 'igraph'
## The following object is masked from 'package:tidygraph':
## 
##     groups
## The following objects are masked from 'package:dplyr':
## 
##     as_data_frame, groups, union
## The following objects are masked from 'package:purrr':
## 
##     compose, simplify
## The following object is masked from 'package:tidyr':
## 
##     crossing
## The following object is masked from 'package:tibble':
## 
##     as_data_frame
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## The following object is masked from 'package:base':
## 
##     union

Import Data

Finally, we’ll be using a fresh data download from twitter that only includes relatively small set of tweets from the past week to help keep things simple. Network graphs can quickly get unwieldy as the nodes in your network grow. The #commoncore project does an excellent job visualizing large networks using advanced techniques. We’ll focus on a relatively small network since we’ll be introduce some very basic techniques for visualization.

👉 Your Turn

Use the read_csv() function from the {readr} package to read the ccss-tweets-fresh.csv file from the data folder and assign to a new data frame named ccss_tweets:

ccss_tweets <- read_csv("data/ccss-tweets-fresh.csv")
## Rows: 48 Columns: 90
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (49): screen_name, text, source, reply_to_screen_name, hashtags, urls_u...
## dbl  (26): user_id, status_id, display_text_width, reply_to_status_id, reply...
## lgl  (11): is_quote, is_retweet, quote_count, reply_count, symbols, ext_medi...
## dttm  (4): created_at, quoted_created_at, retweet_created_at, account_create...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

2. WRANGLE

In general, data wrangling involves some combination of cleaning, reshaping, transforming, and merging data (Wickham and Grolemund 2016). As highlighted in Estrellado et al. (2020), wrangling network data can be even more challenging than other data sources since network data often includes variables about both individuals and their relationships.

For our data wrangling this week, we’re keeping it simple since working with relational data is a bit of a departure from our working with rectangular data frames. Our primary goals for Unit 1 are learning how to:

  1. Create an Edgelist. In this section, we introduce a new tidy data format call the “edgelist” which contains information about each edge (i.e. connections or ties between actors) in our network.

  2. Format Network Data. We learn to shape our data files into two common formats for storing network data: edgelists and nodelists.

  3. Create a Network Object. Finally, we’ll need to convert our data frames into special data format, an R network object, for working with relational data.

2b. Format Network Data

Recall from Chapter 1 of Carolan (2014) that ties, or relations, are what connect actors to one another. These ties are often referred to as “edges” when formatting and graphing network data and the range of ties in a network can be extensive. Some of the more common ties, or edges, used to denote connections among actors in educational research include:

  • Behavioral interaction (e.g., talking to each other or sending messages)

  • Physical connection (e.g., sitting together at lunch, living in the same neighborhood)

  • Association or affiliation (e.g., taking the same courses, belonging to the same peer group)

  • Evaluation of one person by another (e.g., considering someone a friend or enemy)

  • Formal relations (e.g., knowing who has authority over whom)

  • Moving between places or status (e.g., school choice preferences, dating patterns among adolescents)

Create Edgelist

The edgelist format is slightly different than other formats you have likely worked with in that the values in the first two columns of each row represent a dyad, or tie between two nodes in a network. An edge-list can also contain other information regarding the strength, duration, or frequency of the relationship, sometime called weight, in addition to other “edge attributes.”

For our analysis of tweets in this case study, we’ll use an approach similar to that used by Supovitz et al. (2017). Specifically, each node is an individual Twitter user (person, group, institution, etc.) and the connection between each node is the tweet, retweet, or mention/reply.

👉 Your Turn

Use the code chunk below to look at the data we just imported using one of your favorite methods for inspecting data. In the space that follows, identify the columns you think could be used to construct an edge list.

view(ccss_tweets)
  • YOUR RESPONSE HERE

Extract Edges and Nodes

If one of the columns you indicated in your response above included screen_name nice work! You may also have noticed that the mentions_screen_name column includes the names of those in the reply column, as well as those “mentioned” in each tweet, i.e. whose username was included in a post.

As noted above, an edgelist includes the nodes that make up a tie or dyad. Since the only two columns we need to construct our edgelist is the screen_name of the tweet author and the screen names of those included in the mentions, let’s:

  1. relocate() and rename those columns to sender and target to indicate the “direction” of the tweet;

  2. select() those columns along with any attributes that we think might be useful for analysis later on, like the timestamp or content of the tweet, i.e. text;

  3. assign our new data frame to ties_1.

ties_1 <-  ccss_tweets %>%
  relocate(sender = screen_name, # rename scree_name to sender
           target = mentions_screen_name) %>% # rename to receiver
  select(sender,
         target,
         created_at,
         text)

ties_1
## # A tibble: 48 × 4
##    sender          target              created_at          text                 
##    <chr>           <chr>               <dttm>              <chr>                
##  1 ben_combes      <NA>                2021-11-08 07:26:01 "I never got the gas…
##  2 NYSCER2021      TeachLiberty1 nana… 2021-11-08 01:49:17 "@TeachLiberty1 @nan…
##  3 SchnauzieLover  <NA>                2021-11-07 21:41:16 "Education Failures …
##  4 2bux15cents     <NA>                2021-11-07 17:32:08 "At what point will …
##  5 AlexiosAsText   MaggieEThornton     2021-11-07 15:22:14 "@MaggieEThornton If…
##  6 tx_granny       BanjoTanJoe Millen… 2021-11-06 19:29:41 "@MillennialOther @R…
##  7 PaprBagPrincss  <NA>                2021-11-05 20:31:28 "#CCChat (Common Cor…
##  8 Tech4Learning   <NA>                2021-11-05 18:01:00 "6 ways to incorpora…
##  9 listland        <NA>                2021-11-05 12:30:40 "Top 10 Reasons Why …
## 10 CapitolAdvocate <NA>                2021-11-05 12:28:25 "\"I’ve had students…
## # … with 38 more rows

Our edgelist with attributes is almost ready, but we have a couple of issues we still need to deal with.

As you may have noticed, our receiver column contains the names of multiple Twitter users, but a dyad or tie can only be between two actors or nodes.

“Unnest” User Names

In order to place each target user in a separate row that corresponds with each sender of the tweet, we will need to “unnest” these names using the {tidytext} package from Unit 3.

In Unit 3 we used the unnest_tokens() function to extract individual words from tweets. This time we’ll use it to place target of each tweet into separate rows corresponding to the sender of each tweet.

Run the following code and take a look at the new data frame to make sure it looks as expect and to spot any potential issues:

ties_2 <- ties_1 %>%
  unnest_tokens(input = target,
                output = receiver,
                to_lower = FALSE) %>%
  relocate(sender, receiver)

ties_2
## # A tibble: 166 × 4
##    sender         receiver        created_at          text                      
##    <chr>          <chr>           <dttm>              <chr>                     
##  1 ben_combes     <NA>            2021-11-08 07:26:01 I never got the gas bridg…
##  2 NYSCER2021     TeachLiberty1   2021-11-08 01:49:17 @TeachLiberty1 @nanalatin…
##  3 NYSCER2021     nanalatinaAA    2021-11-08 01:49:17 @TeachLiberty1 @nanalatin…
##  4 NYSCER2021     rweingarten     2021-11-08 01:49:17 @TeachLiberty1 @nanalatin…
##  5 NYSCER2021     AFTunion        2021-11-08 01:49:17 @TeachLiberty1 @nanalatin…
##  6 SchnauzieLover <NA>            2021-11-07 21:41:16 Education Failures since …
##  7 2bux15cents    <NA>            2021-11-07 17:32:08 At what point will the #c…
##  8 AlexiosAsText  MaggieEThornton 2021-11-07 15:22:14 @MaggieEThornton If enoug…
##  9 tx_granny      BanjoTanJoe     2021-11-06 19:29:41 @MillennialOther @Richard…
## 10 tx_granny      MillennialOther 2021-11-06 19:29:41 @MillennialOther @Richard…
## # … with 156 more rows

You probably notice one issue that we could deal with in a couple different ways. Specifically, many tweets are not directed at other users and hence the value for receiver is NA. We could keep these users as “isolates” in our network, but let’s simply remove them since our primary goal is to identify transmitters, transceivers and transcenders.

👉 Your Turn

In the code chunk below, use the drop_na() function to remove the rows with missing values from our receiver column in our ties-2 data frame since they are incomplete dyads. Save your final edgelist as ties using the <- assignment operator.

ties <- ties_2 %>%
  drop_na(receiver)

Now use the code chunk below to take a quick look at our final edgelist and answer the questions that follow:

ties
  1. How many edges are in our CCSS network?

    • YOUR RESPONSE HERE
  2. What do these ties/edges/connections represent?

    • YOUR RESPONSE HERE
  3. When do they start and end?

    • YOUR RESPONSE HERE

Create Nodelist

The second file we need to create is a data frame that contains all the nodes, often referred to as actors, in our network. This list sometimes includes attributes that we might want to examine as part of our analysis as well, like the number of Twitter followers, demographic group, country, gender, etc. This file or data frame is sometimes referred to as a nodelist or node attribute file.

To construct our basic nodelist, we will use the pivot_longer() function from the {dplyr} package, an updated version of the gather() function introduced in the RStudio Reshape Data Primer.

Run the following code to select the usernames form our ties edgelist, merge our sender and receiver columns into a single column, and take a quick look at our new data frame.

actors_1 <- ties %>%
  select(sender, receiver) %>%
  pivot_longer(cols = c(sender,receiver))

actors_1
## # A tibble: 286 × 2
##    name     value          
##    <chr>    <chr>          
##  1 sender   NYSCER2021     
##  2 receiver TeachLiberty1  
##  3 sender   NYSCER2021     
##  4 receiver nanalatinaAA   
##  5 sender   NYSCER2021     
##  6 receiver rweingarten    
##  7 sender   NYSCER2021     
##  8 receiver AFTunion       
##  9 sender   AlexiosAsText  
## 10 receiver MaggieEThornton
## # … with 276 more rows

As you can see, our sender and receiver usernames have been combined into a single column called value with a new column indicating whether they were a sender or receiver called name by default.

Since we’re only interested in usernames for our nodelist, and since we have a number of duplicate names, let’s select just the value column, rename it to actors, and use the distinct() function from {dplyr} to keep only unique names and remove duplicates:

actors <- actors_1 %>%
  select(value) %>%
  rename(actors = value) %>% 
  distinct()

👉 Your Turn

Use the code chunk below to take a quick look at our final actors data frame and answer question that follows:

actors

How many unique node/actors are in our CCSS network?

  • YOUR RESPONSE HERE

2c. Create Network Object

Before we can begin using many of the functions from the {tidygraph} and {ggraph} packages for summarizing and visualizing our Common Core Twitter network, we first need to convert our node and edge lists into network object.

Combine Edges & Nodes

The {tidygraph} package contains a tbl_graph() function that includes the following arguments:

  • edges = expects a data frame, in our case ties, containing information about the edges in the graph. The nodes of each edge must either be in a to and from column, or in the two first columns like the data frame we provided.

  • nodes = expects a data frame, in our case actors, containing information about the nodes in the graph. If to and/or from are characters or names, like in our data frames, then they will be matched to the column named according to node_key in nodes, if it exists, or matched to the first column in the node list.

  • directed = specifies whether the constructed graph should be directed, i.e. include information about whether each node is the sender or target of a connection. By default this is set to TRUE.

Let’s go ahead and create our network graph, name it network and print the output:

ccss_network_1 <- tbl_graph(edges = ties, 
                            nodes = actors)

ccss_network_1
## # A tbl_graph: 94 nodes and 143 edges
## #
## # A directed multigraph with 13 components
## #
## # Node Data: 94 × 1 (active)
##   actors       
##   <chr>        
## 1 NYSCER2021   
## 2 TeachLiberty1
## 3 nanalatinaAA 
## 4 rweingarten  
## 5 AFTunion     
## 6 AlexiosAsText
## # … with 88 more rows
## #
## # Edge Data: 143 × 4
##    from    to created_at          text                                          
##   <int> <int> <dttm>              <chr>                                         
## 1     1     2 2021-11-08 01:49:17 @TeachLiberty1 @nanalatinaAA We are happy to …
## 2     1     3 2021-11-08 01:49:17 @TeachLiberty1 @nanalatinaAA We are happy to …
## 3     1     4 2021-11-08 01:49:17 @TeachLiberty1 @nanalatinaAA We are happy to …
## # … with 140 more rows

👉 Your Turn

Take a look at the output for our simple graph above and answer the following questions:

  1. Are the numbers and names of nodes and actors consistent with our actors and ties data frames? What about the integers included in the from and to columns of the Edge Data?

    • YOUR RESPONSE HERE
  2. What do you think “components” refers to? Hint: see Chapter 6 of (Carolan 2014).

    • YOUR RESPONSE HERE
  3. Is our network directed or undirected?

    • YOUR RESPONSE HERE

A quick note about “directed” networks. A directed network indicates that a connection or tie is not necessarily reciprocated or mutual. For example, our network is directed because even though a user may follow, mention or reply to someone else, they may not necessarily receive a follow, mention or reply back. A facebook friendship network is undirected, however, since someone can not “friend” someone on Facebook without them also indicating a “friend” relationship, so the friendship is mutual or reciprocated.


3. EXPLORE

As noted in in our course readings, exploratory data analysis involves the processes of describing your data (such as by calculating the means and standard deviations of numeric variables, or counting the frequency of categorical variables) and, often, visualizing your data prior to modeling.

In Section 3, we use the {tidygraph} package for retrieving network descriptives and introduce the {ggraph} package to create a network visualization to help illustrate these metrics. Specifically, in this section we’ll learn to:

  1. Examine Basic Descriptives. We focus primarily on actors and edges in this walkthrough, including the edges wights we added in the previous section as well as node degree, and import and fairly intuitive measure of centrality.

  2. Make a Sociogram. Finally, we wrap up the explore phases by learning to plot a network and tweak key elements like the size, shape, and position of nodes and edges to better at communicating key findings.

3a. Examine Basic Descriptives

Many analyses of social networks are primarily descriptive. As Carolan (2014) notes, these descriptive studies aim either to:

  1. represent the network’s underlying social structure through data-reduction techniques; or,

  2. characterize network properties through network measures.

Centrality

A key structural property of networks is the concept of centralization. A network that is highly centralized is one in which relations are focused on a small number of actors or even a single actor in a network, whereas ties in a decentralized network are diffuse and spread over a number of actors. One of the most common descriptives reported in network studies and a primary measure of centralization is degree.

Degree is the number of ties to and from an ego. In a directed network, in-degree is the number of ties received, whereas out-degree is the number of ties sent.

Node Degree

The {tidygraph} package has an unique function called activate() that allows us to treat the nodes in our network object as if they were a typical data frame to which we can then apply standard tidyverse functions like select(), filter(), and mutate().

The latter function, mutate(), we can use to create new variables for nodes such as measures of degree, in-degree, and out-degree used by Supovitz et al. (2017) to identify transcenders, tranceivers, and transmitters respectively. These measures can be added by using the centrality_degree() function in the {tidygraph} package.

Run the following code to add degree measures to each of our nodes and print the output for inspection:

ccss_network <- ccss_network_1 %>%
  activate(nodes) %>%
  mutate(degree = centrality_degree(mode = "all")) %>%
  mutate(in_degree = centrality_degree(mode = "in"))

ccss_network
## # A tbl_graph: 94 nodes and 143 edges
## #
## # A directed multigraph with 13 components
## #
## # Node Data: 94 × 3 (active)
##   actors        degree in_degree
##   <chr>          <dbl>     <dbl>
## 1 NYSCER2021         4         0
## 2 TeachLiberty1      1         1
## 3 nanalatinaAA       1         1
## 4 rweingarten        1         1
## 5 AFTunion           1         1
## 6 AlexiosAsText      1         0
## # … with 88 more rows
## #
## # Edge Data: 143 × 4
##    from    to created_at          text                                          
##   <int> <int> <dttm>              <chr>                                         
## 1     1     2 2021-11-08 01:49:17 @TeachLiberty1 @nanalatinaAA We are happy to …
## 2     1     3 2021-11-08 01:49:17 @TeachLiberty1 @nanalatinaAA We are happy to …
## 3     1     4 2021-11-08 01:49:17 @TeachLiberty1 @nanalatinaAA We are happy to …
## # … with 140 more rows

We now see that these simple measures of centrality have been added to the nodes in our network.

👉 Your Turn

As you may have noted above, we forgot to include out-degree in our measures of centrality. Copy the code from above and add an out-degree measure to each node and print the output:

ccss_network <- ccss_network_1 %>%
  activate(nodes) %>%
  mutate(degree = centrality_degree(mode = "all")) %>%
  mutate(in_degree = centrality_degree(mode = "in")) %>%
  mutate(out_degree = centrality_degree(mode = "out"))

ccss_network
## # A tbl_graph: 94 nodes and 143 edges
## #
## # A directed multigraph with 13 components
## #
## # Node Data: 94 × 4 (active)
##   actors        degree in_degree out_degree
##   <chr>          <dbl>     <dbl>      <dbl>
## 1 NYSCER2021         4         0          4
## 2 TeachLiberty1      1         1          0
## 3 nanalatinaAA       1         1          0
## 4 rweingarten        1         1          0
## 5 AFTunion           1         1          0
## 6 AlexiosAsText      1         0          1
## # … with 88 more rows
## #
## # Edge Data: 143 × 4
##    from    to created_at          text                                          
##   <int> <int> <dttm>              <chr>                                         
## 1     1     2 2021-11-08 01:49:17 @TeachLiberty1 @nanalatinaAA We are happy to …
## 2     1     3 2021-11-08 01:49:17 @TeachLiberty1 @nanalatinaAA We are happy to …
## 3     1     4 2021-11-08 01:49:17 @TeachLiberty1 @nanalatinaAA We are happy to …
## # … with 140 more rows

Summarize Centrality Measures

We can also use the activate() function combined with the data.frame() function to extract our new measures to a separate data frame so we inspect our nodes individually and create some summary statistics using the handy summary() function.

node_measures <- ccss_network %>% 
  activate(nodes) %>%
  data.frame()

summary(node_measures)
##     actors              degree         in_degree       out_degree    
##  Length:94          Min.   : 1.000   Min.   :0.000   Min.   : 0.000  
##  Class :character   1st Qu.: 1.000   1st Qu.:1.000   1st Qu.: 0.000  
##  Mode  :character   Median : 2.000   Median :2.000   Median : 0.000  
##                     Mean   : 3.043   Mean   :1.521   Mean   : 1.521  
##                     3rd Qu.: 2.000   3rd Qu.:2.000   3rd Qu.: 0.000  
##                     Max.   :99.000   Max.   :6.000   Max.   :99.000

We see that typical nodes in this network are connected on average with 3 other Twitter users and have received on average mentions or replies from 1.5 users.

👉 Your Turn

Recall from the Prepare section that one of the questions guiding this analysis was:

Who are the transmitters, transceivers, and transcenders in our Common Core Twitter network?

Use the code chunk below to view() your node_measures data frame and answer the questions that follow:

view(node_measures)

Identify the Twitter user with the highest value for each of the following types of central actors:

  • Transmitters:

  • Transceivers:

  • Transcenders:

3b. Make a Sociogram

If you recall from 1a. Review the Research section, one of the defining characteristics of the social network perspective is its use of graphic imagery to represent actors and their relations with one another. To emphasize this point, Carolan (2014) reported that:

The visualization of social networks has been a core practice since its foundation more than 90 years ago and remains a hallmark of contemporary social network analysis. 

Network visualization can be used for a variety of purposes, ranging from highlighting key actors to even serving as works of art.

This excellent figure from Katya Ognyanova’s also excellent tutorial on Static and Dynamic Network Visualization with R helps illustrate the variety of goals a good network visualization can accomplish:

These visual representations of the actors and their relations, i.e. the network, are called a sociogram. Actors who are most central to the network, such as those with higher node degrees, are usually placed in the center of the sociogram and their ties are placed near them. As we’ll see in just a bit, those two actors with hundreds of ties will be placed by most graph layout algorithms in the center of the graph.

👉 Your Turn

The plot() function from R’s built in {graphics} package can be used to make a simple sociogram, but as you’ll see, it’s a bit lacking.

In the code chunk below, use the plot() function with your ccss_network object to see what the basic plot function produces:

plot(ccss_network)

If this had been a smaller network like one generated from a single classroom, this might have been a little more useful, but for larger networks like our Twitter data, this doesn’t communicate much. In fact, it’s visualizations like these that give sociograms the unflattering nickname of “hair ball” plots!

Fortunately, the {ggraph} package includes a plethora of plotting parameters for graph layouts, edges and nodes to improve the visual design of network graphs.

For the remainder of this section, we’re going to focus on creating a sociogram that highlights the main “transmitters” in our network.

Build a network

One thing to keep in mind when building a network viz with {ggraph}, is that just like it’s ggplot() counterpart, the ggraph() function takes care of setting up the plot object along with creating the layout for the plot based on the network object and the layout specification provided.

Let’s first pass our ccss_network object to ggraph and see what happens.

ggraph(ccss_network)
## Using `stress` as default layout

As you can see, just like the ggplot() function, this didn’t produce much on it’s own. All that the ggraph() function does is set up the network object to make a sociogram, and create a layout for our network, in this case the default “stress” layout.

Add Nodes

Very similar to how ggplot() uses the + operator to “layer” functions together to progressively build graphs, ggraph use the + operator progressively build a sociogram.

To add our nodes, we’ll added the geom_node_point() function. Again, just like with {ggplot2}, the “geom” in the geom_non_point() functions stands for “Geometric elements,” or geoms for short, and represents what you actually see in the plot.

👉 Your Turn

Now “add” the geom_node_point() function to our code using the + operator:

ggraph(ccss_network) + 
  geom_node_point() 
## Using `stress` as default layout

Well, at least we have our nodes now! But the default “stress” layout for our sociogram is not so great. Let’s fix that.

Add Layout

One of the major advances in visualization since the first hand-drawn sociograms developed by Jacob Moreno (1934) to represent relations among children in school is the use of software and algorithms to automatically layout networks on a grid.

There are may different layout methods, but we’ll start with the Fruchterman-Reingold (FR) layout, which is one of the most used layout algorithms for network visualization. These types of force-directed algorithms generally work well with large networks and try to layout graphs in “an aesthetically-pleasing way” by making edges roughly equal in length and minimizing overlap.

Let’s go ahead and include the layout argument, which in addition to including its own unique layouts, can incorporate layouts form {igraph} package like fr for the Fruchterman-Reingold (FR) layout:

ggraph(ccss_network, layout = "fr") +
  geom_node_point()

That’s a little better. We can also start to make out our separate network “components” as distinct groups, with one larger connected group in the center. In our Model section, we’ll introduce community detection algorithms to automatically identify groups and use color to show group membership.

Tweak Nodes

Also like {ggplot2}, geoms can include aesthetics, or aes for short, such as alpha for transparency, as well as color, shape and size.

Let’s now add some “aesthetics” to our points by including the aes() function and arguments such as size = and color =, which we can set to our out_degree measures to help highlight our primary transmitters:

ggraph(ccss_network, layout = "fr") + 
geom_node_point(aes(size = out_degree,
                    color = out_degree))

We can clearly see we have one main transmitter in this network! Unfortunately, without labels we don’t know who this Twitter users is.

Let’s fix that by adding another layer with some node text and labels. Since node labels are a geometric element, we can apply aesthetics to them as well, like color and size. Let’s also include the repel = argument that when set to TRUE will avoid overlapping text.

ggraph(ccss_network, layout = "fr") + 
  geom_node_point(aes(size = out_degree,
                      color = out_degree)) +
  geom_node_text(aes(label = actors,
                     size = out_degree/2,
                     color = out_degree),
                 repel=TRUE)

Much better! Even without the edges, using size and color have helped to highlight the main “transmitter” in our network.

Add Edges

Now, let’s connect the dots and add some edges using the geom_edge_link() function. We’ll also include some arguments like arrow = to include some arrows 1mm in length, an end_cap = around each node to keep arrows from overlapping the them, and set the transparency of our edges to alpha = .2 so our edges fade more into the background and help keep the focus on our nodes:

ggraph(ccss_network, layout = "fr") + 
  geom_node_point(aes(size = out_degree,
                      color = out_degree)) +
  geom_node_text(aes(label = actors,
                     size = out_degree/2,
                     color = out_degree),
                 repel=TRUE) +
  geom_edge_link(arrow = arrow(length = unit(1, 'mm')), 
                 end_cap = circle(3, 'mm'),
                 alpha = .2)

Add a Theme

Finally, let’s add a theme, which controls the finer points of display, like the font size and background color. The theme_graph() function add a theme specially tuned for graph visualizations. This function removes redundant elements in order to put focus on the data and if you type ?theme_graph in the console you will get a sense of the level of fine tuning you can do if desired.

Let’s add theme_graph() to our sociogram, remove the legends since they are not especially useful, and call it good for now:

ggraph(ccss_network, layout = "fr") + 
  geom_node_point(aes(size = out_degree,
                      color = out_degree),
                  show.legend = FALSE) +
  geom_node_text(aes(label = actors,
                     size = out_degree/2,
                     color = out_degree),
                 repel=TRUE,
                 show.legend = FALSE) +
  geom_edge_link(arrow = arrow(length = unit(1, 'mm')), 
                 end_cap = circle(3, 'mm'),
                 alpha = .2) + 
  theme_graph()

Note: If you’re having difficulty seeing the sociogram in the small R Markdown code chunk, you can copy and paste the code in the console and it will show in the viewpane and then you can enlarge and even save as an image file.

👉 Your Turn

Try modifying the code below by tweaking the included function/arguments or adding new ones for layouts, nodes, and edges to highlight the main “transenders” or “tranceivers” in our network. See if you can also make your plot either more “aesthetically pleasing” and/or more purposeful in what it’s trying to communicate.

There are no right or wrong answers, just have some fun trying out different approaches!

ggraph(ccss_network, layout = "fr") + 
  geom_node_point(aes(size = out_degree,
                      color = out_degree),
                  show.legend = FALSE) +
  geom_node_text(aes(label = actors,
                     size = out_degree/2,
                     color = out_degree),
                 repel=TRUE,
                 show.legend = FALSE) +
  geom_edge_link(arrow = arrow(length = unit(1, 'mm')), 
                 end_cap = circle(3, 'mm'),
                 alpha = .2) + 
  theme_graph()

Congrats! You made it to the end of the Explore section and are ready to learn a little about network modeling! Before proceeding further, knit your document and check to see if you encounter any errors.


4. MODEL

As highlighted in Chapter 3 of Data Science in Education Using R, the Model step of the data science process entails “using statistical models, from simple to complex, to understand trends and patterns in the data.”

4a. Identify Groups

Chapter 6: Groups and Positions in Complete Networks of SNA and Education (Carolan 2014) introduces both “bottom up” and “top down” approaches for identifying groups in a network, as well as why researchers may be interested in exploring these groups. He also notes that:

Unlike most social science, the idea is to identify these groups through their relational data, not an exogenous attribute such as grade level, departmental affiliation, or years of experience. 

In this section, we’ll briefly explore a “top down” approach to identifying these groups through the use of community detection algorithms.

Community Detection

Similar to the range of functions included for calculating node and edge centrality, the {tidygraph} package includes various clustering functions provided by the {igraph} package. Some of these algorithms are designed for directed graphs, while others are for undirected graphs.

Also similar to calculating centrality measures, we need to activate() our nodes first before applying these community detection algorithms to assign our nodes to groups.

For the sake of simplicity, and because there seem to be some obvious groups based on network components, let’s group our nodes using the group_components() along with the mutate() function from {dplyr} to create a new group variable to our nodelist that provides a value based on the component to which each node belongs.

Run the following code to group our nodes and print our new cccss_network_groups object and take a quick look:

ccss_network_groups <- ccss_network %>%
  activate(nodes) %>%
  mutate(group = group_components())
  
ccss_network_groups
## # A tbl_graph: 94 nodes and 143 edges
## #
## # A directed multigraph with 13 components
## #
## # Node Data: 94 × 5 (active)
##   actors        degree in_degree out_degree group
##   <chr>          <dbl>     <dbl>      <dbl> <int>
## 1 NYSCER2021         4         0          4     3
## 2 TeachLiberty1      1         1          0     3
## 3 nanalatinaAA       1         1          0     3
## 4 rweingarten        1         1          0     3
## 5 AFTunion           1         1          0     3
## 6 AlexiosAsText      1         0          1    11
## # … with 88 more rows
## #
## # Edge Data: 143 × 4
##    from    to created_at          text                                          
##   <int> <int> <dttm>              <chr>                                         
## 1     1     2 2021-11-08 01:49:17 @TeachLiberty1 @nanalatinaAA We are happy to …
## 2     1     3 2021-11-08 01:49:17 @TeachLiberty1 @nanalatinaAA We are happy to …
## 3     1     4 2021-11-08 01:49:17 @TeachLiberty1 @nanalatinaAA We are happy to …
## # … with 140 more rows

Now that we’ve assigned our nodes to a group, let’s add one final layer to our sociogram from above by using the geom_node_voronoi() function to color our nodes by group assignment and changing our group numbers to factors so each group is a distinct color:

ccss_network_groups %>%
  ggraph(layout = "fr") + 
  geom_node_point(aes(size = out_degree, 
                      color = out_degree),
                  show.legend = FALSE) +
  geom_node_text(aes(label = actors, 
                     color = out_degree,
                     size = out_degree), 
                     repel=TRUE,
                 show.legend = FALSE) +
  geom_edge_link(arrow = arrow(length = unit(1, 'mm')), 
                 end_cap = circle(3, 'mm'),
                 alpha = .2) + 
  theme_graph() + 
  geom_node_voronoi(aes(fill = factor(group),
                         alpha = .05), 
                    max.radius = .5,
                    show.legend = FALSE) 
## Warning: ggrepel: 1 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

I’m not entirely sure this was an improvement to our sociogram but it definitely is colorful!!

4b. Identify Sentiment

Recall from the Prepare section that the final question guiding this analysis was:

Which actors in our network tend to be more opposed to the Common Core?

In this section, we’ll use the {vader} package for sentiment analysis introduce in Unit 3 to help answer this question.

👉 Your Turn

In the code chunk below, use the vader_df() function to perform sentiment analysis on the text in our ccss_tweets data frame created above, save as summary_vader and inspect the results. Hint: You will need to use the $ operator to select the text column containing our tweets.

summary_vader <- vader_df(ccss_tweets$text)

summary_vader
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                text
## 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     I never got the gas bridge argument for new #gas plants … who would invest billions in new CCGT plants to then only use them for half their productive life as deep #decarbonization, on the road to #NetZero, takes hold. I suppose there is always the elusive carbon capture #CCSS https://t.co/wUk4xHXnSt
## 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    @TeachLiberty1 @nanalatinaAA We are happy to support teachers but will NEVER support teacher unions any longer.  The screwed our kids during #CommonCore and .@rweingarten   .@AFTunion did the same with the mask mandates.  They only talk about our kids when they need something.  No more
## 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           Education Failures since 2012 can be traced back to #CommonCore and we MUST abolish it!! Our children are becoming STUPID and parents are unable to assist their children with this warped way of teaching. It’s NOT a way to learn! #AbolishCommonCore
## 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          At what point will the #commoncore integrate #compsci and #netizenship into curriculums for countries that are deeply dependent on #technology as a learning ‘solution’?
## 5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               @MaggieEThornton If enough were gathered could run a discourse analysis, but the gold standard would be to do something like Supovitz's #CommonCore project https://t.co/qROBwDpJCZ
## 6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   @MillennialOther @Richard_Harambe @RepMikeJohnson This is the result of a #commoncore education
## 7                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             #CCChat (Common Core Chat)\n90) #CCSS (Common Core State Standards)\n91) #CommonCore\n92) #EdPolicy\n93) #EdReform\n94) #ESSA https://t.co/IRYo2HCr41
## 8                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    6 ways to incorporate digital writing across the curriculum https://t.co/ISp4J9Pwnk #ccss #eBooks #digitalstorytelling https://t.co/tJpBZCdbMf
## 9                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       Top 10 Reasons Why Standardized Testing Is Here to Stay https://t.co/5cSYQragyv #commoncore
## 10                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              "I’ve had students who take home essays and they will misspell ‘American’ and forget to capitalize it.” Thanks #commoncore! https://t.co/3b4ZRrI6pj
## 11                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              "Again, I can’t be sure that I’m right. Maybe it was #CommonCore" @MichaelPetrilli \n"It was #commoncore." @CapitolAdvocate https://t.co/bKFbvxElDX
## 12                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  @MillennialOther @Richard_Harambe @RepMikeJohnson This is the result of a #commoncore education
## 13                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 @4Patrick7 @BookerSparticus What an ignoramus. What possible excuse is there for someone too old to blame #commoncore? Willful blindness, partisanship and tearing the social fabric of America.
## 14                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           A lot of the crt debate reminds me of the common core math debate. Parents just upset something is new and different combined with isolated examples of bad lesson plans. Been a while since I heard complaints about common core math....  #CRT #commoncore #teaching
## 15                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          #Republicans have been uniting us around #Florida #CommonCore for over 10 years now and #NewMexico launched "it" @ the grassroots level so not sure why #CRTinschools would be a threat to the #energy holding the #Anonymous #QANON Attack the Capitol agct #DNA connections together?
## 16                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Listen to "Episode 19: Big Lies in K-12"  (Nov. 3, 2021) at  https://t.co/UWaMsLUGAt \n#phonics   #constructivism   #commoncore  #dumbingdown  #socialism
## 17                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              The Shrewd Administrator\n#E320CannabisStudy\n#E320Leadership \n#E320Education \n#NVACS #CCSS #NGSS #ISTE \nhttps://t.co/eUGV6whHGv
## 18                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Never forget #CommonCore! https://t.co/co0UCdphI5
## 19                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         @Tori_Sachs @GovWhitmer Charter backers hold no moral high ground \n\n@MiGOP's (@onetoughnerd @MISenate @MI_Republicans @pjcolbeck ) had an opportunity to give parents a voice - primarily by removing DC's &amp; Lansing's stranglehold over local curriculum\n\nBut they threw #StopCommonCore advocates #UnderTheBus
## 20                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      🥜 job #AWFL on airplane doesn’t want to breathe the “unvaccinated oxygen”of the passenger next to her.  \n#commoncore #Science #edumacation #education #COVIDIOTS #COVID19
## 21                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     If you replace "harvest the blood of children" with "indoctrinates children with socialist propaganda" it's little more than a rerun of the vacuous hysterics we saw in response to #CommonCore. #edreform #politics https://t.co/QOd6skgSuH
## 22                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Is #commoncore math delaying the counting in #NewJersey?  Get it done! 🔥
## 23                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           Dear Dems, please be careful not to pass that 100% mark!!! #CommonCore
## 24                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              In the past couple of elections, the Dirty Democrats have proved that teaching &amp; learning #CommonCore math is screwing up the vote counting counters. Weird that the “discrepancies” always happen in key areas. #VirginiaVotes
## 25                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 Someone at FadeawayWorld doesn't know how to count. They put L.James at 3 &amp; K.A.Jabbar at 4. Both have 10 appearances w/ KAJ winning 6 rings &amp; LJ winning 4 rings. Shouldn't KAJ be ranked 3rd and LJ be 4th?? Is this due to common core math?? #fadeawayworld #commoncore #CommonSense
## 26                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              groups attempt ban"true history.#commoncore #Tennessee Nowhere instructed to make students feel less than as retired #teacher I initiated learning promoted critical thinking so history wouldn't repeat itself \n #TheView https://t.co/Ra54AYmnjO
## 27                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Trump: "#CommonCore is gone [when I'm electd]." #IACaucus https://t.co/Rb7UrTIEtc
## 28                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        @jonrappoport How the #CommonCore technocrats take your future-adult children's guns by taking their privacy now. https://t.co/CZmGf4qod5
## 29                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Trump: "#CommonCore is gone [when I'm electd]." #IACaucus https://t.co/Rb7UrTIEtc
## 30                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       Michael acts like the captain of a titanic know as #commoncore. Defends it even after it sunk. Tells people it wasn't the ships fault despite the overwhelming evidence that it was a complete and utter #failure. https://t.co/Fl7j4LeMRE
## 31                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       Why are we still accepting #commoncore #math? It was created by a dozen money grabbing fools who never taught math in a K-12 setting. \n\nThis has me absolutely murderous right now. And I love math god damn it. https://t.co/QjOsjZ0y69
## 32                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      There's more to learning than ABC's &amp; 123's https://t.co/qrHcv0JY2P no #commoncore #playmatters https://t.co/gZi3zcSPcB
## 33                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           @unusual_whales stable.. what? is that something to do with the vaccines? #CBDC #CCSS #BuildBackBetter
## 34                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  @NorthmanTrader also it was unable to collect the taxes.. thats why they need #WEFagenda, cashless payments, #CBDC #CCSS  #GreenPass #GreenDeal\nand minorities type socialism like #MeToo etc.. and #COVID19 to #DigitalTransformation #ID2020 \n#ToPayHisTaxes\n#BuildBackBetter #BBB https://t.co/iaKOOMsZTb
## 35                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       @zerohedge #CCSS China Credit Score System\n🥕💉🏏🍆🥒🥕💉🏏🥒🍆🇪🇺🌈🦄🍭🌈🍭🌈 #DoNotComply #LetsGoBrandon
## 36                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      There's more to learning than ABC's &amp; 123's https://t.co/qrHcv0JY2P no #commoncore #playmatters https://t.co/gZi3zcSPcB
## 37                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   Sending your child to a public school is like telling them walk across the road during rush hour traffic. Nothing good can come from it! #Parents #CRT #CommonCore\n#Indoctrination \n#Pastors #Church https://t.co/P1Cnw0iyEU
## 38                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        @johnpavlovitz @nazuzuwin Similar to their fear of #CommonCore. Knowledge terrifies them.
## 39                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         Cape Coral schools, Chesterfield Public Schools, #commoncore\nCity of Falls Church, Colts Neck School District\n(K/DRA 3) Read Aloud for Small and Large\nFREE activity at\nhttps://t.co/48MLi94V7C\nCheck out our site: https://t.co/t6LySc6rZ3 https://t.co/iEZMwZbbof
## 40                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 I LOVE being in the classroom with the students, but completing the @danielson_group questionnaire is burn out central.  \n#danielsonframework #teacher #teachertwitter #administration #life #over #content #motivator #inspirer #education #commoncore https://t.co/vOEvx0aE5h
## 41                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 I LOVE being in the classroom with the students, but completing the @danielson_group questionnaire is burn out central.  \n#danielsonframework #teacher #teachertwitter #administration #life #over #content #motivator #inspirer #education #commoncore https://t.co/DjT45PZOBq
## 42                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 I LOVE being in the classroom with the students, but completing the @danielson_group questionnaire is burn out central.  \n#danielsonframework #teacher #teachertwitter #administration #life #over #content #motivator #inspirer #education #commoncore https://t.co/fEO7VEDsfM
## 43                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 I LOVE being in the classroom with the students, but completing the @danielson_group questionnaire is burn out central.  \n#danielsonframework #teacher #teachertwitter #administration #life #over #content #motivator #inspirer #education #commoncore https://t.co/vOEvx0aE5h
## 44                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 I LOVE being in the classroom with the students, but completing the @danielson_group questionnaire is burn out central.  \n#danielsonframework #teacher #teachertwitter #administration #life #over #content #motivator #inspirer #education #commoncore https://t.co/DjT45PZOBq
## 45                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 I LOVE being in the classroom with the students, but completing the @danielson_group questionnaire is burn out central.  \n#danielsonframework #teacher #teachertwitter #administration #life #over #content #motivator #inspirer #education #commoncore https://t.co/fEO7VEDsfM
## 46                 @EatAtJoes7 @upperleftwino @TosoFlash @BDemocratsfor @manifesto2000 @DynastyEnd @pixfiber @Dallas4Bernie @BySallyAlbright @MiddleMolly @BobForDinner @TruthRuththe @shelly_laughing @Sailing_fool @SaginawsSong @banquet_beggars @BubblesToBurst @CyborgBjorn @elevenstars @pepper26potts @itsjustkay0k @darealwiles @FatherOfTheBir9 @McCadeMiller2 @prettynobodyco @ulcerative @M4M4ALL @ReadeAlexandra @TaraReadeTruth @glennkirschner2 @DNC @TaranaBurke @RTAmericaNews @ChrisLynnHedges @maxkeiser @LeeCamp @GovJVentura @JohnRalstonSaul @OurRevolutionPR @OurRevolution @ProudSocialist @TRNshow @ninaturner @Bioregional @kthalps @EdProgress @bc_disability @candisability @GSVteam @NewsHour Still holding a candle for #BillGates' Corporate, er, I mean Common Core State Standards? That's too bad, because the white supremacist #CCSS trash is essentially dead. https://t.co/IRX2ZUf7jf\n\n#JoeBiden \n#Bidenflation \n#BidenIsAFailure \n#JoeBidensAmerica \n#JimCrowJoe #Biden
## 47 @EatAtJoes7 @BDemocratsfor @manifesto2000 @upperleftwino @DynastyEnd @pixfiber @Dallas4Bernie @BySallyAlbright @MiddleMolly @BobForDinner @TruthRuththe @shelly_laughing @Sailing_fool @SaginawsSong @banquet_beggars @BubblesToBurst @CyborgBjorn @elevenstars @pepper26potts @itsjustkay0k @darealwiles @FatherOfTheBir9 @McCadeMiller2 @prettynobodyco @ulcerative @M4M4ALL @ReadeAlexandra @TaraReadeTruth @glennkirschner2 @DNC @TaranaBurke @RTAmericaNews @ChrisLynnHedges @maxkeiser @LeeCamp @GovJVentura @JohnRalstonSaul @OurRevolutionPR @OurRevolution @ProudSocialist @TRNshow @ninaturner @Bioregional @kthalps @EdProgress @bc_disability @candisability @GSVteam @NewsHour LOL. Yeah, my opposition to a white supremacist, corporate curriculum (#CCSS) based on the "great man" theory of history and a literature list that intentionally eschewed POC really drives home your point #BidenBoi.\n\n#JoeBiden \n#Bidenflation \n#BidenIsAFailure \n#JimCrowJoe #Biden https://t.co/ztrG8yx07i
## 48                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 Check out Common Core Math in Action Book Grades K-2 Catherine Jones Kuhns Marrie Lasater  https://t.co/BQd6VYWDJS via @eBay #math #commoncore #commoncoremath #classroom #homeschool #classroom
##                                                                                                                                                                                                                                                                           word_scores
## 1                                                                                                                        {0, 0, 0, 0, 0, 0, -1.5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
## 2                                                                                                                               {0, 0, 0, 0, 1.35, 0, 0.85, 0, 0, 0, 0, -1.887, 0, 0, 0, 0, 0, -3.3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1.8, 0}
## 3                                                                                                                                                   {0, -2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -3.133, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
## 4                                                                                                                                                                                                          {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1.3}
## 5                                                                                                                                                                                                         {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2.25, 0, 0, 0, 0}
## 6                                                                                                                                                                                                                                                   {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
## 7                                                                                                                                                                                                                                                {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
## 8                                                                                                                                                                                                                                          {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
## 9                                                                                                                                                                                                                                              {0.8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
## 10                                                                                                                                                                                                                  {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -0.9, 0, 0, 0, 1.9, 0, 0}
## 11                                                                                                                                                                                                                           {0, 0, 0, 0, 1.3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
## 12                                                                                                                                                                                                                                                  {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
## 13                                                                                                                                                                                          {0, 0, 0, 0, -1.9, 0, 0, 0.3, 0, 0, 0, 0, 0, 0, 0, -1.4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
## 14                                                                                                                                   {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1.6, 0, 0, 0, 0, 0, 0, 0, -1.3, 0, 0, -2.5, 0, 0, 0, 0, 0, 0, 0, 0, -1.7, 0, 0, 0, 0, 0, 0, 0}
## 15                                                                                                                             {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.5, 0, 0, 0, 0, 0, 0, 0, -1.167979, 0, 0, 0, 0, 0, -2.4, 0, 0, 1.1, 0, 0, 0, 0, -2.1, 0, 0, 0, 0, 0, 0}
## 16                                                                                                                                                                                                                          {0, 0, 0, 0, 0, -1.8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
## 17                                                                                                                                                                                                                                                  {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
## 18                                                                                                                                                                                                                                                                   {0, 0.666, 0, 0}
## 19                                                                                                                                                      {0, 0, 0, 0, 0, -0.6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
## 20                                                                                                                                                                                                            {0, 0, 0, 0, 0, 0, 0.3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
## 21                                                                                                                                                                         {0, 0, 0, 0, 0, 0, 0, 0, 0, -0.6, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1.8, 0, 0, 0, 0, 0, 0, 0, 0, 0}
## 22                                                                                                                                                                                                                                               {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
## 23                                                                                                                                                                                                                                         {1.6, 0, 1.3, 0, 0.6, 0, 0, 0, 0, 0, 0, 0}
## 24                                                                                                                                                                    {0, 0, 0, 0, 0, 0, 0, -1.9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -0.9, 0, 0, 0, 0, 0, -0.7, 0, 0, 0, 0, 0, 0, 0, 0, 0}
## 25                                                                                                                         {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2.4, 0, 0, 0, 0, 2.4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
## 26                                                                                                                                                                                       {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1.8, -1.3, 0, 0, 0, 0, 0, 0, 0, 0}
## 27                                                                                                                                                                                                                                                        {0, 0, 0, 0, 0, 0, 0, 0, 0}
## 28                                                                                                                                                                                                                                   {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
## 29                                                                                                                                                                                                                                                        {0, 0, 0, 0, 0, 0, 0, 0, 0}
## 30                                                                                                                                                         {0, 0, 1.5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1.258, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -2.593, 0}
## 31                                                                                                                                             {0, 0, 0, 0, 1.6, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, -2.2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -3.493, 0, 0, 0, 0, 3.2, 0, 1.1, -1.7, 0, 0}
## 32                                                                                                                                                                                                                                         {0, 0, 0, 0, 0, 0, 0, 0, 0, -1.2, 0, 0, 0}
## 33                                                                                                                                                                                                                                       {0, 1.2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
## 34                                                                                                                                                                     {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1.5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
## 35                                                                                                                                                                                                                                                      {0, 0, 0, 1.6, 0, 0, 0, 0, 0}
## 36                                                                                                                                                                                                                                         {0, 0, 0, 0, 0, 0, 0, 0, 0, -1.2, 0, 0, 0}
## 37                                                                                                                                                                            {0, 0, 0, 0, 0, 0, 0, 0, 1.5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1.406, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
## 38                                                                                                                                                                                                                                            {0, 0, 0, 0, 0, -2.2, 0, 0, 0, -2.6, 0}
## 39                                                                                                                                                                            {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3.033, 0, 0, 0, 0, 0, 0, 0, 0, 0}
## 40                                                                                                                                                                                  {0, 1.9665, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3.3, 0, 0, 0}
## 41                                                                                                                                                                                  {0, 1.9665, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3.3, 0, 0, 0}
## 42                                                                                                                                                                                  {0, 1.9665, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3.3, 0, 0, 0}
## 43                                                                                                                                                                                  {0, 1.9665, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3.3, 0, 0, 0}
## 44                                                                                                                                                                                  {0, 1.9665, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3.3, 0, 0, 0}
## 45                                                                                                                                                                                  {0, 1.9665, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3.3, 0, 0, 0}
## 46                  {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -2.5, 0, 0, 0, 0.5, 0, 0, 0, 0, -3.3, 0, 0, 0, 0, 0, 0, 0}
## 47 {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3.083, 1.2, 0, 0, 0, 0, 0, 0.5, 0, 0, 0, 0, 0, 0, 3.1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
## 48                                                                                                                                                                                                           {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
##    compound   pos   neu   neg but_count
## 1    -0.361 0.000 0.952 0.048         0
## 2    -0.777 0.079 0.733 0.188         1
## 3    -0.841 0.000 0.830 0.170         0
## 4     0.318 0.091 0.909 0.000         0
## 5     0.502 0.124 0.876 0.000         1
## 6     0.000 0.000 1.000 0.000         0
## 7     0.000 0.000 1.000 0.000         0
## 8     0.000 0.000 1.000 0.000         0
## 9     0.202 0.141 0.859 0.000         0
## 10    0.316 0.138 0.779 0.082         0
## 11    0.318 0.119 0.881 0.000         0
## 12    0.000 0.000 1.000 0.000         0
## 13   -0.612 0.042 0.784 0.173         0
## 14   -0.878 0.000 0.783 0.217         0
## 15   -0.724 0.070 0.761 0.169         0
## 16   -0.421 0.000 0.859 0.141         0
## 17    0.000 0.000 1.000 0.000         0
## 18    0.240 0.395 0.605 0.000         0
## 19    0.077 0.046 0.916 0.039         1
## 20    0.077 0.056 0.944 0.000         0
## 21   -0.660 0.000 0.824 0.176         0
## 22    0.000 0.000 1.000 0.000         0
## 23    0.749 0.450 0.550 0.000         0
## 24   -0.670 0.000 0.827 0.173         0
## 25    0.830 0.139 0.861 0.000         0
## 26    0.128 0.087 0.841 0.072         0
## 27    0.000 0.000 1.000 0.000         0
## 28    0.000 0.000 1.000 0.000         0
## 29    0.000 0.000 1.000 0.000         0
## 30    0.043 0.112 0.803 0.085         0
## 31   -0.126 0.205 0.600 0.195         0
## 32   -0.296 0.000 0.845 0.155         0
## 33    0.374 0.165 0.835 0.000         0
## 34    0.361 0.067 0.933 0.000         0
## 35    0.382 0.245 0.755 0.000         0
## 36   -0.296 0.000 0.845 0.155         0
## 37    0.099 0.079 0.852 0.068         0
## 38   -0.778 0.000 0.570 0.430         0
## 39    0.617 0.112 0.888 0.000         0
## 40    0.806 0.206 0.794 0.000         1
## 41    0.806 0.206 0.794 0.000         1
## 42    0.806 0.206 0.794 0.000         1
## 43    0.806 0.206 0.794 0.000         1
## 44    0.806 0.206 0.794 0.000         1
## 45    0.806 0.206 0.794 0.000         1
## 46   -0.807 0.017 0.896 0.087         0
## 47    0.898 0.124 0.876 0.000         0
## 48    0.000 0.000 1.000 0.000         0

Recall that the compound score is the one most commonly used for sentiment analysis by most researchers, and is computed by summing the valence scores of each word in the lexicon, adjusted according to the rules, and then normalized to be between -1 (most extreme negative) and +1 (most extreme positive).

Now use the inner_join() function introduced in the Join Data Sets Primer to add these sentiment scores values back to our tweets by joining summary_vader and ccss_tweets and using by = "text" as the key for matching both data frames:

tweet_sentiment <-inner_join(summary_vader, 
                             ccss_tweets,
                             by = "text") %>%
  select(screen_name, compound, )

Summarize Sentiment

Now that you have these joined, let’s group our data by each user’s screen_name and summarize their mean compound sentiment score:

user_sentiment <- tweet_sentiment %>%
  group_by(screen_name) %>%
  summarise(sentiment = mean(compound))

Note that we have effectively just created some new node and edge “attributes” that could be incorporated into our network visualization to potentially help understand why groups may have formed as they did.

👉 Your Turn

Recall from the Prepare section that the final question guiding this analysis was:

Which actors in our network tend to be more opposed to the Common Core?

Use the code chunk below to inspect our data frame and answer the question above in the space below:

user_sentiment
  • YOUR RESPONSE HERE

5. COMMUNICATE

In this case study, we focused on the literature guiding our analysis; wrangling our data into a network object; examining basic network centrality measures; and identifying groups and sentiment in tweets about the CCSS curriculum standards.

In response to the following research questions driving this analysis, write a 2-3 sentences summarizing our findings:

  1. Who are the main transmitters, transceivers, and transcenders in our Common Core Twitter network?

  2. What subgroups, or factions, existed in our network?

  3. Which actors in our network tended to be more opposed to the Common Core?

Finally, add 1-2 sentences in response to the following prompts:

  1. One important thing I took away from this unit about network network analysis:

  2. One thing about network analysis I want to learn more about:

🧶 Knit & Check ✅

Congratulations - you’ve completed your first text mining case study! To complete your work, you can click the drop down arrow at the top of the file, then select “Knit top HTML.” This will create a report in your Files pane that serves as a record of your code and its output you can open or share.

References

Carolan, Brian. 2014. “Social Network Analysis and Education: Theory, Methods & Applications.” https://doi.org/10.4135/9781452270104.
Estrellado, Ryan A., Emily A. Freer, Jesse Mostipak, Joshua M. Rosenberg, and Isabella C. Velásquez. 2020. Data Science in Education Using r. Routledge. https://doi.org/10.4324/9780367822842.
Supovitz, Jonathan, Alan J Daly, Miguel del Fresno, and Christian Kolouch. 2017. “Commoncore Project.” Retrieved April 3: 2019. http://www.hashtagcommoncore.com/.
Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. " O’Reilly Media, Inc.". https://r4ds.had.co.nz.
---
title: 'Unit 4 Case Study: Hashtag Common Core'
subtitle: "ECI 586 Introduction to Learning Analytics"
author: "Dr. Shaun Kellogg"
date: "`r format(Sys.Date(),'%B %e, %Y')`"
output:
  html_document:
    toc: yes
    toc_depth: 4
    toc_float: yes
    code_folding: show
    code_download: TRUE
editor_options:
  markdown:
    wrap: 72
bibliography: lit/references.bib
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# 1. PREPARE

Our Unit 4 Case Study: Hashtag Common Core is inspired by the work of
Jonathan Supovitz, Alan Daly, Miguel del Fresno and Christian Kolouch
who examined the intense debate surrounding the Common Core State
Standards education reform as it played out on Twitter. As noted on
their expansive and interactive website for the [\#COMMONCORE
Project](https://www.hashtagcommoncore.com), the Common Core was a major
education policy initiative of the early 21st century. A primary aim of
the Common Core was to strengthen education systems across the United
States through a set of specific and challenging education standards.
Although these standards once enjoyed bipartisan support, we saw in our
previous case study how these standards have become a political punching
bag.

In Unit 4, we continue our investigation of tweets around the these
controversial state standards through social network analysis.
Specifically, this case study will cover the following topics pertaining
to each data-intensive workflow process:

1.  **Prepare**: Prior to analysis, we'll take a look at the context
    from which our data came, formulate some research questions, and get
    introduced the {tidygraph} and {ggraph} packages for analyzing
    relational data.

2.  **Wrangle**: In the wrangling section of our case study, we will
    learn some basic techniques for manipulating, cleaning,
    transforming, and merging network data.

3.  **Explore**: With our network data tidied, we learn to calculate
    some key network measures and to illustrate some of these stats
    through network visualization.

4.  **Model**: We conclude our analysis by introducing community
    detection algorithms for identifying groups and revisiting sentiment
    about the common core.

5.  **Communicate**: We briefly reflect on our walkthrough in and what
    we learned from our analysis.

## 1a. Review the Research

Recall from [Social Network Analysis and Education: Theory, Methods &
Applications](https://methods.sagepub.com/book/social-network-analysis-and-education)
that Carolyn (2013) cited the following four features used by Freeman
(2004) to define the social network perspective:

1.  Social network analysis is **motivated by a relational intuition**
    based on ties connecting social actors.

2.  It is firmly **grounded in systematic empirical data**.

3.  It **makes** **use of graphic imagery** to represent actors and
    their relations with one another.

4.  It **relies** **on** **mathematical and/or computational models** to
    succinctly represent the complexity of social life.

The [\#COMMONCORE Project](https://www.hashtagcommoncore.com) that we'll
examine next is an exemplary illustration of these four defining
features of the social network perspective.

### The \#commoncore Project

![](img/commoncore.jpeg){width="50%"}

Supovitz, J., Daly, A.J., del Fresno, M., & Kolouch, C.
(2017). *\#commoncore Project.* Retrieved from
[http://www.hashtagcommoncore.com](http://www.hashtagcommoncore.com/).

#### Prologue

As noted by @supovitz2017commoncore, the Common Core State Standards
have been a "persistent flashpoint in the debate over the direction of
American education." The \#commoncore Project explores the Common Core
debate on Twitter using a combination of social network analyses and
psychological investigations which help to reveal both the underlying
social structure of the conversation and the motivations of the
participants.

The central question guiding this investigation was:

> How are social media-enabled social networks changing the discourse in
> American politics that produces and sustains education policy?

#### Data Sources & Analyses

The [methods
page](https://www.hashtagcommoncore.com/project/methodology) of the
\#COMMONCORE Project provides a detailed discussion of the data and
analyses used to arrive at the conclusions in *\#commoncore: How social
media is changing the politics of education*. Provided below is a
summary of how authors retrieved data from Twitter and the analyses
applied for each of the five acts in the website. I highly encourage you
to take a look at this section if you'd like to learn more about their
approach and in particular if you're unfamiliar with how users can
interact and communicate on Twitter.

#### Data Collection

To collect data on keywords related to the Common Core, the project used
a customized data collection tool developed by two of our co-authors,
Miguel del Fresno and Alan J. Daly, called *Social Runner
Lab^TM^*. Similar to an approach we used in Unit 3, the authors
downloaded data in real time directly from Twitter's Application
Programming Interface (API) based on tweets using specified keywords,
keyphrases, or hashtags and then restricted their analysis to the
following terms: *commoncore*, *ccss* and *stopcommoncore.* They also
captured Twitter profile names, or user names, as well as the tweets,
retweets, and mentions posted. Data included messages that are public on
twitter, but not private direct messages between individuals, nor from
accounts which users have made private.

#### Analyses

In order to address their research question, the authors applied social
network analysis techniques in addition to qualitative and automated
text mining approaches. For social network analyses, each node is an
individual Twitter user (person, group, institution, etc.) and the
connection between each node is the tweet, retweet, or mention/reply.
After retrieving data from the Twitter API, the authors created a file
that could be analyzed in Gephi, an open-source software program which
depicts the relations as networks and provides metrics for describing
features of the network.

In addition to data visualization and network descriptives, the authors
examined group development and lexical tendencies among users. For group
development, they used a community detection algorithm to identify and
represent structural sub-communities, or "factions", which they describe
as a group with more ties within than across group even those group
boundaries are somewhat porous). For lexical tendencies, the authors
used the Linguistic Inquiry and Word Count (LIWC) lexicons to determine
psychological drive, diagnose their level of conviction, make inferences
about thinking styles, and even determine sentiment similar to what we
examined in Unit 2.

For a nice summary of the data used for the analysis, as well as the
samples of actors and tweets, the keywords, and the methods that were
utilized, see [Table 1. Data and Method for Each
Act](https://www.hashtagcommoncore.com/project/methodology) in the
Methods section of the \#commoncore website.

#### Key Findings

In the \#commoncore Project, analyses of almost 1 million tweets sent by
about 190,000 distinct actors over a period of 32 months revealed the
following:

-   In **Act 1**, **The Giant Network**, the authors identified five
    major sub-communities, or factions, in the Twitter debate
    surrounding the Common Core, including: (1) supporters of the Common
    Core, (2) opponents of the standards from inside education, and (3)
    opponents from outside of education.
-   In **Act 2**, **Central Actors**, they noted that most of these
    participants were casual contributors -- almost 95% of them made
    fewer than 10 tweets in any given six-month period. They also
    distinguished between two types of influence on
    Twitter: *Transmitters* who tweeted a lot, regardless of the extent
    of their followership; and *Transceivers*, those who gained their
    influence by being frequently retweeted and mentioned.
-   In **Act 3**, **Key Events,** the authors identified issues driving
    the major spikes in the conversation, like when Secretary of
    Education Duncan spoke about white suburban moms' opposition to the
    Common Core, or the debate over the authorization of the Every
    Student Succeeds Act in November 2015. They also offended evidence
    of manufactured controversies spurred by sensationalizing minor
    issues and outright fake news stories.
-   In **Act 4**, **Lexical Tendencies**, the authors examined the
    linguistic tendencies of the three major factions and found that
    Common Core supporters used the highest number of conviction words,
    tended to use more achievement-oriented language, and used more
    words associated with a formal and analytic thinking style. By
    contrast, opponents of the Common Core from within education tended
    to use more words associated with sadness, and used more narrative
    thinking style language. Opponents of the Common Core from outside
    of education made the highest use of words associated with peer
    affiliation, used the largest number of angry words, and exhibited
    the lowest level of conviction in their word choices.
-   In **Act 5**, **The Tweet Machine**, examined five frames that
    opponents of the Common Core used to appeal to values of particular
    subgroups including the government frame, business frame, war frame,
    experiment frame, and propaganda frame. By combining these
    constituencies, the authors illustrated how the Common Core
    developed a strong transpartisan coalition of opposition.

### **👉 Your Turn** **⤵**

For our Unit 4 Walkthrough, we'll apply some of the same techniques used
by this study including calculating some basic network measures of
centrality. For example, take a look at the [*Explore the
Networks*](https://www.hashtagcommoncore.com/#2-1){style="font-size: 10pt;"}
section from Act 2: Central Actors and the Transmitters, Transceivers
and Transcenders.

Click on each of the boxes for a select time period and in the space
below list a couple Twitter users identified by their analysis as the
following:

-   **Transmitters:**

-   **Transceivers:**

-   **Transcenders:**

Now visit Act 2 of the methods section and In the space below, identify
the network measure used for the first two central actors (hint: it's
italicized) and copy their definition used. Note, Transcenders has been
completed for you.

-   ***Transmitters***

-   ***Transceivers***

-   ***Transcenders*** is measured by *degree* and consists of actors
    (i.e. Twitter uers) who have both high out-degree, as well as
    having high in-degree.

## 1b. Identify a Question(s)

Recall from above that the central question guiding the \#COMMONCORE
Project was:

> How are social media-enabled social networks changing the discourse in
> American politics that produces and sustains education policy?

For Unit 4, we are going to focus our questions on something a bit less
ambitious but inspired by this work:

1.  Who are the transmitters, transceivers, and transcenders in our
    Common Core Twitter network?
2.  What subgroups, or factions, exist in our network?
3.  Which actors in our network tend to be more opposed to the Common
    Core?

To address the last question, we'll revisit our techniques we learned
from our Unit 3 VADER sentiment analysis.

### **👉 Your Turn** **⤵**

Based on what you know about networks and the context so far, what other
research question(s) might ask we ask in this context that a social
network perspective might be able to answer?

In the space below, type a brief response to the following questions:

-   YOUR RESPONSE HERE

## 1c. Set Up Project

As highlighted in [Chapter 6 of Data Science in Education Using
R](https://datascienceineducation.com/c06.html) (DSIEUR), one of the
first steps of every workflow should be to set up your "Project" within
RStudio. Recall that:

> A **Project** is the home for all of the files, images, reports, and
> code that are used in any given project

Since we are working in RStudio Cloud, a Project has already been set up
for you as indicated by the `.Rproj` file in your main directory in the
Files pane.

### Load Libraries

In Unit 1, we also learned about **packages**, or libraries, which are
shareable collections of R code that can contain functions, data, and/or
documentation and extend the functionality of R. You can always check to
see which packages have already been installed and loaded into RStudio
Cloud by looking at the the Files, Plots, & Packages Pane in the lower
right hand corner.

### **👉 Your Turn** **⤵**

First, load the {tidyverse}, {tidytext} and {vader} packages from
previous case studies. We'll be using several of these packages to
wrangling and explore our data later sections:

```{r}
library(tidyverse)
library(tidytext)
library(vader)
```

Next, we will introduce a few new packages that will help us with our
social network analyses.

### tidygraph 📦

![](img/tidygraph.png){width="20%"}

The {[tidygraph](https://tidygraph.data-imaginist.com)} package is a
huge package that exports 280 different functions and methods, including
access to almost all of the `dplyr` verbs plus a few more, developed for
use with relational data. While network data itself is not tidy, it can
be envisioned as two tidy tables, one for node data and one for edge
data. The {tidygraph} package provides a way to switch between the two
tables and uses `dplyr` verbs to manipulate them. Furthermore it
provides access to a lot of graph algorithms with return values that
facilitate their use in a tidy workflow.

Let's go ahead and load the {tidygraph} library:

```{r}
library(tidygraph)
```

### ggraph 📦

![](img/ggraph.png){width="20%"}

Created by the same developer as {tidygraph},
{[ggraph](https://ggraph.data-imaginist.com/index.html)} -- pronounced
gg-raph or g-giraffe hence the logo -- is an extension of
{[ggplot](https://ggplot2.tidyverse.org)} aimed at supporting relational
data structures such as networks, graphs, and trees. Both packages are
more modern and widely adopted approaches data visualization in R.

While ggraph builds upon the foundation of ggplot and its API, it comes
with its own self-contained set of geoms, facets, etc., as well as
adding the concept of *layouts* to the [grammar of
graphics](https://ggplot2-book.org/introduction.html?q=grammar#what-is-the-grammar-of-graphics),
i.e. the "gg" in ggplot and ggraph.

Let's go ahead and load the {ggraph} library:

```{r}
library(ggraph)
```

### igraph Package **📦**

![](img/igraph.png){width="20%"}

Both {tidygraph} and {ggraph} depend heavily [igraph network analysis
package](https://igraph.org). The main goals of the igraph package and
the collection of network analysis tools it contains are to provide a
set of data types and functions for:

1.  pain-free implementation of graph algorithms,

2.  fast handling of large graphs, with millions of vertices (i.e.,
    actors or nodes) and edges,

3.  allowing rapid prototyping via high level languages like R.

Run the code chunk below to load the {igraph} library:

```{r}
library(igraph)
```

### Import Data

Finally, we'll be using a fresh data download from twitter that only
includes relatively small set of tweets from the past week to help keep
things simple. Network graphs can quickly get unwieldy as the nodes in
your network grow. The \#commoncore project does an excellent job
visualizing large networks using advanced techniques. We'll focus on a
relatively small network since we'll be introduce some very basic
techniques for visualization.

### **👉 Your Turn** **⤵**

Use the `read_csv()` function from the {readr} package to read the
`ccss-tweets-fresh.csv` file from the data folder and assign to a new
data frame named `ccss_tweets`:

```{r}
ccss_tweets <- read_csv("data/ccss-tweets-fresh.csv")
```

------------------------------------------------------------------------

# 2. WRANGLE

In general, data wrangling involves some combination of cleaning,
reshaping, transforming, and merging data [@wickham2016r]. As
highlighted in @estrellado2020e, wrangling network data can be even more
challenging than other data sources since network data often includes
variables about both individuals and their relationships.

For our data wrangling this week, we're keeping it simple since working
with relational data is a bit of a departure from our working with
rectangular data frames. Our primary goals for Unit 1 are learning how
to:

a.  **Create an Edgelist**. In this section, we introduce a new tidy
    data format call the "edgelist" which contains information about
    each edge (i.e. connections or ties between actors) in our network.

b.  **Format Network Data**. We learn to shape our data files into two
    common formats for storing network data: edgelists and nodelists.

c.  **Create a Network Object**. Finally, we'll need to convert our data
    frames into special data format, an R network object, for working
    with relational data.

## 2b. Format Network Data

Recall from Chapter 1 of @carolan2014 that ties, or relations, are what
connect actors to one another. These ties are often referred to as
"edges" when formatting and graphing network data and the range of ties
in a network can be extensive. Some of the more common ties, or edges,
used to denote connections among actors in educational research include:

-   Behavioral interaction (e.g., talking to each other or sending
    messages)

-   Physical connection (e.g., sitting together at lunch, living in the
    same neighborhood)

-   Association or affiliation (e.g., taking the same courses, belonging
    to the same peer group)

-   Evaluation of one person by another (e.g., considering someone a
    friend or enemy)

-   Formal relations (e.g., knowing who has authority over whom)

-   Moving between places or status (e.g., school choice preferences,
    dating patterns among adolescents)

### Create Edgelist

The **edgelist** format is slightly different than other formats you
have likely worked with in that the values in the first two columns of
each row represent a dyad, or tie between two nodes in a network. An
edge-list can also contain other information regarding the strength,
duration, or frequency of the relationship, sometime called weight, in
addition to other "edge attributes."

For our analysis of tweets in this case study, we'll use an approach
similar to that used by @supovitz2017commoncore. Specifically, each node
is an individual Twitter user (person, group, institution, etc.) and the
connection between each node is the tweet, retweet, or mention/reply.

### **👉 Your Turn** **⤵**

Use the code chunk below to look at the data we just imported using one
of your favorite methods for inspecting data. In the space that follows,
identify the columns you think could be used to construct an edge list.

```{r, eval=FALSE}
view(ccss_tweets)
```

-   YOUR RESPONSE HERE

#### Extract Edges and Nodes

If one of the columns you indicated in your response above included
`screen_name` nice work! You may also have noticed that the
`mentions_screen_name` column includes the names of those in the reply
column, as well as those "mentioned" in each tweet, i.e. whose username
was included in a post.

As noted above, an edgelist includes the nodes that make up a tie or
dyad. Since the only two columns we need to construct our edgelist is
the `screen_name` of the tweet author and the screen names of those
included in the mentions, let's:

1.  `relocate()` and rename those columns to `sender` and `target` to
    indicate the "direction" of the tweet;

2.  `select()` those columns along with any attributes that we think
    might be useful for analysis later on, like the timestamp or content
    of the tweet, i.e. `text`;

3.  assign our new data frame to `ties_1`.

```{r}
ties_1 <-  ccss_tweets %>%
  relocate(sender = screen_name, # rename scree_name to sender
           target = mentions_screen_name) %>% # rename to receiver
  select(sender,
         target,
         created_at,
         text)

ties_1
```

Our edgelist with attributes is almost ready, but we have a couple of
issues we still need to deal with.

As you may have noticed, our receiver column contains the names of
multiple Twitter users, but a dyad or tie can only be between two actors
or nodes.

#### "Unnest" User Names

In order to place each target user in a separate row that corresponds
with each sender of the tweet, we will need to "unnest" these names
using the {tidytext} package from Unit 3.

In Unit 3 we used the `unnest_tokens()` function to extract individual
words from tweets. This time we'll use it to place target of each tweet
into separate rows corresponding to the sender of each tweet.

Run the following code and take a look at the new data frame to make
sure it looks as expect and to spot any potential issues:

```{r}
ties_2 <- ties_1 %>%
  unnest_tokens(input = target,
                output = receiver,
                to_lower = FALSE) %>%
  relocate(sender, receiver)

ties_2
```

You probably notice one issue that we could deal with in a couple
different ways. Specifically, many tweets are not directed at other
users and hence the value for receiver is *`NA`*. We could keep these
users as "isolates" in our network, but let's simply remove them since
our primary goal is to identify transmitters, transceivers and
transcenders.

### **👉 Your Turn** **⤵**

In the code chunk below, use the `drop_na()` function to remove the rows
with missing values from our `receiver` column in our `ties-2` data
frame since they are incomplete dyads. Save your final edgelist as
`ties` using the `<-` assignment operator.

```{r}
ties <- ties_2 %>%
  drop_na(receiver)
```

Now use the code chunk below to take a quick look at our final edgelist
and answer the questions that follow:

```{r, eval=FALSE}
ties
```

1.  How many edges are in our CCSS network?

    -   YOUR RESPONSE HERE

2.  What do these ties/edges/connections represent?

    -   YOUR RESPONSE HERE

3.  When do they start and end?

    -   YOUR RESPONSE HERE

### Create Nodelist

The second file we need to create is a data frame that contains all the
nodes, often referred to as actors, in our network. This list sometimes
includes attributes that we might want to examine as part of our
analysis as well, like the number of Twitter followers, demographic
group, country, gender, etc. This file or data frame is sometimes
referred to as a **nodelist** or node attribute file.

To construct our basic nodelist, we will use the `pivot_longer()`
function from the {dplyr} package, an updated version of the `gather()`
function introduced in the [RStudio Reshape Data
Primer](https://rstudio.cloud/learn/primers/4.1).

Run the following code to select the usernames form our `ties` edgelist,
merge our sender and receiver columns into a single column, and take a
quick look at our new data frame.

```{r}
actors_1 <- ties %>%
  select(sender, receiver) %>%
  pivot_longer(cols = c(sender,receiver))

actors_1
```

As you can see, our sender and receiver usernames have been combined
into a single column called `value` with a new column indicating whether
they were a sender or receiver called `name` by default.

Since we're only interested in usernames for our nodelist, and since we
have a number of duplicate names, let's select just the `value` column,
rename it to actors, and use the `distinct()` function from {dplyr} to
keep only unique names and remove duplicates:

```{r}
actors <- actors_1 %>%
  select(value) %>%
  rename(actors = value) %>% 
  distinct()
```

### **👉 Your Turn** **⤵**

Use the code chunk below to take a quick look at our final `actors` data
frame and answer question that follows:

```{r, eval=FALSE}
actors
```

How many unique node/actors are in our CCSS network?

-   YOUR RESPONSE HERE

## 2c. Create Network Object

Before we can begin using many of the functions from the {tidygraph} and
{ggraph} packages for summarizing and visualizing our Common Core
Twitter network, we first need to convert our node and edge lists into
network object.

### Combine Edges & Nodes

The {tidygraph} package contains a `tbl_graph()` function that includes
the following arguments:

-   `edges =` expects a data frame, in our case `ties`, containing
    information about the edges in the graph. The nodes of each edge
    must either be in a `to` and `from` column, or in the two first
    columns like the data frame we provided.

-   `nodes =` expects a data frame, in our case `actors`, containing
    information about the nodes in the graph. If `to` and/or `from` are
    characters or names, like in our data frames, then they will be
    matched to the column named according to `node_key` in nodes, if it
    exists, or matched to the first column in the node list.

-   `directed =` specifies whether the constructed graph should be
    directed, i.e. include information about whether each node is the
    sender or target of a connection. By default this is set to `TRUE`.

Let's go ahead and create our network graph, name it `network` and print
the output:

```{r}
ccss_network_1 <- tbl_graph(edges = ties, 
                            nodes = actors)

ccss_network_1
```

### **👉 Your Turn** **⤵**

Take a look at the output for our simple graph above and answer the
following questions:

1.  Are the numbers and names of nodes and actors consistent with our
    `actors` and `ties` data frames? What about the integers included in
    the `from` and `to` columns of the Edge Data?

    -   YOUR RESPONSE HERE

2.  What do you think "components" refers to? **Hint:** see Chapter 6 of
    [@carolan2014].

    -   YOUR RESPONSE HERE

3.  Is our network directed or undirected?

    -   YOUR RESPONSE HERE

A quick note about "directed" networks. A **directed network** indicates
that a connection or tie is not necessarily reciprocated or mutual. For
example, our network is directed because even though a user may follow,
mention or reply to someone else, they may not necessarily receive a
follow, mention or reply back. A facebook friendship network is
undirected, however, since someone can not "friend" someone on Facebook
without them also indicating a "friend" relationship, so the friendship
is mutual or reciprocated.

------------------------------------------------------------------------

# 3. EXPLORE

As noted in in our course readings, exploratory data analysis involves
the processes of describing your data (such as by calculating the means
and standard deviations of numeric variables, or counting the frequency
of categorical variables) and, often, visualizing your data prior to
modeling.

In Section 3, we use the {tidygraph} package for retrieving network
descriptives and introduce the {ggraph} package to create a network
visualization to help illustrate these metrics. Specifically, in this
section we'll learn to:

a.  **Examine Basic Descriptives**. We focus primarily on actors and
    edges in this walkthrough, including the edges wights we added in
    the previous section as well as node degree, and import and fairly
    intuitive measure of centrality.

b.  **Make a Sociogram**. Finally, we wrap up the explore phases by
    learning to plot a network and tweak key elements like the size,
    shape, and position of nodes and edges to better at communicating
    key findings.

## 3a. Examine Basic Descriptives

Many analyses of social networks are primarily descriptive. As
@carolan2014 notes, these descriptive studies aim either to:

1.  represent the network's underlying social structure through
    data-reduction techniques; or,

2.  characterize network properties through network measures.

### Centrality

A key structural property of networks is the concept of centralization.
A network that is highly centralized is one in which relations are
focused on a small number of actors or even a single actor in a network,
whereas ties in a decentralized network are diffuse and spread over a
number of actors. One of the most common descriptives reported in
network studies and a primary measure of centralization is **degree**.

> **Degree** is the number of ties to and from an ego. In a directed
> network, in-degree is the number of ties received, whereas out-degree
> is the number of ties sent.

#### Node Degree

The {tidygraph} package has an unique function called `activate()` that
allows us to treat the nodes in our network object as if they were a
typical data frame to which we can then apply standard tidyverse
functions like `select()`, `filter()`, and `mutate()`.

The latter function, `mutate()`, we can use to create new variables for
nodes such as measures of degree, in-degree, and out-degree used by
@supovitz2017commoncore to identify transcenders, tranceivers, and
transmitters respectively. These measures can be added by using the
`centrality_degree()` function in the {tidygraph} package.

Run the following code to add degree measures to each of our nodes and
print the output for inspection:

```{r}
ccss_network <- ccss_network_1 %>%
  activate(nodes) %>%
  mutate(degree = centrality_degree(mode = "all")) %>%
  mutate(in_degree = centrality_degree(mode = "in"))

ccss_network
```

We now see that these simple measures of centrality have been added to
the nodes in our network.

### **👉 Your Turn** **⤵**

As you may have noted above, we forgot to include out-degree in our
measures of centrality. Copy the code from above and add an out-degree
measure to each node and print the output:

```{r}
ccss_network <- ccss_network_1 %>%
  activate(nodes) %>%
  mutate(degree = centrality_degree(mode = "all")) %>%
  mutate(in_degree = centrality_degree(mode = "in")) %>%
  mutate(out_degree = centrality_degree(mode = "out"))

ccss_network
```

#### Summarize Centrality Measures

We can also use the `activate()` function combined with the
`data.frame()` function to extract our new measures to a separate data
frame so we inspect our nodes individually and create some summary
statistics using the handy `summary()` function.

```{r}
node_measures <- ccss_network %>% 
  activate(nodes) %>%
  data.frame()

summary(node_measures)
```

We see that typical nodes in this network are connected on average with
3 other Twitter users and have received on average mentions or replies
from 1.5 users.

### **👉 Your Turn** **⤵**

Recall from the Prepare section that one of the questions guiding this
analysis was:

> Who are the transmitters, transceivers, and transcenders in our Common
> Core Twitter network?

Use the code chunk below to `view()` your `node_measures` data frame and
answer the questions that follow:

```{r, eval=FALSE}
view(node_measures)
```

Identify the Twitter user with the highest value for each of the
following types of central actors:

-   **Transmitters:**

-   **Transceivers:**

-   **Transcenders:**

## 3b. Make a Sociogram

If you recall from [1a. Review the Research] section, one of the
defining characteristics of the social network perspective is its use of
graphic imagery to represent actors and their relations with one
another. To emphasize this point, @carolan2014 reported that:

> The visualization of social networks has been a core practice since
> its foundation more than 90 years ago and remains a hallmark of
> contemporary social network analysis. 

Network visualization can be used for a variety of purposes, ranging
from highlighting key actors to even serving as works of art.

This excellent figure from Katya Ognyanova's also excellent tutorial on
[Static and Dynamic Network Visualization with
R](https://kateto.net/network-visualization/) helps illustrate the
variety of goals a good network visualization can accomplish:

![](img/viz-goals.jpeg){width="80%"}

These visual representations of the actors and their relations, i.e. the
network, are called a sociogram. Actors who are most central to the
network, such as those with higher node degrees, are usually placed in
the center of the sociogram and their ties are placed near them. As
we'll see in just a bit, those two actors with hundreds of ties will be
placed by most graph layout algorithms in the center of the graph.

### **👉 Your Turn** **⤵**

The `plot()` function from R's built in {graphics} package can be used
to make a simple sociogram, but as you'll see, it's a bit lacking.

In the code chunk below, use the `plot()` function with your
`ccss_network` object to see what the basic plot function produces:

```{r}
plot(ccss_network)
```

If this had been a smaller network like one generated from a single
classroom, this might have been a little more useful, but for larger
networks like our Twitter data, this doesn't communicate much. In fact,
it's visualizations like these that give sociograms the unflattering
nickname of "hair ball" plots!

Fortunately, the {ggraph} package includes a plethora of plotting
parameters for graph
[layouts](https://ggraph.data-imaginist.com/articles/Layouts.html),
[edges](https://ggraph.data-imaginist.com/articles/Edges.html) and
[nodes](https://ggraph.data-imaginist.com/articles/Nodes.html) to
improve the visual design of network graphs.

For the remainder of this section, we're going to focus on creating a
sociogram that highlights the main "transmitters" in our network.

### Build a network

One thing to keep in mind when building a network viz with {ggraph}, is
that just like it's `ggplot()` counterpart, the `ggraph()` function
takes care of setting up the plot object along with creating the layout
for the plot based on the network object and the layout specification
provided.

Let's first pass our `ccss_network` object to ggraph and see what
happens.

```{r}
ggraph(ccss_network)
```

As you can see, just like the `ggplot()` function, this didn't produce
much on it's own. All that the `ggraph()` function does is set up the
network object to make a sociogram, and create a layout for our network,
in this case the default "stress" layout.

#### Add Nodes

Very similar to how `ggplot()` uses the `+` operator to "layer"
functions together to progressively build graphs, `ggraph` use the `+`
operator progressively build a sociogram.

To add our nodes, we'll added the `geom_node_point()` function. Again,
just like with {ggplot2}, the "geom" in the `geom_non_point()` functions
stands for "Geometric elements", or geoms for short, and represents what
you actually see in the plot.

### **👉 Your Turn** **⤵**

Now "add" the `geom_node_point()` function to our code using the `+`
operator:

```{r}
ggraph(ccss_network) + 
  geom_node_point() 
```

Well, at least we have our nodes now! But the default "stress" layout
for our sociogram is not so great. Let's fix that.

#### Add Layout

One of the major advances in visualization since the first hand-drawn
sociograms developed by Jacob Moreno (1934) to represent relations among
children in school is the use of software and algorithms to
automatically layout networks on a grid.

There are may different [layout
methods](https://en.wikipedia.org/wiki/Graph_drawing#Layout_methods),
but we'll start with the Fruchterman-Reingold (FR) layout, which is one
of the most used layout algorithms for network visualization. These
types of force-directed algorithms generally work well with large
networks and try to layout graphs in "an aesthetically-pleasing way" by
making edges roughly equal in length and minimizing overlap.

Let's go ahead and include the layout argument, which in addition to
including its own unique layouts, can incorporate layouts form {igraph}
package like `fr` for the Fruchterman-Reingold (FR) layout:

```{r}
ggraph(ccss_network, layout = "fr") +
  geom_node_point()
```

That's a little better. We can also start to make out our separate
network "components" as distinct groups, with one larger connected group
in the center. In our Model section, we'll introduce community detection
algorithms to automatically identify groups and use color to show group
membership.

#### Tweak Nodes

Also like {ggplot2}, geoms can include aesthetics, or aes for short,
such as `alpha` for transparency, as well as `color`, `shape` and
`size`.

Let's now add some "aesthetics" to our points by including the `aes()`
function and arguments such as `size =` and `color =`, which we can set
to our `out_degree` measures to help highlight our primary transmitters:

```{r}
ggraph(ccss_network, layout = "fr") + 
geom_node_point(aes(size = out_degree,
                    color = out_degree))
```

We can clearly see we have one main transmitter in this network!
Unfortunately, without labels we don't know who this Twitter users is.

Let's fix that by adding another layer with some node text and labels.
Since node labels are a geometric element, we can apply aesthetics to
them as well, like color and size. Let's also include the `repel =`
argument that when set to `TRUE` will avoid overlapping text.

```{r}
ggraph(ccss_network, layout = "fr") + 
  geom_node_point(aes(size = out_degree,
                      color = out_degree)) +
  geom_node_text(aes(label = actors,
                     size = out_degree/2,
                     color = out_degree),
                 repel=TRUE)
```

Much better! Even without the edges, using size and color have helped to
highlight the main "transmitter" in our network.

#### Add Edges

Now, let's connect the dots and add some
[edges](https://ggraph.data-imaginist.com/articles/Edges.html) using the
`geom_edge_link()` function. We'll also include some arguments like
`arrow =` to include some arrows 1mm in length, an `end_cap =` around
each node to keep arrows from overlapping the them, and set the
transparency of our edges to `alpha = .2` so our edges fade more into
the background and help keep the focus on our nodes:

```{r}
ggraph(ccss_network, layout = "fr") + 
  geom_node_point(aes(size = out_degree,
                      color = out_degree)) +
  geom_node_text(aes(label = actors,
                     size = out_degree/2,
                     color = out_degree),
                 repel=TRUE) +
  geom_edge_link(arrow = arrow(length = unit(1, 'mm')), 
                 end_cap = circle(3, 'mm'),
                 alpha = .2)
```

#### Add a Theme

Finally, let's add a **theme,** which controls the finer points of
display, like the font size and background color. The `theme_graph()`
function add a theme specially tuned for graph visualizations. This
function removes redundant elements in order to put focus on the data
and if you type `?theme_graph` in the console you will get a sense of
the level of fine tuning you can do if desired.

Let's add `theme_graph()` to our sociogram, remove the legends since
they are not especially useful, and call it good for now:

```{r}
ggraph(ccss_network, layout = "fr") + 
  geom_node_point(aes(size = out_degree,
                      color = out_degree),
                  show.legend = FALSE) +
  geom_node_text(aes(label = actors,
                     size = out_degree/2,
                     color = out_degree),
                 repel=TRUE,
                 show.legend = FALSE) +
  geom_edge_link(arrow = arrow(length = unit(1, 'mm')), 
                 end_cap = circle(3, 'mm'),
                 alpha = .2) + 
  theme_graph()
```

**Note:** If you're having difficulty seeing the sociogram in the small
R Markdown code chunk, you can copy and paste the code in the console
and it will show in the viewpane and then you can enlarge and even save
as an image file.

### **👉 Your Turn** **⤵**

Try modifying the code below by tweaking the included function/arguments
or adding new ones for
[layouts](https://ggraph.data-imaginist.com/articles/Layouts.html),
[nodes](https://ggraph.data-imaginist.com/articles/Nodes.html), and
[edges](https://ggraph.data-imaginist.com/articles/Edges.html) to
highlight the main "transenders" or "tranceivers" in our network. See if
you can also make your plot either more "aesthetically pleasing" and/or
more purposeful in what it's trying to communicate.

There are no right or wrong answers, just have some fun trying out
different approaches!

```{r}
ggraph(ccss_network, layout = "fr") + 
  geom_node_point(aes(size = out_degree,
                      color = out_degree),
                  show.legend = FALSE) +
  geom_node_text(aes(label = actors,
                     size = out_degree/2,
                     color = out_degree),
                 repel=TRUE,
                 show.legend = FALSE) +
  geom_edge_link(arrow = arrow(length = unit(1, 'mm')), 
                 end_cap = circle(3, 'mm'),
                 alpha = .2) + 
  theme_graph()
```

Congrats! You made it to the end of the Explore section and are ready to
learn a little about network modeling! Before proceeding further, knit
your document and check to see if you encounter any errors.

------------------------------------------------------------------------

# 4. MODEL

As highlighted in [Chapter 3 of Data Science in Education Using
R](https://datascienceineducation.com/c03.html), the **Model** step of
the data science process entails "using statistical models, from simple
to complex, to understand trends and patterns in the data."

## 4a. Identify Groups

Chapter 6: Groups and Positions in Complete Networks of SNA and
Education [@carolan2014] introduces both "bottom up" and "top down"
approaches for identifying groups in a network, as well as why
researchers may be interested in exploring these groups. He also notes
that:

> Unlike most social science, the idea is to identify these groups
> through their relational data, not an exogenous attribute such as
> grade level, departmental affiliation, or years of experience. 

In this section, we'll briefly explore a "top down" approach to
identifying these groups through the use of community detection
algorithms.

### Community Detection

Similar to the range of functions included for calculating node and edge
centrality, the {tidygraph} package includes various clustering
functions provided by the {igraph} package. Some of these algorithms are
designed for directed graphs, while others are for undirected graphs.

Also similar to calculating centrality measures, we need to `activate()`
our nodes first before applying these community detection algorithms to
assign our nodes to groups.

For the sake of simplicity, and because there seem to be some obvious
groups based on network components, let's group our nodes using the
`group_components()` along with the `mutate()` function from {dplyr} to
create a new `group` variable to our nodelist that provides a value
based on the component to which each node belongs.

Run the following code to group our nodes and print our new
`cccss_network_groups` object and take a quick look:

```{r}
ccss_network_groups <- ccss_network %>%
  activate(nodes) %>%
  mutate(group = group_components())
  
ccss_network_groups
```

Now that we've assigned our nodes to a group, let's add one final layer
to our sociogram from above by using the `geom_node_voronoi()` function
to color our nodes by group assignment and changing our group numbers to
factors so each group is a distinct color:

```{r}
ccss_network_groups %>%
  ggraph(layout = "fr") + 
  geom_node_point(aes(size = out_degree, 
                      color = out_degree),
                  show.legend = FALSE) +
  geom_node_text(aes(label = actors, 
                     color = out_degree,
                     size = out_degree), 
                     repel=TRUE,
                 show.legend = FALSE) +
  geom_edge_link(arrow = arrow(length = unit(1, 'mm')), 
                 end_cap = circle(3, 'mm'),
                 alpha = .2) + 
  theme_graph() + 
  geom_node_voronoi(aes(fill = factor(group),
                         alpha = .05), 
                    max.radius = .5,
                    show.legend = FALSE) 
```

I'm not entirely sure this was an improvement to our sociogram but it
definitely is colorful!!

## 4b. Identify Sentiment

Recall from the Prepare section that the final question guiding this
analysis was:

> Which actors in our network tend to be more opposed to the Common
> Core?

In this section, we'll use the {vader} package for sentiment analysis
introduce in Unit 3 to help answer this question.

### **👉 Your Turn** **⤵**

In the code chunk below, use the `vader_df()` function to perform
sentiment analysis on the `text` in our `ccss_tweets` data frame created
above, save as `summary_vader` and inspect the results. **Hint:** You
will need to use the `$` operator to select the `text` column containing
our tweets.

```{r}
summary_vader <- vader_df(ccss_tweets$text)

summary_vader
```

Recall that the `compound` score is the one most commonly used for
sentiment analysis by most researchers, and is computed by summing the
valence scores of each word in the lexicon, adjusted according to the
rules, and then normalized to be between -1 (most extreme negative) and
+1 (most extreme positive).

Now use the `inner_join()` function introduced in the [Join Data Sets
Primer](https://rstudio.cloud/learn/primers/4.3) to add these sentiment
scores values back to our tweets by joining `summary_vader` and
`ccss_tweets` and using `by = "text"` as the key for matching both data
frames:

```{r}
tweet_sentiment <-inner_join(summary_vader, 
                             ccss_tweets,
                             by = "text") %>%
  select(screen_name, compound, )
```

#### Summarize Sentiment

Now that you have these joined, let's group our data by each user's
`screen_name` and summarize their mean `compound` sentiment score:

```{r}
user_sentiment <- tweet_sentiment %>%
  group_by(screen_name) %>%
  summarise(sentiment = mean(compound))
```

Note that we have effectively just created some new node and edge
"attributes" that could be incorporated into our network visualization
to potentially help understand why groups may have formed as they did.

### **👉 Your Turn** **⤵**

Recall from the Prepare section that the final question guiding this
analysis was:

> Which actors in our network tend to be more opposed to the Common
> Core?

Use the code chunk below to inspect our data frame and answer the
question above in the space below:

```{r, eval=FALSE}
user_sentiment
```

-   YOUR RESPONSE HERE

------------------------------------------------------------------------

# 5. COMMUNICATE

In this case study, we focused on the literature guiding our analysis;
wrangling our data into a network object; examining basic network
centrality measures; and identifying groups and sentiment in tweets
about the CCSS curriculum standards.

In response to the following research questions driving this analysis,
write a 2-3 sentences summarizing our findings:

1.  Who are the main transmitters, transceivers, and transcenders in our
    Common Core Twitter network?

    -   

2.  What subgroups, or factions, existed in our network?

    -   

3.  Which actors in our network tended to be more opposed to the Common
    Core?

    -   

Finally, add 1-2 sentences in response to the following prompts:

1.  One important thing I took away from this unit about network network
    analysis:

    -   

2.  One thing about network analysis I want to learn more about:

    -   

### 🧶 Knit & Check ✅

Congratulations - you've completed your first text mining case study! To
complete your work, you can click the drop down arrow at the top of the
file, then select "Knit top HTML". This will create a report in your
Files pane that serves as a record of your code and its output you can
open or share.

### References
