This project is inspired by our work in Units 3 and 4 with text mining and social network analysis (SNA). Specifically, this project borrows from the Week 10 Case Study that references Dr. Rosenberg’s research of publicly accessible data from Twitter as well as the Week 12 Case Study referencing Jonathan Supovitz, Alan Daly, Miguel del Fresno and Christian Kolouch’s research of the Common Core Project.
For the #ThinkandDo Network project, we will continue to investigate how text mining and SNA can be used to track engagement and sentiment of the NC State brand theme. Specifically, this case study will cover the following topics pertaining to each data-intensive workflow process:
Prepare: Prior to analysis, we’ll take a look at the context from which our data came, formulate some research questions, and describe the target audience of this analysis.
Wrangle: In the wrangling section of our case study, we will walk through the data pre-processing steps, variable or features that have been created to address research questions, and data transformations.
Explore: With our network data tidied, we learn to calculate some key network measures and to illustrate some of these stats through network visualization.
Model: We conclude our analysis by introducing community detection algorithms for identifying groups and revisiting sentiment about the common core.
Communicate: We highlight key findings and insights, suggestion potential action, discuss the limitations and any ethical/legal issues.
As a current NC State grad student (and a previous undergrad student), I have a general curiosity about the #ThinkandDo brand theme used by the university. In the wake of the NC State-UNC game, I was reminded of how social media, specifically hashtags are used to share ideas and information. My sister attended NC State prior to “Think and Do” being adopted by the university. This concept extends to so many areas of NC State’s culture–from scholarships to annual giving drives to athletics, #ThinkandDo is an integral part of the university’s mission. When I was exploring project ideas, I became interested in how this motto is used by actors and the links between them.
For the #ThinkandDo Network project, we are going to focus our questions on the following areas:
This data was collected using the rtweet library and Twitter API guided by Rosenberg et al. (2020).
It’s my assumption that the university would benefit the most from this analysis. Based on my research questions, this would reveal actors in their network, show how focused the relations are in the network (focused on a small number of actors, a large number of actors, a single actor), and reveal if the sentiment of these tweets is positive or negative. This could be beneficial information from a marketing standpoint as well as uncovering new information about how users connect with NC State through Twitter.
We’ll start by install packages for the following steps.
install.packages("twitteR")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.1'
## (as 'lib' is unspecified)
install.packages("tidyverse")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.1'
## (as 'lib' is unspecified)
install.packages("tidytext")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.1'
## (as 'lib' is unspecified)
install.packages("vader")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.1'
## (as 'lib' is unspecified)
install.packages("rtweet")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.1'
## (as 'lib' is unspecified)
install.packages("ggplot2")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.1'
## (as 'lib' is unspecified)
install.packages("dplyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.1'
## (as 'lib' is unspecified)
install.packages("rtweet")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.1'
## (as 'lib' is unspecified)
install.packages("igraph")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.1'
## (as 'lib' is unspecified)
install.packages("ggraph")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.1'
## (as 'lib' is unspecified)
install.packages("wordcloud2")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.1'
## (as 'lib' is unspecified)
Now that our packages our loaded, we’ll load our libraries.
library("twitteR")
library("tidyverse")
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.6 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.4 ✓ stringr 1.4.0
## ✓ readr 2.1.1 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::id() masks twitteR::id()
## x dplyr::lag() masks stats::lag()
## x dplyr::location() masks twitteR::location()
library("tidytext")
library("vader")
library("rtweet")
##
## Attaching package: 'rtweet'
## The following object is masked from 'package:purrr':
##
## flatten
## The following object is masked from 'package:twitteR':
##
## lookup_statuses
library("ggplot2")
library("dplyr")
library("rtweet")
library("igraph")
##
## Attaching package: 'igraph'
## The following objects are masked from 'package:dplyr':
##
## as_data_frame, groups, union
## The following objects are masked from 'package:purrr':
##
## compose, simplify
## The following object is masked from 'package:tidyr':
##
## crossing
## The following object is masked from 'package:tibble':
##
## as_data_frame
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
## The following object is masked from 'package:base':
##
## union
library("ggraph")
library("wordcloud2")
Next, we’ll enter information about our app including our API and access keys.
think_token <- create_token(
app = thinkanddo,
consumer_key = consumer_key,
consumer_secret = consumer_secret,
access_token = access_token,
access_secret = access_secret)
Let’s generate a search using our hashtag. I’m also attaching the head() function to get a quick look at the results being generated.
think_tweets <- search_tweets(q = "#ThinkAndDo",
n = 500)
head(think_tweets)
## # A tibble: 6 × 90
## user_id status_id created_at screen_name text source
## <chr> <chr> <dttm> <chr> <chr> <chr>
## 1 316530285 1468834065465942016 2021-12-09 06:45:05 NikGalanopoulos "#Ne… Twitt…
## 2 316530285 1467745860146638848 2021-12-06 06:40:57 NikGalanopoulos "#Ne… Twitt…
## 3 316530285 1466675130420113415 2021-12-03 07:46:15 NikGalanopoulos "#Ne… Twitt…
## 4 316530285 1468487804732723201 2021-12-08 07:49:10 NikGalanopoulos "#Ne… Twitt…
## 5 316530285 1467074173826736131 2021-12-04 10:11:54 NikGalanopoulos "#Ne… Twitt…
## 6 316530285 1468123734879916035 2021-12-07 07:42:29 NikGalanopoulos "#Ne… Twitt…
## # … with 84 more variables: display_text_width <dbl>, reply_to_status_id <chr>,
## # reply_to_user_id <chr>, reply_to_screen_name <chr>, is_quote <lgl>,
## # is_retweet <lgl>, favorite_count <int>, retweet_count <int>,
## # quote_count <int>, reply_count <int>, hashtags <list>, symbols <list>,
## # urls_url <list>, urls_t.co <list>, urls_expanded_url <list>,
## # media_url <list>, media_t.co <list>, media_expanded_url <list>,
## # media_type <list>, ext_media_url <list>, ext_media_t.co <list>, …
Now, we’ll be viewing the data downloaded from Twitter.
view(think_tweets)
For the wrangle step, our primary tasks are to:
Restructure Data. We focus on removing extraneous data using the select(), filter(), and relocate() functions.
Tidy Text. Finally, we introduce the {tidytext} package to “tidy” and tokenize our tweets in order to create our data frame for analysis. We also introduce a new join function to remove “stop words” that don’t add much value to our analysis.
In the above code chunk, we can identify a few tweets that don’t fit into our intended target. Before we start addressing our research questions, we need to make some changes to our search criteria. First, let’s start by removing any tweets that aren’t in English. Almost (if not all) of NC State’s social media marketing is done in English.
think_tweets <- search_tweets(q = "#ThinkAndDo",
n = 18000,
include_rts = FALSE,
`-filter` = "replies",
lang = "en")
head(think_tweets)
## # A tibble: 6 × 90
## user_id status_id created_at screen_name text source
## <chr> <chr> <dttm> <chr> <chr> <chr>
## 1 918113257478590464 146861168… 2021-12-08 16:01:26 SupplyChain… "A rec… Hoots…
## 2 1095716063390695425 146825825… 2021-12-07 16:37:01 NCState_PMB "Our D… Twitt…
## 3 1095716063390695425 146685728… 2021-12-03 19:50:03 NCState_PMB "Pleas… Twitt…
## 4 1095716063390695425 146685746… 2021-12-03 19:50:47 NCState_PMB "Pleas… Twitt…
## 5 2684135433 146805034… 2021-12-07 02:50:51 Anthony1All… "Proud… Twitt…
## 6 907382738 146795103… 2021-12-06 20:16:13 KathieDello "A sho… Twitt…
## # … with 84 more variables: display_text_width <dbl>, reply_to_status_id <chr>,
## # reply_to_user_id <chr>, reply_to_screen_name <chr>, is_quote <lgl>,
## # is_retweet <lgl>, favorite_count <int>, retweet_count <int>,
## # quote_count <int>, reply_count <int>, hashtags <list>, symbols <list>,
## # urls_url <list>, urls_t.co <list>, urls_expanded_url <list>,
## # media_url <list>, media_t.co <list>, media_expanded_url <list>,
## # media_type <list>, ext_media_url <list>, ext_media_t.co <list>, …
Using the ‘head’ function, we can see that our additional filter was helpful. We’re not merging data sets, so we don’t have to worry about combining data. We’ll take an additional step to export our tweets as a CSV.
write_as_csv(think_tweets, "think-tweets.csv")
We will tidy this text more, but we’ll save that for a future step. For now, let’s focus on formatting network data.
One of the research questions we’re working toward is determining the roles of the actors in the network. As well, we want to identify how centralized the network is. As a first step, we’re reducing the amount of columns so we can see a bit more about the relationship between our actors.
edge <- think_tweets %>%
relocate(sender = screen_name, target = mentions_screen_name) %>%
select(sender, target, created_at, text)
edge
## # A tibble: 15 × 4
## sender target created_at text
## <chr> <list> <dttm> <chr>
## 1 SupplyChainNCSU <chr [3]> 2021-12-08 16:01:26 "A recent study from researche…
## 2 NCState_PMB <chr [1]> 2021-12-07 16:37:01 "Our Department is hiring an a…
## 3 NCState_PMB <chr [1]> 2021-12-03 19:50:03 "Please join us in wishing luc…
## 4 NCState_PMB <chr [1]> 2021-12-03 19:50:47 "Please join us in wishing luc…
## 5 Anthony1Allison <chr [2]> 2021-12-07 02:50:51 "Proud to have Grant in our in…
## 6 KathieDello <chr [1]> 2021-12-06 20:16:13 "A short read on our invaluabl…
## 7 drhipp <chr [2]> 2021-12-06 18:30:36 "@NCStateCNR @NCSUgeospatial T…
## 8 NCStateDELTA <chr [2]> 2021-12-03 19:00:41 "Meredith DiMattina has a driv…
## 9 NCSUCheer <chr [1]> 2021-12-03 18:10:05 "Thank you to the Boys and Gir…
## 10 buxton977 <chr [2]> 2021-12-03 16:58:10 "Final presentations in the @n…
## 11 QuesadaLabNCSU <chr [3]> 2021-12-02 20:40:56 "Big news! congrats @CALS_Dean…
## 12 NCStateCHASS <chr [1]> 2021-12-02 16:00:42 "How do you distill a novel — …
## 13 NCSU_MMB <chr [1]> 2021-12-02 14:49:52 "Want to see what our practicu…
## 14 Nathan_Goldman <chr [2]> 2021-12-02 04:46:43 "I am excited to share the acc…
## 15 NCSU_YFCS <chr [1]> 2021-12-01 21:05:00 "Only two days remain to visit…
We’re replicating the steps here from the Week 12 Case Study. Let’s run this chunk and see what sort of results we generate.
##edge_2 <- edge %>%
##unnest_tokens(input = target, output = receiver, to_lower = FALSE) %>%
##relocate(sender, receiver)
##edge_2
Hmm…this doesn’t quite as expected. Instead of populating results like what we saw in the Week 12 Case Study, the data collected from our tweets doesn’t populate the same way. Instead, we’ll look at other ways that we can determine the role of users in the #ThinkandDo network. We’ll check out these techniques in Section 4. Model.
For now, we’ll shift gears and focus on preparing for our sentiment analysis using the {vader} package.
Tokenize Text
In this step, we’ll be splitting up our text into tokens, words or bi-grams. We’ll also arrange our tokens and see how many observations we yield. First, let’s tidy our tweets using the unnest_tokens() function.
tidy_tweets <- think_tweets %>%
unnest_tokens(output = word, input = text) %>%
relocate(word)
head(tidy_tweets)
## # A tibble: 6 × 90
## word user_id status_id created_at screen_name source
## <chr> <chr> <chr> <dttm> <chr> <chr>
## 1 a 9181132574… 14686116860… 2021-12-08 16:01:26 SupplyChain… Hootsui…
## 2 recent 9181132574… 14686116860… 2021-12-08 16:01:26 SupplyChain… Hootsui…
## 3 study 9181132574… 14686116860… 2021-12-08 16:01:26 SupplyChain… Hootsui…
## 4 from 9181132574… 14686116860… 2021-12-08 16:01:26 SupplyChain… Hootsui…
## 5 researchers 9181132574… 14686116860… 2021-12-08 16:01:26 SupplyChain… Hootsui…
## 6 across 9181132574… 14686116860… 2021-12-08 16:01:26 SupplyChain… Hootsui…
## # … with 84 more variables: display_text_width <dbl>, reply_to_status_id <chr>,
## # reply_to_user_id <chr>, reply_to_screen_name <chr>, is_quote <lgl>,
## # is_retweet <lgl>, favorite_count <int>, retweet_count <int>,
## # quote_count <int>, reply_count <int>, hashtags <list>, symbols <list>,
## # urls_url <list>, urls_t.co <list>, urls_expanded_url <list>,
## # media_url <list>, media_t.co <list>, media_expanded_url <list>,
## # media_type <list>, ext_media_url <list>, ext_media_t.co <list>, …
We’ll do the same for the following code chunk, expect we won’t use the relocate() function. Let’s view the token that is created in the same chunk.
think_tokens <- think_tweets %>%
unnest_tokens(output = word, input = text)
think_tokens
## # A tibble: 418 × 90
## user_id status_id created_at screen_name source display_text_wi…
## <chr> <chr> <dttm> <chr> <chr> <dbl>
## 1 91811325… 1468611686… 2021-12-08 16:01:26 SupplyChai… Hoots… 215
## 2 91811325… 1468611686… 2021-12-08 16:01:26 SupplyChai… Hoots… 215
## 3 91811325… 1468611686… 2021-12-08 16:01:26 SupplyChai… Hoots… 215
## 4 91811325… 1468611686… 2021-12-08 16:01:26 SupplyChai… Hoots… 215
## 5 91811325… 1468611686… 2021-12-08 16:01:26 SupplyChai… Hoots… 215
## 6 91811325… 1468611686… 2021-12-08 16:01:26 SupplyChai… Hoots… 215
## 7 91811325… 1468611686… 2021-12-08 16:01:26 SupplyChai… Hoots… 215
## 8 91811325… 1468611686… 2021-12-08 16:01:26 SupplyChai… Hoots… 215
## 9 91811325… 1468611686… 2021-12-08 16:01:26 SupplyChai… Hoots… 215
## 10 91811325… 1468611686… 2021-12-08 16:01:26 SupplyChai… Hoots… 215
## # … with 408 more rows, and 84 more variables: reply_to_status_id <chr>,
## # reply_to_user_id <chr>, reply_to_screen_name <chr>, is_quote <lgl>,
## # is_retweet <lgl>, favorite_count <int>, retweet_count <int>,
## # quote_count <int>, reply_count <int>, hashtags <list>, symbols <list>,
## # urls_url <list>, urls_t.co <list>, urls_expanded_url <list>,
## # media_url <list>, media_t.co <list>, media_expanded_url <list>,
## # media_type <list>, ext_media_url <list>, ext_media_t.co <list>, …
Next, run the following code chunk.
think_tokens_1 <- think_tweets %>%
unnest_tokens(output = word, input = text, token = "tweets")
## Using `to_lower = TRUE` with `token = 'tweets'` may not preserve URLs.
think_tokens_1
## # A tibble: 388 × 90
## user_id status_id created_at screen_name source display_text_wi…
## <chr> <chr> <dttm> <chr> <chr> <dbl>
## 1 91811325… 1468611686… 2021-12-08 16:01:26 SupplyChai… Hoots… 215
## 2 91811325… 1468611686… 2021-12-08 16:01:26 SupplyChai… Hoots… 215
## 3 91811325… 1468611686… 2021-12-08 16:01:26 SupplyChai… Hoots… 215
## 4 91811325… 1468611686… 2021-12-08 16:01:26 SupplyChai… Hoots… 215
## 5 91811325… 1468611686… 2021-12-08 16:01:26 SupplyChai… Hoots… 215
## 6 91811325… 1468611686… 2021-12-08 16:01:26 SupplyChai… Hoots… 215
## 7 91811325… 1468611686… 2021-12-08 16:01:26 SupplyChai… Hoots… 215
## 8 91811325… 1468611686… 2021-12-08 16:01:26 SupplyChai… Hoots… 215
## 9 91811325… 1468611686… 2021-12-08 16:01:26 SupplyChai… Hoots… 215
## 10 91811325… 1468611686… 2021-12-08 16:01:26 SupplyChai… Hoots… 215
## # … with 378 more rows, and 84 more variables: reply_to_status_id <chr>,
## # reply_to_user_id <chr>, reply_to_screen_name <chr>, is_quote <lgl>,
## # is_retweet <lgl>, favorite_count <int>, retweet_count <int>,
## # quote_count <int>, reply_count <int>, hashtags <list>, symbols <list>,
## # urls_url <list>, urls_t.co <list>, urls_expanded_url <list>,
## # media_url <list>, media_t.co <list>, media_expanded_url <list>,
## # media_type <list>, ext_media_url <list>, ext_media_t.co <list>, …
Now that we’ve created our tokens, let’s examine the most common words being used in the tweets from our data set.
think_tokens_1 %>%
count(word, sort = TRUE)
## # A tibble: 264 × 2
## word n
## <chr> <int>
## 1 to 16
## 2 #thinkanddo 14
## 3 the 14
## 4 a 9
## 5 in 9
## 6 of 7
## 7 and 6
## 8 our 6
## 9 on 5
## 10 for 4
## # … with 254 more rows
Remove Stop Words
As you’d expect, the most common words are usually prepositions and other similar ‘filler’ words that don’t really serve our analysis. At this point, we’ll remove these stop words to help analyze sentiment in Section 4. First, let’s view stop words using the view() function.
View(stop_words)
Next, let’s use the anti_join() function to start removing stop words. We’ll also use head() to take a quick peek at the results.
think_tokens_2 <- anti_join(think_tokens_1, stop_words, by = "word")
head(think_tokens_2)
## # A tibble: 6 × 90
## user_id status_id created_at screen_name source display_text_wi…
## <chr> <chr> <dttm> <chr> <chr> <dbl>
## 1 91811325… 1468611686… 2021-12-08 16:01:26 SupplyChai… Hootsu… 215
## 2 91811325… 1468611686… 2021-12-08 16:01:26 SupplyChai… Hootsu… 215
## 3 91811325… 1468611686… 2021-12-08 16:01:26 SupplyChai… Hootsu… 215
## 4 91811325… 1468611686… 2021-12-08 16:01:26 SupplyChai… Hootsu… 215
## 5 91811325… 1468611686… 2021-12-08 16:01:26 SupplyChai… Hootsu… 215
## 6 91811325… 1468611686… 2021-12-08 16:01:26 SupplyChai… Hootsu… 215
## # … with 84 more variables: reply_to_status_id <chr>, reply_to_user_id <chr>,
## # reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
## # favorite_count <int>, retweet_count <int>, quote_count <int>,
## # reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
## # urls_t.co <list>, urls_expanded_url <list>, media_url <list>,
## # media_t.co <list>, media_expanded_url <list>, media_type <list>,
## # ext_media_url <list>, ext_media_t.co <list>, …
We’ll add a few more stop words to the list.
my_stopwords <- c("amp", "=", "+")
think_tokens_3 <- think_tokens_2 %>%
filter(!word %in% my_stopwords)
Let’s view our tibble now that we’ve removed stop words using the anti_fan() function and our own custom stop word list.
think_tokens_3 %>%
count(word, sort = TRUE)
## # A tibble: 198 × 2
## word n
## <chr> <int>
## 1 #thinkanddo 14
## 2 @ncstate 3
## 3 titled 3
## 4 @ncstatecnr 2
## 5 @ncstatepoole 2
## 6 #ncstate 2
## 7 #plantpack 2
## 8 alumni 2
## 9 carbon 2
## 10 county 2
## # … with 188 more rows
Looks great! Now we have some data that we can work with for our sentiment analysis.
This step will involve both describing data and creating visualization to reflect that information. Specifically, in this section we’ll focus on the following:
Top Tokens. We’ll take a look to see what the most common words were in our tokens data and use this data when constructing models for part b.
Create a Word Cloud. To provide a visual illustration to the results we yield from our analysis in part a, we will construct a word cloud.
In the code chunk below, we’ll see which words were the most common from our think_top_tokens data. We’re using the top 25 words since our data set is smaller in size.
think_top_tokens <- think_tokens_3 %>%
count(word, sort = TRUE) %>%
top_n(25)
## Selecting by n
think_top_tokens
## # A tibble: 198 × 2
## word n
## <chr> <int>
## 1 #thinkanddo 14
## 2 @ncstate 3
## 3 titled 3
## 4 @ncstatecnr 2
## 5 @ncstatepoole 2
## 6 #ncstate 2
## 7 #plantpack 2
## 8 alumni 2
## 9 carbon 2
## 10 county 2
## # … with 188 more rows
Now that we have results from our top tokens, we’ve create a word cloud using the wordcloud2() function.
wordcloud2(think_top_tokens)
Since our attempt in Section 2 didn’t yield the results we expected, we’ll be looking at other methods to identify who are actors are and the roles they occupy. Remember, we’re looking to address the following research question:
What roles do the actors play in this network (transmitters, transceivers, transcenders)?
Let’s start by identifying who our unique users are. Here, we’re using the unique() function.
users <- unique(think_tweets$screen_name)
users
## [1] "SupplyChainNCSU" "NCState_PMB" "Anthony1Allison" "KathieDello"
## [5] "drhipp" "NCStateDELTA" "NCSUCheer" "buxton977"
## [9] "QuesadaLabNCSU" "NCStateCHASS" "NCSU_MMB" "Nathan_Goldman"
## [13] "NCSU_YFCS"
Next, let’s look at the most retweeted tweets. I’m only looking at the top 5 considering the size of our network. The most retweeted tweets will be at the top of our results.
think_tweets %>%
arrange(-retweet_count) %>%
top_n(5, retweet_count) %>%
select(retweet_count, screen_name, text, created_at)
## # A tibble: 7 × 4
## retweet_count screen_name text created_at
## <int> <chr> <chr> <dttm>
## 1 3 KathieDello "A short read on our invalua… 2021-12-06 20:16:13
## 2 3 buxton977 "Final presentations in the … 2021-12-03 16:58:10
## 3 3 QuesadaLabNCSU "Big news! congrats @CALS_De… 2021-12-02 20:40:56
## 4 2 Nathan_Goldman "I am excited to share the a… 2021-12-02 04:46:43
## 5 1 NCState_PMB "Our Department is hiring an… 2021-12-07 16:37:01
## 6 1 NCState_PMB "Please join us in wishing l… 2021-12-03 19:50:03
## 7 1 NCStateCHASS "How do you distill a novel … 2021-12-02 16:00:42
Great! We have a solid list of the top 5 most retweeted tweets using our hashtag. Now, we’ll repeat the same step except this time we’ll find the most favorited tweets. Again, when we consider the size of our network, we’re just going to find the top 5.
think_tweets %>%
arrange(-favorite_count) %>%
top_n(5, favorite_count) %>%
select(favorite_count, screen_name, text,created_at)
## # A tibble: 5 × 4
## favorite_count screen_name text created_at
## <int> <chr> <chr> <dttm>
## 1 22 QuesadaLabNCSU "Big news! congrats @CALS_D… 2021-12-02 20:40:56
## 2 17 buxton977 "Final presentations in the… 2021-12-03 16:58:10
## 3 15 NCSUCheer "Thank you to the Boys and … 2021-12-03 18:10:05
## 4 12 NCState_PMB "Please join us in wishing … 2021-12-03 19:50:03
## 5 7 KathieDello "A short read on our invalu… 2021-12-06 20:16:13
At this point, we have a list of unique users, the most retweeted tweets, and the most favorited tweets. The last thing we’ll identify is the highest tweet count by screen name.
think_tweets %>%
count(screen_name, sort = TRUE) %>%
top_n(10) %>%
mutate(screen_name = paste0("@", screen_name))
## Selecting by n
## # A tibble: 13 × 2
## screen_name n
## <chr> <int>
## 1 @NCState_PMB 3
## 2 @Anthony1Allison 1
## 3 @buxton977 1
## 4 @drhipp 1
## 5 @KathieDello 1
## 6 @Nathan_Goldman 1
## 7 @NCStateCHASS 1
## 8 @NCStateDELTA 1
## 9 @NCSU_MMB 1
## 10 @NCSU_YFCS 1
## 11 @NCSUCheer 1
## 12 @QuesadaLabNCSU 1
## 13 @SupplyChainNCSU 1
Recall from the Prepare section that the final question guiding this analysis was:
Do actors in this network tend to be positive or negative in their sentiment?
In this section, we’ll use the {vader} package for sentiment analysis to answer this question
summary_vader <- vader_df(think_tweets$text)
## Warning in tolower(wpe[i]) %in% vaderLexicon$V1: input string ':-Þ' cannot be
## translated to UTF-8, is it valid in 'ANSI_X3.4-1968'?
## Warning in tolower(wpe[i]) %in% vaderLexicon$V1: input string ':Þ' cannot be
## translated to UTF-8, is it valid in 'ANSI_X3.4-1968'?
summary_vader
## text
## 1 A recent study from researchers across the @NCState campus details baseline needs to bring factories into compliance with labor standards: https://t.co/AzARaCe7l4 @NCStatePoole #ThinkandDo #laborethics @Robhandfield
## 2 Our Department is hiring an assistant professor in microbial physiology. It's a great opportunity to work at @NCState and #ThinkAndDo the extraordinary. \nYou can learn more about the position and apply at:\nhttps://t.co/H5rMzinIqf
## 3 Please join us in wishing luck to Nathan Wilson on their defense Monday, titled: "A Synthetic Carbon Fixation Cycle to Increase Carbon Assimilation in Plants"! 🌿🎉#ThinkAndDo #PlantPack https://t.co/aCTGXNcCVL
## 4 Please join us in wishing luck to Lee Kimmel on their defense Monday, titled: "Guide to the Vascular Flora of William B. Umstead State Park (Wake County, North Carolina)"! 🌿🎉\n#ThinkAndDo #PlantPack https://t.co/DkqiIzuRUb
## 5 Proud to have Grant in our inaugural MMA cohort! @NCStatePooleMMA #ThinkandDo #GoPack @BigGrant73_ https://t.co/XJowQ4M4oE
## 6 A short read on our invaluable partnership with @TheScienceHouse #thinkanddo 🐺 https://t.co/x26rNoZwsg
## 7 @NCStateCNR @NCSUgeospatial Thanks! I'm excited for what we can #thinkanddo https://t.co/Ck1HLa5w8Y
## 8 Meredith DiMattina has a drive for going above and beyond. She shares how her determination for learning more led her to pursue a master's degree in the online MGIST program and how it has impacted her career. @NCStateOnline @NCStateCNR #ThinkandDo\nhttps://t.co/VeyOVsyWLn
## 9 Thank you to the Boys and Girls Club of Wake County for recognizing our team as their “Volunteer of the Year”. #ThinkAndDo ✨✨✨ https://t.co/WTnRNzQbRo
## 10 Final presentations in the @ncsulibraries Viz Studio were a success! So proud of my #Datavisualization students! \n\n@ncstate #ncsu #ncstate #thinkanddo #datascience #gephi #Tableau #datascienceacademy https://t.co/BFwvf4ZQ8I
## 11 Big news! congrats @CALS_Dean \U{01f973} @NCSU_DEPP @NCStateCALS #ThinkAndDo https://t.co/nCCt6aIUVt
## 12 How do you distill a novel — one with strong connections to your own life — into a three-minute summary? Ask MFA student Catey Christiansen, co-winner of @ncsugradschool's 3MT competition. #ThinkAndDo 📖 https://t.co/8bcBlIlK3B https://t.co/1ssaQMoZD6
## 13 Want to see what our practicum projects entail? Tune in right now to watch MMB students present on their practicum projects. #ThinkAndDo @NCState_PMB https://t.co/buMIHY02Bf
## 14 I am excited to share the acceptance of our paper at AOS titled "Does task-specific knowledge improve audit quality: Evidence from audits of income tax accounts" (coauthored w/Kat Harris and Tom Omer).\n\n@ncstatemac @NCStatePoole \n\n#thinkanddo \n\nhttps://t.co/fNOw923CUV
## 15 Only two days remain to visit the Kurudi Nyumbani Black Alumni Art Exhibition in the African American Cultural Center! Head to the gallery by December 3rd to see alumni artwork on the theme “The Return Home”. \n#ncstate #thinkanddo #aacc #howlback https://t.co/H93ep1zeJU
## word_scores
## 1 {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
## 2 {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3.1, 1.8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
## 3 {1.3, 1.2, 0, 0, 0.9, 2, 0, 0, 0, 0, 0, 0.5, 0, 0, 0, 0, 0, 0, 0, 0, 1.3, 0, 0, 0, 0, 0, 0, 0}
## 4 {1.3, 1.2, 0, 0, 0.9, 2, 0, 0, 0, 0, 0, 0.5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
## 5 {2.1, 0, 0, 1.5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
## 6 {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
## 7 {0, 0, 1.9, 0, 1.4, 0, 0, 0, 0, 0, 0}
## 8 {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1.2, 0, 0, 1.7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
## 9 {1.5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
## 10 {0, 0, 0, 0, 0, 0, 0, 0, 0, 2.7, 0, 2.393, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
## 11 {0, 0, 2.4, 0, 0, 0, 0, 0, 0}
## 12 {0, 0, 0, 0, 0, 1.3, 0, 0, 0, 2.3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
## 13 {0.3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
## 14 {0, 0, 1.4, 0, 1.2, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1.9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
## 15 {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
## compound pos neu neg but_count
## 1 0.000 0.000 1.000 0 0
## 2 0.785 0.182 0.818 0 0
## 3 0.888 0.380 0.620 0 0
## 4 0.848 0.286 0.714 0 0
## 5 0.709 0.329 0.671 0 0
## 6 0.000 0.000 1.000 0 0
## 7 0.680 0.383 0.617 0 0
## 8 0.599 0.112 0.888 0 0
## 9 0.361 0.098 0.902 0 0
## 10 0.826 0.250 0.750 0 0
## 11 0.571 0.316 0.684 0 0
## 12 0.681 0.149 0.851 0 0
## 13 0.077 0.053 0.947 0 0
## 14 0.859 0.247 0.753 0 0
## 15 0.000 0.000 1.000 0 0
Now, we’ll use the inner_join() inner_join() to add sentiment scores to our tweets by joining summary_vader (which we used in the above chunk) and our think_tweets data set.
tweet_sentiment <-inner_join(summary_vader,
think_tweets,
by = "text") %>%
select(screen_name, compound, )
Finally, let’s summarize the sentiment of our tweets. We’ll group data by users and summarize the mean compound sentiment score.
user_sentiment <- tweet_sentiment %>%
group_by(screen_name) %>%
summarise(sentiment = mean(compound))
user_sentiment
## # A tibble: 13 × 2
## screen_name sentiment
## <chr> <dbl>
## 1 Anthony1Allison 0.709
## 2 buxton977 0.826
## 3 drhipp 0.68
## 4 KathieDello 0
## 5 Nathan_Goldman 0.859
## 6 NCState_PMB 0.840
## 7 NCStateCHASS 0.681
## 8 NCStateDELTA 0.599
## 9 NCSU_MMB 0.077
## 10 NCSU_YFCS 0
## 11 NCSUCheer 0.361
## 12 QuesadaLabNCSU 0.571
## 13 SupplyChainNCSU 0
Your case study should demonstrate your ability to clearly and effectively communicate the most important and most useful findings.
In response to the following research questions driving this analysis, here is a summary of our findings:
What roles do the actors play in this network (transmitters, transceivers, transcenders)?
The top transmitters were (NCState_PMB?) and (buxton977?). The top transceivers were (KathieDello?) and (buxton977?). The top trascenders were (QuesadaLabNCSU?) and (NCSUCheer?).
How centralized is the network?
There does not seem to be much centrality in this network. While these actors all used the #ThinkandDo hashtag, the connection between them seems to be distant. There is no to little engagement between the users identified in our data sets.
Do actors in this network tend to be positive or negative in their sentiment?
Actors in the network tended to be positive in their sentiment. In fact, there were no instances of negative sentiment. The number of users was limited so this is unsurprising; however, there were three of the 13 users from our sentiment analysis that had a sentiment score of 0 which is neither negative or positive. The most positive actors were (Nathan_Goldman?) who had a sentiment score of 0.859 and (NC_State_PMB?) who had a score of 0.84.
Finally, let’s address ways that this data could be improved or expanded on:
Exploring tweets in R. (2020, January 8). Trafford Data Lab. https://medium.com/@traffordDataLab/exploring-tweets-in-r-54f6011a193d
Lee, J. (2020, February 4). Pulling Tweets into R. Medium.com. https://towardsdatascience.com/pulling-tweets-into-r-e17d4981cfe2
Wasser, L., & Farmer, C. (n.d.). Lesson 2. Twitter Data in R Using Rtweet: Analyze and Download Twitter Data. Earth Lab. https://www.earthdatascience.org/courses/earth-analytics/get-data-using-apis/use-twitter-api-r/