Sentiment Analysis Badge

The final activity for each learning lab provides space to work with data and to reflect on how the concepts and techniques introduced in each lab might apply to your own research.

To earn a badge for each lab, you are required to respond to a set of prompts for two parts:

In Part I, you will reflect on your understanding of key concepts and begin to think about potential next steps for your own study.
In Part II, you will create a simple data product in R that demonstrates your ability to apply a data analysis technique introduced in this learning lab.

Part I: Reflect and Plan

Use the institutional library (e.g. NCSU Library), Google Scholar or search engine to locate a research article, presentation, or resource that applies text mining to an educational context or topic of interest. More specifically, locate a text mining study that visualize text data.

Provide an APA citation for your selected study.
- Zucco, C., Calabrese, B., & Cannataro, M. (2017, November). Sentiment analysis and affective computing for depression monitoring. In 2017 IEEE international conference on bioinformatics and biomedicine (BIBM) (pp. 1988-1995). IEEE.
How does the sentiment analysis address research questions?
- It helps in identifying depression through sentiment analysis.

Draft a research question for a population you may be interested in studying, or that would be of interest to educational researchers, and that would require the collection of text data and answer the following questions:

What are preservice teachers’ attitudes toward technology integration?

What text data would need to be collected?
- Responses from Open-ended questions may help answer this question.
For what reason would text data need to be collected in order to address this question?
- The preservice teachers may describe their attitudes either positive or negative emotions in their responses.
Explain the analytical level at which these text data would need to be collected and analyzed.
- Sentiment analysis level to generate either positive or negative emotions.

Part II: Data Product

Use your case study file to create small multiples like the following figure:

I highly recommend creating a new R script in your lab-2 folder to complete this task. When your code is ready to share, use the code chunk below to share the final code for your model and answer the questions that follow.

# YOUR FINAL CODE HERE
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(readr)
library(tidyr)
library(rtweet)
library(writexl)
library(readxl)
library(tidytext)
library(textdata)
library(ggplot2)
library(textdata)
library(scales)

## 
## Attaching package: 'scales'

## The following object is masked from 'package:readr':
## 
##     col_factor

ngss_tweets <- read_xlsx("data/ngss_tweets.xlsx")
ccss_tweets <- read_xlsx("data/csss_tweets.xlsx")

ngss_text <-
  ngss_tweets %>%
  filter(lang == "en") %>%
  select(status_id, screen_name, created_at, text) %>%
  mutate(standards = "ngss") %>%
  relocate(standards)

ccss_text <-
  ccss_tweets %>%
  filter(lang == "en") %>%
  select(status_id, screen_name, created_at, text) %>%
  mutate(standards = "ccss") %>%
  relocate(standards)

tweets <- bind_rows(ngss_text, ccss_text)
head(tweets)

## # A tibble: 6 × 5
##   standards status_id           screen_name  created_at          text           
##   <chr>     <chr>               <chr>        <dttm>              <chr>          
## 1 ngss      1365716690336645124 loyr2662     2021-02-27 17:33:27 "Switching gea…
## 2 ngss      1363217513761415171 loyr2662     2021-02-20 20:02:37 "Was just intr…
## 3 ngss      1365709122763653133 Furlow_teach 2021-02-27 17:03:23 "@IBchemmilam …
## 4 ngss      1365673294360420353 Furlow_teach 2021-02-27 14:41:01 "@IBchemmilam …
## 5 ngss      1365667393188601857 TdiShelton   2021-02-27 14:17:34 "I am so honor…
## 6 ngss      1365690477266284545 TdiShelton   2021-02-27 15:49:17 "Thank you @br…

tail(tweets)

## # A tibble: 6 × 5
##   standards status_id           screen_name   created_at          text          
##   <chr>     <chr>               <chr>         <dttm>              <chr>         
## 1 ccss      1362923643924316162 JosiePaul8807 2021-02-20 00:34:53 "@SenatorHick…
## 2 ccss      1362910913855160320 ctwittnc      2021-02-19 23:44:18 "@winningatmy…
## 3 ccss      1362906588021989376 the_rbeagle   2021-02-19 23:27:06 "@dmarush @el…
## 4 ccss      1362902622445862912 silea         2021-02-19 23:11:21 "@LizerReal I…
## 5 ccss      1362899370199445508 JodyCoyote12  2021-02-19 22:58:25 "@CarlaRK3 @N…
## 6 ccss      1362894990813188096 Ryan_Hawes    2021-02-19 22:41:01 "I just got a…

tweet_tokens <- 
  tweets %>%
  unnest_tokens(output = word, 
                input = text)
tidy_tweets <-
  tweet_tokens %>%
  anti_join(stop_words, by = "word") %>%
  filter(!word == "amp")

afinn <- get_sentiments("afinn")
afinn

## # A tibble: 2,477 × 2
##    word       value
##    <chr>      <dbl>
##  1 abandon       -2
##  2 abandoned     -2
##  3 abandons      -2
##  4 abducted      -2
##  5 abduction     -2
##  6 abductions    -2
##  7 abhor         -3
##  8 abhorred      -3
##  9 abhorrent     -3
## 10 abhors        -3
## # ℹ 2,467 more rows

bing <- get_sentiments("bing")
bing

## # A tibble: 6,786 × 2
##    word        sentiment
##    <chr>       <chr>    
##  1 2-faces     negative 
##  2 abnormal    negative 
##  3 abolish     negative 
##  4 abominable  negative 
##  5 abominably  negative 
##  6 abominate   negative 
##  7 abomination negative 
##  8 abort       negative 
##  9 aborted     negative 
## 10 aborts      negative 
## # ℹ 6,776 more rows

nrc <- get_sentiments("nrc")
nrc

## # A tibble: 13,872 × 2
##    word        sentiment
##    <chr>       <chr>    
##  1 abacus      trust    
##  2 abandon     fear     
##  3 abandon     negative 
##  4 abandon     sadness  
##  5 abandoned   anger    
##  6 abandoned   fear     
##  7 abandoned   negative 
##  8 abandoned   sadness  
##  9 abandonment anger    
## 10 abandonment fear     
## # ℹ 13,862 more rows

loughran <- get_sentiments("loughran")
loughran

## # A tibble: 4,150 × 2
##    word         sentiment
##    <chr>        <chr>    
##  1 abandon      negative 
##  2 abandoned    negative 
##  3 abandoning   negative 
##  4 abandonment  negative 
##  5 abandonments negative 
##  6 abandons     negative 
##  7 abdicated    negative 
##  8 abdicates    negative 
##  9 abdicating   negative 
## 10 abdication   negative 
## # ℹ 4,140 more rows

sentiment_afinn <- inner_join(tidy_tweets, afinn, by = "word")
sentiment_afinn

## # A tibble: 1,540 × 6
##    standards status_id           screen_name  created_at          word     value
##    <chr>     <chr>               <chr>        <dttm>              <chr>    <dbl>
##  1 ngss      1365716690336645124 loyr2662     2021-02-27 17:33:27 win          4
##  2 ngss      1365709122763653133 Furlow_teach 2021-02-27 17:03:23 love         3
##  3 ngss      1365709122763653133 Furlow_teach 2021-02-27 17:03:23 sweet        2
##  4 ngss      1365709122763653133 Furlow_teach 2021-02-27 17:03:23 signifi…     1
##  5 ngss      1365667393188601857 TdiShelton   2021-02-27 14:17:34 honored      2
##  6 ngss      1365667393188601857 TdiShelton   2021-02-27 14:17:34 opportu…     2
##  7 ngss      1365667393188601857 TdiShelton   2021-02-27 14:17:34 wonderf…     4
##  8 ngss      1365667393188601857 TdiShelton   2021-02-27 14:17:34 powerful     2
##  9 ngss      1365690477266284545 TdiShelton   2021-02-27 15:49:17 loved        3
## 10 ngss      1365706140496130050 TdiShelton   2021-02-27 16:51:32 share        1
## # ℹ 1,530 more rows

sentiment_bing <- inner_join(tidy_tweets, bing, by = "word")
sentiment_bing

## # A tibble: 1,668 × 6
##    standards status_id           screen_name created_at          word  sentiment
##    <chr>     <chr>               <chr>       <dttm>              <chr> <chr>    
##  1 ngss      1365716690336645124 loyr2662    2021-02-27 17:33:27 win   positive 
##  2 ngss      1365709122763653133 Furlow_tea… 2021-02-27 17:03:23 love  positive 
##  3 ngss      1365709122763653133 Furlow_tea… 2021-02-27 17:03:23 help… positive 
##  4 ngss      1365709122763653133 Furlow_tea… 2021-02-27 17:03:23 sweet positive 
##  5 ngss      1365709122763653133 Furlow_tea… 2021-02-27 17:03:23 tough positive 
##  6 ngss      1365667393188601857 TdiShelton  2021-02-27 14:17:34 hono… positive 
##  7 ngss      1365667393188601857 TdiShelton  2021-02-27 14:17:34 appr… positive 
##  8 ngss      1365667393188601857 TdiShelton  2021-02-27 14:17:34 wond… positive 
##  9 ngss      1365667393188601857 TdiShelton  2021-02-27 14:17:34 powe… positive 
## 10 ngss      1365690477266284545 TdiShelton  2021-02-27 15:49:17 loved positive 
## # ℹ 1,658 more rows

sentiment_nrc <- inner_join(tidy_tweets, nrc, by = "word")

## Warning in inner_join(tidy_tweets, nrc, by = "word"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 26 of `x` matches multiple rows in `y`.
## ℹ Row 7509 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

sentiment_nrc

## # A tibble: 7,841 × 6
##    standards status_id           screen_name created_at          word  sentiment
##    <chr>     <chr>               <chr>       <dttm>              <chr> <chr>    
##  1 ngss      1363217513761415171 loyr2662    2021-02-20 20:02:37 math… trust    
##  2 ngss      1365709122763653133 Furlow_tea… 2021-02-27 17:03:23 fami… positive 
##  3 ngss      1365709122763653133 Furlow_tea… 2021-02-27 17:03:23 fami… trust    
##  4 ngss      1365709122763653133 Furlow_tea… 2021-02-27 17:03:23 love  joy      
##  5 ngss      1365709122763653133 Furlow_tea… 2021-02-27 17:03:23 love  positive 
##  6 ngss      1365709122763653133 Furlow_tea… 2021-02-27 17:03:23 sweet anticipa…
##  7 ngss      1365709122763653133 Furlow_tea… 2021-02-27 17:03:23 sweet joy      
##  8 ngss      1365709122763653133 Furlow_tea… 2021-02-27 17:03:23 sweet positive 
##  9 ngss      1365709122763653133 Furlow_tea… 2021-02-27 17:03:23 sweet surprise 
## 10 ngss      1365709122763653133 Furlow_tea… 2021-02-27 17:03:23 sweet trust    
## # ℹ 7,831 more rows

sentiment_loughran <- inner_join(tidy_tweets, bing, by = "word")
sentiment_loughran

## # A tibble: 1,668 × 6
##    standards status_id           screen_name created_at          word  sentiment
##    <chr>     <chr>               <chr>       <dttm>              <chr> <chr>    
##  1 ngss      1365716690336645124 loyr2662    2021-02-27 17:33:27 win   positive 
##  2 ngss      1365709122763653133 Furlow_tea… 2021-02-27 17:03:23 love  positive 
##  3 ngss      1365709122763653133 Furlow_tea… 2021-02-27 17:03:23 help… positive 
##  4 ngss      1365709122763653133 Furlow_tea… 2021-02-27 17:03:23 sweet positive 
##  5 ngss      1365709122763653133 Furlow_tea… 2021-02-27 17:03:23 tough positive 
##  6 ngss      1365667393188601857 TdiShelton  2021-02-27 14:17:34 hono… positive 
##  7 ngss      1365667393188601857 TdiShelton  2021-02-27 14:17:34 appr… positive 
##  8 ngss      1365667393188601857 TdiShelton  2021-02-27 14:17:34 wond… positive 
##  9 ngss      1365667393188601857 TdiShelton  2021-02-27 14:17:34 powe… positive 
## 10 ngss      1365690477266284545 TdiShelton  2021-02-27 15:49:17 loved positive 
## # ℹ 1,658 more rows

ts_plot(tweets, by = "days")

tweets %>%
  group_by(standards) %>%
  ts_plot(by = "days")

summary_bing <- sentiment_bing %>% 
  group_by(standards) %>% 
  count(sentiment) 

summary_bing

## # A tibble: 4 × 3
## # Groups:   standards [2]
##   standards sentiment     n
##   <chr>     <chr>     <int>
## 1 ccss      negative    926
## 2 ccss      positive    446
## 3 ngss      negative     66
## 4 ngss      positive    230

summary_bing <- sentiment_bing %>% 
  group_by(standards) %>% 
  count(sentiment, sort = TRUE) %>% 
  spread(sentiment, n) 

summary_bing

## # A tibble: 2 × 3
## # Groups:   standards [2]
##   standards negative positive
##   <chr>        <int>    <int>
## 1 ccss           926      446
## 2 ngss            66      230

summary_bing <- sentiment_bing %>% 
  group_by(standards) %>% 
  count(sentiment, sort = TRUE) %>% 
  spread(sentiment, n) %>%
  mutate(sentiment = positive - negative) %>%
  mutate(lexicon = "bing") %>%
  relocate(lexicon)

summary_bing

## # A tibble: 2 × 5
## # Groups:   standards [2]
##   lexicon standards negative positive sentiment
##   <chr>   <chr>        <int>    <int>     <int>
## 1 bing    ccss           926      446      -480
## 2 bing    ngss            66      230       164

head(sentiment_afinn)

## # A tibble: 6 × 6
##   standards status_id           screen_name  created_at          word      value
##   <chr>     <chr>               <chr>        <dttm>              <chr>     <dbl>
## 1 ngss      1365716690336645124 loyr2662     2021-02-27 17:33:27 win           4
## 2 ngss      1365709122763653133 Furlow_teach 2021-02-27 17:03:23 love          3
## 3 ngss      1365709122763653133 Furlow_teach 2021-02-27 17:03:23 sweet         2
## 4 ngss      1365709122763653133 Furlow_teach 2021-02-27 17:03:23 signific…     1
## 5 ngss      1365667393188601857 TdiShelton   2021-02-27 14:17:34 honored       2
## 6 ngss      1365667393188601857 TdiShelton   2021-02-27 14:17:34 opportun…     2

summary_afinn <- sentiment_afinn %>% 
  group_by(standards) %>% 
  summarise(sentiment = sum(value)) %>% 
  mutate(lexicon = "AFINN") %>%
  relocate(lexicon)
summary_afinn

## # A tibble: 2 × 3
##   lexicon standards sentiment
##   <chr>   <chr>         <dbl>
## 1 AFINN   ccss           -808
## 2 AFINN   ngss            503

afinn_score <- sentiment_afinn %>% 
  group_by(standards, status_id) %>% 
  summarise(value = sum(value))

## `summarise()` has grouped output by 'standards'. You can override using the
## `.groups` argument.

afinn_score

## # A tibble: 857 × 3
## # Groups:   standards [2]
##    standards status_id           value
##    <chr>     <chr>               <dbl>
##  1 ccss      1362894990813188096     2
##  2 ccss      1362899370199445508     4
##  3 ccss      1362906588021989376    -2
##  4 ccss      1362910494487535618    -9
##  5 ccss      1362910913855160320    -1
##  6 ccss      1362928225379250179     2
##  7 ccss      1362933982074073090    -1
##  8 ccss      1362947497258151945    -3
##  9 ccss      1362949805694013446     3
## 10 ccss      1362970614282264583     3
## # ℹ 847 more rows

afinn_sentiment <- afinn_score %>%
  filter(value != 0) %>%
  mutate(sentiment = if_else(value < 0, "negative", "positive"))

afinn_sentiment

## # A tibble: 820 × 4
## # Groups:   standards [2]
##    standards status_id           value sentiment
##    <chr>     <chr>               <dbl> <chr>    
##  1 ccss      1362894990813188096     2 positive 
##  2 ccss      1362899370199445508     4 positive 
##  3 ccss      1362906588021989376    -2 negative 
##  4 ccss      1362910494487535618    -9 negative 
##  5 ccss      1362910913855160320    -1 negative 
##  6 ccss      1362928225379250179     2 positive 
##  7 ccss      1362933982074073090    -1 negative 
##  8 ccss      1362947497258151945    -3 negative 
##  9 ccss      1362949805694013446     3 positive 
## 10 ccss      1362970614282264583     3 positive 
## # ℹ 810 more rows

afinn_ratio <- afinn_sentiment %>% 
  group_by(standards) %>% 
  count(sentiment) %>% 
  spread(sentiment, n) %>%
  mutate(ratio = negative/positive)

afinn_ratio

## # A tibble: 2 × 4
## # Groups:   standards [2]
##   standards negative positive ratio
##   <chr>        <int>    <int> <dbl>
## 1 ccss           421      211 2.00 
## 2 ngss            21      167 0.126

afinn_counts <- afinn_sentiment %>%
  group_by(standards) %>% 
  count(sentiment) %>%
  filter(standards == "ngss")

afinn_counts %>%
  ggplot(aes(x="", y=n, fill=sentiment)) +
  geom_bar(width = .6, stat = "identity") +
  labs(title = "Next Gen Science Standards",
       subtitle = "Proportion of Positive & Negative Tweets") +
  coord_polar(theta = "y") +
  theme_void()

summary_afinn2 <- sentiment_afinn %>% 
  group_by(standards) %>% 
  filter(value != 0) %>%
  mutate(sentiment = if_else(value < 0, "negative", "positive")) %>% 
  count(sentiment, sort = TRUE) %>% 
  mutate(method = "AFINN")

summary_bing2 <- sentiment_bing %>% 
  group_by(standards) %>% 
  count(sentiment, sort = TRUE) %>% 
  mutate(method = "bing")

summary_nrc2 <- sentiment_nrc %>% 
  filter(sentiment %in% c("positive", "negative")) %>%
  group_by(standards) %>% 
  count(sentiment, sort = TRUE) %>% 
  mutate(method = "nrc") 

summary_loughran2 <- sentiment_loughran %>% 
  filter(sentiment %in% c("positive", "negative")) %>%
  group_by(standards) %>% 
  count(sentiment, sort = TRUE) %>% 
  mutate(method = "loughran") 

summary_sentiment <- bind_rows(summary_afinn2,
                               summary_bing2,
                               summary_nrc2,
                               summary_loughran2) %>%
  arrange(method, standards) %>%
  relocate(method)

summary_sentiment

## # A tibble: 16 × 4
## # Groups:   standards [2]
##    method   standards sentiment     n
##    <chr>    <chr>     <chr>     <int>
##  1 AFINN    ccss      negative    740
##  2 AFINN    ccss      positive    477
##  3 AFINN    ngss      positive    278
##  4 AFINN    ngss      negative     45
##  5 bing     ccss      negative    926
##  6 bing     ccss      positive    446
##  7 bing     ngss      positive    230
##  8 bing     ngss      negative     66
##  9 loughran ccss      negative    926
## 10 loughran ccss      positive    446
## 11 loughran ngss      positive    230
## 12 loughran ngss      negative     66
## 13 nrc      ccss      positive   2294
## 14 nrc      ccss      negative    766
## 15 nrc      ngss      positive    571
## 16 nrc      ngss      negative     79

total_counts <- summary_sentiment %>%
  group_by(method, standards) %>%
  summarise(total = sum(n))

## `summarise()` has grouped output by 'method'. You can override using the
## `.groups` argument.

sentiment_counts <- left_join(summary_sentiment, total_counts)

## Joining with `by = join_by(method, standards)`

sentiment_counts

## # A tibble: 16 × 5
## # Groups:   standards [2]
##    method   standards sentiment     n total
##    <chr>    <chr>     <chr>     <int> <int>
##  1 AFINN    ccss      negative    740  1217
##  2 AFINN    ccss      positive    477  1217
##  3 AFINN    ngss      positive    278   323
##  4 AFINN    ngss      negative     45   323
##  5 bing     ccss      negative    926  1372
##  6 bing     ccss      positive    446  1372
##  7 bing     ngss      positive    230   296
##  8 bing     ngss      negative     66   296
##  9 loughran ccss      negative    926  1372
## 10 loughran ccss      positive    446  1372
## 11 loughran ngss      positive    230   296
## 12 loughran ngss      negative     66   296
## 13 nrc      ccss      positive   2294  3060
## 14 nrc      ccss      negative    766  3060
## 15 nrc      ngss      positive    571   650
## 16 nrc      ngss      negative     79   650

sentiment_percents <- sentiment_counts %>%
  mutate(percent = n/total * 100)

sentiment_percents

## # A tibble: 16 × 6
## # Groups:   standards [2]
##    method   standards sentiment     n total percent
##    <chr>    <chr>     <chr>     <int> <int>   <dbl>
##  1 AFINN    ccss      negative    740  1217    60.8
##  2 AFINN    ccss      positive    477  1217    39.2
##  3 AFINN    ngss      positive    278   323    86.1
##  4 AFINN    ngss      negative     45   323    13.9
##  5 bing     ccss      negative    926  1372    67.5
##  6 bing     ccss      positive    446  1372    32.5
##  7 bing     ngss      positive    230   296    77.7
##  8 bing     ngss      negative     66   296    22.3
##  9 loughran ccss      negative    926  1372    67.5
## 10 loughran ccss      positive    446  1372    32.5
## 11 loughran ngss      positive    230   296    77.7
## 12 loughran ngss      negative     66   296    22.3
## 13 nrc      ccss      positive   2294  3060    75.0
## 14 nrc      ccss      negative    766  3060    25.0
## 15 nrc      ngss      positive    571   650    87.8
## 16 nrc      ngss      negative     79   650    12.2

sentiment_percents %>%
  ggplot(aes(x = standards, y = percent, fill=sentiment)) +
  geom_bar(width = .8, stat = "identity", position = "dodge") +
  facet_wrap(~method, ncol = 2) +
  labs(title = "Public Sentiment on Twitter", 
       subtitle = "The Common Core & Next Gen Science Standards",
       x = "State Standards", 
       y = "Percentage of Words")

Knit & Submit

Congratulations, you’ve completed your Intro to text mining Badge! Complete the following steps in the orientation to submit your work for review.

Sentiment Analysis Badge

LASER Institute TM Learning Lab 2

Suzhen Duan

July 20, 2023

Part I: Reflect and Plan

Part II: Data Product

Knit & Submit