The final activity for each learning lab provides space to work with data and to reflect on how the concepts and techniques introduced in each lab might apply to your own research.
To earn a badge for each lab, you are required to respond to a set of prompts for two parts:
In Part I, you will reflect on your understanding of key concepts and begin to think about potential next steps for your own study.
In Part II, you will create a simple data product in R that demonstrates your ability to apply a data analysis technique introduced in this learning lab.
Use the institutional library (e.g. NCSU Library), Google Scholar or search engine to locate a research article, presentation, or resource that applies text mining to an educational context or topic of interest. More specifically, locate a text mining study that visualize text data.
Provide an APA citation for your selected study.
How does the sentiment analysis address research questions?
Draft a research question for a population you may be interested in studying, or that would be of interest to educational researchers, and that would require the collection of text data and answer the following questions:
What are preservice teachers’ attitudes toward technology integration?
What text data would need to be collected?
For what reason would text data need to be collected in order to address this question?
Explain the analytical level at which these text data would need to be collected and analyzed.
Use your case study file to create small multiples like the following figure:
I highly recommend creating a new R script in your lab-2 folder to complete this task. When your code is ready to share, use the code chunk below to share the final code for your model and answer the questions that follow.
# YOUR FINAL CODE HERE
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(readr)
library(tidyr)
library(rtweet)
library(writexl)
library(readxl)
library(tidytext)
library(textdata)
library(ggplot2)
library(textdata)
library(scales)
##
## Attaching package: 'scales'
## The following object is masked from 'package:readr':
##
## col_factor
ngss_tweets <- read_xlsx("data/ngss_tweets.xlsx")
ccss_tweets <- read_xlsx("data/csss_tweets.xlsx")
ngss_text <-
ngss_tweets %>%
filter(lang == "en") %>%
select(status_id, screen_name, created_at, text) %>%
mutate(standards = "ngss") %>%
relocate(standards)
ccss_text <-
ccss_tweets %>%
filter(lang == "en") %>%
select(status_id, screen_name, created_at, text) %>%
mutate(standards = "ccss") %>%
relocate(standards)
tweets <- bind_rows(ngss_text, ccss_text)
head(tweets)
## # A tibble: 6 Ă— 5
## standards status_id screen_name created_at text
## <chr> <chr> <chr> <dttm> <chr>
## 1 ngss 1365716690336645124 loyr2662 2021-02-27 17:33:27 "Switching gea…
## 2 ngss 1363217513761415171 loyr2662 2021-02-20 20:02:37 "Was just intr…
## 3 ngss 1365709122763653133 Furlow_teach 2021-02-27 17:03:23 "@IBchemmilam …
## 4 ngss 1365673294360420353 Furlow_teach 2021-02-27 14:41:01 "@IBchemmilam …
## 5 ngss 1365667393188601857 TdiShelton 2021-02-27 14:17:34 "I am so honor…
## 6 ngss 1365690477266284545 TdiShelton 2021-02-27 15:49:17 "Thank you @br…
tail(tweets)
## # A tibble: 6 Ă— 5
## standards status_id screen_name created_at text
## <chr> <chr> <chr> <dttm> <chr>
## 1 ccss 1362923643924316162 JosiePaul8807 2021-02-20 00:34:53 "@SenatorHick…
## 2 ccss 1362910913855160320 ctwittnc 2021-02-19 23:44:18 "@winningatmy…
## 3 ccss 1362906588021989376 the_rbeagle 2021-02-19 23:27:06 "@dmarush @el…
## 4 ccss 1362902622445862912 silea 2021-02-19 23:11:21 "@LizerReal I…
## 5 ccss 1362899370199445508 JodyCoyote12 2021-02-19 22:58:25 "@CarlaRK3 @N…
## 6 ccss 1362894990813188096 Ryan_Hawes 2021-02-19 22:41:01 "I just got a…
tweet_tokens <-
tweets %>%
unnest_tokens(output = word,
input = text)
tidy_tweets <-
tweet_tokens %>%
anti_join(stop_words, by = "word") %>%
filter(!word == "amp")
afinn <- get_sentiments("afinn")
afinn
## # A tibble: 2,477 Ă— 2
## word value
## <chr> <dbl>
## 1 abandon -2
## 2 abandoned -2
## 3 abandons -2
## 4 abducted -2
## 5 abduction -2
## 6 abductions -2
## 7 abhor -3
## 8 abhorred -3
## 9 abhorrent -3
## 10 abhors -3
## # ℹ 2,467 more rows
bing <- get_sentiments("bing")
bing
## # A tibble: 6,786 Ă— 2
## word sentiment
## <chr> <chr>
## 1 2-faces negative
## 2 abnormal negative
## 3 abolish negative
## 4 abominable negative
## 5 abominably negative
## 6 abominate negative
## 7 abomination negative
## 8 abort negative
## 9 aborted negative
## 10 aborts negative
## # ℹ 6,776 more rows
nrc <- get_sentiments("nrc")
nrc
## # A tibble: 13,872 Ă— 2
## word sentiment
## <chr> <chr>
## 1 abacus trust
## 2 abandon fear
## 3 abandon negative
## 4 abandon sadness
## 5 abandoned anger
## 6 abandoned fear
## 7 abandoned negative
## 8 abandoned sadness
## 9 abandonment anger
## 10 abandonment fear
## # ℹ 13,862 more rows
loughran <- get_sentiments("loughran")
loughran
## # A tibble: 4,150 Ă— 2
## word sentiment
## <chr> <chr>
## 1 abandon negative
## 2 abandoned negative
## 3 abandoning negative
## 4 abandonment negative
## 5 abandonments negative
## 6 abandons negative
## 7 abdicated negative
## 8 abdicates negative
## 9 abdicating negative
## 10 abdication negative
## # ℹ 4,140 more rows
sentiment_afinn <- inner_join(tidy_tweets, afinn, by = "word")
sentiment_afinn
## # A tibble: 1,540 Ă— 6
## standards status_id screen_name created_at word value
## <chr> <chr> <chr> <dttm> <chr> <dbl>
## 1 ngss 1365716690336645124 loyr2662 2021-02-27 17:33:27 win 4
## 2 ngss 1365709122763653133 Furlow_teach 2021-02-27 17:03:23 love 3
## 3 ngss 1365709122763653133 Furlow_teach 2021-02-27 17:03:23 sweet 2
## 4 ngss 1365709122763653133 Furlow_teach 2021-02-27 17:03:23 signifi… 1
## 5 ngss 1365667393188601857 TdiShelton 2021-02-27 14:17:34 honored 2
## 6 ngss 1365667393188601857 TdiShelton 2021-02-27 14:17:34 opportu… 2
## 7 ngss 1365667393188601857 TdiShelton 2021-02-27 14:17:34 wonderf… 4
## 8 ngss 1365667393188601857 TdiShelton 2021-02-27 14:17:34 powerful 2
## 9 ngss 1365690477266284545 TdiShelton 2021-02-27 15:49:17 loved 3
## 10 ngss 1365706140496130050 TdiShelton 2021-02-27 16:51:32 share 1
## # ℹ 1,530 more rows
sentiment_bing <- inner_join(tidy_tweets, bing, by = "word")
sentiment_bing
## # A tibble: 1,668 Ă— 6
## standards status_id screen_name created_at word sentiment
## <chr> <chr> <chr> <dttm> <chr> <chr>
## 1 ngss 1365716690336645124 loyr2662 2021-02-27 17:33:27 win positive
## 2 ngss 1365709122763653133 Furlow_tea… 2021-02-27 17:03:23 love positive
## 3 ngss 1365709122763653133 Furlow_tea… 2021-02-27 17:03:23 help… positive
## 4 ngss 1365709122763653133 Furlow_tea… 2021-02-27 17:03:23 sweet positive
## 5 ngss 1365709122763653133 Furlow_tea… 2021-02-27 17:03:23 tough positive
## 6 ngss 1365667393188601857 TdiShelton 2021-02-27 14:17:34 hono… positive
## 7 ngss 1365667393188601857 TdiShelton 2021-02-27 14:17:34 appr… positive
## 8 ngss 1365667393188601857 TdiShelton 2021-02-27 14:17:34 wond… positive
## 9 ngss 1365667393188601857 TdiShelton 2021-02-27 14:17:34 powe… positive
## 10 ngss 1365690477266284545 TdiShelton 2021-02-27 15:49:17 loved positive
## # ℹ 1,658 more rows
sentiment_nrc <- inner_join(tidy_tweets, nrc, by = "word")
## Warning in inner_join(tidy_tweets, nrc, by = "word"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 26 of `x` matches multiple rows in `y`.
## ℹ Row 7509 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
sentiment_nrc
## # A tibble: 7,841 Ă— 6
## standards status_id screen_name created_at word sentiment
## <chr> <chr> <chr> <dttm> <chr> <chr>
## 1 ngss 1363217513761415171 loyr2662 2021-02-20 20:02:37 math… trust
## 2 ngss 1365709122763653133 Furlow_tea… 2021-02-27 17:03:23 fami… positive
## 3 ngss 1365709122763653133 Furlow_tea… 2021-02-27 17:03:23 fami… trust
## 4 ngss 1365709122763653133 Furlow_tea… 2021-02-27 17:03:23 love joy
## 5 ngss 1365709122763653133 Furlow_tea… 2021-02-27 17:03:23 love positive
## 6 ngss 1365709122763653133 Furlow_tea… 2021-02-27 17:03:23 sweet anticipa…
## 7 ngss 1365709122763653133 Furlow_tea… 2021-02-27 17:03:23 sweet joy
## 8 ngss 1365709122763653133 Furlow_tea… 2021-02-27 17:03:23 sweet positive
## 9 ngss 1365709122763653133 Furlow_tea… 2021-02-27 17:03:23 sweet surprise
## 10 ngss 1365709122763653133 Furlow_tea… 2021-02-27 17:03:23 sweet trust
## # ℹ 7,831 more rows
sentiment_loughran <- inner_join(tidy_tweets, bing, by = "word")
sentiment_loughran
## # A tibble: 1,668 Ă— 6
## standards status_id screen_name created_at word sentiment
## <chr> <chr> <chr> <dttm> <chr> <chr>
## 1 ngss 1365716690336645124 loyr2662 2021-02-27 17:33:27 win positive
## 2 ngss 1365709122763653133 Furlow_tea… 2021-02-27 17:03:23 love positive
## 3 ngss 1365709122763653133 Furlow_tea… 2021-02-27 17:03:23 help… positive
## 4 ngss 1365709122763653133 Furlow_tea… 2021-02-27 17:03:23 sweet positive
## 5 ngss 1365709122763653133 Furlow_tea… 2021-02-27 17:03:23 tough positive
## 6 ngss 1365667393188601857 TdiShelton 2021-02-27 14:17:34 hono… positive
## 7 ngss 1365667393188601857 TdiShelton 2021-02-27 14:17:34 appr… positive
## 8 ngss 1365667393188601857 TdiShelton 2021-02-27 14:17:34 wond… positive
## 9 ngss 1365667393188601857 TdiShelton 2021-02-27 14:17:34 powe… positive
## 10 ngss 1365690477266284545 TdiShelton 2021-02-27 15:49:17 loved positive
## # ℹ 1,658 more rows
ts_plot(tweets, by = "days")
tweets %>%
group_by(standards) %>%
ts_plot(by = "days")
summary_bing <- sentiment_bing %>%
group_by(standards) %>%
count(sentiment)
summary_bing
## # A tibble: 4 Ă— 3
## # Groups: standards [2]
## standards sentiment n
## <chr> <chr> <int>
## 1 ccss negative 926
## 2 ccss positive 446
## 3 ngss negative 66
## 4 ngss positive 230
summary_bing <- sentiment_bing %>%
group_by(standards) %>%
count(sentiment, sort = TRUE) %>%
spread(sentiment, n)
summary_bing
## # A tibble: 2 Ă— 3
## # Groups: standards [2]
## standards negative positive
## <chr> <int> <int>
## 1 ccss 926 446
## 2 ngss 66 230
summary_bing <- sentiment_bing %>%
group_by(standards) %>%
count(sentiment, sort = TRUE) %>%
spread(sentiment, n) %>%
mutate(sentiment = positive - negative) %>%
mutate(lexicon = "bing") %>%
relocate(lexicon)
summary_bing
## # A tibble: 2 Ă— 5
## # Groups: standards [2]
## lexicon standards negative positive sentiment
## <chr> <chr> <int> <int> <int>
## 1 bing ccss 926 446 -480
## 2 bing ngss 66 230 164
head(sentiment_afinn)
## # A tibble: 6 Ă— 6
## standards status_id screen_name created_at word value
## <chr> <chr> <chr> <dttm> <chr> <dbl>
## 1 ngss 1365716690336645124 loyr2662 2021-02-27 17:33:27 win 4
## 2 ngss 1365709122763653133 Furlow_teach 2021-02-27 17:03:23 love 3
## 3 ngss 1365709122763653133 Furlow_teach 2021-02-27 17:03:23 sweet 2
## 4 ngss 1365709122763653133 Furlow_teach 2021-02-27 17:03:23 signific… 1
## 5 ngss 1365667393188601857 TdiShelton 2021-02-27 14:17:34 honored 2
## 6 ngss 1365667393188601857 TdiShelton 2021-02-27 14:17:34 opportun… 2
summary_afinn <- sentiment_afinn %>%
group_by(standards) %>%
summarise(sentiment = sum(value)) %>%
mutate(lexicon = "AFINN") %>%
relocate(lexicon)
summary_afinn
## # A tibble: 2 Ă— 3
## lexicon standards sentiment
## <chr> <chr> <dbl>
## 1 AFINN ccss -808
## 2 AFINN ngss 503
afinn_score <- sentiment_afinn %>%
group_by(standards, status_id) %>%
summarise(value = sum(value))
## `summarise()` has grouped output by 'standards'. You can override using the
## `.groups` argument.
afinn_score
## # A tibble: 857 Ă— 3
## # Groups: standards [2]
## standards status_id value
## <chr> <chr> <dbl>
## 1 ccss 1362894990813188096 2
## 2 ccss 1362899370199445508 4
## 3 ccss 1362906588021989376 -2
## 4 ccss 1362910494487535618 -9
## 5 ccss 1362910913855160320 -1
## 6 ccss 1362928225379250179 2
## 7 ccss 1362933982074073090 -1
## 8 ccss 1362947497258151945 -3
## 9 ccss 1362949805694013446 3
## 10 ccss 1362970614282264583 3
## # ℹ 847 more rows
afinn_sentiment <- afinn_score %>%
filter(value != 0) %>%
mutate(sentiment = if_else(value < 0, "negative", "positive"))
afinn_sentiment
## # A tibble: 820 Ă— 4
## # Groups: standards [2]
## standards status_id value sentiment
## <chr> <chr> <dbl> <chr>
## 1 ccss 1362894990813188096 2 positive
## 2 ccss 1362899370199445508 4 positive
## 3 ccss 1362906588021989376 -2 negative
## 4 ccss 1362910494487535618 -9 negative
## 5 ccss 1362910913855160320 -1 negative
## 6 ccss 1362928225379250179 2 positive
## 7 ccss 1362933982074073090 -1 negative
## 8 ccss 1362947497258151945 -3 negative
## 9 ccss 1362949805694013446 3 positive
## 10 ccss 1362970614282264583 3 positive
## # ℹ 810 more rows
afinn_ratio <- afinn_sentiment %>%
group_by(standards) %>%
count(sentiment) %>%
spread(sentiment, n) %>%
mutate(ratio = negative/positive)
afinn_ratio
## # A tibble: 2 Ă— 4
## # Groups: standards [2]
## standards negative positive ratio
## <chr> <int> <int> <dbl>
## 1 ccss 421 211 2.00
## 2 ngss 21 167 0.126
afinn_counts <- afinn_sentiment %>%
group_by(standards) %>%
count(sentiment) %>%
filter(standards == "ngss")
afinn_counts %>%
ggplot(aes(x="", y=n, fill=sentiment)) +
geom_bar(width = .6, stat = "identity") +
labs(title = "Next Gen Science Standards",
subtitle = "Proportion of Positive & Negative Tweets") +
coord_polar(theta = "y") +
theme_void()
summary_afinn2 <- sentiment_afinn %>%
group_by(standards) %>%
filter(value != 0) %>%
mutate(sentiment = if_else(value < 0, "negative", "positive")) %>%
count(sentiment, sort = TRUE) %>%
mutate(method = "AFINN")
summary_bing2 <- sentiment_bing %>%
group_by(standards) %>%
count(sentiment, sort = TRUE) %>%
mutate(method = "bing")
summary_nrc2 <- sentiment_nrc %>%
filter(sentiment %in% c("positive", "negative")) %>%
group_by(standards) %>%
count(sentiment, sort = TRUE) %>%
mutate(method = "nrc")
summary_loughran2 <- sentiment_loughran %>%
filter(sentiment %in% c("positive", "negative")) %>%
group_by(standards) %>%
count(sentiment, sort = TRUE) %>%
mutate(method = "loughran")
summary_sentiment <- bind_rows(summary_afinn2,
summary_bing2,
summary_nrc2,
summary_loughran2) %>%
arrange(method, standards) %>%
relocate(method)
summary_sentiment
## # A tibble: 16 Ă— 4
## # Groups: standards [2]
## method standards sentiment n
## <chr> <chr> <chr> <int>
## 1 AFINN ccss negative 740
## 2 AFINN ccss positive 477
## 3 AFINN ngss positive 278
## 4 AFINN ngss negative 45
## 5 bing ccss negative 926
## 6 bing ccss positive 446
## 7 bing ngss positive 230
## 8 bing ngss negative 66
## 9 loughran ccss negative 926
## 10 loughran ccss positive 446
## 11 loughran ngss positive 230
## 12 loughran ngss negative 66
## 13 nrc ccss positive 2294
## 14 nrc ccss negative 766
## 15 nrc ngss positive 571
## 16 nrc ngss negative 79
total_counts <- summary_sentiment %>%
group_by(method, standards) %>%
summarise(total = sum(n))
## `summarise()` has grouped output by 'method'. You can override using the
## `.groups` argument.
sentiment_counts <- left_join(summary_sentiment, total_counts)
## Joining with `by = join_by(method, standards)`
sentiment_counts
## # A tibble: 16 Ă— 5
## # Groups: standards [2]
## method standards sentiment n total
## <chr> <chr> <chr> <int> <int>
## 1 AFINN ccss negative 740 1217
## 2 AFINN ccss positive 477 1217
## 3 AFINN ngss positive 278 323
## 4 AFINN ngss negative 45 323
## 5 bing ccss negative 926 1372
## 6 bing ccss positive 446 1372
## 7 bing ngss positive 230 296
## 8 bing ngss negative 66 296
## 9 loughran ccss negative 926 1372
## 10 loughran ccss positive 446 1372
## 11 loughran ngss positive 230 296
## 12 loughran ngss negative 66 296
## 13 nrc ccss positive 2294 3060
## 14 nrc ccss negative 766 3060
## 15 nrc ngss positive 571 650
## 16 nrc ngss negative 79 650
sentiment_percents <- sentiment_counts %>%
mutate(percent = n/total * 100)
sentiment_percents
## # A tibble: 16 Ă— 6
## # Groups: standards [2]
## method standards sentiment n total percent
## <chr> <chr> <chr> <int> <int> <dbl>
## 1 AFINN ccss negative 740 1217 60.8
## 2 AFINN ccss positive 477 1217 39.2
## 3 AFINN ngss positive 278 323 86.1
## 4 AFINN ngss negative 45 323 13.9
## 5 bing ccss negative 926 1372 67.5
## 6 bing ccss positive 446 1372 32.5
## 7 bing ngss positive 230 296 77.7
## 8 bing ngss negative 66 296 22.3
## 9 loughran ccss negative 926 1372 67.5
## 10 loughran ccss positive 446 1372 32.5
## 11 loughran ngss positive 230 296 77.7
## 12 loughran ngss negative 66 296 22.3
## 13 nrc ccss positive 2294 3060 75.0
## 14 nrc ccss negative 766 3060 25.0
## 15 nrc ngss positive 571 650 87.8
## 16 nrc ngss negative 79 650 12.2
sentiment_percents %>%
ggplot(aes(x = standards, y = percent, fill=sentiment)) +
geom_bar(width = .8, stat = "identity", position = "dodge") +
facet_wrap(~method, ncol = 2) +
labs(title = "Public Sentiment on Twitter",
subtitle = "The Common Core & Next Gen Science Standards",
x = "State Standards",
y = "Percentage of Words")
Congratulations, you’ve completed your Intro to text mining Badge! Complete the following steps in the orientation to submit your work for review.