The final activity for each learning lab provides space to work with data and to reflect on how the concepts and techniques introduced in each lab might apply to your own research.

To earn a badge for each lab, you are required to respond to a set of prompts for two parts: 

Part I: Reflect and Plan

Use the institutional library (e.g. NCSU Library), Google Scholar or search engine to locate a research article, presentation, or resource that applies text mining to an educational context or topic of interest. More specifically, locate a text mining study that visualize text data.

  1. Provide an APA citation for your selected study.

    • Liu, M., Jiang, X., Zhang, B., Song, T., Yu, G., Liu, G., ... & Zhou, Z. (2023). How do topics and emotions develop in elementary school children? A text mining perspective based on free-writing text over 6 years. Frontiers in Psychology14, 1109126.
  2. How does the sentiment analysis address research questions?

    • Across the topics identified, they reported the percentages of positive/negative/neutral responses.

Draft a research question for a population you may be interested in studying, or that would be of interest to educational researchers, and that would require the collection of text data and answer the following questions:

  1. What text data would need to be collected?

    • I want to collect open-ended written responses from my students during a peer assessment activitiy. The data will be collected after the activity to reflect on their participation experiences.
  2. For what reason would text data need to be collected in order to address this question?

    • I think the text-data will be super helpful to learn if they arem motivated during the activity through sentiment analysis.
  3. Explain the analytical level at which these text data would need to be collected and analyzed.

    • One student on one row with their written responses - this is my unit of analysis.

Part II: Data Product

Use your case study file to create small multiples like the following figure:

I highly recommend creating a new R script in your lab-2 folder to complete this task. When your code is ready to share, use the code chunk below to share the final code for your model and answer the questions that follow.

# YOUR FINAL CODE HERE

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(readr)
library(tidyr)
library(rtweet)
library(writexl)
library(readxl)
library(tidytext)
library(textdata)
library(ggplot2)
library(textdata)
library(scales)
## 
## Attaching package: 'scales'
## The following object is masked from 'package:readr':
## 
##     col_factor
ngss_tweets <- read_xlsx("data/ngss_tweets.xlsx")
ccss_tweets <- read_xlsx("data/csss_tweets.xlsx")

ngss_text <-
  ngss_tweets %>%
  filter(lang == "en") %>%
  select(status_id, text) %>%
  mutate(standards = "ngss") %>%
  relocate(standards)

ccss_text <-
  ccss_tweets %>%
  filter(lang == "en") %>%
  select(status_id, text) %>%
  mutate(standards = "ccss") %>%
  relocate(standards)

tweets <- bind_rows(ngss_text, ccss_text)

afinn <- get_sentiments("afinn")
sentiment_afinn <- tweets %>%
  unnest_tokens(output = word, 
                input = text)  %>% 
  anti_join(stop_words, by = "word") %>%
  filter(!word == "amp") %>%
  inner_join(afinn, by = "word")


summary_afinn2 <- sentiment_afinn %>% 
  group_by(standards) %>% 
  filter(value != 0) %>%
  mutate(sentiment = if_else(value < 0, "negative", "positive")) %>% 
  count(sentiment, sort = TRUE) %>% 
  mutate(method = "AFINN")


bing <- get_sentiments("bing")
sentiment_bing <- tweets %>%
  unnest_tokens(output = word, 
                input = text)  %>% 
  anti_join(stop_words, by = "word") %>%
  filter(!word == "amp") %>%
  inner_join(bing, by = "word")

summary_bing2 <- sentiment_bing %>% 
  group_by(standards) %>% 
  count(sentiment, sort = TRUE) %>% 
  mutate(method = "bing")

nrc <- get_sentiments("nrc")
sentiment_nrc <- tweets %>%
  unnest_tokens(output = word, 
                input = text)  %>% 
  anti_join(stop_words, by = "word") %>%
  filter(!word == "amp") %>%
  inner_join(nrc, by = "word")
## Warning in inner_join(., nrc, by = "word"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 26 of `x` matches multiple rows in `y`.
## ℹ Row 7509 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.
summary_nrc2 <- sentiment_nrc %>% 
  filter(sentiment %in% c("positive", "negative")) %>%
  group_by(standards) %>% 
  count(sentiment, sort = TRUE) %>% 
  mutate(method = "nrc") 

loughran<- get_sentiments("loughran")
sentiment_loughran <- tweets %>%
  unnest_tokens(output = word, 
                input = text)  %>% 
  anti_join(stop_words, by = "word") %>%
  filter(!word == "amp") %>%
  inner_join(loughran, by = "word")
## Warning in inner_join(., loughran, by = "word"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 2417 of `x` matches multiple rows in `y`.
## ℹ Row 2589 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.
summary_loughran2 <- sentiment_loughran %>% 
  filter(sentiment %in% c("positive", "negative")) %>%
  group_by(standards) %>% 
  count(sentiment, sort = TRUE) %>% 
  mutate(method = "loughran")



summary_sentiment <- bind_rows(summary_afinn2,
                               summary_bing2,
                               summary_nrc2,
                               summary_loughran2) %>%
  arrange(method, standards) %>%
  relocate(method)


total_counts <- summary_sentiment %>%
  group_by(method, standards) %>%
  summarise(total = sum(n))
## `summarise()` has grouped output by 'method'. You can override using the
## `.groups` argument.
sentiment_counts <- left_join(summary_sentiment, total_counts)
## Joining with `by = join_by(method, standards)`
sentiment_percents <- sentiment_counts %>%
  mutate(percent = n/total * 100)

sentiment_percents %>%
  ggplot(aes(x = standards, y = percent, fill=sentiment)) +
  geom_bar(width = .8, stat = "identity",position = "dodge") +
  facet_wrap(~method, nrow = 2) +
 # coord_flip() +
  labs(title = "Public Sentiment on Twitter", 
       subtitle = "The Common Core & Next Gen Science Standards",
       x = "State Standards", 
       y = "Percent")

Knit & Submit

Congratulations, you’ve completed your Intro to text mining Badge! Complete the following steps in the orientation to submit your work for review.