The final activity for each learning lab provides space to work with data and to reflect on how the concepts and techniques introduced in each lab might apply to your own research.

To earn a badge for each lab, you are required to respond to a set of prompts for two parts: 

Part I: Reflect and Plan

Use the institutional library (e.g. NCSU Library), Google Scholar or search engine to locate a research article, presentation, or resource that applies text mining to an educational context or topic of interest. More specifically, locate a text mining study that visualize text data.

  1. Provide an APA citation for your selected study.
  1. How does the visualization address research questions?

Draft a research question for a population you may be interested in studying, or that would be of interest to educational researchers, and that would require the collection of text data and answer the following questions:

  1. What text data would need to be collected?
  1. For what reason would text data need to be collected in order to address this question?
  1. Explain the analytical level at which these text data would need to be collected and analyzed.

Part II: Data Product

Use your case study file to create a new word cloud that does not include words that would give you important information about teachers’ experiences with professional development. (For example, we did not include “University” in the word cloud describing where scholar came from as it occurs everywhere).

I highly recommend creating a new R script in your lab-1 folder to complete this task. When your code is ready to share, use the code chunk below to share the final code for your model and answer the questions that follow.

# YOUR FINAL CODE HERE
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidytext)
opd_survey <- read_csv("data/opd_survey.csv")
## New names:
## Rows: 57054 Columns: 19
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (19): RecordedDate, ResponseId, Role, Q14, Q16...5, Resource...6, Resour...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `Q16` -> `Q16...5`
## • `Resource` -> `Resource...6`
## • `Resource_10_TEXT` -> `Resource_10_TEXT...9`
## • `Resource` -> `Resource...10`
## • `Resource_10_TEXT` -> `Resource_10_TEXT...11`
## • `Q16` -> `Q16...12`
# enter the name of your data frame and view directly in console 
#opd_survey 

# view your data frame transposed so your can see every column and the first few entries
#glimpse(opd_survey) 

# look at just the first six entries
#head(opd_survey) 

# or the last six entries
#tail(opd_survey) 

# view the names of your variables or columns
#names(opd_survey) 

# or view in source pane
view(opd_survey) 
colnames(opd_survey)[6] ="Resource"
opd_teacher <- opd_survey %>%
  select(Role, Resource, Q21) %>%
  rename(text = Q21) %>%
  slice(-1, -2) %>%
  na.omit() %>%
  filter(Role == "Teacher")

head(opd_teacher)
## # A tibble: 6 × 3
##   Role    Resource                                                         text 
##   <chr>   <chr>                                                            <chr>
## 1 Teacher Live Webinar                                                     "lev…
## 2 Teacher Online Learning Module (e.g. Call for Change, Understanding the… "Non…
## 3 Teacher Online Learning Module (e.g. Call for Change, Understanding the… "In …
## 4 Teacher Online Learning Module (e.g. Call for Change, Understanding the… "Und…
## 5 Teacher Online Learning Module (e.g. Call for Change, Understanding the… "ove…
## 6 Teacher Online Learning Module (e.g. Call for Change, Understanding the… "onl…
opd_tidy <- opd_survey %>%
  select(Role, Resource, Q21) %>%
  rename(text = Q21) %>%
  slice(-1, -2) %>%
  na.omit() %>%
  filter(Role == "Teacher") %>%
  unnest_tokens(word, text)

head(opd_tidy)
## # A tibble: 6 × 3
##   Role    Resource                                                         word 
##   <chr>   <chr>                                                            <chr>
## 1 Teacher Live Webinar                                                     leve…
## 2 Teacher Live Webinar                                                     ofqu…
## 3 Teacher Live Webinar                                                     and  
## 4 Teacher Live Webinar                                                     revi…
## 5 Teacher Live Webinar                                                     bloo…
## 6 Teacher Online Learning Module (e.g. Call for Change, Understanding the… none
view(opd_tidy)
opd_clean <- anti_join(opd_tidy, stop_words)
## Joining with `by = join_by(word)`
head(opd_clean)
## # A tibble: 6 × 3
##   Role    Resource                                                         word 
##   <chr>   <chr>                                                            <chr>
## 1 Teacher Live Webinar                                                     leve…
## 2 Teacher Live Webinar                                                     ofqu…
## 3 Teacher Live Webinar                                                     revi…
## 4 Teacher Live Webinar                                                     bloo…
## 5 Teacher Online Learning Module (e.g. Call for Change, Understanding the… modu…
## 6 Teacher Online Learning Module (e.g. Call for Change, Understanding the… teac…
opd_counts <- opd_clean %>% 
  count(word, sort = TRUE)

opd_counts
## # A tibble: 5,352 × 2
##    word              n
##    <chr>         <int>
##  1 information    1885
##  2 learning       1520
##  3 videos         1385
##  4 resources      1286
##  5 online         1139
##  6 examples       1105
##  7 understanding  1092
##  8 time           1082
##  9 students       1013
## 10 data            971
## # ℹ 5,342 more rows
opd_resource_counts <- opd_clean %>%
  count(Resource, word, sort = TRUE)

view(opd_resource_counts)
opd_frequencies <- opd_clean %>%
  count(Resource, word, sort = TRUE) %>%
  group_by(Resource) %>%
  mutate(proportion = n / sum(n))

opd_frequencies
## # A tibble: 7,210 × 4
## # Groups:   Resource [10]
##    Resource                                               word      n proportion
##    <chr>                                                  <chr> <int>      <dbl>
##  1 Online Learning Module (e.g. Call for Change, Underst… info…  1782     0.0238
##  2 Online Learning Module (e.g. Call for Change, Underst… lear…  1445     0.0193
##  3 Online Learning Module (e.g. Call for Change, Underst… vide…  1336     0.0179
##  4 Online Learning Module (e.g. Call for Change, Underst… reso…  1209     0.0162
##  5 Online Learning Module (e.g. Call for Change, Underst… onli…  1082     0.0145
##  6 Online Learning Module (e.g. Call for Change, Underst… unde…  1053     0.0141
##  7 Online Learning Module (e.g. Call for Change, Underst… time   1036     0.0139
##  8 Online Learning Module (e.g. Call for Change, Underst… exam…  1025     0.0137
##  9 Online Learning Module (e.g. Call for Change, Underst… stud…   951     0.0127
## 10 Online Learning Module (e.g. Call for Change, Underst… data    915     0.0122
## # ℹ 7,200 more rows
opd_words <- opd_teacher %>%
  unnest_tokens(word, text) %>%
  count(Resource, word, sort = TRUE)

head(opd_words)
## # A tibble: 6 × 3
##   Resource                                                           word      n
##   <chr>                                                              <chr> <int>
## 1 Online Learning Module (e.g. Call for Change, Understanding the S… the   13058
## 2 Online Learning Module (e.g. Call for Change, Understanding the S… to     7933
## 3 Online Learning Module (e.g. Call for Change, Understanding the S… of     6132
## 4 Online Learning Module (e.g. Call for Change, Understanding the S… and    5560
## 5 Online Learning Module (e.g. Call for Change, Understanding the S… i      3861
## 6 Online Learning Module (e.g. Call for Change, Understanding the S… it     3087
total_words <- opd_words %>%
  group_by(Resource) %>%
  summarise(total = sum(n))

total_words
## # A tibble: 10 × 2
##    Resource                                                                total
##    <chr>                                                                   <int>
##  1 Calendar                                                                  137
##  2 Document, please specify (i.e. Facilitator's Guide, Crosswalks, Sampl…    500
##  3 Live Webinar                                                              316
##  4 Online Learning Module (e.g. Call for Change, Understanding the Stand… 181197
##  5 Other, please specify                                                    3363
##  6 Promotional Video                                                         149
##  7 Recorded Webinar or Presentation (e.g. Strategic Staffing, Standards …   1083
##  8 Summer Institute/RESA PowerPoint Presentations                            883
##  9 Website, please specify                                                  1860
## 10 Wiki                                                                     1039
opd_totals <- left_join(opd_words, total_words)
## Joining with `by = join_by(Resource)`
opd_totals
## # A tibble: 8,833 × 4
##    Resource                                                   word      n  total
##    <chr>                                                      <chr> <int>  <int>
##  1 Online Learning Module (e.g. Call for Change, Understandi… the   13058 181197
##  2 Online Learning Module (e.g. Call for Change, Understandi… to     7933 181197
##  3 Online Learning Module (e.g. Call for Change, Understandi… of     6132 181197
##  4 Online Learning Module (e.g. Call for Change, Understandi… and    5560 181197
##  5 Online Learning Module (e.g. Call for Change, Understandi… i      3861 181197
##  6 Online Learning Module (e.g. Call for Change, Understandi… it     3087 181197
##  7 Online Learning Module (e.g. Call for Change, Understandi… my     2649 181197
##  8 Online Learning Module (e.g. Call for Change, Understandi… was    2520 181197
##  9 Online Learning Module (e.g. Call for Change, Understandi… a      2473 181197
## 10 Online Learning Module (e.g. Call for Change, Understandi… in     2378 181197
## # ℹ 8,823 more rows
opd_tf_idf <- opd_totals %>%
  bind_tf_idf(word, Resource, n)

opd_tf_idf
## # A tibble: 8,833 × 7
##    Resource                              word      n  total     tf   idf  tf_idf
##    <chr>                                 <chr> <int>  <int>  <dbl> <dbl>   <dbl>
##  1 Online Learning Module (e.g. Call fo… the   13058 181197 0.0721 0     0      
##  2 Online Learning Module (e.g. Call fo… to     7933 181197 0.0438 0     0      
##  3 Online Learning Module (e.g. Call fo… of     6132 181197 0.0338 0     0      
##  4 Online Learning Module (e.g. Call fo… and    5560 181197 0.0307 0.105 0.00323
##  5 Online Learning Module (e.g. Call fo… i      3861 181197 0.0213 0     0      
##  6 Online Learning Module (e.g. Call fo… it     3087 181197 0.0170 0     0      
##  7 Online Learning Module (e.g. Call fo… my     2649 181197 0.0146 0     0      
##  8 Online Learning Module (e.g. Call fo… was    2520 181197 0.0139 0     0      
##  9 Online Learning Module (e.g. Call fo… a      2473 181197 0.0136 0     0      
## 10 Online Learning Module (e.g. Call fo… in     2378 181197 0.0131 0.105 0.00138
## # ℹ 8,823 more rows
view(opd_tf_idf)
opd_quotes <- opd_teacher %>%
  select(text) %>% 
  filter(grepl('online', text))

view(opd_quotes)

sample_n(opd_quotes, 20)
## # A tibble: 20 × 1
##    text                                                                         
##    <chr>                                                                        
##  1 online learning module                                                       
##  2 It was online.                                                               
##  3 Learning about the online journal.                                           
##  4 The online journal via Penzu.                                                
##  5 lots of online resources where we can find more information related to the t…
##  6 online resources                                                             
##  7 Did not complete online, rather it was in a workshop. It was most beneficial…
##  8 This online resource gave clarity as to the purpose of the MSL's.  The video…
##  9 The easy access to pertinent information regarding administrative responsibi…
## 10 online                                                                       
## 11 Taking it online                                                             
## 12 online                                                                       
## 13 Its online.                                                                  
## 14 The most valuable aspect of this online resource was the understanding of re…
## 15 The section that discussed evaluation of online resources.                   
## 16 online                                                                       
## 17 it was online                                                                
## 18 The most beneficial aspect of this online resource was the handout giving sp…
## 19 All of the online resources, the teacher videos, and lessons.                
## 20 That it could be completed online and not take-up important planning or intr…
opd_quotes <- opd_teacher %>%
  select(text) %>% 
  filter(grepl('inform*', text))

view(opd_quotes)

sample_n(opd_quotes, 20)
## # A tibble: 20 × 1
##    text                                                                         
##    <chr>                                                                        
##  1 I found the videos both informative and intriguing.                          
##  2 Information on the video was useful and informative. Learning about TED.com  
##  3 The information that was avaliable.                                          
##  4 Learned great new information about the new revised Bloom's Taxonomy.        
##  5 New information not yet seen in previous modules                             
##  6 It had a link to the DPI information about the MSLs                          
##  7 I like the ease that one can access the information online at any given time.
##  8 I like the examples, especially the example with the articoke dip.  I believ…
##  9 NC teaching standard information                                             
## 10 Information was very informative.                                            
## 11 The links to resources and other information to learn more about the content.
## 12 The information regarding differences between digital and traditional litera…
## 13 The way the information was presented and the somewhat useful content.       
## 14 The graphic organizers that the information was presented in                 
## 15 The complete module was great and informative.                               
## 16 great information.                                                           
## 17 It had a lot of valuable information about assessments and  testing.  I love…
## 18 Attempt was made to present valuable information.                            
## 19 Great information, I thought the videos were helpful.                        
## 20 The information of what is changing and how that applies to my teaching.
library(wordcloud2)

#wordcloud2(opd_counts)
wordcloud2(opd_counts,
           color = ifelse(opd_counts[, 2] > 1000, 'black', 'gray'))
opd_counts %>%
  # keep rows with word counts greater than 500
  filter(n > 500) %>%
  #reorder the word variable by n and replace with new variable called word
  mutate(word = reorder(word, n)) %>%
  # create a plot with n on x axis and word on y axis
  ggplot(aes(n, word)) +
  # make it a bar plot
  geom_col() + 
  labs(x = "Word Counts", y = NULL, title = "20 Most Frequently Used Words to Describe the Value of Online Resources") + 
  theme_minimal()

library(forcats)

opd_frequencies %>%
  filter(Resource != "Calendar") %>% # remove Calendar responses, too few. 
  group_by(Resource) %>%
  slice_max(proportion, n = 5) %>%
  ungroup() %>%
  ggplot(aes(proportion, fct_reorder(word, proportion), fill = Resource)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~Resource, ncol = 3, scales = "free")

opd_resource_counts <- opd_clean %>%
  count(Resource, word)

total_words <- opd_resource_counts %>%
  group_by(Resource) %>%
  summarize(total = sum(n))

opd_words <- left_join(opd_resource_counts, total_words)
## Joining with `by = join_by(Resource)`
opd_tf_idf <- opd_words %>%
  bind_tf_idf(word, Resource, n)
opd_tf_idf %>%
  filter(Resource != "Calendar") %>%
  group_by(Resource) %>%
  slice_max(tf_idf, n = 5) %>%
  ungroup() %>%
  mutate(Resource=as.factor(Resource),
         word=reorder_within(word, tf_idf, Resource)) %>%
  ggplot(aes(word, tf_idf, fill = Resource)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~Resource, ncol = 3, scales = "free") +
  coord_flip() +
  scale_x_reordered() +
  labs(title = "Words Unique to Each Online Learning Resurcecs", x = "tf-idf value", y = NULL)

view(opd_counts)
opd_counts_mid <- opd_counts %>%
  filter(n < 1000)

view(opd_counts_mid)
wordcloud2(opd_counts_mid)
#wordcloud2(opd_counts_mid,
#           color = ifelse(opd_counts_mid[, 2] > 500, 'black', 'gray'))

Knit & Submit

Congratulations, you’ve completed your Intro to text mining Badge! Complete the following steps in the orientation to submit your work for review.