The final activity for each learning lab provides space to work with data and to reflect on how the concepts and techniques introduced in each lab might apply to your own research.

To earn a badge for each lab, you are required to respond to a set of prompts for two parts: 

Part I: Reflect and Plan

Use the institutional library (e.g. NCSU Library), Google Scholar or search engine to locate a research article, presentation, or resource that applies text mining to an educational context or topic of interest. More specifically, locate a text mining study that visualize text data.

  1. Provide an APA citation for your selected study.

    -Mohammadi, E., & Karami, A. (2022). Exploring research trends in big data across disciplines: A text mining analysis. Journal of Information Science, 48(1), 44–56. https://doi.org/10.1177/0165551520932855

  2. How does the visualization address research questions?

    • Data Visualisation entails methods to visually depict relationships within large volumes ofdata. While this topic is one ofthe most frequently applied in our document corpus, its popularity was mostly in 2012 and it has experienced diminishing

Draft a research question for a population you may be interested in studying, or that would be of interest to educational researchers, and that would require the collection of text data and answer the following questions:

  1. What text data would need to be collected?

    • The data were limited to papers published between 2012 and 2017. Due to the low number of publications before 2012 (86 papers), we chose 2012 as the starting point.
  2. For what reason would text data need to be collected in order to address this question?

    • The purpose of this research project is to explore the scope and structure of big data across disciplines.
  3. Explain the analytical level at which these text data would need to be collected and analyzed.

    • The authors applied topic modeling and word co-occurrence analysis methods to identify key topics from more than 36,000 big data publications across all academic disciplines between 2012 and 2017.

Part II: Data Product

Use your case study file to create a new word cloud that does not include words that would give you important information about teachers’ experiences with professional development. (For example, we did not include “University” in the word cloud describing where scholar came from as it occurs everywhere).

I highly recommend creating a new R script in your lab-1 folder to complete this task. When your code is ready to share, use the code chunk below to share the final code for your model and answer the questions that follow.

# YOUR FINAL CODE HERE
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.7     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(tidytext)

opd_survey <- read_csv ("data/opd_survey.csv")
## New names:
## • `Q16` -> `Q16...5`
## • `Resource` -> `Resource...6`
## • `Resource_10_TEXT` -> `Resource_10_TEXT...9`
## • `Resource` -> `Resource...10`
## • `Resource_10_TEXT` -> `Resource_10_TEXT...11`
## • `Q16` -> `Q16...12`
## Rows: 57054 Columns: 19
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (19): RecordedDate, ResponseId, Role, Q14, Q16...5, Resource...6, Resour...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
opd_teacher <- opd_survey %>%
  select(Role, Resource...6, Q21) %>%
  rename(text = Q21) %>%
  slice(-1, -2) %>%
  na.omit() %>%
  filter(Role == "Teacher")

opd_tidy <- unnest_tokens(opd_teacher, word, text)

head(opd_tidy)
## # A tibble: 6 × 3
##   Role    Resource...6                                                     word 
##   <chr>   <chr>                                                            <chr>
## 1 Teacher Live Webinar                                                     leve…
## 2 Teacher Live Webinar                                                     ofqu…
## 3 Teacher Live Webinar                                                     and  
## 4 Teacher Live Webinar                                                     revi…
## 5 Teacher Live Webinar                                                     bloo…
## 6 Teacher Online Learning Module (e.g. Call for Change, Understanding the… none
head(opd_teacher)
## # A tibble: 6 × 3
##   Role    Resource...6                                                     text 
##   <chr>   <chr>                                                            <chr>
## 1 Teacher Live Webinar                                                     "lev…
## 2 Teacher Online Learning Module (e.g. Call for Change, Understanding the… "Non…
## 3 Teacher Online Learning Module (e.g. Call for Change, Understanding the… "In …
## 4 Teacher Online Learning Module (e.g. Call for Change, Understanding the… "Und…
## 5 Teacher Online Learning Module (e.g. Call for Change, Understanding the… "ove…
## 6 Teacher Online Learning Module (e.g. Call for Change, Understanding the… "onl…
head (stop_words)
## # A tibble: 6 × 2
##   word      lexicon
##   <chr>     <chr>  
## 1 a         SMART  
## 2 a's       SMART  
## 3 able      SMART  
## 4 about     SMART  
## 5 above     SMART  
## 6 according SMART
view (stop_words)
opd_clean <- anti_join(opd_tidy, stop_words)
## Joining, by = "word"
head(opd_clean)
## # A tibble: 6 × 3
##   Role    Resource...6                                                     word 
##   <chr>   <chr>                                                            <chr>
## 1 Teacher Live Webinar                                                     leve…
## 2 Teacher Live Webinar                                                     ofqu…
## 3 Teacher Live Webinar                                                     revi…
## 4 Teacher Live Webinar                                                     bloo…
## 5 Teacher Online Learning Module (e.g. Call for Change, Understanding the… modu…
## 6 Teacher Online Learning Module (e.g. Call for Change, Understanding the… teac…
opd_counts <- count(opd_clean, word, sort = TRUE)
opd_counts <- opd_clean %>% 
  count(word, sort = TRUE)

opd_counts
## # A tibble: 5,352 × 2
##    word              n
##    <chr>         <int>
##  1 information    1885
##  2 learning       1520
##  3 videos         1385
##  4 resources      1286
##  5 online         1139
##  6 examples       1105
##  7 understanding  1092
##  8 time           1082
##  9 students       1013
## 10 data            971
## # … with 5,342 more rows
exludewords <- data.frame("word"= c("teachers", "children"))
opd_clean.2 <- anti_join(opd_clean,exludewords) #remove new words
## Joining, by = "word"
opd_counts.2 <-opd_clean.2 %>%
    count(word,sort=TRUE)
head(opd_counts.2)
## # A tibble: 6 × 2
##   word            n
##   <chr>       <int>
## 1 information  1885
## 2 learning     1520
## 3 videos       1385
## 4 resources    1286
## 5 online       1139
## 6 examples     1105
library(wordcloud2)
wordcloud2(opd_counts.2)

Knit & Submit

Congratulations, you’ve completed your Intro to text mining Badge! Complete the following steps to submit your work for review:

  1. Change the name of the author: in the YAML header at the very top of this document to your name. As noted in Reproducible Research in R, The YAML header controls the style and feel for knitted document but doesn’t actually display in the final output.

  2. Click the yarn icon above to “knit” your data product to a HTML file that will be saved in your R Project folder.

  3. Commit your changes in GitHub Desktop and push them to your online GitHub repository.

  4. Publish your HTML page the web using one of the following publishing methods:

    • Publish on RPubs by clicking the “Publish” button located in the Viewer Pane when you knit your document. Note, you will need to quickly create a RPubs account.

    • Publishing on GitHub using either GitHub Pages or the HTML previewer.

  5. Post a new discussion on GitHub to our Text mining Badges forum. In your post, include a link to your published web page and a short reflection highlighting one thing you learned from this lab and one thing you’d like to explore further.