The final activity for each learning lab provides space to work with data and to reflect on how the concepts and techniques introduced in each lab might apply to your own research.
To earn a badge for each lab, you are required to respond to a set of prompts for two parts:
In Part I, you will reflect on your understanding of key concepts and begin to think about potential next steps for your own study.
In Part II, you will create a simple data product in R that demonstrates your ability to apply a data analysis technique introduced in this learning lab.
Use the institutional library (e.g. NCSU Library), Google Scholar or search engine to locate a research article, presentation, or resource that applies text mining to an educational context or topic of interest. More specifically, locate a text mining study that visualize text data.
Provide an APA citation for your selected study.
-Mohammadi, E., & Karami, A. (2022). Exploring research trends in big data across disciplines: A text mining analysis. Journal of Information Science, 48(1), 44–56. https://doi.org/10.1177/0165551520932855
How does the visualization address research questions?
Draft a research question for a population you may be interested in studying, or that would be of interest to educational researchers, and that would require the collection of text data and answer the following questions:
What text data would need to be collected?
For what reason would text data need to be collected in order to address this question?
Explain the analytical level at which these text data would need to be collected and analyzed.
Use your case study file to create a new word cloud that does not include words that would give you important information about teachers’ experiences with professional development. (For example, we did not include “University” in the word cloud describing where scholar came from as it occurs everywhere).
I highly recommend creating a new R script in your lab-1 folder to complete this task. When your code is ready to share, use the code chunk below to share the final code for your model and answer the questions that follow.
# YOUR FINAL CODE HERE
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.7 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.0
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(tidytext)
opd_survey <- read_csv ("data/opd_survey.csv")
## New names:
## • `Q16` -> `Q16...5`
## • `Resource` -> `Resource...6`
## • `Resource_10_TEXT` -> `Resource_10_TEXT...9`
## • `Resource` -> `Resource...10`
## • `Resource_10_TEXT` -> `Resource_10_TEXT...11`
## • `Q16` -> `Q16...12`
## Rows: 57054 Columns: 19
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (19): RecordedDate, ResponseId, Role, Q14, Q16...5, Resource...6, Resour...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
opd_teacher <- opd_survey %>%
select(Role, Resource...6, Q21) %>%
rename(text = Q21) %>%
slice(-1, -2) %>%
na.omit() %>%
filter(Role == "Teacher")
opd_tidy <- unnest_tokens(opd_teacher, word, text)
head(opd_tidy)
## # A tibble: 6 × 3
## Role Resource...6 word
## <chr> <chr> <chr>
## 1 Teacher Live Webinar leve…
## 2 Teacher Live Webinar ofqu…
## 3 Teacher Live Webinar and
## 4 Teacher Live Webinar revi…
## 5 Teacher Live Webinar bloo…
## 6 Teacher Online Learning Module (e.g. Call for Change, Understanding the… none
head(opd_teacher)
## # A tibble: 6 × 3
## Role Resource...6 text
## <chr> <chr> <chr>
## 1 Teacher Live Webinar "lev…
## 2 Teacher Online Learning Module (e.g. Call for Change, Understanding the… "Non…
## 3 Teacher Online Learning Module (e.g. Call for Change, Understanding the… "In …
## 4 Teacher Online Learning Module (e.g. Call for Change, Understanding the… "Und…
## 5 Teacher Online Learning Module (e.g. Call for Change, Understanding the… "ove…
## 6 Teacher Online Learning Module (e.g. Call for Change, Understanding the… "onl…
head (stop_words)
## # A tibble: 6 × 2
## word lexicon
## <chr> <chr>
## 1 a SMART
## 2 a's SMART
## 3 able SMART
## 4 about SMART
## 5 above SMART
## 6 according SMART
view (stop_words)
opd_clean <- anti_join(opd_tidy, stop_words)
## Joining, by = "word"
head(opd_clean)
## # A tibble: 6 × 3
## Role Resource...6 word
## <chr> <chr> <chr>
## 1 Teacher Live Webinar leve…
## 2 Teacher Live Webinar ofqu…
## 3 Teacher Live Webinar revi…
## 4 Teacher Live Webinar bloo…
## 5 Teacher Online Learning Module (e.g. Call for Change, Understanding the… modu…
## 6 Teacher Online Learning Module (e.g. Call for Change, Understanding the… teac…
opd_counts <- count(opd_clean, word, sort = TRUE)
opd_counts <- opd_clean %>%
count(word, sort = TRUE)
opd_counts
## # A tibble: 5,352 × 2
## word n
## <chr> <int>
## 1 information 1885
## 2 learning 1520
## 3 videos 1385
## 4 resources 1286
## 5 online 1139
## 6 examples 1105
## 7 understanding 1092
## 8 time 1082
## 9 students 1013
## 10 data 971
## # … with 5,342 more rows
exludewords <- data.frame("word"= c("teachers", "children"))
opd_clean.2 <- anti_join(opd_clean,exludewords) #remove new words
## Joining, by = "word"
opd_counts.2 <-opd_clean.2 %>%
count(word,sort=TRUE)
head(opd_counts.2)
## # A tibble: 6 × 2
## word n
## <chr> <int>
## 1 information 1885
## 2 learning 1520
## 3 videos 1385
## 4 resources 1286
## 5 online 1139
## 6 examples 1105
library(wordcloud2)
wordcloud2(opd_counts.2)
Congratulations, you’ve completed your Intro to text mining Badge! Complete the following steps to submit your work for review:
Change the name of the author: in the YAML
header at the very top of this document to your name. As noted in Reproducible
Research in R, The YAML header controls the style and feel for
knitted document but doesn’t actually display in the final
output.
Click the yarn icon above to “knit” your data product to a HTML file that will be saved in your R Project folder.
Commit your changes in GitHub Desktop and push them to your online GitHub repository.
Publish your HTML page the web using one of the following publishing methods:
Publish on RPubs by clicking the “Publish” button located in the Viewer Pane when you knit your document. Note, you will need to quickly create a RPubs account.
Publishing on GitHub using either GitHub Pages or the HTML previewer.
Post a new discussion on GitHub to our Text mining Badges forum. In your post, include a link to your published web page and a short reflection highlighting one thing you learned from this lab and one thing you’d like to explore further.