The final activity for each learning lab provides space to work with data and to reflect on how the concepts and techniques introduced in each lab might apply to your own research.
To earn a badge for each lab, you are required to respond to a set of prompts for two parts:
In Part I, you will reflect on your understanding of key concepts and begin to think about potential next steps for your own study.
In Part II, you will create a simple data product in R that demonstrates your ability to apply a data analysis technique introduced in this learning lab.
Use the institutional library (e.g. NCSU Library), Google Scholar or search engine to locate a research article, presentation, or resource that applies text mining to an educational context or topic of interest. More specifically, locate a text mining study that visualize text data.
Provide an APA citation for your selected study.
How does the visualization address research questions?
Draft a research question for a population you may be interested in studying, or that would be of interest to educational researchers, and that would require the collection of text data and answer the following questions:
What text data would need to be collected?
For what reason would text data need to be collected in order to address this question?
Explain the analytical level at which these text data would need to be collected and analyzed.
Use your case study file to create a new word cloud that does not include words that would give you important information about teachers’ experiences with professional development. (For example, we did not include “University” in the word cloud describing where scholar came from as it occurs everywhere).
I highly recommend creating a new R script in your lab-1 folder to complete this task. When your code is ready to share, use the code chunk below to share the final code for your model and answer the questions that follow.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.5 ✔ purrr 0.3.4
## ✔ tibble 3.1.7 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.0
## ✔ readr 2.0.1 ✔ forcats 0.5.1
## Warning: package 'tibble' was built under R version 4.1.2
## Warning: package 'tidyr' was built under R version 4.1.2
## Warning: package 'dplyr' was built under R version 4.1.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(tidytext)
## Warning: package 'tidytext' was built under R version 4.1.2
library(wordcloud2)
## Warning: package 'wordcloud2' was built under R version 4.1.2
opd_survey <- read_csv("data/opd_survey.csv")
## New names:
## • `Q16` -> `Q16...5`
## • `Resource` -> `Resource...6`
## • `Resource_10_TEXT` -> `Resource_10_TEXT...9`
## • `Resource` -> `Resource...10`
## • `Resource_10_TEXT` -> `Resource_10_TEXT...11`
## • `Q16` -> `Q16...12`
## Rows: 57054 Columns: 19
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (19): RecordedDate, ResponseId, Role, Q14, Q16...5, Resource...6, Resour...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
opd_survey
## # A tibble: 57,054 × 19
## RecordedDate ResponseId Role Q14 Q16...5 Resource...6 Resource_8_TEXT
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 "Recorded Date" "Response… "Wha… "Ple… "Which… "Please ind… "Please indica…
## 2 "{\"ImportId\":\… "{\"Impor… "{\"… "{\"… "{\"Im… "{\"ImportI… "{\"ImportId\"…
## 3 "3/14/12 12:41" "R_6fKCyE… "Cen… "K-1… "Eleme… "Summer Ins… <NA>
## 4 "3/14/12 13:31" "R_09rHle… "Cen… "Ele… "Eleme… "Online Lea… <NA>
## 5 "3/14/12 14:51" "R_1BlKt2… "Sch… "Hig… "Not A… "Online Lea… <NA>
## 6 "3/14/12 15:02" "R_bPGUVT… "Sch… "Ele… "Guida… "Calendar" <NA>
## 7 "3/14/12 17:24" "R_egJEHM… "Tea… "Ele… "Engli… "Live Webin… <NA>
## 8 "3/15/12 9:18" "R_4PEbT4… "Tea… "Ele… "Infor… "Online Lea… <NA>
## 9 "3/15/12 10:54" "R_eqB1bM… "Sch… "Mid… "Guida… "Live Webin… <NA>
## 10 "3/15/12 14:20" "R_ewwUwi… "Cen… "Pre… "Other… "Wiki" <NA>
## # … with 57,044 more rows, and 12 more variables: Resource_9_TEXT <chr>,
## # Resource_10_TEXT...9 <chr>, Resource...10 <chr>,
## # Resource_10_TEXT...11 <chr>, Q16...12 <chr>, Q16_9_TEXT <chr>, Q19 <chr>,
## # Q20 <chr>, Q21 <chr>, Q26 <chr>, Q37 <chr>, Q8 <chr>
glimpse(opd_survey)
## Rows: 57,054
## Columns: 19
## $ RecordedDate <chr> "Recorded Date", "{\"ImportId\":\"recordedDate\"…
## $ ResponseId <chr> "Response ID", "{\"ImportId\":\"_recordId\"}", "…
## $ Role <chr> "What is your role within your school district o…
## $ Q14 <chr> "Please select the school level(s) you work with…
## $ Q16...5 <chr> "Which content area(s) do you specialize in? (S…
## $ Resource...6 <chr> "Please indicate the online professional develop…
## $ Resource_8_TEXT <chr> "Please indicate the online professional develop…
## $ Resource_9_TEXT <chr> "Please indicate the online professional develop…
## $ Resource_10_TEXT...9 <chr> "Please indicate the online professional develop…
## $ Resource...10 <chr> "What was the primary focus of the webinar you a…
## $ Resource_10_TEXT...11 <chr> "What was the primary focus of the webinar you a…
## $ Q16...12 <chr> "Which primary content area(s) did the webinar a…
## $ Q16_9_TEXT <chr> "Which primary content area(s) did the webinar a…
## $ Q19 <chr> "Please specify the online learning module you a…
## $ Q20 <chr> "How are you using this resource?", "{\"ImportId…
## $ Q21 <chr> "What was the most beneficial/valuable aspect of…
## $ Q26 <chr> "What recommendations do you have for improving …
## $ Q37 <chr> "What recommendations do you have for making thi…
## $ Q8 <chr> "Which of the following best describe(s) how you…
head(opd_survey)
## # A tibble: 6 × 19
## RecordedDate ResponseId Role Q14 Q16...5 Resource...6 Resource_8_TEXT
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 "Recorded Date" "Response… "Wha… "Ple… "Which… "Please ind… "Please indica…
## 2 "{\"ImportId\":\"… "{\"Impor… "{\"… "{\"… "{\"Im… "{\"ImportI… "{\"ImportId\"…
## 3 "3/14/12 12:41" "R_6fKCyE… "Cen… "K-1… "Eleme… "Summer Ins… <NA>
## 4 "3/14/12 13:31" "R_09rHle… "Cen… "Ele… "Eleme… "Online Lea… <NA>
## 5 "3/14/12 14:51" "R_1BlKt2… "Sch… "Hig… "Not A… "Online Lea… <NA>
## 6 "3/14/12 15:02" "R_bPGUVT… "Sch… "Ele… "Guida… "Calendar" <NA>
## # … with 12 more variables: Resource_9_TEXT <chr>, Resource_10_TEXT...9 <chr>,
## # Resource...10 <chr>, Resource_10_TEXT...11 <chr>, Q16...12 <chr>,
## # Q16_9_TEXT <chr>, Q19 <chr>, Q20 <chr>, Q21 <chr>, Q26 <chr>, Q37 <chr>,
## # Q8 <chr>
tail(opd_survey)
## # A tibble: 6 × 19
## RecordedDate ResponseId Role Q14 Q16...5 Resource...6 Resource_8_TEXT
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 7/2/13 10:20 R_0cggNPIobej2k… Teac… Midd… World … Online Lear… <NA>
## 2 7/2/13 12:32 R_bpZ2jPQV1BOta… <NA> <NA> <NA> <NA> <NA>
## 3 7/2/13 12:32 R_4SbNuxFI6qv8p… Teac… Elem… Mathem… Online Lear… <NA>
## 4 7/2/13 12:32 R_1TT9rRNolK2xS… <NA> <NA> <NA> <NA> <NA>
## 5 7/2/13 12:32 R_8raUHcIydALnR… <NA> <NA> <NA> <NA> <NA>
## 6 7/2/13 12:32 R_2bs3lzLBdGWjk… <NA> <NA> <NA> <NA> <NA>
## # … with 12 more variables: Resource_9_TEXT <chr>, Resource_10_TEXT...9 <chr>,
## # Resource...10 <chr>, Resource_10_TEXT...11 <chr>, Q16...12 <chr>,
## # Q16_9_TEXT <chr>, Q19 <chr>, Q20 <chr>, Q21 <chr>, Q26 <chr>, Q37 <chr>,
## # Q8 <chr>
names(opd_survey)
## [1] "RecordedDate" "ResponseId" "Role"
## [4] "Q14" "Q16...5" "Resource...6"
## [7] "Resource_8_TEXT" "Resource_9_TEXT" "Resource_10_TEXT...9"
## [10] "Resource...10" "Resource_10_TEXT...11" "Q16...12"
## [13] "Q16_9_TEXT" "Q19" "Q20"
## [16] "Q21" "Q26" "Q37"
## [19] "Q8"
view(opd_survey)
write_csv(opd_survey, "data/opd_survey_copy.csv")
opd_selected <- select(opd_survey, Role, Resource...6, Q21)
head(opd_selected)
## # A tibble: 6 × 3
## Role Resource...6 Q21
## <chr> <chr> <chr>
## 1 "What is your role within your school district or organiza… "Please ind… "Wha…
## 2 "{\"ImportId\":\"QID2\"}" "{\"ImportI… "{\"…
## 3 "Central Office Staff (e.g. Superintendents, Tech Director… "Summer Ins… <NA>
## 4 "Central Office Staff (e.g. Superintendents, Tech Director… "Online Lea… "Glo…
## 5 "School Support Staff (e.g. Counselors, Technology Facilit… "Online Lea… <NA>
## 6 "School Support Staff (e.g. Counselors, Technology Facilit… "Calendar" "com…
opd_renamed <- rename(opd_selected, text = Q21)
opd_renamed <- rename(opd_selected, Resource =Resource...6) #renaming variables from ...6 to resource
opd_sliced <- slice(opd_renamed, -1, -2) # the - sign indicates to NOT keep rows 1 and 2, drop these two rows
opd_complete <- na.omit(opd_sliced) ## delete missing values
opd_teacher <- filter(opd_complete, Role == "Teacher") ##only include teacher responses
## tokenization
opd_tidy <- opd_renamed %>%
select(Role, Resource, Q21) %>%
rename(text = Q21) %>%
slice(-1, -2) %>%
na.omit() %>%
filter(Role == "Teacher") %>%
unnest_tokens(word, text)
head(opd_tidy)
## # A tibble: 6 × 3
## Role Resource word
## <chr> <chr> <chr>
## 1 Teacher Live Webinar leve…
## 2 Teacher Live Webinar ofqu…
## 3 Teacher Live Webinar and
## 4 Teacher Live Webinar revi…
## 5 Teacher Live Webinar bloo…
## 6 Teacher Online Learning Module (e.g. Call for Change, Understanding the… none
head(stop_words)
## # A tibble: 6 × 2
## word lexicon
## <chr> <chr>
## 1 a SMART
## 2 a's SMART
## 3 able SMART
## 4 about SMART
## 5 above SMART
## 6 according SMART
view(stop_words)
opd_clean <- anti_join(opd_tidy, stop_words)
## Joining, by = "word"
head(opd_clean)
## # A tibble: 6 × 3
## Role Resource word
## <chr> <chr> <chr>
## 1 Teacher Live Webinar leve…
## 2 Teacher Live Webinar ofqu…
## 3 Teacher Live Webinar revi…
## 4 Teacher Live Webinar bloo…
## 5 Teacher Online Learning Module (e.g. Call for Change, Understanding the… modu…
## 6 Teacher Online Learning Module (e.g. Call for Change, Understanding the… teac…
opd_counts <- count(opd_clean, word, sort = TRUE)
opd_counts <- opd_clean %>%
count(word, sort = TRUE)
opd_counts
## # A tibble: 5,352 × 2
## word n
## <chr> <int>
## 1 information 1885
## 2 learning 1520
## 3 videos 1385
## 4 resources 1286
## 5 online 1139
## 6 examples 1105
## 7 understanding 1092
## 8 time 1082
## 9 students 1013
## 10 data 971
## # … with 5,342 more rows
exclude_words <-data.frame("word"= c("videos", "online"))
opd_excluded <- anti_join(opd_clean, exclude_words)
## Joining, by = "word"
opd_counts2 <-opd_excluded %>%
count(word,sort=TRUE)
head(opd_counts2)
## # A tibble: 6 × 2
## word n
## <chr> <int>
## 1 information 1885
## 2 learning 1520
## 3 resources 1286
## 4 examples 1105
## 5 understanding 1092
## 6 time 1082
wordcloud2(opd_counts2)
Congratulations, you’ve completed your Intro to text mining Badge! Complete the following steps to submit your work for review:
Change the name of the author: in the YAML
header at the very top of this document to your name. As noted in Reproducible
Research in R, The YAML header controls the style and feel for
knitted document but doesn’t actually display in the final
output.
Click the yarn icon above to “knit” your data product to a HTML file that will be saved in your R Project folder.
Commit your changes in GitHub Desktop and push them to your online GitHub repository.
Publish your HTML page the web using one of the following publishing methods:
Publish on RPubs by clicking the “Publish” button located in the Viewer Pane when you knit your document. Note, you will need to quickly create a RPubs account.
Publishing on GitHub using either GitHub Pages or the HTML previewer.
Post a new discussion on GitHub to our Text mining Badges forum. In your post, include a link to your published web page and a short reflection highlighting one thing you learned from this lab and one thing you’d like to explore further.