The final activity for each learning lab provides space to work with data and to reflect on how the concepts and techniques introduced in each lab might apply to your own research.

To earn a badge for each lab, you are required to respond to a set of prompts for two parts: 

Part I: Reflect and Plan

Use the institutional library (e.g. NCSU Library), Google Scholar or search engine to locate a research article, presentation, or resource that applies text mining to an educational context or topic of interest. More specifically, locate a text mining study that visualize text data.

  1. Provide an APA citation for your selected study.

    • Cook, J., Chen, C., & Griffin, A. (2019). Using text mining and data mining techniques for applied learning assessment. Journal of Effective Teaching in Higher Education, 2(1), 60-79.
  2. How does the visualization address research questions?

    • The authors used text mining to analyze 672 student evaluations collected from 40 different applied learning courses from fall 2013 to spring 2015, in order to evaluate the impact on instructional practice and student learning.

Draft a research question for a population you may be interested in studying, or that would be of interest to educational researchers, and that would require the collection of text data and answer the following questions:

  1. What text data would need to be collected?

    • RQ: How do students find jobs after receiving a federal work-study offer? I will use student survey data
  2. For what reason would text data need to be collected in order to address this question?

    • Because it’s not clear how students get jobs on campus after receiving a FWS offer and there is no quant data to assess that research question. We have collected student survey data and ask questions to students about their experiences of working while enrolled.
  3. Explain the analytical level at which these text data would need to be collected and analyzed.

    • Student-level data

Part II: Data Product

Use your case study file to create a new word cloud that does not include words that would give you important information about teachers’ experiences with professional development. (For example, we did not include “University” in the word cloud describing where scholar came from as it occurs everywhere).

I highly recommend creating a new R script in your lab-1 folder to complete this task. When your code is ready to share, use the code chunk below to share the final code for your model and answer the questions that follow.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.5     ✔ purrr   0.3.4
## ✔ tibble  3.1.7     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.0.1     ✔ forcats 0.5.1
## Warning: package 'tibble' was built under R version 4.1.2
## Warning: package 'tidyr' was built under R version 4.1.2
## Warning: package 'dplyr' was built under R version 4.1.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(tidytext)
## Warning: package 'tidytext' was built under R version 4.1.2
library(wordcloud2)
## Warning: package 'wordcloud2' was built under R version 4.1.2
opd_survey <- read_csv("data/opd_survey.csv")
## New names:
## • `Q16` -> `Q16...5`
## • `Resource` -> `Resource...6`
## • `Resource_10_TEXT` -> `Resource_10_TEXT...9`
## • `Resource` -> `Resource...10`
## • `Resource_10_TEXT` -> `Resource_10_TEXT...11`
## • `Q16` -> `Q16...12`
## Rows: 57054 Columns: 19
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (19): RecordedDate, ResponseId, Role, Q14, Q16...5, Resource...6, Resour...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
opd_survey
## # A tibble: 57,054 × 19
##    RecordedDate      ResponseId Role  Q14   Q16...5 Resource...6 Resource_8_TEXT
##    <chr>             <chr>      <chr> <chr> <chr>   <chr>        <chr>          
##  1 "Recorded Date"   "Response… "Wha… "Ple… "Which… "Please ind… "Please indica…
##  2 "{\"ImportId\":\… "{\"Impor… "{\"… "{\"… "{\"Im… "{\"ImportI… "{\"ImportId\"…
##  3 "3/14/12 12:41"   "R_6fKCyE… "Cen… "K-1… "Eleme… "Summer Ins…  <NA>          
##  4 "3/14/12 13:31"   "R_09rHle… "Cen… "Ele… "Eleme… "Online Lea…  <NA>          
##  5 "3/14/12 14:51"   "R_1BlKt2… "Sch… "Hig… "Not A… "Online Lea…  <NA>          
##  6 "3/14/12 15:02"   "R_bPGUVT… "Sch… "Ele… "Guida… "Calendar"    <NA>          
##  7 "3/14/12 17:24"   "R_egJEHM… "Tea… "Ele… "Engli… "Live Webin…  <NA>          
##  8 "3/15/12 9:18"    "R_4PEbT4… "Tea… "Ele… "Infor… "Online Lea…  <NA>          
##  9 "3/15/12 10:54"   "R_eqB1bM… "Sch… "Mid… "Guida… "Live Webin…  <NA>          
## 10 "3/15/12 14:20"   "R_ewwUwi… "Cen… "Pre… "Other… "Wiki"        <NA>          
## # … with 57,044 more rows, and 12 more variables: Resource_9_TEXT <chr>,
## #   Resource_10_TEXT...9 <chr>, Resource...10 <chr>,
## #   Resource_10_TEXT...11 <chr>, Q16...12 <chr>, Q16_9_TEXT <chr>, Q19 <chr>,
## #   Q20 <chr>, Q21 <chr>, Q26 <chr>, Q37 <chr>, Q8 <chr>
glimpse(opd_survey)
## Rows: 57,054
## Columns: 19
## $ RecordedDate          <chr> "Recorded Date", "{\"ImportId\":\"recordedDate\"…
## $ ResponseId            <chr> "Response ID", "{\"ImportId\":\"_recordId\"}", "…
## $ Role                  <chr> "What is your role within your school district o…
## $ Q14                   <chr> "Please select the school level(s) you work with…
## $ Q16...5               <chr> "Which content area(s) do you specialize in?  (S…
## $ Resource...6          <chr> "Please indicate the online professional develop…
## $ Resource_8_TEXT       <chr> "Please indicate the online professional develop…
## $ Resource_9_TEXT       <chr> "Please indicate the online professional develop…
## $ Resource_10_TEXT...9  <chr> "Please indicate the online professional develop…
## $ Resource...10         <chr> "What was the primary focus of the webinar you a…
## $ Resource_10_TEXT...11 <chr> "What was the primary focus of the webinar you a…
## $ Q16...12              <chr> "Which primary content area(s) did the webinar a…
## $ Q16_9_TEXT            <chr> "Which primary content area(s) did the webinar a…
## $ Q19                   <chr> "Please specify the online learning module you a…
## $ Q20                   <chr> "How are you using this resource?", "{\"ImportId…
## $ Q21                   <chr> "What was the most beneficial/valuable aspect of…
## $ Q26                   <chr> "What recommendations do you have for improving …
## $ Q37                   <chr> "What recommendations do you have for making thi…
## $ Q8                    <chr> "Which of the following best describe(s) how you…
head(opd_survey)
## # A tibble: 6 × 19
##   RecordedDate       ResponseId Role  Q14   Q16...5 Resource...6 Resource_8_TEXT
##   <chr>              <chr>      <chr> <chr> <chr>   <chr>        <chr>          
## 1 "Recorded Date"    "Response… "Wha… "Ple… "Which… "Please ind… "Please indica…
## 2 "{\"ImportId\":\"… "{\"Impor… "{\"… "{\"… "{\"Im… "{\"ImportI… "{\"ImportId\"…
## 3 "3/14/12 12:41"    "R_6fKCyE… "Cen… "K-1… "Eleme… "Summer Ins…  <NA>          
## 4 "3/14/12 13:31"    "R_09rHle… "Cen… "Ele… "Eleme… "Online Lea…  <NA>          
## 5 "3/14/12 14:51"    "R_1BlKt2… "Sch… "Hig… "Not A… "Online Lea…  <NA>          
## 6 "3/14/12 15:02"    "R_bPGUVT… "Sch… "Ele… "Guida… "Calendar"    <NA>          
## # … with 12 more variables: Resource_9_TEXT <chr>, Resource_10_TEXT...9 <chr>,
## #   Resource...10 <chr>, Resource_10_TEXT...11 <chr>, Q16...12 <chr>,
## #   Q16_9_TEXT <chr>, Q19 <chr>, Q20 <chr>, Q21 <chr>, Q26 <chr>, Q37 <chr>,
## #   Q8 <chr>
tail(opd_survey)
## # A tibble: 6 × 19
##   RecordedDate ResponseId       Role  Q14   Q16...5 Resource...6 Resource_8_TEXT
##   <chr>        <chr>            <chr> <chr> <chr>   <chr>        <chr>          
## 1 7/2/13 10:20 R_0cggNPIobej2k… Teac… Midd… World … Online Lear… <NA>           
## 2 7/2/13 12:32 R_bpZ2jPQV1BOta… <NA>  <NA>  <NA>    <NA>         <NA>           
## 3 7/2/13 12:32 R_4SbNuxFI6qv8p… Teac… Elem… Mathem… Online Lear… <NA>           
## 4 7/2/13 12:32 R_1TT9rRNolK2xS… <NA>  <NA>  <NA>    <NA>         <NA>           
## 5 7/2/13 12:32 R_8raUHcIydALnR… <NA>  <NA>  <NA>    <NA>         <NA>           
## 6 7/2/13 12:32 R_2bs3lzLBdGWjk… <NA>  <NA>  <NA>    <NA>         <NA>           
## # … with 12 more variables: Resource_9_TEXT <chr>, Resource_10_TEXT...9 <chr>,
## #   Resource...10 <chr>, Resource_10_TEXT...11 <chr>, Q16...12 <chr>,
## #   Q16_9_TEXT <chr>, Q19 <chr>, Q20 <chr>, Q21 <chr>, Q26 <chr>, Q37 <chr>,
## #   Q8 <chr>
names(opd_survey)
##  [1] "RecordedDate"          "ResponseId"            "Role"                 
##  [4] "Q14"                   "Q16...5"               "Resource...6"         
##  [7] "Resource_8_TEXT"       "Resource_9_TEXT"       "Resource_10_TEXT...9" 
## [10] "Resource...10"         "Resource_10_TEXT...11" "Q16...12"             
## [13] "Q16_9_TEXT"            "Q19"                   "Q20"                  
## [16] "Q21"                   "Q26"                   "Q37"                  
## [19] "Q8"
view(opd_survey)
write_csv(opd_survey, "data/opd_survey_copy.csv")

opd_selected <- select(opd_survey, Role, Resource...6, Q21)
head(opd_selected)
## # A tibble: 6 × 3
##   Role                                                        Resource...6 Q21  
##   <chr>                                                       <chr>        <chr>
## 1 "What is your role within your school district or organiza… "Please ind… "Wha…
## 2 "{\"ImportId\":\"QID2\"}"                                   "{\"ImportI… "{\"…
## 3 "Central Office Staff (e.g. Superintendents, Tech Director… "Summer Ins…  <NA>
## 4 "Central Office Staff (e.g. Superintendents, Tech Director… "Online Lea… "Glo…
## 5 "School Support Staff (e.g. Counselors, Technology Facilit… "Online Lea…  <NA>
## 6 "School Support Staff (e.g. Counselors, Technology Facilit… "Calendar"   "com…
opd_renamed <- rename(opd_selected, text = Q21)
opd_renamed <- rename(opd_selected, Resource =Resource...6) #renaming variables from ...6 to resource
opd_sliced <- slice(opd_renamed, -1, -2) # the - sign indicates to NOT keep rows 1 and 2, drop these two rows
opd_complete <- na.omit(opd_sliced) ## delete missing values 
opd_teacher <- filter(opd_complete, Role == "Teacher") ##only include teacher responses

## tokenization 
opd_tidy <- opd_renamed %>%
  select(Role, Resource, Q21) %>%
  rename(text = Q21) %>%
  slice(-1, -2) %>%
  na.omit() %>%
  filter(Role == "Teacher") %>%
  unnest_tokens(word, text)
head(opd_tidy)
## # A tibble: 6 × 3
##   Role    Resource                                                         word 
##   <chr>   <chr>                                                            <chr>
## 1 Teacher Live Webinar                                                     leve…
## 2 Teacher Live Webinar                                                     ofqu…
## 3 Teacher Live Webinar                                                     and  
## 4 Teacher Live Webinar                                                     revi…
## 5 Teacher Live Webinar                                                     bloo…
## 6 Teacher Online Learning Module (e.g. Call for Change, Understanding the… none
head(stop_words)
## # A tibble: 6 × 2
##   word      lexicon
##   <chr>     <chr>  
## 1 a         SMART  
## 2 a's       SMART  
## 3 able      SMART  
## 4 about     SMART  
## 5 above     SMART  
## 6 according SMART
view(stop_words)
opd_clean <- anti_join(opd_tidy, stop_words)
## Joining, by = "word"
head(opd_clean)
## # A tibble: 6 × 3
##   Role    Resource                                                         word 
##   <chr>   <chr>                                                            <chr>
## 1 Teacher Live Webinar                                                     leve…
## 2 Teacher Live Webinar                                                     ofqu…
## 3 Teacher Live Webinar                                                     revi…
## 4 Teacher Live Webinar                                                     bloo…
## 5 Teacher Online Learning Module (e.g. Call for Change, Understanding the… modu…
## 6 Teacher Online Learning Module (e.g. Call for Change, Understanding the… teac…
opd_counts <- count(opd_clean, word, sort = TRUE)
opd_counts <- opd_clean %>% 
  count(word, sort = TRUE)
opd_counts
## # A tibble: 5,352 × 2
##    word              n
##    <chr>         <int>
##  1 information    1885
##  2 learning       1520
##  3 videos         1385
##  4 resources      1286
##  5 online         1139
##  6 examples       1105
##  7 understanding  1092
##  8 time           1082
##  9 students       1013
## 10 data            971
## # … with 5,342 more rows
exclude_words <-data.frame("word"= c("videos", "online"))
opd_excluded <- anti_join(opd_clean, exclude_words) 
## Joining, by = "word"
opd_counts2 <-opd_excluded %>%
  count(word,sort=TRUE)
head(opd_counts2)
## # A tibble: 6 × 2
##   word              n
##   <chr>         <int>
## 1 information    1885
## 2 learning       1520
## 3 resources      1286
## 4 examples       1105
## 5 understanding  1092
## 6 time           1082
wordcloud2(opd_counts2)

Knit & Submit

Congratulations, you’ve completed your Intro to text mining Badge! Complete the following steps to submit your work for review:

  1. Change the name of the author: in the YAML header at the very top of this document to your name. As noted in Reproducible Research in R, The YAML header controls the style and feel for knitted document but doesn’t actually display in the final output.

  2. Click the yarn icon above to “knit” your data product to a HTML file that will be saved in your R Project folder.

  3. Commit your changes in GitHub Desktop and push them to your online GitHub repository.

  4. Publish your HTML page the web using one of the following publishing methods:

    • Publish on RPubs by clicking the “Publish” button located in the Viewer Pane when you knit your document. Note, you will need to quickly create a RPubs account.

    • Publishing on GitHub using either GitHub Pages or the HTML previewer.

  5. Post a new discussion on GitHub to our Text mining Badges forum. In your post, include a link to your published web page and a short reflection highlighting one thing you learned from this lab and one thing you’d like to explore further.