Intro to TM Badge

The final activity for each learning lab provides space to work with data and to reflect on how the concepts and techniques introduced in each lab might apply to your own research.

To earn a badge for each lab, you are required to respond to a set of prompts for two parts:

In Part I, you will reflect on your understanding of key concepts and begin to think about potential next steps for your own study.
In Part II, you will create a simple data product in R that demonstrates your ability to apply a data analysis technique introduced in this learning lab.

Part I: Reflect and Plan

Use the institutional library (e.g. NCSU Library), Google Scholar or search engine to locate a research article, presentation, or resource that applies text mining to an educational context or topic of interest. More specifically, locate a text mining study that visualize text data.

Provide an APA citation for your selected study.
- Haber, J. R. (2021). Sorting schools: A computational analysis of charter school identities and stratification. Sociology of Education, 94(1), 43-64. https://doi.org/10.1177/0038040720953218
How does the visualization address research questions?
- word clouds; plots

Draft a research question for a population you may be interested in studying, or that would be of interest to educational researchers, and that would require the collection of text data and answer the following questions:

What text data would need to be collected?
- school improvement plans; community center brochures; library flyers
For what reason would text data need to be collected in order to address this question?
- to figure out what services and activities are provided in a given community
Explain the analytical level at which these text data would need to be collected and analyzed.
- service providers
- institutions

Part II: Data Product

Use your case study file to create a new word cloud that does not include words that would give you important information about teachers’ experiences with professional development. (For example, we did not include “University” in the word cloud describing where scholar came from as it occurs everywhere).

I highly recommend creating a new R script in your lab-1 folder to complete this task. When your code is ready to share, use the code chunk below to share the final code for your model and answer the questions that follow.

# YOUR FINAL CODE HERE
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(tidytext)
opd_survey <- read_csv("data/opd_survey.csv")

## New names:
## Rows: 57054 Columns: 19
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (19): RecordedDate, ResponseId, Role, Q14, Q16...5, Resource...6, Resour...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `Q16` -> `Q16...5`
## • `Resource` -> `Resource...6`
## • `Resource_10_TEXT` -> `Resource_10_TEXT...9`
## • `Resource` -> `Resource...10`
## • `Resource_10_TEXT` -> `Resource_10_TEXT...11`
## • `Q16` -> `Q16...12`

opd_selected <- select(opd_survey, ResponseId, Role, Q8)
opd_sliced <- slice(opd_selected, -1, -2)
opd_complete <- na.omit(opd_sliced)
opd_teacher <- filter(opd_complete, Role == "Teacher")
opd_tidy <- unnest_tokens(opd_teacher, word, Q8) # data,token unit,column
head(stop_words) # within package

## # A tibble: 6 × 2
##   word      lexicon
##   <chr>     <chr>  
## 1 a         SMART  
## 2 a's       SMART  
## 3 able      SMART  
## 4 about     SMART  
## 5 above     SMART  
## 6 according SMART

opd_clean <- anti_join(opd_tidy, stop_words)

## Joining with `by = join_by(word)`

#
opd_counts <- count(opd_clean, word, sort = TRUE)
#
library(wordcloud2)
wordcloud2(opd_counts)

Knit & Submit

Congratulations, you’ve completed your Intro to text mining Badge! Complete the following steps in the orientation to submit your work for review.

Intro to TM Badge

LASER Institute TM Learning Lab 1

Lee

July 20, 2023

Part I: Reflect and Plan

Part II: Data Product

Knit & Submit