12/02/2021

Objective

Overall

  • Bridge quantitative and qualitative research
    • Provide examples that cross paradigms
    • What can we learn from each paradigm?

Today

  • Would some of these topics be relevant across paradigms?
  • How could they be presented in an engaging way?
    • Real examples

Background

Motivation

  • How can NLP assist researchers in extracting meaning from large volumes of text?
    • Journal articles
    • Open-ended survey responses
    • Social media posts
    • News articles
    • Other

Articles Used in the Demo

  • Cuddapah, J. L. (2002). The Teachers College New Teacher Institute: Supporting New Teachers through Mentoring Relationships.
  • Cuddapah, J. L., & Clayton, C. D. (2011). Using Wenger’s communities of practice to explore a new teacher cohort. Journal of Teacher Education, 62(1), 62-75.
  • Cuddapah, J. L., & Stanford, B. H. (2015). Career-changers’ ideal teacher images and grounded classroom perspectives. Teaching and teacher education, 51, 27-37.
  • Gifford, J., Snyder, M. G., & Cuddapah, J. L. (2013). Novice career changers weather the classroom weather. Phi Delta Kappan, 94(6), 50-54.
  • Masci, F. J., Cuddapah, J. L., & Pajak, E. F. (2008). Becoming and agent of stability: Keeping your school in balance in the perfect storm. American Secondary Education 36(2), 57-68.

Outline of Basic Steps

  • Import the Text
  • Clean the Text
  • Perform Statistical Analysis
  • Create Visualizations

Import the Text

  • Text Corpus
    • Can store text from multiple documents
    • Holds additional metadata (e.g. title, author, date, page numbers)

Import the Text - Example Corpus

##   author       : Jennifer Locraft Cuddapah
##   datetimestamp: 2106-02-07 01:28:15
##   description  : Teaching and Teacher Education, 51 (2015) 27-37. doi:10.1016/j.tate.2015.05.004
##   heading      : Career-changers’ ideal teacher images and grounded classroom perspectives
##   id           : IdealTeacherImages.pdf
##   language     : en
##   origin       : Elsevier
##  [1] "Teaching and Teacher Education 51 (2015) 27e37"                         
##  [2] ""                                                                       
##  [3] "Contents lists available at ScienceDirect"                              
##  [4] ""                                                                       
##  [5] "Teaching and Teacher Education journal homepage:"                       
##  [6] "www.elsevier.com/locate/tate"                                           
##  [7] ""                                                                       
##  [8] "Career-changers’ ideal teacher images and grounded classroom"           
##  [9] "perspectives Jennifer Locraft Cuddapah a, *, Beverly Hardcastle"        
## [10] "Stanford b, 1 a Hood College, 401 Rosemont Avenue, Frederick, MD 21701,"

Clean the Text

  • Remove numbers, punctuation, symbols, etc.
    • Remove irrelevant clutter
  • Convert all terms to lower case
    • e.g. Should “Principal” be treated as the same word as “principal”?
  • Remove stopwords
    • e.g. the, a, and, if, is, are
  • Stemming / Lemmatization
    • e.g. Should “train”, “trained”, “training”, etc., be treated in common?

Clean the Text - Example

Clean the Text - Example

Clean the Text - Example

Clean the Text - Example

Term Document Matrix

At what level should the text be analyzed?

  • Terms
    • Single words (unigrams)
    • N-grams (bigrams, trigrams, etc.)
    • Sentences
    • Paragraphs
    • Pages

Sample TDM

term AgentOfStability.pdf IdealTeacherImages.pdf MentoringRelationships.pdf
continues 1 NA 1
culture 8 NA 3
depart 1 NA NA
impacted 1 7 NA
participated NA 2 NA

Perform Statistical Analysis

  • Term Frequency \[ tf_w^i = \frac{n_w^i}{n^i} \]
term document count tf
spark AgentOfStability.pdf 1 0.0003702
predicting IdealTeacherImages.pdf 1 0.0001691
scientists IdealTeacherImages.pdf 1 0.0001691
individual MentoringRelationships.pdf 9 0.0016138
multifaceted MentoringRelationships.pdf 2 0.0003586

Perform Statistical Analysis - Continued

  • Inverse Document Frequency

\[ idf_w = log(\frac{D}{j | t_w \in d_j}) = log(\frac{1}{df_w}) \]

Perform Statistical Analysis - Continued

  • Inverse Document Frequency
term document count idf
spark AgentOfStability.pdf 1 1.6094379
predicting IdealTeacherImages.pdf 1 1.6094379
scientists IdealTeacherImages.pdf 1 0.9162907
individual MentoringRelationships.pdf 9 0.0000000
multifaceted MentoringRelationships.pdf 2 1.6094379

Perform Statistical Analysis - Continued

  • Tf-Idf \[ TfIdf_w^i = tf_w^i * idf_w \]
term document count tf idf tf_idf
spark AgentOfStability.pdf 1 0.0003702 1.6094379 0.0005959
predicting IdealTeacherImages.pdf 1 0.0001691 1.6094379 0.0002721
scientists IdealTeacherImages.pdf 1 0.0001691 0.9162907 0.0001549
individual MentoringRelationships.pdf 9 0.0016138 0.0000000 0.0000000
multifaceted MentoringRelationships.pdf 2 0.0003586 1.6094379 0.0005772

Visualizations

  • Top Words per Document by TF

Visualizations - Continued

  • Top Words per Document by TF-IDF

Visualizations - Continued

  • n-Grams

Visualizations - Continued

  • Word Clouds

Vizualizations - Continued

  • Term Correlation Graphs

Visualizations - Continued

  • Latent Dirichlet Allocation (LDA)

Visualizations - Continued

  • Latent Dirichlet Allocation (LDA) Topic by Document

Visualizations - Continued

  • Sentiment Analysis
word sentiment
meltdown negative
memorable positive
menace negative
menacing negative
menacingly negative
mendacious negative
mendacity negative

Visualizations - Continued

  • Sentiment Analysis