Sentiment Analysis on Science of Reading Legislation

1. PREPARE

Background

This independent analysis focuses on studying public’s sentiment regarding the Science of Reading (SoR). Science of Reading relates to evidence-based reading instruction practices that can be designed to meet the needs of individual learners(Snowling & Hulme, 2005). This includes, acquisition of language, phonological and phonemic awareness, phonics and spelling, fluency, vocabulary, oral language, and comprehension. The main motivation for doing this analysis came from the recent debates that have been going on regarding modification of “Read to Achive” legislation, that calls for the adoption of Science of Reading curricula in North Carolina (Pondiscio,2021).In April 2021, the NC’s democratic governor mandated schools to use phoenics based approach to improve reading instruction. According to North Carolina’s Department of Public Instruction, in implementing this bill, teachers are expected to be trained in SoR and to base their instruction in it. The main purpose of this brief analysis is to highlight the public’s opinion regarding Science of Reading as a construct. The study followed through the Data-Intensive Research Workflow presented by Krumm et al.(2018), to perform the informed analysis and communicate the findings.

Research Questions

The research questions that guided this study include:

RQ1: What are the most frequent words that represent Science of Reading discussions on Twitter?

RQ2: What is the overall sentiment toward Science of Reading in social network platforms such as Twitter?

The Dataset

The analysis is based on Twitter data that is pulled through a developer account. The dataset primarily included 781 observations, which due to the limitations of the account, these are tweets that have been posted in the previous nine days.

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

## 
## Attaching package: 'scales'

## The following object is masked from 'package:readr':
## 
##     col_factor

After loading the libraries, the next step involved pulling Twitter Data relating to science of reading through the developer account.

## <Token>
## <oauth_endpoint>
##  request:   https://api.twitter.com/oauth/request_token
##  authorize: https://api.twitter.com/oauth/authenticate
##  access:    https://api.twitter.com/oauth/access_token
## <oauth_app> DoreenUsesR
##   key:    pTi0tvNb5VF0ONG2hQFX7RkqP
##   secret: <hidden>
## <credentials> oauth_token, oauth_token_secret
## ---

2. WRANGLE

In this study, the wrangling process involved pulling Twitter data by searching tweets that correspond to science of reading. I then pulled the data into R and performed selection of variables of interest , tokenized the tweets and got the lexicons for sentiment analysis. I have included comments in the code chuck to inform on the performed manipulations.

3. EXPLORE

## Using `to_lower = TRUE` with `token = 'tweets'` may not preserve URLs.

## # A tibble: 2,867 × 2
##    word                  n
##    <chr>             <int>
##  1 reading             648
##  2 science             546
##  3 #scienceofreading   180
##  4 students            123
##  5 literacy            116
##  6 teachers            112
##  7 amp                 103
##  8 read                 80
##  9 learning             72
## 10 learn                64
## # … with 2,857 more rows

It can be observed some words included in this output are expected to be present in the discussions and they do not offer deeper meaning to the analysis. I therefore customized the stopwords to remove words such as “reading”, “science”, “amp”, ’#scienceofreading” which are repetitive of the main concept.

## # A tibble: 2,854 × 2
##    word            n
##    <chr>       <int>
##  1 students      123
##  2 literacy      116
##  3 teachers      112
##  4 read           80
##  5 learning       72
##  6 learn          64
##  7 teacher        51
##  8 training       49
##  9 school         46
## 10 instruction    43
## # … with 2,844 more rows

Inorder to have a neat wordcloud, I selected the top 100 words to include in the visualization.

## Selecting by n

From the visualized wordcloud, it can be observed words such as “debates”, “learn” “media” instruction” to be prevalent in the discussions about Science of reading. However,as much as I found the wordcloud to be informative, I think plotting the words can provide further insights to the study. I therefore created a bar chart to provide a clearer count of the words in use.

After exploring the frequent words by the word cloud and bar chart, in the next stage I loaded the lexicons (AFINN, BING and NRC) in order to compute the sentiment of public tweets. The use of these three lexicons enhances the validity of study especially in answering the second research question.

## # A tibble: 2,477 × 2
##    word       value
##    <chr>      <dbl>
##  1 abandon       -2
##  2 abandoned     -2
##  3 abandons      -2
##  4 abducted      -2
##  5 abduction     -2
##  6 abductions    -2
##  7 abhor         -3
##  8 abhorred      -3
##  9 abhorrent     -3
## 10 abhors        -3
## # … with 2,467 more rows

## # A tibble: 6,786 × 2
##    word        sentiment
##    <chr>       <chr>    
##  1 2-faces     negative 
##  2 abnormal    negative 
##  3 abolish     negative 
##  4 abominable  negative 
##  5 abominably  negative 
##  6 abominate   negative 
##  7 abomination negative 
##  8 abort       negative 
##  9 aborted     negative 
## 10 aborts      negative 
## # … with 6,776 more rows

## # A tibble: 13,875 × 2
##    word        sentiment
##    <chr>       <chr>    
##  1 abacus      trust    
##  2 abandon     fear     
##  3 abandon     negative 
##  4 abandon     sadness  
##  5 abandoned   anger    
##  6 abandoned   fear     
##  7 abandoned   negative 
##  8 abandoned   sadness  
##  9 abandonment anger    
## 10 abandonment fear     
## # … with 13,865 more rows

## # A tibble: 4,150 × 2
##    word         sentiment
##    <chr>        <chr>    
##  1 abandon      negative 
##  2 abandoned    negative 
##  3 abandoning   negative 
##  4 abandonment  negative 
##  5 abandonments negative 
##  6 abandons     negative 
##  7 abdicated    negative 
##  8 abdicates    negative 
##  9 abdicating   negative 
## 10 abdication   negative 
## # … with 4,140 more rows

## # A tibble: 205 × 3
##    word            n value
##    <chr>       <int> <dbl>
##  1 support        37     2
##  2 free           24     1
##  3 struggling     24    -2
##  4 excited        20     3
##  5 love           20     3
##  6 join           15     1
##  7 opportunity    15     2
##  8 growth         14     2
##  9 importance     12     2
## 10 amazing        11     4
## # … with 195 more rows

## # A tibble: 213 × 3
##    word           n sentiment
##    <chr>      <int> <chr>    
##  1 support       37 positive 
##  2 free          24 positive 
##  3 struggling    24 negative 
##  4 balanced      20 positive 
##  5 excited       20 positive 
##  6 love          20 positive 
##  7 lead          18 positive 
##  8 skill         16 positive 
##  9 amazing       11 positive 
## 10 gains         10 positive 
## # … with 203 more rows

## # A tibble: 803 × 3
##    word            n sentiment
##    <chr>       <int> <chr>    
##  1 learning       72 positive 
##  2 learn          64 positive 
##  3 teacher        51 positive 
##  4 teacher        51 trust    
##  5 school         46 trust    
##  6 instruction    43 positive 
##  7 instruction    43 trust    
##  8 teach          31 joy      
##  9 teach          31 positive 
## 10 teach          31 surprise 
## # … with 793 more rows

## # A tibble: 2 × 2
##   sentiment     n
##   <chr>     <int>
## 1 positive    129
## 2 negative     84

## # A tibble: 2 × 2
##   sentiment     n
##   <chr>     <int>
## 1 positive    129
## 2 negative     84

## # A tibble: 1 × 4
##   lexicon negative positive sentiment
##   <chr>      <int>    <int>     <int>
## 1 bing          84      129        45

## # A tibble: 1 × 2
##   lexicon sentiment
##   <chr>       <dbl>
## 1 AFINN         137

4. MODEL

## # A tibble: 698 × 2
##    status_id           text                                                     
##    <chr>               <chr>                                                    
##  1 1492248230003937287 "#dyslexia,#reading-intervention\n#scienceofreading #ear…
##  2 1492230303464792065 "“There is NO comprehension strategy powerful enough to …
##  3 1492228001521709059 "Today I was at @FoxNews talking about how NY politician…
##  4 1492224881949384710 "We LOVE to see this!! #unanimous #earlyliteracy #scienc…
##  5 1492222836123000838 "Great news out of Virginia today! Delegate @CarrieCoyne…
##  6 1492219473755095041 "@overtimerules The District is putting resources toward…
##  7 1490677380372959239 "A little #MondayMotivation for any literacy leaders out…
##  8 1492213780616552448 "We're adding some style to our uniforms with our new Pr…
##  9 1492181809727262721 "7th and 8th grade Earthworm Dissection in our Science L…
## 10 1492209877644681218 "#Educators: do you remember the first time you learned …
## # … with 688 more rows

## Using `to_lower = TRUE` with `token = 'tweets'` may not preserve URLs.

## # A tibble: 716 × 3
##    status_id           word        value
##    <chr>               <chr>       <dbl>
##  1 1492248230003937287 recommended     2
##  2 1492230303464792065 powerful        2
##  3 1492224881949384710 love            3
##  4 1492219473755095041 supporting      1
##  5 1492219473755095041 support         2
##  6 1492213780616552448 spirit          1
##  7 1492181809727262721 fun             4
##  8 1492181809727262721 admire          3
##  9 1492209877644681218 shame          -2
## 10 1492209877644681218 free            1
## # … with 706 more rows

## # A tibble: 246 × 2
##    status_id           value
##    <chr>               <dbl>
##  1 1489301156442431489     4
##  2 1489314374732763137    -4
##  3 1489342994885103616     4
##  4 1489356657457041408     0
##  5 1489357753508368393     2
##  6 1489385427803025408     4
##  7 1489400946362765313    -2
##  8 1489456081449500676     6
##  9 1489472827925610497     2
## 10 1489590243871318017    -1
## # … with 236 more rows

## # A tibble: 231 × 3
##    status_id           value sentiment
##    <chr>               <dbl> <chr>    
##  1 1489301156442431489     4 positive 
##  2 1489314374732763137    -4 negative 
##  3 1489342994885103616     4 positive 
##  4 1489357753508368393     2 positive 
##  5 1489385427803025408     4 positive 
##  6 1489400946362765313    -2 negative 
##  7 1489456081449500676     6 positive 
##  8 1489472827925610497     2 positive 
##  9 1489590243871318017    -1 negative 
## 10 1489607503998554112    -4 negative 
## # … with 221 more rows

## # A tibble: 1 × 3
##   negative positive ratio
##      <int>    <int> <dbl>
## 1       54      177 0.305

## # A tibble: 6 × 3
##   method sentiment     n
##   <chr>  <chr>     <int>
## 1 AFINN  positive    506
## 2 AFINN  negative    210
## 3 bing   positive    129
## 4 bing   negative     84
## 5 nrc    positive    244
## 6 nrc    negative     86

## Joining, by = "method"

## # A tibble: 6 × 4
##   method sentiment     n total
##   <chr>  <chr>     <int> <int>
## 1 AFINN  positive    506   716
## 2 AFINN  negative    210   716
## 3 bing   positive    129   213
## 4 bing   negative     84   213
## 5 nrc    positive    244   330
## 6 nrc    negative     86   330

## # A tibble: 6 × 5
##   method sentiment     n total percent
##   <chr>  <chr>     <int> <int>   <dbl>
## 1 AFINN  positive    506   716    70.7
## 2 AFINN  negative    210   716    29.3
## 3 bing   positive    129   213    60.6
## 4 bing   negative     84   213    39.4
## 5 nrc    positive    244   330    73.9
## 6 nrc    negative     86   330    26.1

5. COMMUNICATE

Discussions

The independent analysis provided baseline information that has helped in revealing useful insights on the public’s sentiment toward the Science of Reading as a concept and the words that were frequent in those discussions. In this discussion, I will use the research questions to guide the presentation of the findings.

What are the most frequent words that represent Science of Reading discussions in Twitter?

The analysis revealed that words such as “literacy”, “debates”, “students”, instruction” and “debates” were prevalent in Twitter posts regarding the science of reading. In interpreting these words, it can be seen that the tweets were mostly focused on students who are the intended target group.

What is the overall sentiment toward Science of Reading in social network platforms such as Twitter?

From the sentiment analysis, the findings indicate that the overall sentiment toward science of reading is positive. The specific percentage scores for each lexicons are 73% for Afinn, 59% for Bing and 73% for NRC. The use of these three lexicons was important for enhancing the validity of the sentiment analysis. The visualization shows the findings from these lexicons did not vary much. Afinn and NRC almost yielded the same scores for the tweets analyzed.

Limitations

Based on the scope, this analysis has a number of limitations. The major one includes the number of observations used to perform the study. Due to the limitations of the developer account, the only tweets that could be imported were from the past 9 days. Such analysis requires data to be collected over a period of time to make sure the corpus is sufficient for a more accurate analysis. Secondly, this analysis was very general especially in gleaning sentiment from the public. It would have been useful to find a way of classifying these tweets according to who posted them. Classifying the sentiment according to positions and roles would have provided further meaning to the study. For instance, the opinion of teachers who are expected to undergo training to implement the legislation would be different from the sentiment of parents or policy makers who are already convinced on the positive results from the Science of Reading (Pondiscio,2021). This also provides opportunities for further research and ways that this study could be improved.

Implications

As much as the findings are limited, they can still provide foundational information for policy, research and practice. The public sentiment on science of reading can provide some entry level information for reformers, policy makers, educators and researchers who are looking into science of reading and the implementation of the legislation in the NC education system. The keywords can provide an indication of where the discussions in the social networks are headed and what issues are the primary concern.

References

Krumm, A., Means, B., & Bienkowski, M. (2018). Learning analytics goes to school: A collaborative approach to improving education. Routledge.

Snowling, M. J., & Hulme, C. E. (2005). The science of reading: A handbook. Blackwell Publishing.