Using Dictionaries and Sentiment Analysis
This project will explore if, and how, online sentiment was impacted within the Christian, Jewish, and Muslim subreddit communities during the pandemic. The goal is to conduct a comparative, sentiment analysis of three major subreddit faith groups, r/Christianity, r/Judaism, and r/Islam during 2020 - 2021. Initial questions to guide the research include:
• How was content and sentiment impacted within these subreddit groups over the first two years of the pandemic?
• How does emotionality and sentiment compare between the subreddit faith groups during this time?
• How did sentiment within the faith communities trend in comparison to that of r/Covid19?
Considering my research questions, I sought dictionaries focused on providing both detailed positive and negative sentiment as well defined emotions. I utilized the NRC Word-Emotion Association Lexicon to capture sentiment associated as either positive or negative as well as to begin categorizing emotions using their 8 basic types of emotions. These types include anger, anticipation, disgust, fear, joy, sadness, surprise, and trust. The dictionary includes approximately 13,901 entries.
The aim in using such dictionary was to alert us to the scale, frequency, and underlying emotions around sentiment of these subreddit groups during 2020-2021.
Plotting Polarity Distribution using Liwcalike and the NRC Lexicon
Corpus consisting of 2,179 documents and 6 docvars.
text2 :
"Lettered Bible verse I drew for encouragement_"
text5 :
"A WTF tweet from the LA Times._"
text6 :
"Happy 160th birthday to Eliezer Yitzhak Perlman, a.k.a. Elie..."
text7 :
"Ukraine: Bar Mitzvah celebration for boys who discovered the..."
text8 :
"... it never occurred to me that people didn t know this_"
text9 :
"Jesus expels the bankers_"
[ reached max_ndoc ... 2,173 more documents ]
Interestingly, the plot shows the majority of documents 2,179 of 3,481 (or about 63%) had a polarity value of zero. After gathering a sample of text to investigate, text sentiment varied in positivity/negativity. The next step will be to compute the count of emotions still using the NRC Emotion Lexicon.
Plotting Polarity using DFM and the NRC Lexicon
Once again we have a large proportion of documents valued at zero, yet this time we also see an emergence on the polar ends with far more positive sentiment captured than negative. Some sample data from poles are below.
Sample of positive (+1) statements:
Corpus consisting of 6 documents and 6 docvars.
text2 :
"Lettered Bible verse I drew for encouragement_"
text6 :
"Happy 160th birthday to Eliezer Yitzhak Perlman, a.k.a. Elie..."
text7 :
"Ukraine: Bar Mitzvah celebration for boys who discovered the..."
text10 :
"Love torah memes_"
text13 :
"I visited one of the church in Montreal!_"
text16 :
"Excited to submerge myself in Christ with my new bible._"
Sample of neutral (0) statements:
Corpus consisting of 6 documents and 6 docvars.
text3 :
"Unexpected consequences_"
text5 :
"A WTF tweet from the LA Times._"
text8 :
"... it never occurred to me that people didn t know this_"
text9 :
"Jesus expels the bankers_"
text11 :
"a spicy Jewish meme_"
text14 :
"Grandad s Tudor bible, published in the 1500s_"
Sample of negative (-1) statements:
Corpus consisting of 6 documents and 6 docvars.
text19 :
"Unknown German Jew avoids Nazi captivity by escaping through..."
text26 :
"Arabs in Jerusalem broke into a synagogue, threw Torah scrol..."
text40 :
"16-year old Jewish girl in prison for having a letter in Heb..."
text46 :
"Remember the martyr._"
text49 :
"The Dutch had banned all religion in South Africa in the 170..."
text65 :
"Find the verse that fits your day best a_"
DFM using the 10 NRC Features
Document-feature matrix of: 3,481 documents, 10 features (67.81% sparse) and 6 docvars.
features
docs anger anticipation disgust fear joy negative positive sadness
text1 3 13 1 9 11 8 23 4
text2 0 1 0 0 0 0 1 0
text3 0 1 0 1 1 1 1 0
text4 1 2 0 2 4 1 5 0
text5 0 0 0 0 0 0 0 0
text6 0 2 0 0 2 0 3 0
features
docs surprise trust
text1 7 17
text2 0 1
text3 1 0
text4 0 3
text5 0 0
text6 1 1
[ reached max_ndoc ... 3,475 more documents ]
Upon review of this data frame (sampled above), there are two reasons for the cluster of neutral responses earlier. First, text with both equal positive and negative sentiment are canceling each other out to zero. This amounts to approximately 270 of the 1,421 zero scores. Next,there was a few documents where an emotion registered but no positive or negative sentiment present. Finally, a large amount of text has zeros across all 8 emotions and negative/positive sentiment. Ultimately, this points toward emotional valence not being very prevalent in the data.
For the final steps of dictionary analysis, I will look at how covid-related terms were discussed.
Covid in Context
Covid-related terms selected for review include: “covid”, “covid-19”, “coronavirus”, “Sars-CoV-2”, “virus”, “vaccine”, “Pfizer”, “Moderna”, “Johnson and Johnson”, “vax”.
[1] "We have 619 threads that mention positive or negative words in the context of covid-related terms."
Min. 1st Qu. Median Mean 3rd Qu. Max.
-1.0000 -0.5000 0.3333 0.1904 1.0000 1.0000
After removing the terms with both zero positive and negative sentiment, we see that there is a substantial amount of positive context surrounding covid. This may be because a covid information subreddit, r/Covid19 was included in the data set. However, many still register as negative, neutral, and everything in between. Once again, the neutral scores could be from scores cancelling each other out rather than themselves being zero (as removed from above). At this point, we don’t have much information about which group or the context of the text but it is evident that emotional valence is not the defining factor in the text. This is further emphasized by the 3 word clouds below for each year from 2019 - 2021.
Top words over these years include those referencing the pandemic but there remains a consistent thread of discussion around faith and religion. This understanding goes back to my first post which referenced previous studies on religious behaviors suggesting a correlation between fear and beliefs, exposing casual links that fear motivates religious faith and the latter mitigates fear. So in one last effort, I will look into faith-based terms in context.
Faith in Context
Faith-based terms selected for review include: “God”, “pray”, “prayer”, “Jesus”, “Christ”, “Allah”, “Bible”, “Quran”, “church”, “mosque”, “temple”, “faith”, “baptized”, “Christians”, “Jews”, “Muslims”, “Christianity”, “Jewish”, “Islam”.
[1] "We have 857 threads that mention positive or negative words in the context of faith-related terms."
Min. 1st Qu. Median Mean 3rd Qu. Max.
-1.00000 0.09091 0.54545 0.45539 1.00000 1.00000
Here in the summary and plot, we can see significantly more positive sentiment surrounding faith-based terms as opposed to covid-related terms.
Next steps will be to refine my research question and uncover what topics are actually being discussed within these groups and when.