Structural Topic Models
Research Questions - Updated
After reviewing the previous observations, I have updated my research questions to investigate changes in conversations, if any, around faith and religion among r/Christianity, r/Islam, and r/Judaism, during the covid-19 pandemic of 2020 to 2021. My questions are:
- Did conversations of faith and religion strengthen, weaken, or remain the same among the faith subreddit groups during 2020-2021 as compared to 2019?
- How did faith and religion permeate discussions of covid-19?
Following the previous post and conversation with Dr. Song, I came to the conclusion that the text did not register enough emotional valence to conduct a sole sentiment analysis. It was also apparent that I would need a method that allowed me to capture defined speakers (in this case subreddit group names and/or years) so that I could compare them and document changes over time. These conclusions led to the adoption of structure topic modeling (STM) as the next vehicle for this research.
I began with finding the appropriate number of topics to research. Below is a plot of Diagnostic Values by Number of Topics.
Here we can see topics between 5 and 10 have higher values for the held-out likelihood and semantic coherence as well as lower values of residuals all indicating a better model. I also conducted another plot of up to 50 topics in which there was a sharp and significant decline after 25 topics.
Correlated Topic Models
Topic 1 Top Words:
Highest Prob: covid, sars, cov, vaccine, coronavirus, patients, study
FREX: covid, sars, cov, vaccine, coronavirus, patients, infection
Lift: 1080x1350, 12yrs, 1500s, 1890s, 1940s, 19th, 20covid
Score: covid, sars, cov, vaccine, patients, infection, clinical
Topic 2 Top Words:
Highest Prob: people, just, can, like, don, allah, us
FREX: drew, shahada, hi, users, allah, sub, lebanon
Lift: appropriate, blanket, bother, christchurch, dua, everytime, hajj
Score: don, allah, islam, thank, religion, feel, think
Topic 3 Top Words:
Highest Prob: jewish, jews, muslim, israel, mosque, al, beautiful
FREX: jewish, israeli, shabbat, holocaust, palestinian, challah, hanukkah
Lift: 22nd, 6th, abdullah, adha, æ, afghan, ahmed
Score: jewish, jews, israeli, shalom, palestinian, israel, shabbat
Topic 4 Top Words:
Highest Prob: god, life, just, time, pray, day, got
FREX: baptized, surgery, watching, gone, chemo, path, suicidal
Lift: 13s, 160th, 17ocqakzil8, 1800s, 18f, 1900s, 22m
Score: god, felt, life, started, thank, went, baptized
Topic 5 Top Words:
Highest Prob: god, jesus, love, church, https, christ, us
FREX: lgbt, mark, yaqeeninstitute, abuaminaelias, abortion, pope, mum
Lift: 3a, 40mg, aaron, acknowledge, addicts, affirming, aisha
Score: god, jesus, yaqeeninstitute, abuaminaelias, love, lgbt, christians
Here we can see similar terms from the previous post’s word clouds and key terms for applying dictionary approaches in context.
Plot Examples
In the plot above, “covid” is by far the most dominant term between the two topics, followed by other covid-related themes. Interestingly, terms in Topics 1 and 4 are somewhat grouped together while “covid” specifically is isolated from the groupings.
This graph shows little correlation between topics 1, 3, and 5 however topics 2 and 4 are showing connection. Not sure if this is due to topics maybe being isolated by subreddit which I have not been able to distinguish in topics yet.
Next steps will be to separate the data into pandemic years (2020-2021) and pre-pandemic (2019) to see if what comparison can be made of faith-based conversations before and after. Second I would also like to parse out the covid topic among the subreddit groups to uncover any differences/similarities.