For this assignment, you will use the same data as the Factor Analysis assignment to discover the important topics in U.S. Governors’ tweets about the pandemic. The dataframe for the assignment includes four columns - State, Name, and Party of Governor plus the Text of their tweets. The Text column is the one you should process and analyze.
Load all the libraries or functions that you will use to for the rest of the assignment. It is helpful to define your libraries and functions at the top of a report, so that others can know what they need for the report to compile correctly.
Load the Python libraries or functions that you will use for that section.
##r chunk
library(readr)
library(tidyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tm)
## Loading required package: NLP
#library(tidyverse)
library(tidytext)
library(slam)
library(ggplot2)
##
## Attaching package: 'ggplot2'
## The following object is masked from 'package:NLP':
##
## annotate
#install.packages('topicmodels') for LDA function
#install.packages('tidyverse')
library(topicmodels)
## Warning: package 'topicmodels' was built under R version 4.0.4
library(reticulate)
use_condaenv('r-reticulate')
#conda_install("pyLDAvis")
py_install("pandas")
py_install("pyLDAvis")
##python chunk
import string
import pyLDAvis
## C:\Users\JIANWEI LI\Documents\R\win-library\4.0\reticulate\python\rpytools\loader.py:19: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
## module = _import(
import gensim.corpora as corpora
## C:\Users\JIANWE~1\CONDA~1\envs\R-RETI~1\lib\site-packages\scipy\sparse\sparsetools.py:21: DeprecationWarning: `scipy.sparse.sparsetools` is deprecated!
## scipy.sparse.sparsetools is a private module for scipy.sparse, and should not be used.
## _deprecated()
import nltk
## C:\Users\JIANWE~1\CONDA~1\envs\R-RETI~1\lib\site-packages\nltk\decorators.py:67: DeprecationWarning: `formatargspec` is deprecated since Python 3.5. Use `signature` and the `Signature` object directly
## signature = inspect.formatargspec(
## C:\Users\JIANWEI LI\Documents\R\win-library\4.0\reticulate\python\rpytools\loader.py:19: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.10 it will stop working
## module = _import(
## C:\Users\JIANWEI LI\Documents\R\win-library\4.0\reticulate\python\rpytools\loader.py:19: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.10 it will stop working
## module = _import(
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
ps = PorterStemmer()
Create the corpus for the model in R.
##r chunk
gov_tweets<- read.csv("C:/Users/JIANWEI LI/Desktop/anly540_temp/rstudio_python/anly540/topic_modeling_hw8/Gov Tweets Clean.csv")
import_corpus<- Corpus(VectorSource(gov_tweets$Text))
import_matrix<- DocumentTermMatrix(
import_corpus,
control = list(
stemming = TRUE,
stopwords = TRUE,
minWordLength = 3,
removeNumbers = TRUE,
removePunctuation = TRUE
)
)
import_weight<- tapply(import_matrix$v / row_sums(import_matrix)[import_matrix$i],
import_matrix$j,
mean) * log2(nDocs(import_matrix) / col_sums(import_matrix > 0))
import_matrix<- import_matrix[row_sums(import_matrix)>0,]
k=10
seed = 234
lda_fit<-LDA(import_matrix,k = k,control=list(seed=seed))
lda_fixed<- LDA(import_matrix,
k =k,
control = list(estimate.alpha = FALSE, seed = seed))
lda_gibbs<- LDA(
import_matrix,
k = k,
method = 'Gibbs',
control = list(
seed = seed
)
)
ctm_fit<- CTM(import_matrix,
k = k,
control = list(
seed = seed,
var = list(tol=10^-4),
em = list(tol = 10^-3)
)
)
## warning: cg didn't converge (lambda)
lda_fit@alpha
## [1] 0.01563735
lda_fixed@alpha
## [1] 5
lda_gibbs@alpha
## [1] 5
sapply(list(lda_fit,lda_fixed,lda_gibbs,ctm_fit),
function(x)
mean(apply(posterior(x)$topics, 1, function(z)-sum(z*log(z)))))
## [1] 0.1571801 0.8283624 1.0663677 0.5634535
terms(lda_fit,20)
## Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Topic 6
## [1,] "covid" "covid" "covid" "covid" "covid" "covid"
## [2,] "updat" "live" "new" "will" "arizona" "health"
## [3,] "today" "today" "test" "lagov" "health" "will"
## [4,] "connecticut" "state" "updat" "state" "aztogeth" "help"
## [5,] "live" "updat" "posit" "laleg" "live" "spread"
## [6,] "test" "test" "weve" "louisiana" "will" "stay"
## [7,] "will" "watch" "will" "spread" "thank" "home"
## [8,] "maryland" "will" "case" "today" "work" "work"
## [9,] "watch" "press" "total" "health" "state" "can"
## [10,] "state" "can" "now" "test" "public" "need"
## [11,] "respons" "fight" "jersey" "respons" "arizonan" "care"
## [12,] "brief" "hold" "jerseyan" "help" "watch" "state"
## [13,] "covidma" "help" "may" "work" "provid" "protect"
## [14,] "news" "brief" "lost" "provid" "help" "safe"
## [15,] "discuss" "respons" "bergen" "updat" "protect" "public"
## [16,] "latest" "facebook" "live" "new" "updat" "keep"
## [17,] "posit" "provid" "health" "case" "can" "busi"
## [18,] "peopl" "spread" "watch" "continu" "azdh" "continu"
## [19,] "health" "inform" "bring" "need" "continu" "get"
## [20,] "effort" "work" "state" "care" "spread" "live"
## Topic 7 Topic 8 Topic 9 Topic 10
## [1,] "covid" "covid" "covid" "covid"
## [2,] "will" "new" "covidohioreadi" "provid"
## [3,] "missouri" "updat" "ohio" "today"
## [4,] "state" "will" "inthistogetherohio" "state"
## [5,] "today" "teamkentucki" "will" "respons"
## [6,] "test" "togetherki" "stayhomeohio" "updat"
## [7,] "updat" "today" "case" "will"
## [8,] "posit" "live" "can" "live"
## [9,] "health" "mexico" "today" "health"
## [10,] "case" "state" "confirm" "watch"
## [11,] "respons" "test" "updat" "can"
## [12,] "continu" "governor" "data" "spread"
## [13,] "director" "inform" "httpstcolwxirscb" "work"
## [14,] "work" "case" "hospit" "confer"
## [15,] "help" "health" "test" "new"
## [16,] "hospit" "home" "work" "help"
## [17,] "governor" "beshear" "take" "hold"
## [18,] "provid" "stay" "help" "test"
## [19,] "care" "announc" "care" "care"
## [20,] "can" "kentuckian" "dramyacton" "virginia"
terms(lda_gibbs,20)
## Topic 1 Topic 2 Topic 3 Topic 4
## [1,] "lagov" "arizona" "new" "live"
## [2,] "laleg" "aztogeth" "covid" "covid"
## [3,] "louisiana" "thank" "posit" "updat"
## [4,] "gov" "arizonan" "weve" "brief"
## [5,] "tennesse" "azdh" "case" "hold"
## [6,] "oklahoma" "health" "total" "watch"
## [7,] "tennessean" "provid" "test" "press"
## [8,] "brief" "alpolit" "lost" "connecticut"
## [9,] "test" "discuss" "jersey" "news"
## [10,] "edward" "media" "jerseyan" "respons"
## [11,] "pennsylvanian" "donat" "bring" "provid"
## [12,] "site" "drcarachrist" "bergen" "today"
## [13,] "free" "together" "essex" "confer"
## [14,] "phase" "hutchinson" "burlington" "facebook"
## [15,] "media" "public" "camden" "virginia"
## [16,] "bulletin" "governor" "now" "discuss"
## [17,] "pennsylvania" "news" "hudson" "latest"
## [18,] "group" "effort" "may" "test"
## [19,] "httpstcodtnqgjl" "grate" "gloucest" "ill"
## [20,] "read" "kid" "cumberland" "peopl"
## Topic 5 Topic 6 Topic 7 Topic 8
## [1,] "missouri" "covid" "covidohioreadi" "covid"
## [2,] "covidma" "updat" "ohio" "will"
## [3,] "director" "watch" "inthistogetherohio" "today"
## [4,] "governor" "teamkentucki" "covid" "state"
## [5,] "nevada" "togetherki" "stayhomeohio" "health"
## [6,] "mike" "governor" "case" "can"
## [7,] "brief" "live" "data" "spread"
## [8,] "missourian" "beshear" "httpstcolwxirscb" "help"
## [9,] "parson" "share" "confirm" "test"
## [10,] "texa" "kentuckian" "will" "work"
## [11,] "healthylivingmo" "will" "dramyacton" "updat"
## [12,] "resid" "inform" "can" "live"
## [13,] "texan" "httpstconhomytsv" "age" "continu"
## [14,] "montanan" "healthyathom" "hospit" "care"
## [15,] "posit" "nebraska" "onlin" "respons"
## [16,] "regard" "south" "peopl" "provid"
## [17,] "facebook" "facebook" "number" "stay"
## [18,] "idaho" "rickett" "ohioan" "home"
## [19,] "respond" "httpstcoefuktzn" "check" "need"
## [20,] "montana" "patriot" "counti" "order"
## Topic 9 Topic 10
## [1,] "maryland" "new"
## [2,] "ill" "mexico"
## [3,] "pandem" "case"
## [4,] "hampshir" "total"
## [5,] "minnesota" "posit"
## [6,] "minnesotan" "test"
## [7,] "delawar" "health"
## [8,] "respons" "state"
## [9,] "vermont" "announc"
## [10,] "oregon" "offici"
## [11,] "wyom" "addit"
## [12,] "ongo" "home"
## [13,] "main" "statewid"
## [14,] "affect" "nmdoh"
## [15,] "unpreced" "stay"
## [16,] "granit" "today"
## [17,] "aggress" "alltogethernm"
## [18,] "michigan" "alaska"
## [19,] "resid" "spread"
## [20,] "stater" "akgov"
##r chunk
lda_fit_topics<- tidy(lda_fit,matrix = 'beta')
top_terms<- lda_fit_topics %>%
group_by(topic)%>%top_n(10,beta)%>%ungroup() %>% arrange(topic,-beta)
Clean up the text and create the Document Term Matrix.
##r chunk
cleanup<- theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line.x = element_line(color = 'black'),
axis.line.y = element_line(color = 'black'),
legend.key = element_rect(fill = 'white'),
text = element_text(size = 10)
)
Weight the matrix to remove all the high and low frequency words.
##r chunk
Run and LDA Fit model (only!).
##r chunk
Create a plot of the top ten terms for each topic.
##r chunk
top_terms %>%
mutate(term = reorder(term, beta)) %>%
ggplot(aes(term, beta, fill = factor(topic))) +
geom_bar(stat = "identity", show.legend = FALSE) +
facet_wrap( ~ topic, scales = "free") +
cleanup +
coord_flip()
Transfer the df[‘Text’] to Python and convert it to a list for processing.
##python chunk
tweets = list(r.gov_tweets["Text"])
len(tweets)
## 49
Process the text using Python.
##python chunk
import string
import nltk
from nltk.corpus import stopwords
processed_tweets = []
for tweet in tweets:
tweet = tweet.lower() # lowercase
tweet = tweet.translate(str.maketrans('', '', string.punctuation))
tweet = nltk.word_tokenize(tweet)
tweet = [word for word in tweet if word not in stopwords.words('english')]
tweet = [ps.stem(word = word) for word in tweet]
processed_tweets.append(tweet)
len(processed_tweets)
## 49
processed_tweets[0]
## ['ive', 'extend', 'covid', 'public', 'health', 'disast', 'emerg', 'anoth', '45day', 'everi', 'industri', 'sector', 'ar', 'affect', 'crisi', 'import', 'continu', 'support', 'protect', 'industri', 'peopl', 'threat', 'longer', 'immin', 'httpstco2an22n0gwr', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '050520', 'httpstcocmlypdifq2', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstco1cws1vdebg', 'faith', 'commun', 'support', 'effort', 'fight', 'covid', 'miss', 'inperson', 'fellowship', 'mani', 'church', 'continu', 'meet', 'remot', 'present', 'guidanc', 'give', 'hous', 'worship', 'option', 'minist', 'congreg', 'httpstco9rvgmtf6zr', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '050420', 'httpstcoh3uasas2y', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcouphoummna4', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '050220', 'httpstcobizdnvigiz', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcobizdnvigiz', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '050120', 'httpstcojkdxbuh5qn', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '043020', 'httpstcobg9lhjfbay', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcotlb60qwcgv', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '042920', 'httpstcontbjk8vtw4', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcontbjk8vtw4', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '042820', 'httpstcojo5hvzntai', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcojo5hvzntai', 'want', 'thank', 'walmart', 'quest', 'diagnost', 'open', 'drivethru', 'covid', 'test', 'site', 'central', 'ar', 'symptomat', 'arkansan', 'health', 'care', 'worker', 'first', 'respond', 'increas', 'test', 'capac', 'enhanc', 'gather', 'data', 'look', 'lift', 'restrict', 'ar', 'last', 'week', 'encourag', 'symptomat', 'arkansan', 'get', 'test', 'weekend', 'thank', 'respons', 'partnership', 'hospit', 'test', 'site', 'exceed', 'goal', 'conduct', 'gt1500', 'test', 'day', 'give', 'us', 'accur', 'sampl', 'covid', 'number', 'ar', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '042720', 'httpstcomxdnv1jf9n', 'im', 'hold', 'news', 'confer', 'noon', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcomxdnv1jf9n', 'arkansa', 'surg', 'campaign', 'continu', 'today', 'think', 'symptom', 'covid19', 'dont', 'wait', 'get', 'test', 'httpstcosgcmw5rcac', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '042520', 'httpstcoocrlbfa0u', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcoocrlbfa0u', 'wwii', 'veteran', 'loui', 'strickland', '100', 'year', 'old', 'today', 'fought', 'normandi', 'daughter', 'fellow', 'vet', 'arkansa', 'state', 'veteran', 'home', 'threw', 'parti', 'today', 'daughter', 'famili', 'couldnt', 'attend', 'covid', 'restrict', 'happi', 'birthday', 'loui', 'thank', 'serv', 'httpstco1iefo9qnna', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '042420', 'httpstcod4s9w018fa', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcod4s9w018fa', 'encourag', 'symptom', 'fever', 'cough', 'short', 'breath', 'get', 'test', 'covid19', 'within', 'next', 'two', 'day', 'think', 'symptom', 'dont', 'wait', 'get', 'test', 'httpstcoekhxdj9e6f', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '042320', 'httpstcooiqojakjpn', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstco497r9iiwqa', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '042220', 'httpstcomiq8tuvsmd', 'outbreak', 'covid19', 'import', 'ever', 'arkansan', 'particip', '2020censu', 'respond', 'censu', 'socialdistanc', 'friendli', 'submit', 'respons', 'phone', 'mail', 'onlin', 'httpstcoj9doqyuu0q', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcomiq8tuvsmd', 'today', 'announc', 'creation', 'covid19', 'test', 'work', 'group', 'ensur', 'arkansa', 'adequ', 'test', 'process', 'place', 'pursu', 'publichealth', 'econom', 'recoveri', 'strategi', 'join', 'work', 'group', 'first', 'meet', 'afternoon', 'httpstcoaetnuswqbp', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '042120', 'httpstcorpg0ncb3si', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcorpg0ncb3si', 'proof', 'arkansan', 'care', '2', 'peopl', 'jonesboro', 'organ', 'freepizza', 'night', 'encourag', 'town', 'covid', 'plu', 'tornado', 'word', 'spread', 'donat', 'arriv', 'becam', 'oper', 'full', 'belli', '2', 'week', '12', 'restaur', '2500', 'free', 'supper', 'that', 'spirit', 'arkansa', 'httpstcoe7xiqq9vyy', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcoziardpdz8d', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '041920', 'httpstcorztbs9ljlo', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcorztbs9ljlo', '23', 'task', 'forc', 'includ', '27', 'leader', 'privat', 'sector', 'public', 'agenc', 'examin', 'impact', 'covid19', 'busi', 'industri', 'state', 'select', 'steuart', 'walton', 'chairman', '13', 'today', 'creat', 'governor', 'econom', 'recoveri', 'task', 'forc', 'develop', 'industryspecif', 'strategi', 'make', 'recommend', 'arkansass', 'econom', 'recoveri', 'effect', 'covid19', 'httpstco4brjy6dhb7', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '041820', 'httpstcod5rdrkf4gc', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcod5rdrkf4gc', 'offer', 'condol', 'famili', 'chief', 'petti', 'offic', 'charl', 'robert', 'thacker', 'jr', 'fort', 'smith', 'nativ', 'lost', 'life', 'covid19', '42', 'year', 'old', 'grate', 'servic', 'countri', 'httpstcooi8ouren6q', '1st', 'report', 'governor', 'medic', 'advisori', 'committe', 'postpeak', 'covid19', 'respons', 'priorit', 'restor', 'arkansass', 'economi', 'time', 'fashion', 'protect', 'vulner', 'maintain', 'adequ', 'health', 'care', 'public', 'health', 'capac', 'prevent', 'resurg', 'covid19', '12', 'care', 'review', 'realdonaldtrump', 'model', 'reopen', 'economi', 'first', 'report', 'governor', 'medic', 'advisori', 'committe', 'postpeak', 'covid19', 'respons', 'base', 'arkansass', 'current', 'public', 'health', 'data', 'hope', 'begin', 'lift', 'restrict', 'may', '4', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '041720', 'httpstcoqgnx0gucqj', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcoqgnx0gucqj', 'arkansasdw', 'launch', 'new', 'websit', 'provid', 'regularli', 'updat', 'inform', 'regard', 'covid19rel', 'unemploy', 'benefit', 'httpstcoa3obxp2vot', 'onestop', 'shop', 'answer', 'frequentlyask', 'question', 'portal', 'file', 'claim', 'recent', 'news', 'articl', 'meet', 'newlyform', 'governor', 'medic', 'advisori', 'committe', 'postpeak', 'covid19', 'respons', 'first', 'time', 'tomorrow', 'discuss', 'public', 'health', 'strategi', 'futur', 'finish', 'confer', 'call', 'presid', 'covid19', 'task', 'forc', 'brief', 'governor', 'open', 'america', 'talk', 'mean', 'arkansa', 'tomorrow', 'daili', 'updat', 'httpstcotocnxbhnqd', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '041620', 'httpstcocd49nd7us', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcocd49nd7us', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '041520', 'httpstcobuqqnb7rk', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcobuqqnb7rk', 'look', 'forward', 'join', 'shannonbream', 'foxnewsnight', 'discuss', 'arkansass', 'covid19', 'respons', 'httpstcoug13ii4zv7', 'issu', '2', 'execut', 'order', 'today', '1st', 'allow', 'first', 'respond', 'frontlin', 'health', 'care', 'worker', 'qualifi', 'worker', 'comp', 'work', 'respons', 'caus', 'contract', 'covid19', '2nd', 'provid', 'liabil', 'immun', 'medic', 'emerg', 'respond', 'crisi', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '041420', 'httpstcog7ot43zqbv', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcog7ot43zqbv', 'ive', 'form', 'medic', 'advisori', 'committe', 'help', 'guid', 'public', 'health', 'strategi', 'arkansa', 'reach', 'peak', 'number', 'covid19', 'case', 'committe', 'examin', 'protocol', 'make', 'recommend', 'necessari', 'avoid', 'resurg', 'covid19', 'peak', 'httpstcoaggw9p05q7', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '041320', 'httpstcor6rwpmrudr', 'someth', 'bear', 'mind', 'face', 'covid19', 'arkansass', 'popul', 'isnt', 'fulli', 'repres', '2020censu', 'arkansa', 'hospit', 'servic', 'commun', 'health', 'center', 'could', 'impact', 'neg', 'next', 'decad', 'make', 'sure', 'count', 'httpstcoj9doqyuu0q', 'thank', 'dr', 'fauci', 'recogn', 'effort', 'fight', 'covid19', 'arkansa', 'let', 'stay', 'commit', 'win', 'fight', 'httpstcorrunjg0ovz', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '041020', 'httpstcoxphibyj9d7', 'live', 'kark4newsfox16new', 'even', 'answer', 'question', 'covid19', 'public', 'health', 'emerg', 'arkansa', 'answer', 'mani', 'question', 'tune', 'virtual', 'town', 'hall', '7pm', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '040920', 'httpstcon8jincdhqd', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '040820', 'httpstcoepxqn3fsfg', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '040720', 'httpstcovsknrrbty3', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcovsfmoen6n3', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '040620', 'httpstcom3ymchnmx3', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcom3ymchnmx3', '23', 'appreci', 'entergyark', 'support', 'new', 'covid19', 'relief', 'fund', 'announc', 'today', 'news', 'brief', 'temporarili', 'suspend', 'disconnect', 'servic', 'custom', 'cant', 'pay', '13', 'crisi', 'place', 'hardship', 'mani', 'arkansan', 'incred', 'respons', 'need', 'other', 'im', 'pleas', 'state', 'partner', 'w', 'smartgivingar', 'support', 'covid19', 'relief', 'fund', 'fund', 'arkansan', 'donat', 'help', 'neighbor', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '040520', 'httpstcoh42a1l4w92', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcohaewl5zjmo', '22', 'grate', 'arkansan', 'taken', 'extraordinari', 'effort', 'practic', 'social', 'distanc', 'flatten', 'curv', 'day', 'evalu', 'new', 'step', 'take', 'measur', 'provid', 'protect', 'arkansan', 'prepar', 'peak', 'number', 'covid19', 'case', '12', 'issu', 'execut', 'order', 'mandat', 'new', 'safeti', 'measur', 'commerci', 'lodg', 'shortterm', 'rental', 'institut', 'oper', 'order', 'protect', 'public', 'health', 'covid19', 'crisi', 'httpstcorplzmo2o7p', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '040420', 'httpstcohg5h89y2tt', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcohg5h89y2tt', 'arkansass', 'request', 'feder', 'disast', 'assist', 'result', 'covid19', 'approv', 'thank', 'feder', 'deleg', 'fema', 'potu', 'approv', 'assist', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '040320', 'httpstcoja0ry2vrpi', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcohplzst7npc', 'linglin42108652', 'great', 'suggest', 'linglin42108652', 'test', 'posit', 'covid19', 'selfquarantin', 'leav', 'home', 'design', 'friend', 'famili', 'member', 'care', 'includ', 'get', 'groceri', 'would', 'help', 'mitig', 'spread', 'viru', 'askgovhutchinson', 'chris70909106', 'start', 'tomorrow', 'arkansa', 'state', 'park', 'implement', 'new', 'safeti', 'measur', 'like', 'dayus', 'oper', 'reduc', 'risk', 'overcrowd', 'park', 'discourag', 'visitor', 'outofst', 'threat', 'covid19', 'pass', 'askgovhutchinson', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '040220', 'httpstcoyumrweehi6', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcoduh0dg5udr', 'covid19', 'relat', 'question', 'ill', 'particip', 'nationwid', 'askthegovernor', 'twitter', 'q', 'today', '500', 'ill', 'answer', 'mani', 'question', 'repli', 'tweet', 'question', 'askgovhutchinson', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '040120', 'httpstcopugz6bjjdz', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcopugz6bau57', '13', 'current', 'number', 'covid19', 'case', 'arkansa', 'lower', 'project', 'number', 'case', 'provid', 'adhpio', 'last', 'week', 'httpstcocthvdfxd13', 'suspect', 'covid19', 'symptom', 'question', 'regard', 'children', 'covid19', 'call', 'adhpio', '18008037847', 'archildren', '18007433616', 'httpstco4bksjpaizf', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '033120', 'httpstco8ka5d0nv2j', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstco8ka5d0wje9', 'need', 'covid19', 'data', 'ar', 'test', 'give', 'us', 'data', 'increas', 'test', 'immedi', 'need', 'right', 'well', 'protect', 'health', 'care', 'worker', 'whole', 'team', 'includ', 'aremerg', 'adhpio', 'work', 'procur', 'addl', 'test', 'nationaldoctorsday', 'commend', 'extraordinari', 'physician', 'protect', 'heal', 'daili', 'especi', 'grate', 'sacrific', 'covid19', 'outbreak', 'year', 'support', 'doctor', 'social', 'distanc', 'save', 'live', 'crisi', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '033020', 'httpstcoenrkzchmdv', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcoenrkzchmdv', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '032920', 'httpstcokh8irkzaen', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcokh8irkhypn', 'great', 'guidelin', 'arkansa', 'depart', 'park', 'heritag', 'tourism', 'must', 'part', 'flatten', 'curv', 'covid19ark', 'httpstcorf66jnabev', 'uplift', 'arkansa', 'provid', 'free', 'covid19', 'resourc', 'busi', 'organ', 'individu', 'thank', 'littlerockcvb', 'manganholcomb', 'weareteamsi', 'creat', 'commun', 'websit', 'httpstcon3rxnyajlt', 'httpstco8foo9cgtcf', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '032820', 'httpstcoyvhcciet8a', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcoyvhcciet8a', 'live', 'governor', 'hutchinson', 'sign', 'legisl', 'establish', 'covid19', 'raini', 'day', 'fund', 'httpstcoyghs1rkazn', 'sign', 'legisl', 'establish', 'covid19', 'raini', 'day', 'fund', 'immedi', 'upon', 'passag', 'around', 'midnight', 'tonight', 'watch', 'bill', 'sign', 'httpstcoyghs1rkazn', 'week', 'radio', 'address', 'share', 'initi', 'launch', 'assist', 'rural', 'hospit', 'provid', 'support', 'front', 'line', 'treat', 'covid19', 'patient', 'learn', 'httpstcoqiwyd8yuxd', 'httpstcon56acyckeg', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '032720', 'httpstcotnebva6c', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcotnebva6c', 'import', 'arkansan', 'stay', 'home', 'possibl', 'work', 'mitig', 'spread', 'covid19', 'necessari', 'home', 'mind', 'public', 'space', 'keep', 'safe', 'distanc', 'least', 'six', 'feet', 'other', 'need', 'crowd', 'order', 'us', 'success', 'arkansa', 'slow', 'upward', 'trend', 'line', 'covid19', 'public', 'need', 'abid', 'guidanc', 'arkansa', 'depart', 'health', 'adhpio', 'guidelin', 'direct', 'found', 'httpstcotfzzurkedi', 'grate', 'feder', 'deleg', 'commun', 'support', 'theyv', 'given', 'state', 'especi', 'passag', 'covid19', 'relief', 'bill', 'senat', 'bill', 'provid', 'confid', 'arkansan', 'whose', 'employ', 'small', 'busi', 'affect', 'outbreak', '44', 'plan', 'also', 'propos', 'addit', 'payment', '250', 'week', 'nonphysician', 'direct', 'care', 'worker', '500', 'week', 'nonphysician', 'direct', 'care', 'worker', 'work', 'facil', 'covid19', 'present', 'read', 'initi', 'httpstcorolc0f71zr', '14', 'today', 'announc', '116m', 'initi', 'directli', 'address', 'covid19', 'crisi', 'burden', 'rural', 'hospit', 'health', 'care', 'provid', 'propos', 'provid', 'improv', 'access', 'care', 'citizen', 'keep', 'provid', 'open', 'workforc', 'employ', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '032620', 'httpstcofhvh593kkm', 'im', 'hold', 'news', 'confer', '230pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcofhvh593kkm', 'grate', 'member', 'arkansasguard', 'assist', 'uamshealth', 'screen', 'patient', 'administ', 'covid19', 'test', 'drivethrough', 'test', 'site', 'respond', 'faith', 'urgent', 'need', 'state', 'httpstcoyrkk6nalx1', '14', 'grate', 'doctor', 'nurs', 'lab', 'technician', 'health', 'care', 'profession', 'front', 'line', 'covid19', 'look', 'futur', 'fatigu', 'among', 'individu', 'increas', 'demand', 'health', 'care', 'worker', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '032520', 'httpstco49lubn4lki', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstco49lubn4lki', 'arkansasedc', 'work', 'hard', 'behalf', 'arkansa', 'busi', 'covid19', 'outbreak', 'covid19', 'busi', 'employ', 'resourc', 'visit', 'httpstcoqly8bu2tcm', 'issu', 'execut', 'order', '2005', 'leverag', 'telehealth', 'ar', 'covid19', 'outbreak', 'doctor', 'establish', 'new', 'patient', 'phone', 'minim', 'number', 'sick', 'patient', 'wait', 'room', 'mitig', 'spread', 'viru', 'httpstcolrccgwf3d', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '032420', 'httpstcojfq4za5g9b', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcojfq4za5g9b', 'arkansa', 'blood', 'institut', 'abi', 'blood', 'collect', 'organ', 'central', 'arkansa', 'donat', 'center', 'littl', 'rock', 'north', 'littl', 'rock', 'hot', 'spring', 'abl', 'pleas', 'consid', 'donat', 'blood', 'face', 'covid19', 'public', 'health', 'emerg', 'abiblood', '23', 'due', 'covid19', 'outbreak', 'chang', 'individu', 'tax', 'file', 'deadlin', 'offici', 'state', 'revenu', 'forecast', 'lower', '3531m', 'necessit', 'special', 'session', 'gener', 'assembl', 'address', 'shortfal', 'war', '173m', 'unalloc', 'surplu', 'last', 'year', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '032320', 'httpstcoabl6dguzlh', 'im', 'hold', 'news', 'confer', '230pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcoabl6dguzlh', 'busi', 'owner', 'encourag', 'employe', 'protect', 'person', 'health', 'safeti', 'other', 'clean', 'surfac', 'frequent', 'touch', 'avoid', 'meet', 'requir', 'close', 'proxim', 'post', 'symptom', 'covid19', 'fever', 'cough', 'short', 'breath', '14', 'today', 'total', '165', 'posit', 'covid19', 'case', 'ar', 'largest', 'increas', 'case', '24hour', 'period', 'weve', 'seen', 'far', 'number', 'reflect', 'increas', 'adhpio', 'test', 'capac', 'covid19', 'spread', 'import', 'arkansan', 'follow', 'cdc', 'guidelin', 'keep', 'famili', 'safe', 'encourag', 'peopl', 'state', 'practic', 'cdc', 'current', 'recommend', 'avoid', 'social', 'gather', '10', 'help', 'mitig', 'viru', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '032220', 'httpstcowryqkexpeq', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcowryqkexpeq', '13', 'today', 'announc', 'total', '118', 'posit', 'covid19', 'case', 'arkansa', 'base', 'seen', 'state', 'arkansa', 'like', 'reach', 'peak', 'covid19', 'case', '6', '8', 'week', 'project', 'peak', '1000', 'patient', 'hospit', 'social', 'distanc', 'mean', 'stay', 'home', 'although', 'effect', 'way', 'mitig', 'spread', 'covid19', 'walk', 'hike', 'fish', 'outdoor', 'activ', 'consist', 'social', 'distanc', 'practic', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '032120', 'httpstcoag7kynpyss', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcoag7kynpyss', 'protect', 'arkansass', 'workforc', 'busi', 'covid19', 'outbreak', 'top', 'prioriti', 'arkansa', 'busi', 'follow', 'guidelin', 'ensur', 'safeti', 'employe', 'continu', 'oper', 'httpstcobmiudr9f8v', 'team', 'top', 'health', 'care', 'profession', 'adhpio', 'uamshealth', 'agenc', 'hospit', 'across', 'state', 'work', 'hard', 'keep', 'inform', 'covid19', 'outbreak', 'develop', 'guidelin', 'arkansan', 'follow', 'stay', 'safe', 'healthi', 'httpstco245ghogm2h', 'accord', 'cdc', 'best', 'way', 'prevent', 'spread', 'covid19', 'avoid', 'expos', 'viru', 'covid19', 'thought', 'spread', 'peopl', 'within', '6', 'feet', 'practic', 'social', 'distanc', 'help', 'mitig', 'outbreak', 'state', 'rise', 'number', 'posit', 'covid19', 'case', 'arkansa', 'reflect', 'addit', 'test', 'capac', 'want', 'number', 'rise', 'mean', 'locat', 'isol', 'case', 'across', 'state', 'otherwis', 'would', 'go', 'undetect', 'slow', 'spread', 'viru', 'week', 'radio', 'address', 'discuss', 'execut', 'order', 'issu', 'last', 'week', 'expand', 'telemedicin', 'covid19', 'outbreak', 'learn', 'httpstcosybmcinpsx', 'httpstcosv6tqbg2tq', '15', 'today', 'announc', 'current', '96', 'posit', 'case', 'covid19', 'ar', 'affect', '3', 'longterm', 'care', 'facil', 'appl', 'creek', 'nurs', 'rehab', 'centerton', 'villag', 'gener', 'baptist', 'west', 'pine', 'bluff', 'briarwood', 'nurs', 'home', 'rehab', 'littl', 'rock', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '032020', 'httpstcohxq5ei1rfp', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcohxq5ei1rfp', 'last', 'week', 'activ', 'arkansasguard', 'assist', 'covid19', 'respons', 'grate', 'work', 'nation', 'guard', 'medic', '39th', 'infantri', 'brigad', 'combat', 'team', 'support', 'adhpio', 'emerg', 'oper', 'center', 'respond', 'question', 'arkansan', 'httpstcoa10wheke6f', 'thank', 'team', 'member', 'adhpio', 'emerg', 'oper', 'center', 'hard', 'work', 'dedic', 'covid19', 'outbreak', 'commend', 'servic', 'peopl', 'arkansa', 'face', 'public', 'health', 'emerg', 'httpstcotcvxd3ddmd', 'join', 'call', 'potu', 'vp', 'governor', 'afternoon', 'covid19', 'pandem', 'take', 'everi', 'step', 'possibl', 'make', 'nation', 'emerg', 'shortliv', 'possibl', 'httpstcow4r7vpt1bu', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', 'today', '031920', 'httpstcowkarmnbjiy', 'im', 'hold', 'news', 'confer', '230pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcodq4tv1lknb', 'today', 'announc', 'relief', 'arkansa', 'busi', 'childcar', 'provid', 'eas', 'covid19', 'impact', 'read', 'httpstcoithklggfa0', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '031820', 'httpstco2bwhflgv66', '33', 'discourag', 'unnecessari', 'outofst', 'travel', 'time', 'mitig', 'slow', 'spread', 'covid19', 'student', 'educ', 'return', 'school', 'spring', 'break', 'circumst', 'allow', '13', 'today', 'announc', 'six', 'new', 'posit', 'covid19', 'case', 'arkansa', 'strategi', 'ahead', 'curv', 'mitig', 'includ', 'close', 'school', 'two', 'week', 'prevent', 'larg', 'gather', 'dont', 'signific', 'commun', 'spread', 'state', 'im', 'hold', 'news', 'confer', 'fayettevil', '1115', 'provid', 'updat', 'ongo', 'covid19', 'respons', 'watch', 'httpstcolis0p97i2v', '13', 'today', 'arkansa', '16', 'confirm', 'case', 'covid19', 'individu', 'isol', 'situat', 'monitor', 'close', 'announc', 'yesterday', 'activ', 'arkansa', 'nation', 'guard', 'assist', 'covid19', 'respons', 'live', 'daili', 'media', 'brief', 'covid19', '031420', 'httpstco6m2t2zqq60', 'week', 'radio', 'address', 'commend', 'respons', 'respons', 'leader', 'arkansa', 'work', 'togeth', 'prevent', 'spread', 'covid19', 'state', 'httpstcob7hf02q5lm', 'httpstcozcwpcdsolz', 'live', 'daili', 'media', 'brief', 'covid19', 'httpstcoga1wy1cwg', '44', 'ar', 'dept', 'health', 'uamshealth', 'resourc', 'inform', 'regard', 'covid19', 'question', 'show', 'symptom', 'call', '247', 'hotlin', '18008037847', 'visit', 'httpstcoauhqi6j1b7', 'httpstcov7ipofjpoq', 'speak', 'health', 'care', 'profession', '14', 'yesterday', 'announc', 'first', 'presumpt', 'posit', 'case', 'covid19', 'arkansa', 'learn', 'today', 'five', 'addit', 'presumpt', 'posit', 'case', 'state', 'uncommon', 'seen', 'spread', 'progress', 'similar', 'fashion', 'state', '22', 'month', 'ar', 'prepar', 'respond', 'covid19', 'take', 'measur', 'mitig', 'spread', 'viru', 'practic', 'healthi', 'habit', 'wash', 'hand', 'frequent', 'stay', 'home', 'your', 'feel', 'well', 'beyond', 'continu', 'conduct', 'busi', 'normal', 'activ', '55', 'take', 'covid19', 'outbreak', 'serious', 'take', 'precaut', 'im', 'keep', 'normal', 'schedul', 'continu', 'busi', 'go', 'school', 'enjoy', 'beauti', 'spring', '25', 'today', 'confirm', 'case', 'covid19', 'state', 'current', 'monitor', '100', 'travel', 'daili', 'adhpio', 'checkin', 'guidanc', '12', 'neg', 'test', 'result', 'state', 'lab', 'equip', 'test', 'hous']
Create the dictionary and term document matrix in Python.
##python chunk
dic = corpora.Dictionary(processed_tweets)
doc_term_matrix = [dic.doc2bow(doc) for doc in processed_tweets]
Create the LDA Topics model in Python using the same number of topics as used in the Factor Analysis assignment.
##python chunk
import gensim
lda_model = gensim.models.ldamodel.LdaModel(corpus = doc_term_matrix,id2word = dic,num_topics = 10,random_state = 666,update_every = 1,chunksize = 100,passes = 10,alpha = 'auto',per_word_topics = True)
lda_model
## <gensim.models.ldamodel.LdaModel object at 0x0000000004A40880>
print(lda_model.print_topics())
## [(0, '0.045*"covid19" + 0.021*"new" + 0.014*"today" + 0.013*"health" + 0.012*"mexico" + 0.012*"state" + 0.011*"updat" + 0.011*"test" + 0.011*"case" + 0.010*"provid"'), (1, '0.049*"covid19" + 0.016*"new" + 0.012*"test" + 0.009*"health" + 0.007*"arizona" + 0.007*"state" + 0.007*"work" + 0.006*"updat" + 0.006*"today" + 0.006*"care"'), (2, '0.001*"covid19" + 0.000*"state" + 0.000*"spread" + 0.000*"case" + 0.000*"new" + 0.000*"live" + 0.000*"test" + 0.000*"today" + 0.000*"health" + 0.000*"updat"'), (3, '0.053*"covid19" + 0.011*"covid19ohioreadi" + 0.011*"ohio" + 0.010*"today" + 0.009*"updat" + 0.008*"live" + 0.008*"state" + 0.007*"inthistogetherohio" + 0.006*"inform" + 0.006*"help"'), (4, '0.037*"covid19" + 0.032*"lagov" + 0.029*"laleg" + 0.026*"louisiana" + 0.012*"today" + 0.012*"pm" + 0.010*"respons" + 0.009*"state" + 0.008*"spread" + 0.008*"gov"'), (5, '0.002*"covid19" + 0.001*"updat" + 0.001*"today" + 0.001*"live" + 0.001*"new" + 0.000*"state" + 0.000*"case" + 0.000*"health" + 0.000*"test" + 0.000*"respons"'), (6, '0.045*"covid19" + 0.010*"stay" + 0.009*"home" + 0.009*"spread" + 0.009*"help" + 0.007*"state" + 0.007*"work" + 0.007*"health" + 0.006*"test" + 0.006*"need"'), (7, '0.042*"covid19" + 0.022*"live" + 0.017*"updat" + 0.016*"watch" + 0.014*"today" + 0.010*"pm" + 0.010*"provid" + 0.009*"state" + 0.009*"press" + 0.008*"health"'), (8, '0.050*"covid19" + 0.014*"updat" + 0.012*"state" + 0.011*"today" + 0.010*"live" + 0.009*"respons" + 0.009*"test" + 0.008*"provid" + 0.008*"watch" + 0.007*"connecticut"'), (9, '0.013*"idaho" + 0.010*"idahocovid19" + 0.009*"covid19" + 0.003*"order" + 0.003*"stayhom" + 0.003*"help" + 0.003*"time" + 0.003*"follow" + 0.003*"busi" + 0.003*"state"')]
Create the interactive graphics html file. Please note that this file saves in the same folder as your markdown document, and you should upload the knitted file and the LDA visualization html file.
##python chunk
import pyLDAvis.gensim_models
vis = pyLDAvis.gensim_models.prepare(lda_model, doc_term_matrix, dic, n_jobs = 1)
pyLDAvis.save_html(vis, 'LDA_Visualization_JL.html')
# from gensim.models import CoherenceModel
# # Compute Coherence Score
# coherence_model_lda = CoherenceModel(model=lda_model, texts=tweets, dictionary=dic, coherence='c_v')
# coherence_lda = coherence_model_lda.get_coherence()
# print('\nCoherence Score: ', coherence_lda)
Interpret your topics and compare to MEM themes with PCA. Explain the results from your analysis (at least 5 sentences).
ANSWER: alpha value for LDA is low, meaning that majority of document are classified to one single topic. the LDA_fixed and LDA_gibbs show the there is higher spread in topics. Lower entropy values for LDA_fit value implies the one single topic has relative more influence than the other in the document.