For this assignment, you will use the same data as the Factor Analysis assignment to discover the important topics in U.S. Governors’ tweets about the pandemic. The dataframe for the assignment includes four columns - State, Name, and Party of Governor plus the Text of their tweets. The Text column is the one you should process and analyze.
Load all the libraries or functions that you will use to for the rest of the assignment. It is helpful to define your libraries and functions at the top of a report, so that others can know what they need for the report to compile correctly.
Load the Python libraries or functions that you will use for that section.
##r chunk
library(readr)
library(reticulate)
library(tidyr)
library(tm)
library(topicmodels)
library(tidyverse)
library(tidytext)
library(slam)
import string
import pyLDAvis
## C:\Users\raavi\Documents\R\win-library\4.0\reticulate\python\rpytools\loader.py:24: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
## level=level
import pyLDAvis.gensim
import matplotlib.pyplot as plt
import gensim
## C:\Users\raavi\AppData\Local\R-MINI~1\envs\R-RETI~1\lib\site-packages\scipy\sparse\sparsetools.py:21: DeprecationWarning: `scipy.sparse.sparsetools` is deprecated!
## scipy.sparse.sparsetools is a private module for scipy.sparse, and should not be used.
## _deprecated()
import gensim.corpora as corpora
import nltk
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
ps = PorterStemmer()
Gov_Tweets <-
read_csv(
"C:/Users/raavi/Dropbox/Harrisburg University/Semester 3/ANLY 540 Language Modeling/Week 10/Gov Tweets Clean.csv"
)
import_corpus = Corpus(VectorSource(Gov_Tweets$Text))
import_matrix = DocumentTermMatrix(
import_corpus,
control = list(
stemming = TRUE,
stopwords = TRUE,
minWordLength = 3,
removeNumbers = TRUE,
removePunctuation = TRUE
)
)
import_weight = tapply(import_matrix$v / row_sums(import_matrix)[import_matrix$i],
import_matrix$j,
mean) * log2(nDocs(import_matrix) / col_sums(import_matrix > 0))
import_matrix = import_matrix[row_sums(import_matrix) > 0,]
K = 10
SEED = 42
LDA_fit = LDA(import_matrix, k = K, control = list(seed = SEED))
LDA_fixed = LDA(import_matrix,
k = K,
control = list(estimate.alpha = FALSE, seed = SEED))
LDA_gibbs = LDA(
import_matrix,
k = K,
method = "Gibbs",
control = list(
seed = SEED,
burnin = 1000,
thin = 100,
iter = 1000
)
)
CTM_fit = CTM(import_matrix, k = K,
control = list(seed = SEED,
var = list(tol = 10^-4),
em = list(tol = 10^-3)))
LDA_fit@alpha
## [1] 0.01117366
LDA_fixed@alpha
## [1] 5
LDA_gibbs@alpha
## [1] 5
sapply(list(LDA_fit, LDA_fixed, LDA_gibbs, CTM_fit),
function (x)
mean(apply(posterior(x)$topics, 1, function(z) - sum(z * log(z)))))
## [1] 0.08919543 0.77702312 1.10194591 0.55809578
terms(LDA_fit, 20)
## Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Topic 6
## [1,] "covid" "covid" "covid" "covid" "covid" "covid"
## [2,] "will" "will" "live" "updat" "new" "state"
## [3,] "help" "health" "updat" "today" "state" "watch"
## [4,] "stay" "state" "today" "connecticut" "test" "updat"
## [5,] "spread" "live" "provid" "test" "health" "today"
## [6,] "health" "watch" "state" "live" "today" "provid"
## [7,] "home" "today" "hold" "brief" "will" "live"
## [8,] "can" "updat" "watch" "watch" "spread" "will"
## [9,] "state" "help" "test" "will" "mexico" "respons"
## [10,] "work" "respons" "respons" "covidma" "stay" "health"
## [11,] "live" "brief" "press" "news" "home" "help"
## [12,] "need" "alaska" "confer" "respons" "can" "new"
## [13,] "safe" "can" "can" "posit" "announc" "spread"
## [14,] "virginia" "provid" "will" "latest" "help" "can"
## [15,] "test" "spread" "fight" "state" "work" "nebraska"
## [16,] "keep" "continu" "brief" "new" "case" "order"
## [17,] "today" "care" "facebook" "peopl" "continu" "work"
## [18,] "protect" "akgov" "spread" "discuss" "get" "thank"
## [19,] "new" "public" "read" "health" "posit" "test"
## [20,] "care" "test" "health" "effort" "live" "now"
## Topic 7 Topic 8 Topic 9 Topic 10
## [1,] "covid" "covid" "covid" "covid"
## [2,] "new" "will" "arizona" "covidohioreadi"
## [3,] "posit" "today" "health" "ohio"
## [4,] "test" "state" "aztogeth" "will"
## [5,] "case" "lagov" "thank" "inthistogetherohio"
## [6,] "weve" "respons" "work" "updat"
## [7,] "total" "laleg" "will" "stayhomeohio"
## [8,] "updat" "missouri" "arizonan" "can"
## [9,] "jersey" "louisiana" "azdh" "teamkentucki"
## [10,] "jerseyan" "maryland" "help" "live"
## [11,] "now" "updat" "provid" "togetherki"
## [12,] "lost" "live" "can" "case"
## [13,] "will" "spread" "protect" "today"
## [14,] "bergen" "case" "public" "inform"
## [15,] "may" "health" "need" "governor"
## [16,] "hospit" "provid" "state" "beshear"
## [17,] "bring" "test" "continu" "help"
## [18,] "burlington" "continu" "spread" "test"
## [19,] "camden" "work" "discuss" "share"
## [20,] "essex" "posit" "busi" "kentuckian"
terms(LDA_gibbs, 20)
## Topic 1 Topic 2 Topic 3 Topic 4 Topic 5
## [1,] "live" "covid" "arizona" "covidma" "maryland"
## [2,] "updat" "will" "aztogeth" "test" "case"
## [3,] "watch" "today" "thank" "watch" "nevada"
## [4,] "covid" "state" "arizonan" "new" "action"
## [5,] "governor" "health" "azdh" "tennesse" "coronavirus"
## [6,] "teamkentucki" "can" "health" "today" "colorado"
## [7,] "togetherki" "help" "work" "site" "discuss"
## [8,] "facebook" "spread" "oklahoma" "tennessean" "oregon"
## [9,] "fight" "work" "discuss" "read" "montanan"
## [10,] "beshear" "updat" "protect" "care" "live"
## [11,] "confer" "test" "public" "tune" "idaho"
## [12,] "press" "provid" "drcarachrist" "commonwealth" "montana"
## [13,] "share" "continu" "donat" "updat" "main"
## [14,] "kentuckian" "live" "partnership" "texa" "nevadan"
## [15,] "inform" "care" "resourc" "expand" "hawaii"
## [16,] "tune" "respons" "latest" "brief" "oregonian"
## [17,] "livestream" "need" "minnesota" "bulletin" "idahocovid"
## [18,] "httpstconhomytsv" "order" "minnesotan" "delawar" "march"
## [19,] "healthyathom" "take" "access" "capac" "youtub"
## [20,] "ill" "home" "small" "support" "announc"
## Topic 6 Topic 7 Topic 8 Topic 9 Topic 10
## [1,] "new" "lagov" "updat" "covidohioreadi" "new"
## [2,] "covid" "laleg" "brief" "ohio" "case"
## [3,] "posit" "watch" "covid" "inthistogetherohio" "stay"
## [4,] "test" "louisiana" "connecticut" "stayhomeohio" "mexico"
## [5,] "total" "respons" "live" "missouri" "home"
## [6,] "weve" "live" "news" "case" "announc"
## [7,] "case" "gov" "respons" "data" "posit"
## [8,] "lost" "state" "hold" "confirm" "test"
## [9,] "jersey" "updat" "today" "covid" "total"
## [10,] "jerseyan" "provid" "watch" "httpstcolwxirscb" "health"
## [11,] "bring" "press" "test" "will" "addit"
## [12,] "may" "brief" "virginia" "director" "spread"
## [13,] "bergen" "hold" "press" "hospit" "today"
## [14,] "essex" "nebraska" "discuss" "counti" "nmdoh"
## [15,] "burlington" "alaska" "latest" "peopl" "death"
## [16,] "camden" "akgov" "posit" "dramyacton" "north"
## [17,] "hudson" "will" "peopl" "age" "statewid"
## [18,] "gloucest" "outbreak" "confer" "can" "confirm"
## [19,] "cumberland" "rickett" "state" "alpolit" "current"
## [20,] "atlant" "hampshir" "provid" "number" "offici"
LDA_fit_topics = tidy(LDA_fit, matrix = "beta")
top_terms = LDA_fit_topics %>% group_by(topic) %>% top_n(10, beta) %>% ungroup() %>% arrange(topic, -beta)
cleanup = theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line.x = element_line(color = "black"),
axis.line.y = element_line(color = "black"),
legend.key = element_rect(fill = "white"),
text = element_text(size = 10)
)
top_terms %>%
mutate(term = reorder(term, beta)) %>%
ggplot(aes(term, beta, fill = factor(topic))) +
geom_bar(stat = "identity", show.legend = FALSE) +
facet_wrap( ~ topic, scales = "free") +
cleanup +
coord_flip()
Transfer the df[‘Text’] to Python and convert it to a list for processing.
tweets = list(r.Gov_Tweets["Text"])
Process the text using Python.
processed_text = []
for tweet in tweets:
tweet = tweet.lower()
tweet = tweet.translate(str.maketrans('', '', string.punctuation))
tweet = nltk.word_tokenize(tweet)
tweet = [word for word in tweet if word not in stopwords.words('english')]
tweet = [ps.stem(word = word) for word in tweet]
processed_text.append(tweet)
processed_text[0]
## ['ive', 'extend', 'covid', 'public', 'health', 'disast', 'emerg', 'anoth', '45day', 'everi', 'industri', 'sector', 'ar', 'affect', 'crisi', 'import', 'continu', 'support', 'protect', 'industri', 'peopl', 'threat', 'longer', 'immin', 'httpstco2an22n0gwr', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '050520', 'httpstcocmlypdifq2', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstco1cws1vdebg', 'faith', 'commun', 'support', 'effort', 'fight', 'covid', 'miss', 'inperson', 'fellowship', 'mani', 'church', 'continu', 'meet', 'remot', 'present', 'guidanc', 'give', 'hous', 'worship', 'option', 'minist', 'congreg', 'httpstco9rvgmtf6zr', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '050420', 'httpstcoh3uasas2y', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcouphoummna4', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '050220', 'httpstcobizdnvigiz', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcobizdnvigiz', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '050120', 'httpstcojkdxbuh5qn', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '043020', 'httpstcobg9lhjfbay', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcotlb60qwcgv', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '042920', 'httpstcontbjk8vtw4', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcontbjk8vtw4', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '042820', 'httpstcojo5hvzntai', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcojo5hvzntai', 'want', 'thank', 'walmart', 'quest', 'diagnost', 'open', 'drivethru', 'covid', 'test', 'site', 'central', 'ar', 'symptomat', 'arkansan', 'health', 'care', 'worker', 'first', 'respond', 'increas', 'test', 'capac', 'enhanc', 'gather', 'data', 'look', 'lift', 'restrict', 'ar', 'last', 'week', 'encourag', 'symptomat', 'arkansan', 'get', 'test', 'weekend', 'thank', 'respons', 'partnership', 'hospit', 'test', 'site', 'exceed', 'goal', 'conduct', 'gt1500', 'test', 'day', 'give', 'us', 'accur', 'sampl', 'covid', 'number', 'ar', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '042720', 'httpstcomxdnv1jf9n', 'im', 'hold', 'news', 'confer', 'noon', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcomxdnv1jf9n', 'arkansa', 'surg', 'campaign', 'continu', 'today', 'think', 'symptom', 'covid19', 'dont', 'wait', 'get', 'test', 'httpstcosgcmw5rcac', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '042520', 'httpstcoocrlbfa0u', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcoocrlbfa0u', 'wwii', 'veteran', 'loui', 'strickland', '100', 'year', 'old', 'today', 'fought', 'normandi', 'daughter', 'fellow', 'vet', 'arkansa', 'state', 'veteran', 'home', 'threw', 'parti', 'today', 'daughter', 'famili', 'couldnt', 'attend', 'covid', 'restrict', 'happi', 'birthday', 'loui', 'thank', 'serv', 'httpstco1iefo9qnna', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '042420', 'httpstcod4s9w018fa', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcod4s9w018fa', 'encourag', 'symptom', 'fever', 'cough', 'short', 'breath', 'get', 'test', 'covid19', 'within', 'next', 'two', 'day', 'think', 'symptom', 'dont', 'wait', 'get', 'test', 'httpstcoekhxdj9e6f', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '042320', 'httpstcooiqojakjpn', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstco497r9iiwqa', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '042220', 'httpstcomiq8tuvsmd', 'outbreak', 'covid19', 'import', 'ever', 'arkansan', 'particip', '2020censu', 'respond', 'censu', 'socialdistanc', 'friendli', 'submit', 'respons', 'phone', 'mail', 'onlin', 'httpstcoj9doqyuu0q', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcomiq8tuvsmd', 'today', 'announc', 'creation', 'covid19', 'test', 'work', 'group', 'ensur', 'arkansa', 'adequ', 'test', 'process', 'place', 'pursu', 'publichealth', 'econom', 'recoveri', 'strategi', 'join', 'work', 'group', 'first', 'meet', 'afternoon', 'httpstcoaetnuswqbp', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '042120', 'httpstcorpg0ncb3si', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcorpg0ncb3si', 'proof', 'arkansan', 'care', '2', 'peopl', 'jonesboro', 'organ', 'freepizza', 'night', 'encourag', 'town', 'covid', 'plu', 'tornado', 'word', 'spread', 'donat', 'arriv', 'becam', 'oper', 'full', 'belli', '2', 'week', '12', 'restaur', '2500', 'free', 'supper', 'that', 'spirit', 'arkansa', 'httpstcoe7xiqq9vyy', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcoziardpdz8d', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '041920', 'httpstcorztbs9ljlo', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcorztbs9ljlo', '23', 'task', 'forc', 'includ', '27', 'leader', 'privat', 'sector', 'public', 'agenc', 'examin', 'impact', 'covid19', 'busi', 'industri', 'state', 'select', 'steuart', 'walton', 'chairman', '13', 'today', 'creat', 'governor', 'econom', 'recoveri', 'task', 'forc', 'develop', 'industryspecif', 'strategi', 'make', 'recommend', 'arkansass', 'econom', 'recoveri', 'effect', 'covid19', 'httpstco4brjy6dhb7', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '041820', 'httpstcod5rdrkf4gc', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcod5rdrkf4gc', 'offer', 'condol', 'famili', 'chief', 'petti', 'offic', 'charl', 'robert', 'thacker', 'jr', 'fort', 'smith', 'nativ', 'lost', 'life', 'covid19', '42', 'year', 'old', 'grate', 'servic', 'countri', 'httpstcooi8ouren6q', '1st', 'report', 'governor', 'medic', 'advisori', 'committe', 'postpeak', 'covid19', 'respons', 'priorit', 'restor', 'arkansass', 'economi', 'time', 'fashion', 'protect', 'vulner', 'maintain', 'adequ', 'health', 'care', 'public', 'health', 'capac', 'prevent', 'resurg', 'covid19', '12', 'care', 'review', 'realdonaldtrump', 'model', 'reopen', 'economi', 'first', 'report', 'governor', 'medic', 'advisori', 'committe', 'postpeak', 'covid19', 'respons', 'base', 'arkansass', 'current', 'public', 'health', 'data', 'hope', 'begin', 'lift', 'restrict', 'may', '4', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '041720', 'httpstcoqgnx0gucqj', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcoqgnx0gucqj', 'arkansasdw', 'launch', 'new', 'websit', 'provid', 'regularli', 'updat', 'inform', 'regard', 'covid19rel', 'unemploy', 'benefit', 'httpstcoa3obxp2vot', 'onestop', 'shop', 'answer', 'frequentlyask', 'question', 'portal', 'file', 'claim', 'recent', 'news', 'articl', 'meet', 'newlyform', 'governor', 'medic', 'advisori', 'committe', 'postpeak', 'covid19', 'respons', 'first', 'time', 'tomorrow', 'discuss', 'public', 'health', 'strategi', 'futur', 'finish', 'confer', 'call', 'presid', 'covid19', 'task', 'forc', 'brief', 'governor', 'open', 'america', 'talk', 'mean', 'arkansa', 'tomorrow', 'daili', 'updat', 'httpstcotocnxbhnqd', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '041620', 'httpstcocd49nd7us', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcocd49nd7us', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '041520', 'httpstcobuqqnb7rk', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcobuqqnb7rk', 'look', 'forward', 'join', 'shannonbream', 'foxnewsnight', 'discuss', 'arkansass', 'covid19', 'respons', 'httpstcoug13ii4zv7', 'issu', '2', 'execut', 'order', 'today', '1st', 'allow', 'first', 'respond', 'frontlin', 'health', 'care', 'worker', 'qualifi', 'worker', 'comp', 'work', 'respons', 'caus', 'contract', 'covid19', '2nd', 'provid', 'liabil', 'immun', 'medic', 'emerg', 'respond', 'crisi', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '041420', 'httpstcog7ot43zqbv', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcog7ot43zqbv', 'ive', 'form', 'medic', 'advisori', 'committe', 'help', 'guid', 'public', 'health', 'strategi', 'arkansa', 'reach', 'peak', 'number', 'covid19', 'case', 'committe', 'examin', 'protocol', 'make', 'recommend', 'necessari', 'avoid', 'resurg', 'covid19', 'peak', 'httpstcoaggw9p05q7', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '041320', 'httpstcor6rwpmrudr', 'someth', 'bear', 'mind', 'face', 'covid19', 'arkansass', 'popul', 'isnt', 'fulli', 'repres', '2020censu', 'arkansa', 'hospit', 'servic', 'commun', 'health', 'center', 'could', 'impact', 'neg', 'next', 'decad', 'make', 'sure', 'count', 'httpstcoj9doqyuu0q', 'thank', 'dr', 'fauci', 'recogn', 'effort', 'fight', 'covid19', 'arkansa', 'let', 'stay', 'commit', 'win', 'fight', 'httpstcorrunjg0ovz', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '041020', 'httpstcoxphibyj9d7', 'live', 'kark4newsfox16new', 'even', 'answer', 'question', 'covid19', 'public', 'health', 'emerg', 'arkansa', 'answer', 'mani', 'question', 'tune', 'virtual', 'town', 'hall', '7pm', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '040920', 'httpstcon8jincdhqd', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '040820', 'httpstcoepxqn3fsfg', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '040720', 'httpstcovsknrrbty3', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcovsfmoen6n3', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '040620', 'httpstcom3ymchnmx3', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcom3ymchnmx3', '23', 'appreci', 'entergyark', 'support', 'new', 'covid19', 'relief', 'fund', 'announc', 'today', 'news', 'brief', 'temporarili', 'suspend', 'disconnect', 'servic', 'custom', 'cant', 'pay', '13', 'crisi', 'place', 'hardship', 'mani', 'arkansan', 'incred', 'respons', 'need', 'other', 'im', 'pleas', 'state', 'partner', 'w', 'smartgivingar', 'support', 'covid19', 'relief', 'fund', 'fund', 'arkansan', 'donat', 'help', 'neighbor', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '040520', 'httpstcoh42a1l4w92', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcohaewl5zjmo', '22', 'grate', 'arkansan', 'taken', 'extraordinari', 'effort', 'practic', 'social', 'distanc', 'flatten', 'curv', 'day', 'evalu', 'new', 'step', 'take', 'measur', 'provid', 'protect', 'arkansan', 'prepar', 'peak', 'number', 'covid19', 'case', '12', 'issu', 'execut', 'order', 'mandat', 'new', 'safeti', 'measur', 'commerci', 'lodg', 'shortterm', 'rental', 'institut', 'oper', 'order', 'protect', 'public', 'health', 'covid19', 'crisi', 'httpstcorplzmo2o7p', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '040420', 'httpstcohg5h89y2tt', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcohg5h89y2tt', 'arkansass', 'request', 'feder', 'disast', 'assist', 'result', 'covid19', 'approv', 'thank', 'feder', 'deleg', 'fema', 'potu', 'approv', 'assist', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '040320', 'httpstcoja0ry2vrpi', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcohplzst7npc', 'linglin42108652', 'great', 'suggest', 'linglin42108652', 'test', 'posit', 'covid19', 'selfquarantin', 'leav', 'home', 'design', 'friend', 'famili', 'member', 'care', 'includ', 'get', 'groceri', 'would', 'help', 'mitig', 'spread', 'viru', 'askgovhutchinson', 'chris70909106', 'start', 'tomorrow', 'arkansa', 'state', 'park', 'implement', 'new', 'safeti', 'measur', 'like', 'dayus', 'oper', 'reduc', 'risk', 'overcrowd', 'park', 'discourag', 'visitor', 'outofst', 'threat', 'covid19', 'pass', 'askgovhutchinson', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '040220', 'httpstcoyumrweehi6', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcoduh0dg5udr', 'covid19', 'relat', 'question', 'ill', 'particip', 'nationwid', 'askthegovernor', 'twitter', 'q', 'today', '500', 'ill', 'answer', 'mani', 'question', 'repli', 'tweet', 'question', 'askgovhutchinson', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '040120', 'httpstcopugz6bjjdz', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcopugz6bau57', '13', 'current', 'number', 'covid19', 'case', 'arkansa', 'lower', 'project', 'number', 'case', 'provid', 'adhpio', 'last', 'week', 'httpstcocthvdfxd13', 'suspect', 'covid19', 'symptom', 'question', 'regard', 'children', 'covid19', 'call', 'adhpio', '18008037847', 'archildren', '18007433616', 'httpstco4bksjpaizf', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '033120', 'httpstco8ka5d0nv2j', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstco8ka5d0wje9', 'need', 'covid19', 'data', 'ar', 'test', 'give', 'us', 'data', 'increas', 'test', 'immedi', 'need', 'right', 'well', 'protect', 'health', 'care', 'worker', 'whole', 'team', 'includ', 'aremerg', 'adhpio', 'work', 'procur', 'addl', 'test', 'nationaldoctorsday', 'commend', 'extraordinari', 'physician', 'protect', 'heal', 'daili', 'especi', 'grate', 'sacrific', 'covid19', 'outbreak', 'year', 'support', 'doctor', 'social', 'distanc', 'save', 'live', 'crisi', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '033020', 'httpstcoenrkzchmdv', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcoenrkzchmdv', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '032920', 'httpstcokh8irkzaen', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcokh8irkhypn', 'great', 'guidelin', 'arkansa', 'depart', 'park', 'heritag', 'tourism', 'must', 'part', 'flatten', 'curv', 'covid19ark', 'httpstcorf66jnabev', 'uplift', 'arkansa', 'provid', 'free', 'covid19', 'resourc', 'busi', 'organ', 'individu', 'thank', 'littlerockcvb', 'manganholcomb', 'weareteamsi', 'creat', 'commun', 'websit', 'httpstcon3rxnyajlt', 'httpstco8foo9cgtcf', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '032820', 'httpstcoyvhcciet8a', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcoyvhcciet8a', 'live', 'governor', 'hutchinson', 'sign', 'legisl', 'establish', 'covid19', 'raini', 'day', 'fund', 'httpstcoyghs1rkazn', 'sign', 'legisl', 'establish', 'covid19', 'raini', 'day', 'fund', 'immedi', 'upon', 'passag', 'around', 'midnight', 'tonight', 'watch', 'bill', 'sign', 'httpstcoyghs1rkazn', 'week', 'radio', 'address', 'share', 'initi', 'launch', 'assist', 'rural', 'hospit', 'provid', 'support', 'front', 'line', 'treat', 'covid19', 'patient', 'learn', 'httpstcoqiwyd8yuxd', 'httpstcon56acyckeg', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '032720', 'httpstcotnebva6c', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcotnebva6c', 'import', 'arkansan', 'stay', 'home', 'possibl', 'work', 'mitig', 'spread', 'covid19', 'necessari', 'home', 'mind', 'public', 'space', 'keep', 'safe', 'distanc', 'least', 'six', 'feet', 'other', 'need', 'crowd', 'order', 'us', 'success', 'arkansa', 'slow', 'upward', 'trend', 'line', 'covid19', 'public', 'need', 'abid', 'guidanc', 'arkansa', 'depart', 'health', 'adhpio', 'guidelin', 'direct', 'found', 'httpstcotfzzurkedi', 'grate', 'feder', 'deleg', 'commun', 'support', 'theyv', 'given', 'state', 'especi', 'passag', 'covid19', 'relief', 'bill', 'senat', 'bill', 'provid', 'confid', 'arkansan', 'whose', 'employ', 'small', 'busi', 'affect', 'outbreak', '44', 'plan', 'also', 'propos', 'addit', 'payment', '250', 'week', 'nonphysician', 'direct', 'care', 'worker', '500', 'week', 'nonphysician', 'direct', 'care', 'worker', 'work', 'facil', 'covid19', 'present', 'read', 'initi', 'httpstcorolc0f71zr', '14', 'today', 'announc', '116m', 'initi', 'directli', 'address', 'covid19', 'crisi', 'burden', 'rural', 'hospit', 'health', 'care', 'provid', 'propos', 'provid', 'improv', 'access', 'care', 'citizen', 'keep', 'provid', 'open', 'workforc', 'employ', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '032620', 'httpstcofhvh593kkm', 'im', 'hold', 'news', 'confer', '230pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcofhvh593kkm', 'grate', 'member', 'arkansasguard', 'assist', 'uamshealth', 'screen', 'patient', 'administ', 'covid19', 'test', 'drivethrough', 'test', 'site', 'respond', 'faith', 'urgent', 'need', 'state', 'httpstcoyrkk6nalx1', '14', 'grate', 'doctor', 'nurs', 'lab', 'technician', 'health', 'care', 'profession', 'front', 'line', 'covid19', 'look', 'futur', 'fatigu', 'among', 'individu', 'increas', 'demand', 'health', 'care', 'worker', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '032520', 'httpstco49lubn4lki', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstco49lubn4lki', 'arkansasedc', 'work', 'hard', 'behalf', 'arkansa', 'busi', 'covid19', 'outbreak', 'covid19', 'busi', 'employ', 'resourc', 'visit', 'httpstcoqly8bu2tcm', 'issu', 'execut', 'order', '2005', 'leverag', 'telehealth', 'ar', 'covid19', 'outbreak', 'doctor', 'establish', 'new', 'patient', 'phone', 'minim', 'number', 'sick', 'patient', 'wait', 'room', 'mitig', 'spread', 'viru', 'httpstcolrccgwf3d', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '032420', 'httpstcojfq4za5g9b', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcojfq4za5g9b', 'arkansa', 'blood', 'institut', 'abi', 'blood', 'collect', 'organ', 'central', 'arkansa', 'donat', 'center', 'littl', 'rock', 'north', 'littl', 'rock', 'hot', 'spring', 'abl', 'pleas', 'consid', 'donat', 'blood', 'face', 'covid19', 'public', 'health', 'emerg', 'abiblood', '23', 'due', 'covid19', 'outbreak', 'chang', 'individu', 'tax', 'file', 'deadlin', 'offici', 'state', 'revenu', 'forecast', 'lower', '3531m', 'necessit', 'special', 'session', 'gener', 'assembl', 'address', 'shortfal', 'war', '173m', 'unalloc', 'surplu', 'last', 'year', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '032320', 'httpstcoabl6dguzlh', 'im', 'hold', 'news', 'confer', '230pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcoabl6dguzlh', 'busi', 'owner', 'encourag', 'employe', 'protect', 'person', 'health', 'safeti', 'other', 'clean', 'surfac', 'frequent', 'touch', 'avoid', 'meet', 'requir', 'close', 'proxim', 'post', 'symptom', 'covid19', 'fever', 'cough', 'short', 'breath', '14', 'today', 'total', '165', 'posit', 'covid19', 'case', 'ar', 'largest', 'increas', 'case', '24hour', 'period', 'weve', 'seen', 'far', 'number', 'reflect', 'increas', 'adhpio', 'test', 'capac', 'covid19', 'spread', 'import', 'arkansan', 'follow', 'cdc', 'guidelin', 'keep', 'famili', 'safe', 'encourag', 'peopl', 'state', 'practic', 'cdc', 'current', 'recommend', 'avoid', 'social', 'gather', '10', 'help', 'mitig', 'viru', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '032220', 'httpstcowryqkexpeq', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcowryqkexpeq', '13', 'today', 'announc', 'total', '118', 'posit', 'covid19', 'case', 'arkansa', 'base', 'seen', 'state', 'arkansa', 'like', 'reach', 'peak', 'covid19', 'case', '6', '8', 'week', 'project', 'peak', '1000', 'patient', 'hospit', 'social', 'distanc', 'mean', 'stay', 'home', 'although', 'effect', 'way', 'mitig', 'spread', 'covid19', 'walk', 'hike', 'fish', 'outdoor', 'activ', 'consist', 'social', 'distanc', 'practic', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '032120', 'httpstcoag7kynpyss', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcoag7kynpyss', 'protect', 'arkansass', 'workforc', 'busi', 'covid19', 'outbreak', 'top', 'prioriti', 'arkansa', 'busi', 'follow', 'guidelin', 'ensur', 'safeti', 'employe', 'continu', 'oper', 'httpstcobmiudr9f8v', 'team', 'top', 'health', 'care', 'profession', 'adhpio', 'uamshealth', 'agenc', 'hospit', 'across', 'state', 'work', 'hard', 'keep', 'inform', 'covid19', 'outbreak', 'develop', 'guidelin', 'arkansan', 'follow', 'stay', 'safe', 'healthi', 'httpstco245ghogm2h', 'accord', 'cdc', 'best', 'way', 'prevent', 'spread', 'covid19', 'avoid', 'expos', 'viru', 'covid19', 'thought', 'spread', 'peopl', 'within', '6', 'feet', 'practic', 'social', 'distanc', 'help', 'mitig', 'outbreak', 'state', 'rise', 'number', 'posit', 'covid19', 'case', 'arkansa', 'reflect', 'addit', 'test', 'capac', 'want', 'number', 'rise', 'mean', 'locat', 'isol', 'case', 'across', 'state', 'otherwis', 'would', 'go', 'undetect', 'slow', 'spread', 'viru', 'week', 'radio', 'address', 'discuss', 'execut', 'order', 'issu', 'last', 'week', 'expand', 'telemedicin', 'covid19', 'outbreak', 'learn', 'httpstcosybmcinpsx', 'httpstcosv6tqbg2tq', '15', 'today', 'announc', 'current', '96', 'posit', 'case', 'covid19', 'ar', 'affect', '3', 'longterm', 'care', 'facil', 'appl', 'creek', 'nurs', 'rehab', 'centerton', 'villag', 'gener', 'baptist', 'west', 'pine', 'bluff', 'briarwood', 'nurs', 'home', 'rehab', 'littl', 'rock', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '032020', 'httpstcohxq5ei1rfp', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcohxq5ei1rfp', 'last', 'week', 'activ', 'arkansasguard', 'assist', 'covid19', 'respons', 'grate', 'work', 'nation', 'guard', 'medic', '39th', 'infantri', 'brigad', 'combat', 'team', 'support', 'adhpio', 'emerg', 'oper', 'center', 'respond', 'question', 'arkansan', 'httpstcoa10wheke6f', 'thank', 'team', 'member', 'adhpio', 'emerg', 'oper', 'center', 'hard', 'work', 'dedic', 'covid19', 'outbreak', 'commend', 'servic', 'peopl', 'arkansa', 'face', 'public', 'health', 'emerg', 'httpstcotcvxd3ddmd', 'join', 'call', 'potu', 'vp', 'governor', 'afternoon', 'covid19', 'pandem', 'take', 'everi', 'step', 'possibl', 'make', 'nation', 'emerg', 'shortliv', 'possibl', 'httpstcow4r7vpt1bu', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', 'today', '031920', 'httpstcowkarmnbjiy', 'im', 'hold', 'news', 'confer', '230pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcodq4tv1lknb', 'today', 'announc', 'relief', 'arkansa', 'busi', 'childcar', 'provid', 'eas', 'covid19', 'impact', 'read', 'httpstcoithklggfa0', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '031820', 'httpstco2bwhflgv66', '33', 'discourag', 'unnecessari', 'outofst', 'travel', 'time', 'mitig', 'slow', 'spread', 'covid19', 'student', 'educ', 'return', 'school', 'spring', 'break', 'circumst', 'allow', '13', 'today', 'announc', 'six', 'new', 'posit', 'covid19', 'case', 'arkansa', 'strategi', 'ahead', 'curv', 'mitig', 'includ', 'close', 'school', 'two', 'week', 'prevent', 'larg', 'gather', 'dont', 'signific', 'commun', 'spread', 'state', 'im', 'hold', 'news', 'confer', 'fayettevil', '1115', 'provid', 'updat', 'ongo', 'covid19', 'respons', 'watch', 'httpstcolis0p97i2v', '13', 'today', 'arkansa', '16', 'confirm', 'case', 'covid19', 'individu', 'isol', 'situat', 'monitor', 'close', 'announc', 'yesterday', 'activ', 'arkansa', 'nation', 'guard', 'assist', 'covid19', 'respons', 'live', 'daili', 'media', 'brief', 'covid19', '031420', 'httpstco6m2t2zqq60', 'week', 'radio', 'address', 'commend', 'respons', 'respons', 'leader', 'arkansa', 'work', 'togeth', 'prevent', 'spread', 'covid19', 'state', 'httpstcob7hf02q5lm', 'httpstcozcwpcdsolz', 'live', 'daili', 'media', 'brief', 'covid19', 'httpstcoga1wy1cwg', '44', 'ar', 'dept', 'health', 'uamshealth', 'resourc', 'inform', 'regard', 'covid19', 'question', 'show', 'symptom', 'call', '247', 'hotlin', '18008037847', 'visit', 'httpstcoauhqi6j1b7', 'httpstcov7ipofjpoq', 'speak', 'health', 'care', 'profession', '14', 'yesterday', 'announc', 'first', 'presumpt', 'posit', 'case', 'covid19', 'arkansa', 'learn', 'today', 'five', 'addit', 'presumpt', 'posit', 'case', 'state', 'uncommon', 'seen', 'spread', 'progress', 'similar', 'fashion', 'state', '22', 'month', 'ar', 'prepar', 'respond', 'covid19', 'take', 'measur', 'mitig', 'spread', 'viru', 'practic', 'healthi', 'habit', 'wash', 'hand', 'frequent', 'stay', 'home', 'your', 'feel', 'well', 'beyond', 'continu', 'conduct', 'busi', 'normal', 'activ', '55', 'take', 'covid19', 'outbreak', 'serious', 'take', 'precaut', 'im', 'keep', 'normal', 'schedul', 'continu', 'busi', 'go', 'school', 'enjoy', 'beauti', 'spring', '25', 'today', 'confirm', 'case', 'covid19', 'state', 'current', 'monitor', '100', 'travel', 'daili', 'adhpio', 'checkin', 'guidanc', '12', 'neg', 'test', 'result', 'state', 'lab', 'equip', 'test', 'hous']
Create the dictionary and term document matrix in Python.
dictionary = corpora.Dictionary(processed_text)
doc_term_matrix = [dictionary.doc2bow(doc) for doc in processed_text]
Create the LDA Topics model in Python using the same number of topics as used in the Factor Analysis assignment.
lda_model = gensim.models.ldamodel.LdaModel(corpus = doc_term_matrix, #TDM
id2word = dictionary, #Dictionary
num_topics = 10,
random_state = 100,
update_every = 1,
chunksize = 100,
passes = 10,
alpha = 'auto',
per_word_topics = True)
Create the interactive graphics html file. Please note that this file saves in the same folder as your markdown document, and you should upload the knitted file and the LDA visualization html file.
vis = pyLDAvis.gensim.prepare(lda_model, doc_term_matrix, dictionary, n_jobs = 1)
pyLDAvis.save_html(vis, 'LDA_Visualization_Anvesh.html')
Interpret your topics and compare to MEM themes with PCA. Explain the results from your analysis (at least 5 sentences).
ANSWER: Alpha value for LDA is low (0.01117366) which indicates that higher percentages of documents are classified to one single topic. Higher alpha values for LDA_fixed and LDA_gibbs shows that there is higher spread in topics. Lower entropy values for LDA_fit (0.08919543) implies one single topic has more influence in the document. Rest of the three entropy values are higher implying that the influence is spread Most dominant terms. The analysis and graphs(both in R and python) indicate that topics are based on the idea that every document includes a mix of topics. Covid and live predominant in every document