For this assignment, you will use the same data as the Factor Analysis assignment to discover the important topics in U.S. Governors’ tweets about the pandemic. The dataframe for the assignment includes four columns - State, Name, and Party of Governor plus the Text of their tweets. The Text column is the one you should process and analyze.

Load the libraries + functions

Load all the libraries or functions that you will use to for the rest of the assignment. It is helpful to define your libraries and functions at the top of a report, so that others can know what they need for the report to compile correctly.

Load the Python libraries or functions that you will use for that section.

##r chunk
library(readr)
library(reticulate)
library(tidyr)
library(tm)
library(topicmodels)
library(tidyverse)
library(tidytext)
library(slam)
import string
import pyLDAvis
## C:\Users\raavi\Documents\R\win-library\4.0\reticulate\python\rpytools\loader.py:24: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
##   level=level
import pyLDAvis.gensim
import matplotlib.pyplot as plt
import gensim
## C:\Users\raavi\AppData\Local\R-MINI~1\envs\R-RETI~1\lib\site-packages\scipy\sparse\sparsetools.py:21: DeprecationWarning: `scipy.sparse.sparsetools` is deprecated!
## scipy.sparse.sparsetools is a private module for scipy.sparse, and should not be used.
##   _deprecated()
import gensim.corpora as corpora
import nltk
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer 
ps = PorterStemmer()

The Data

Gov_Tweets <-
  read_csv(
    "C:/Users/raavi/Dropbox/Harrisburg University/Semester 3/ANLY 540 Language Modeling/Week 10/Gov Tweets Clean.csv"
  )
import_corpus = Corpus(VectorSource(Gov_Tweets$Text))
import_matrix = DocumentTermMatrix(
  import_corpus,
  control = list(
    stemming = TRUE,
    stopwords = TRUE,
    minWordLength = 3,
    removeNumbers = TRUE,
    removePunctuation = TRUE
  )
)
import_weight = tapply(import_matrix$v / row_sums(import_matrix)[import_matrix$i],
                       import_matrix$j,
                       mean) * log2(nDocs(import_matrix) / col_sums(import_matrix > 0))
import_matrix = import_matrix[row_sums(import_matrix) > 0,]
K = 10
SEED = 42
LDA_fit = LDA(import_matrix, k = K, control = list(seed = SEED))
LDA_fixed = LDA(import_matrix,
                k = K,
                control = list(estimate.alpha = FALSE, seed = SEED))
LDA_gibbs = LDA(
  import_matrix,
  k = K,
  method = "Gibbs",
  control = list(
    seed = SEED,
    burnin = 1000,
    thin = 100,
    iter = 1000
  )
)
CTM_fit = CTM(import_matrix, k = K, 
              control = list(seed = SEED, 
                             var = list(tol = 10^-4), 
                             em = list(tol = 10^-3)))
LDA_fit@alpha
## [1] 0.01117366
LDA_fixed@alpha
## [1] 5
LDA_gibbs@alpha
## [1] 5
sapply(list(LDA_fit, LDA_fixed, LDA_gibbs, CTM_fit), 
       function (x) 
         mean(apply(posterior(x)$topics, 1, function(z) - sum(z * log(z)))))
## [1] 0.08919543 0.77702312 1.10194591 0.55809578
terms(LDA_fit, 20)
##       Topic 1    Topic 2   Topic 3    Topic 4       Topic 5   Topic 6   
##  [1,] "covid"    "covid"   "covid"    "covid"       "covid"   "covid"   
##  [2,] "will"     "will"    "live"     "updat"       "new"     "state"   
##  [3,] "help"     "health"  "updat"    "today"       "state"   "watch"   
##  [4,] "stay"     "state"   "today"    "connecticut" "test"    "updat"   
##  [5,] "spread"   "live"    "provid"   "test"        "health"  "today"   
##  [6,] "health"   "watch"   "state"    "live"        "today"   "provid"  
##  [7,] "home"     "today"   "hold"     "brief"       "will"    "live"    
##  [8,] "can"      "updat"   "watch"    "watch"       "spread"  "will"    
##  [9,] "state"    "help"    "test"     "will"        "mexico"  "respons" 
## [10,] "work"     "respons" "respons"  "covidma"     "stay"    "health"  
## [11,] "live"     "brief"   "press"    "news"        "home"    "help"    
## [12,] "need"     "alaska"  "confer"   "respons"     "can"     "new"     
## [13,] "safe"     "can"     "can"      "posit"       "announc" "spread"  
## [14,] "virginia" "provid"  "will"     "latest"      "help"    "can"     
## [15,] "test"     "spread"  "fight"    "state"       "work"    "nebraska"
## [16,] "keep"     "continu" "brief"    "new"         "case"    "order"   
## [17,] "today"    "care"    "facebook" "peopl"       "continu" "work"    
## [18,] "protect"  "akgov"   "spread"   "discuss"     "get"     "thank"   
## [19,] "new"      "public"  "read"     "health"      "posit"   "test"    
## [20,] "care"     "test"    "health"   "effort"      "live"    "now"     
##       Topic 7      Topic 8     Topic 9    Topic 10            
##  [1,] "covid"      "covid"     "covid"    "covid"             
##  [2,] "new"        "will"      "arizona"  "covidohioreadi"    
##  [3,] "posit"      "today"     "health"   "ohio"              
##  [4,] "test"       "state"     "aztogeth" "will"              
##  [5,] "case"       "lagov"     "thank"    "inthistogetherohio"
##  [6,] "weve"       "respons"   "work"     "updat"             
##  [7,] "total"      "laleg"     "will"     "stayhomeohio"      
##  [8,] "updat"      "missouri"  "arizonan" "can"               
##  [9,] "jersey"     "louisiana" "azdh"     "teamkentucki"      
## [10,] "jerseyan"   "maryland"  "help"     "live"              
## [11,] "now"        "updat"     "provid"   "togetherki"        
## [12,] "lost"       "live"      "can"      "case"              
## [13,] "will"       "spread"    "protect"  "today"             
## [14,] "bergen"     "case"      "public"   "inform"            
## [15,] "may"        "health"    "need"     "governor"          
## [16,] "hospit"     "provid"    "state"    "beshear"           
## [17,] "bring"      "test"      "continu"  "help"              
## [18,] "burlington" "continu"   "spread"   "test"              
## [19,] "camden"     "work"      "discuss"  "share"             
## [20,] "essex"      "posit"     "busi"     "kentuckian"
terms(LDA_gibbs, 20)
##       Topic 1            Topic 2   Topic 3        Topic 4        Topic 5      
##  [1,] "live"             "covid"   "arizona"      "covidma"      "maryland"   
##  [2,] "updat"            "will"    "aztogeth"     "test"         "case"       
##  [3,] "watch"            "today"   "thank"        "watch"        "nevada"     
##  [4,] "covid"            "state"   "arizonan"     "new"          "action"     
##  [5,] "governor"         "health"  "azdh"         "tennesse"     "coronavirus"
##  [6,] "teamkentucki"     "can"     "health"       "today"        "colorado"   
##  [7,] "togetherki"       "help"    "work"         "site"         "discuss"    
##  [8,] "facebook"         "spread"  "oklahoma"     "tennessean"   "oregon"     
##  [9,] "fight"            "work"    "discuss"      "read"         "montanan"   
## [10,] "beshear"          "updat"   "protect"      "care"         "live"       
## [11,] "confer"           "test"    "public"       "tune"         "idaho"      
## [12,] "press"            "provid"  "drcarachrist" "commonwealth" "montana"    
## [13,] "share"            "continu" "donat"        "updat"        "main"       
## [14,] "kentuckian"       "live"    "partnership"  "texa"         "nevadan"    
## [15,] "inform"           "care"    "resourc"      "expand"       "hawaii"     
## [16,] "tune"             "respons" "latest"       "brief"        "oregonian"  
## [17,] "livestream"       "need"    "minnesota"    "bulletin"     "idahocovid" 
## [18,] "httpstconhomytsv" "order"   "minnesotan"   "delawar"      "march"      
## [19,] "healthyathom"     "take"    "access"       "capac"        "youtub"     
## [20,] "ill"              "home"    "small"        "support"      "announc"    
##       Topic 6      Topic 7     Topic 8       Topic 9              Topic 10  
##  [1,] "new"        "lagov"     "updat"       "covidohioreadi"     "new"     
##  [2,] "covid"      "laleg"     "brief"       "ohio"               "case"    
##  [3,] "posit"      "watch"     "covid"       "inthistogetherohio" "stay"    
##  [4,] "test"       "louisiana" "connecticut" "stayhomeohio"       "mexico"  
##  [5,] "total"      "respons"   "live"        "missouri"           "home"    
##  [6,] "weve"       "live"      "news"        "case"               "announc" 
##  [7,] "case"       "gov"       "respons"     "data"               "posit"   
##  [8,] "lost"       "state"     "hold"        "confirm"            "test"    
##  [9,] "jersey"     "updat"     "today"       "covid"              "total"   
## [10,] "jerseyan"   "provid"    "watch"       "httpstcolwxirscb"   "health"  
## [11,] "bring"      "press"     "test"        "will"               "addit"   
## [12,] "may"        "brief"     "virginia"    "director"           "spread"  
## [13,] "bergen"     "hold"      "press"       "hospit"             "today"   
## [14,] "essex"      "nebraska"  "discuss"     "counti"             "nmdoh"   
## [15,] "burlington" "alaska"    "latest"      "peopl"              "death"   
## [16,] "camden"     "akgov"     "posit"       "dramyacton"         "north"   
## [17,] "hudson"     "will"      "peopl"       "age"                "statewid"
## [18,] "gloucest"   "outbreak"  "confer"      "can"                "confirm" 
## [19,] "cumberland" "rickett"   "state"       "alpolit"            "current" 
## [20,] "atlant"     "hampshir"  "provid"      "number"             "offici"
LDA_fit_topics = tidy(LDA_fit, matrix = "beta")
top_terms = LDA_fit_topics %>% group_by(topic) %>% top_n(10, beta) %>% ungroup() %>% arrange(topic, -beta)
cleanup = theme(
  panel.grid.major = element_blank(),
  panel.grid.minor = element_blank(),
  panel.background = element_blank(),
  axis.line.x = element_line(color = "black"),
  axis.line.y = element_line(color = "black"),
  legend.key = element_rect(fill = "white"),
  text = element_text(size = 10)
)
top_terms %>%
  mutate(term = reorder(term, beta)) %>%
  ggplot(aes(term, beta, fill = factor(topic))) +
  geom_bar(stat = "identity", show.legend = FALSE) +
  facet_wrap( ~ topic, scales = "free") +
  cleanup +
  coord_flip()

Gensim Modeling in Python

Transfer the df[‘Text’] to Python and convert it to a list for processing.

tweets = list(r.Gov_Tweets["Text"])

Process the text using Python.

processed_text = []
for tweet in tweets:
  tweet = tweet.lower()
  tweet = tweet.translate(str.maketrans('', '', string.punctuation))
  tweet = nltk.word_tokenize(tweet) 
  tweet = [word for word in tweet if word not in stopwords.words('english')] 
  tweet = [ps.stem(word = word) for word in tweet]
  processed_text.append(tweet)

processed_text[0]
## ['ive', 'extend', 'covid', 'public', 'health', 'disast', 'emerg', 'anoth', '45day', 'everi', 'industri', 'sector', 'ar', 'affect', 'crisi', 'import', 'continu', 'support', 'protect', 'industri', 'peopl', 'threat', 'longer', 'immin', 'httpstco2an22n0gwr', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '050520', 'httpstcocmlypdifq2', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstco1cws1vdebg', 'faith', 'commun', 'support', 'effort', 'fight', 'covid', 'miss', 'inperson', 'fellowship', 'mani', 'church', 'continu', 'meet', 'remot', 'present', 'guidanc', 'give', 'hous', 'worship', 'option', 'minist', 'congreg', 'httpstco9rvgmtf6zr', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '050420', 'httpstcoh3uasas2y', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcouphoummna4', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '050220', 'httpstcobizdnvigiz', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcobizdnvigiz', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '050120', 'httpstcojkdxbuh5qn', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '043020', 'httpstcobg9lhjfbay', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcotlb60qwcgv', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '042920', 'httpstcontbjk8vtw4', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcontbjk8vtw4', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '042820', 'httpstcojo5hvzntai', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcojo5hvzntai', 'want', 'thank', 'walmart', 'quest', 'diagnost', 'open', 'drivethru', 'covid', 'test', 'site', 'central', 'ar', 'symptomat', 'arkansan', 'health', 'care', 'worker', 'first', 'respond', 'increas', 'test', 'capac', 'enhanc', 'gather', 'data', 'look', 'lift', 'restrict', 'ar', 'last', 'week', 'encourag', 'symptomat', 'arkansan', 'get', 'test', 'weekend', 'thank', 'respons', 'partnership', 'hospit', 'test', 'site', 'exceed', 'goal', 'conduct', 'gt1500', 'test', 'day', 'give', 'us', 'accur', 'sampl', 'covid', 'number', 'ar', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '042720', 'httpstcomxdnv1jf9n', 'im', 'hold', 'news', 'confer', 'noon', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcomxdnv1jf9n', 'arkansa', 'surg', 'campaign', 'continu', 'today', 'think', 'symptom', 'covid19', 'dont', 'wait', 'get', 'test', 'httpstcosgcmw5rcac', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '042520', 'httpstcoocrlbfa0u', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcoocrlbfa0u', 'wwii', 'veteran', 'loui', 'strickland', '100', 'year', 'old', 'today', 'fought', 'normandi', 'daughter', 'fellow', 'vet', 'arkansa', 'state', 'veteran', 'home', 'threw', 'parti', 'today', 'daughter', 'famili', 'couldnt', 'attend', 'covid', 'restrict', 'happi', 'birthday', 'loui', 'thank', 'serv', 'httpstco1iefo9qnna', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '042420', 'httpstcod4s9w018fa', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcod4s9w018fa', 'encourag', 'symptom', 'fever', 'cough', 'short', 'breath', 'get', 'test', 'covid19', 'within', 'next', 'two', 'day', 'think', 'symptom', 'dont', 'wait', 'get', 'test', 'httpstcoekhxdj9e6f', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '042320', 'httpstcooiqojakjpn', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstco497r9iiwqa', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '042220', 'httpstcomiq8tuvsmd', 'outbreak', 'covid19', 'import', 'ever', 'arkansan', 'particip', '2020censu', 'respond', 'censu', 'socialdistanc', 'friendli', 'submit', 'respons', 'phone', 'mail', 'onlin', 'httpstcoj9doqyuu0q', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcomiq8tuvsmd', 'today', 'announc', 'creation', 'covid19', 'test', 'work', 'group', 'ensur', 'arkansa', 'adequ', 'test', 'process', 'place', 'pursu', 'publichealth', 'econom', 'recoveri', 'strategi', 'join', 'work', 'group', 'first', 'meet', 'afternoon', 'httpstcoaetnuswqbp', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '042120', 'httpstcorpg0ncb3si', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcorpg0ncb3si', 'proof', 'arkansan', 'care', '2', 'peopl', 'jonesboro', 'organ', 'freepizza', 'night', 'encourag', 'town', 'covid', 'plu', 'tornado', 'word', 'spread', 'donat', 'arriv', 'becam', 'oper', 'full', 'belli', '2', 'week', '12', 'restaur', '2500', 'free', 'supper', 'that', 'spirit', 'arkansa', 'httpstcoe7xiqq9vyy', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcoziardpdz8d', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '041920', 'httpstcorztbs9ljlo', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcorztbs9ljlo', '23', 'task', 'forc', 'includ', '27', 'leader', 'privat', 'sector', 'public', 'agenc', 'examin', 'impact', 'covid19', 'busi', 'industri', 'state', 'select', 'steuart', 'walton', 'chairman', '13', 'today', 'creat', 'governor', 'econom', 'recoveri', 'task', 'forc', 'develop', 'industryspecif', 'strategi', 'make', 'recommend', 'arkansass', 'econom', 'recoveri', 'effect', 'covid19', 'httpstco4brjy6dhb7', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '041820', 'httpstcod5rdrkf4gc', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcod5rdrkf4gc', 'offer', 'condol', 'famili', 'chief', 'petti', 'offic', 'charl', 'robert', 'thacker', 'jr', 'fort', 'smith', 'nativ', 'lost', 'life', 'covid19', '42', 'year', 'old', 'grate', 'servic', 'countri', 'httpstcooi8ouren6q', '1st', 'report', 'governor', 'medic', 'advisori', 'committe', 'postpeak', 'covid19', 'respons', 'priorit', 'restor', 'arkansass', 'economi', 'time', 'fashion', 'protect', 'vulner', 'maintain', 'adequ', 'health', 'care', 'public', 'health', 'capac', 'prevent', 'resurg', 'covid19', '12', 'care', 'review', 'realdonaldtrump', 'model', 'reopen', 'economi', 'first', 'report', 'governor', 'medic', 'advisori', 'committe', 'postpeak', 'covid19', 'respons', 'base', 'arkansass', 'current', 'public', 'health', 'data', 'hope', 'begin', 'lift', 'restrict', 'may', '4', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '041720', 'httpstcoqgnx0gucqj', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcoqgnx0gucqj', 'arkansasdw', 'launch', 'new', 'websit', 'provid', 'regularli', 'updat', 'inform', 'regard', 'covid19rel', 'unemploy', 'benefit', 'httpstcoa3obxp2vot', 'onestop', 'shop', 'answer', 'frequentlyask', 'question', 'portal', 'file', 'claim', 'recent', 'news', 'articl', 'meet', 'newlyform', 'governor', 'medic', 'advisori', 'committe', 'postpeak', 'covid19', 'respons', 'first', 'time', 'tomorrow', 'discuss', 'public', 'health', 'strategi', 'futur', 'finish', 'confer', 'call', 'presid', 'covid19', 'task', 'forc', 'brief', 'governor', 'open', 'america', 'talk', 'mean', 'arkansa', 'tomorrow', 'daili', 'updat', 'httpstcotocnxbhnqd', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '041620', 'httpstcocd49nd7us', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcocd49nd7us', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '041520', 'httpstcobuqqnb7rk', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcobuqqnb7rk', 'look', 'forward', 'join', 'shannonbream', 'foxnewsnight', 'discuss', 'arkansass', 'covid19', 'respons', 'httpstcoug13ii4zv7', 'issu', '2', 'execut', 'order', 'today', '1st', 'allow', 'first', 'respond', 'frontlin', 'health', 'care', 'worker', 'qualifi', 'worker', 'comp', 'work', 'respons', 'caus', 'contract', 'covid19', '2nd', 'provid', 'liabil', 'immun', 'medic', 'emerg', 'respond', 'crisi', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '041420', 'httpstcog7ot43zqbv', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcog7ot43zqbv', 'ive', 'form', 'medic', 'advisori', 'committe', 'help', 'guid', 'public', 'health', 'strategi', 'arkansa', 'reach', 'peak', 'number', 'covid19', 'case', 'committe', 'examin', 'protocol', 'make', 'recommend', 'necessari', 'avoid', 'resurg', 'covid19', 'peak', 'httpstcoaggw9p05q7', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '041320', 'httpstcor6rwpmrudr', 'someth', 'bear', 'mind', 'face', 'covid19', 'arkansass', 'popul', 'isnt', 'fulli', 'repres', '2020censu', 'arkansa', 'hospit', 'servic', 'commun', 'health', 'center', 'could', 'impact', 'neg', 'next', 'decad', 'make', 'sure', 'count', 'httpstcoj9doqyuu0q', 'thank', 'dr', 'fauci', 'recogn', 'effort', 'fight', 'covid19', 'arkansa', 'let', 'stay', 'commit', 'win', 'fight', 'httpstcorrunjg0ovz', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '041020', 'httpstcoxphibyj9d7', 'live', 'kark4newsfox16new', 'even', 'answer', 'question', 'covid19', 'public', 'health', 'emerg', 'arkansa', 'answer', 'mani', 'question', 'tune', 'virtual', 'town', 'hall', '7pm', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '040920', 'httpstcon8jincdhqd', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '040820', 'httpstcoepxqn3fsfg', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '040720', 'httpstcovsknrrbty3', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcovsfmoen6n3', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '040620', 'httpstcom3ymchnmx3', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcom3ymchnmx3', '23', 'appreci', 'entergyark', 'support', 'new', 'covid19', 'relief', 'fund', 'announc', 'today', 'news', 'brief', 'temporarili', 'suspend', 'disconnect', 'servic', 'custom', 'cant', 'pay', '13', 'crisi', 'place', 'hardship', 'mani', 'arkansan', 'incred', 'respons', 'need', 'other', 'im', 'pleas', 'state', 'partner', 'w', 'smartgivingar', 'support', 'covid19', 'relief', 'fund', 'fund', 'arkansan', 'donat', 'help', 'neighbor', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '040520', 'httpstcoh42a1l4w92', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcohaewl5zjmo', '22', 'grate', 'arkansan', 'taken', 'extraordinari', 'effort', 'practic', 'social', 'distanc', 'flatten', 'curv', 'day', 'evalu', 'new', 'step', 'take', 'measur', 'provid', 'protect', 'arkansan', 'prepar', 'peak', 'number', 'covid19', 'case', '12', 'issu', 'execut', 'order', 'mandat', 'new', 'safeti', 'measur', 'commerci', 'lodg', 'shortterm', 'rental', 'institut', 'oper', 'order', 'protect', 'public', 'health', 'covid19', 'crisi', 'httpstcorplzmo2o7p', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '040420', 'httpstcohg5h89y2tt', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcohg5h89y2tt', 'arkansass', 'request', 'feder', 'disast', 'assist', 'result', 'covid19', 'approv', 'thank', 'feder', 'deleg', 'fema', 'potu', 'approv', 'assist', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '040320', 'httpstcoja0ry2vrpi', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcohplzst7npc', 'linglin42108652', 'great', 'suggest', 'linglin42108652', 'test', 'posit', 'covid19', 'selfquarantin', 'leav', 'home', 'design', 'friend', 'famili', 'member', 'care', 'includ', 'get', 'groceri', 'would', 'help', 'mitig', 'spread', 'viru', 'askgovhutchinson', 'chris70909106', 'start', 'tomorrow', 'arkansa', 'state', 'park', 'implement', 'new', 'safeti', 'measur', 'like', 'dayus', 'oper', 'reduc', 'risk', 'overcrowd', 'park', 'discourag', 'visitor', 'outofst', 'threat', 'covid19', 'pass', 'askgovhutchinson', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '040220', 'httpstcoyumrweehi6', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcoduh0dg5udr', 'covid19', 'relat', 'question', 'ill', 'particip', 'nationwid', 'askthegovernor', 'twitter', 'q', 'today', '500', 'ill', 'answer', 'mani', 'question', 'repli', 'tweet', 'question', 'askgovhutchinson', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '040120', 'httpstcopugz6bjjdz', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcopugz6bau57', '13', 'current', 'number', 'covid19', 'case', 'arkansa', 'lower', 'project', 'number', 'case', 'provid', 'adhpio', 'last', 'week', 'httpstcocthvdfxd13', 'suspect', 'covid19', 'symptom', 'question', 'regard', 'children', 'covid19', 'call', 'adhpio', '18008037847', 'archildren', '18007433616', 'httpstco4bksjpaizf', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '033120', 'httpstco8ka5d0nv2j', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstco8ka5d0wje9', 'need', 'covid19', 'data', 'ar', 'test', 'give', 'us', 'data', 'increas', 'test', 'immedi', 'need', 'right', 'well', 'protect', 'health', 'care', 'worker', 'whole', 'team', 'includ', 'aremerg', 'adhpio', 'work', 'procur', 'addl', 'test', 'nationaldoctorsday', 'commend', 'extraordinari', 'physician', 'protect', 'heal', 'daili', 'especi', 'grate', 'sacrific', 'covid19', 'outbreak', 'year', 'support', 'doctor', 'social', 'distanc', 'save', 'live', 'crisi', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '033020', 'httpstcoenrkzchmdv', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcoenrkzchmdv', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '032920', 'httpstcokh8irkzaen', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcokh8irkhypn', 'great', 'guidelin', 'arkansa', 'depart', 'park', 'heritag', 'tourism', 'must', 'part', 'flatten', 'curv', 'covid19ark', 'httpstcorf66jnabev', 'uplift', 'arkansa', 'provid', 'free', 'covid19', 'resourc', 'busi', 'organ', 'individu', 'thank', 'littlerockcvb', 'manganholcomb', 'weareteamsi', 'creat', 'commun', 'websit', 'httpstcon3rxnyajlt', 'httpstco8foo9cgtcf', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '032820', 'httpstcoyvhcciet8a', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcoyvhcciet8a', 'live', 'governor', 'hutchinson', 'sign', 'legisl', 'establish', 'covid19', 'raini', 'day', 'fund', 'httpstcoyghs1rkazn', 'sign', 'legisl', 'establish', 'covid19', 'raini', 'day', 'fund', 'immedi', 'upon', 'passag', 'around', 'midnight', 'tonight', 'watch', 'bill', 'sign', 'httpstcoyghs1rkazn', 'week', 'radio', 'address', 'share', 'initi', 'launch', 'assist', 'rural', 'hospit', 'provid', 'support', 'front', 'line', 'treat', 'covid19', 'patient', 'learn', 'httpstcoqiwyd8yuxd', 'httpstcon56acyckeg', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '032720', 'httpstcotnebva6c', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcotnebva6c', 'import', 'arkansan', 'stay', 'home', 'possibl', 'work', 'mitig', 'spread', 'covid19', 'necessari', 'home', 'mind', 'public', 'space', 'keep', 'safe', 'distanc', 'least', 'six', 'feet', 'other', 'need', 'crowd', 'order', 'us', 'success', 'arkansa', 'slow', 'upward', 'trend', 'line', 'covid19', 'public', 'need', 'abid', 'guidanc', 'arkansa', 'depart', 'health', 'adhpio', 'guidelin', 'direct', 'found', 'httpstcotfzzurkedi', 'grate', 'feder', 'deleg', 'commun', 'support', 'theyv', 'given', 'state', 'especi', 'passag', 'covid19', 'relief', 'bill', 'senat', 'bill', 'provid', 'confid', 'arkansan', 'whose', 'employ', 'small', 'busi', 'affect', 'outbreak', '44', 'plan', 'also', 'propos', 'addit', 'payment', '250', 'week', 'nonphysician', 'direct', 'care', 'worker', '500', 'week', 'nonphysician', 'direct', 'care', 'worker', 'work', 'facil', 'covid19', 'present', 'read', 'initi', 'httpstcorolc0f71zr', '14', 'today', 'announc', '116m', 'initi', 'directli', 'address', 'covid19', 'crisi', 'burden', 'rural', 'hospit', 'health', 'care', 'provid', 'propos', 'provid', 'improv', 'access', 'care', 'citizen', 'keep', 'provid', 'open', 'workforc', 'employ', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '032620', 'httpstcofhvh593kkm', 'im', 'hold', 'news', 'confer', '230pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcofhvh593kkm', 'grate', 'member', 'arkansasguard', 'assist', 'uamshealth', 'screen', 'patient', 'administ', 'covid19', 'test', 'drivethrough', 'test', 'site', 'respond', 'faith', 'urgent', 'need', 'state', 'httpstcoyrkk6nalx1', '14', 'grate', 'doctor', 'nurs', 'lab', 'technician', 'health', 'care', 'profession', 'front', 'line', 'covid19', 'look', 'futur', 'fatigu', 'among', 'individu', 'increas', 'demand', 'health', 'care', 'worker', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '032520', 'httpstco49lubn4lki', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstco49lubn4lki', 'arkansasedc', 'work', 'hard', 'behalf', 'arkansa', 'busi', 'covid19', 'outbreak', 'covid19', 'busi', 'employ', 'resourc', 'visit', 'httpstcoqly8bu2tcm', 'issu', 'execut', 'order', '2005', 'leverag', 'telehealth', 'ar', 'covid19', 'outbreak', 'doctor', 'establish', 'new', 'patient', 'phone', 'minim', 'number', 'sick', 'patient', 'wait', 'room', 'mitig', 'spread', 'viru', 'httpstcolrccgwf3d', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '032420', 'httpstcojfq4za5g9b', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcojfq4za5g9b', 'arkansa', 'blood', 'institut', 'abi', 'blood', 'collect', 'organ', 'central', 'arkansa', 'donat', 'center', 'littl', 'rock', 'north', 'littl', 'rock', 'hot', 'spring', 'abl', 'pleas', 'consid', 'donat', 'blood', 'face', 'covid19', 'public', 'health', 'emerg', 'abiblood', '23', 'due', 'covid19', 'outbreak', 'chang', 'individu', 'tax', 'file', 'deadlin', 'offici', 'state', 'revenu', 'forecast', 'lower', '3531m', 'necessit', 'special', 'session', 'gener', 'assembl', 'address', 'shortfal', 'war', '173m', 'unalloc', 'surplu', 'last', 'year', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '032320', 'httpstcoabl6dguzlh', 'im', 'hold', 'news', 'confer', '230pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcoabl6dguzlh', 'busi', 'owner', 'encourag', 'employe', 'protect', 'person', 'health', 'safeti', 'other', 'clean', 'surfac', 'frequent', 'touch', 'avoid', 'meet', 'requir', 'close', 'proxim', 'post', 'symptom', 'covid19', 'fever', 'cough', 'short', 'breath', '14', 'today', 'total', '165', 'posit', 'covid19', 'case', 'ar', 'largest', 'increas', 'case', '24hour', 'period', 'weve', 'seen', 'far', 'number', 'reflect', 'increas', 'adhpio', 'test', 'capac', 'covid19', 'spread', 'import', 'arkansan', 'follow', 'cdc', 'guidelin', 'keep', 'famili', 'safe', 'encourag', 'peopl', 'state', 'practic', 'cdc', 'current', 'recommend', 'avoid', 'social', 'gather', '10', 'help', 'mitig', 'viru', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '032220', 'httpstcowryqkexpeq', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcowryqkexpeq', '13', 'today', 'announc', 'total', '118', 'posit', 'covid19', 'case', 'arkansa', 'base', 'seen', 'state', 'arkansa', 'like', 'reach', 'peak', 'covid19', 'case', '6', '8', 'week', 'project', 'peak', '1000', 'patient', 'hospit', 'social', 'distanc', 'mean', 'stay', 'home', 'although', 'effect', 'way', 'mitig', 'spread', 'covid19', 'walk', 'hike', 'fish', 'outdoor', 'activ', 'consist', 'social', 'distanc', 'practic', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '032120', 'httpstcoag7kynpyss', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcoag7kynpyss', 'protect', 'arkansass', 'workforc', 'busi', 'covid19', 'outbreak', 'top', 'prioriti', 'arkansa', 'busi', 'follow', 'guidelin', 'ensur', 'safeti', 'employe', 'continu', 'oper', 'httpstcobmiudr9f8v', 'team', 'top', 'health', 'care', 'profession', 'adhpio', 'uamshealth', 'agenc', 'hospit', 'across', 'state', 'work', 'hard', 'keep', 'inform', 'covid19', 'outbreak', 'develop', 'guidelin', 'arkansan', 'follow', 'stay', 'safe', 'healthi', 'httpstco245ghogm2h', 'accord', 'cdc', 'best', 'way', 'prevent', 'spread', 'covid19', 'avoid', 'expos', 'viru', 'covid19', 'thought', 'spread', 'peopl', 'within', '6', 'feet', 'practic', 'social', 'distanc', 'help', 'mitig', 'outbreak', 'state', 'rise', 'number', 'posit', 'covid19', 'case', 'arkansa', 'reflect', 'addit', 'test', 'capac', 'want', 'number', 'rise', 'mean', 'locat', 'isol', 'case', 'across', 'state', 'otherwis', 'would', 'go', 'undetect', 'slow', 'spread', 'viru', 'week', 'radio', 'address', 'discuss', 'execut', 'order', 'issu', 'last', 'week', 'expand', 'telemedicin', 'covid19', 'outbreak', 'learn', 'httpstcosybmcinpsx', 'httpstcosv6tqbg2tq', '15', 'today', 'announc', 'current', '96', 'posit', 'case', 'covid19', 'ar', 'affect', '3', 'longterm', 'care', 'facil', 'appl', 'creek', 'nurs', 'rehab', 'centerton', 'villag', 'gener', 'baptist', 'west', 'pine', 'bluff', 'briarwood', 'nurs', 'home', 'rehab', 'littl', 'rock', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '032020', 'httpstcohxq5ei1rfp', 'im', 'hold', 'news', 'confer', '130pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcohxq5ei1rfp', 'last', 'week', 'activ', 'arkansasguard', 'assist', 'covid19', 'respons', 'grate', 'work', 'nation', 'guard', 'medic', '39th', 'infantri', 'brigad', 'combat', 'team', 'support', 'adhpio', 'emerg', 'oper', 'center', 'respond', 'question', 'arkansan', 'httpstcoa10wheke6f', 'thank', 'team', 'member', 'adhpio', 'emerg', 'oper', 'center', 'hard', 'work', 'dedic', 'covid19', 'outbreak', 'commend', 'servic', 'peopl', 'arkansa', 'face', 'public', 'health', 'emerg', 'httpstcotcvxd3ddmd', 'join', 'call', 'potu', 'vp', 'governor', 'afternoon', 'covid19', 'pandem', 'take', 'everi', 'step', 'possibl', 'make', 'nation', 'emerg', 'shortliv', 'possibl', 'httpstcow4r7vpt1bu', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', 'today', '031920', 'httpstcowkarmnbjiy', 'im', 'hold', 'news', 'confer', '230pm', 'today', 'provid', 'updat', 'covid19', 'respons', 'watch', 'httpstcodq4tv1lknb', 'today', 'announc', 'relief', 'arkansa', 'busi', 'childcar', 'provid', 'eas', 'covid19', 'impact', 'read', 'httpstcoithklggfa0', 'live', 'governor', 'hutchinson', 'provid', 'covid19', 'updat', 'media', '031820', 'httpstco2bwhflgv66', '33', 'discourag', 'unnecessari', 'outofst', 'travel', 'time', 'mitig', 'slow', 'spread', 'covid19', 'student', 'educ', 'return', 'school', 'spring', 'break', 'circumst', 'allow', '13', 'today', 'announc', 'six', 'new', 'posit', 'covid19', 'case', 'arkansa', 'strategi', 'ahead', 'curv', 'mitig', 'includ', 'close', 'school', 'two', 'week', 'prevent', 'larg', 'gather', 'dont', 'signific', 'commun', 'spread', 'state', 'im', 'hold', 'news', 'confer', 'fayettevil', '1115', 'provid', 'updat', 'ongo', 'covid19', 'respons', 'watch', 'httpstcolis0p97i2v', '13', 'today', 'arkansa', '16', 'confirm', 'case', 'covid19', 'individu', 'isol', 'situat', 'monitor', 'close', 'announc', 'yesterday', 'activ', 'arkansa', 'nation', 'guard', 'assist', 'covid19', 'respons', 'live', 'daili', 'media', 'brief', 'covid19', '031420', 'httpstco6m2t2zqq60', 'week', 'radio', 'address', 'commend', 'respons', 'respons', 'leader', 'arkansa', 'work', 'togeth', 'prevent', 'spread', 'covid19', 'state', 'httpstcob7hf02q5lm', 'httpstcozcwpcdsolz', 'live', 'daili', 'media', 'brief', 'covid19', 'httpstcoga1wy1cwg', '44', 'ar', 'dept', 'health', 'uamshealth', 'resourc', 'inform', 'regard', 'covid19', 'question', 'show', 'symptom', 'call', '247', 'hotlin', '18008037847', 'visit', 'httpstcoauhqi6j1b7', 'httpstcov7ipofjpoq', 'speak', 'health', 'care', 'profession', '14', 'yesterday', 'announc', 'first', 'presumpt', 'posit', 'case', 'covid19', 'arkansa', 'learn', 'today', 'five', 'addit', 'presumpt', 'posit', 'case', 'state', 'uncommon', 'seen', 'spread', 'progress', 'similar', 'fashion', 'state', '22', 'month', 'ar', 'prepar', 'respond', 'covid19', 'take', 'measur', 'mitig', 'spread', 'viru', 'practic', 'healthi', 'habit', 'wash', 'hand', 'frequent', 'stay', 'home', 'your', 'feel', 'well', 'beyond', 'continu', 'conduct', 'busi', 'normal', 'activ', '55', 'take', 'covid19', 'outbreak', 'serious', 'take', 'precaut', 'im', 'keep', 'normal', 'schedul', 'continu', 'busi', 'go', 'school', 'enjoy', 'beauti', 'spring', '25', 'today', 'confirm', 'case', 'covid19', 'state', 'current', 'monitor', '100', 'travel', 'daili', 'adhpio', 'checkin', 'guidanc', '12', 'neg', 'test', 'result', 'state', 'lab', 'equip', 'test', 'hous']

Create the dictionary and term document matrix in Python.

dictionary = corpora.Dictionary(processed_text)
doc_term_matrix = [dictionary.doc2bow(doc) for doc in processed_text]

Create the LDA Topics model in Python using the same number of topics as used in the Factor Analysis assignment.

lda_model = gensim.models.ldamodel.LdaModel(corpus = doc_term_matrix, #TDM
                                           id2word = dictionary, #Dictionary
                                           num_topics = 10, 
                                           random_state = 100,
                                           update_every = 1,
                                           chunksize = 100,
                                           passes = 10,
                                           alpha = 'auto',
                                           per_word_topics = True)

Create the interactive graphics html file. Please note that this file saves in the same folder as your markdown document, and you should upload the knitted file and the LDA visualization html file.

vis = pyLDAvis.gensim.prepare(lda_model, doc_term_matrix, dictionary, n_jobs = 1)
pyLDAvis.save_html(vis, 'LDA_Visualization_Anvesh.html')

Interpretation

Interpret your topics and compare to MEM themes with PCA. Explain the results from your analysis (at least 5 sentences).
ANSWER: Alpha value for LDA is low (0.01117366) which indicates that higher percentages of documents are classified to one single topic. Higher alpha values for LDA_fixed and LDA_gibbs shows that there is higher spread in topics. Lower entropy values for LDA_fit (0.08919543) implies one single topic has more influence in the document. Rest of the three entropy values are higher implying that the influence is spread Most dominant terms. The analysis and graphs(both in R and python) indicate that topics are based on the idea that every document includes a mix of topics. Covid and live predominant in every document