What are the most frequent keywords in economic and political science journal articles besides “economic” and “political”? In this homework, I use the rplos package to retreive online economics articles and extract their titles. Secondly, I convert the article titles into VCorpus and use stringr and tm packages to clean the text. Lastly, I apply wordcloud package to create text mining visualization.
The data for this homework is extracted from online economic and political science journal articles using the rplos package.
library(tidytext)
library(stringr)
library(dplyr)
library(qdapRegex)
library(Zelig)
library(rplos)
library(tm)
library(wordcloud)
library(qdap)
options(dplyr.show_progress = FALSE)
#online economic and political science journal articles using rplos package, seting the limit = #200
hw11_text <- plostitle(q = 'economics', fl = 'title,journal', limit = 200)
hw11_text2 <- plostitle(q = 'politics', fl = 'title,journal', limit = 200)
hw11_df <- hw11_text$data
hw11_df2 <- hw11_text2$data
# create a string of all economic and political science journal titles using ' $%! ' as separator
# # Convert to VCorpus
hw11_corpus <- VCorpus(VectorSource(paste(hw11_df$title, collapse = " testing $%! testing ")))
hw11_corpus2 <- VCorpus(VectorSource(paste(hw11_df2$title, collapse = " testing $%! testing ")))
##### Using functions in tm and stringr to clean the data
# replace 'testing' with empty string
hw11_corpus[[1]][1] <- str_replace_all(hw11_corpus[[1]][1], "testing", "")
hw11_corpus <- hw11_corpus %>%
tm_map(stripWhitespace) %>%
tm_map(removeNumbers) %>%
tm_map(removePunctuation) %>%
tm_map(content_transformer(tolower)) %>%
tm_map(removeWords, stopwords("english"))
strwrap(as.character(hw11_corpus[[1]]))
## [1] "impact services economic complexity service sophistication route"
## [2] "economic growth forecasted economic change selffulfilling prophecy"
## [3] "economic decisionmaking economic evidence mhealth systematic"
## [4] "review economic evaluations mhealth solutions economics"
## [5] "reproducibility preclinical research heterogeneous dynamics"
## [6] "economic complexity morals matter economic games economic"
## [7] "framework microbial trade economics epidemic diseases policy"
## [8] "adjustment dynamic economic game economic lives people"
## [9] "disabilities vietnam health economic burden obesity brazil"
## [10] "economic burden cancers indian households correction economics"
## [11] "reproducibility preclinical research economic impact cystic"
## [12] "echinococcosis peru measuring complexity brazilian economic crises"
## [13] "mapping economic costs benefits conservation revising economic"
## [14] "imperative us stem education global economic burden norovirus"
## [15] "gastroenteritis economic inequality predicts biodiversity loss"
## [16] "economic games internet effect stakes economic burden oral cancer"
## [17] "iran complexity centralization fragility economic networks"
## [18] "economic disease burden dengue mexico conspiratorial style lay"
## [19] "economic thinking correction effect perceived regional accents"
## [20] "individual economic behavior lab experiment linguistic performance"
## [21] "cognitive ratings economic decisions hydroeconomic model water"
## [22] "level fluctuations combining limnology economics sustainable"
## [23] "development hydropower effect perceived regional accents"
## [24] "individual economic behavior lab experiment linguistic performance"
## [25] "cognitive ratings economic decisions correction books average"
## [26] "previous decade economic misery cooperative dynamics neighborhood"
## [27] "economic status cities osteoporosis associated vertebral"
## [28] "fractureshealth economic implications knowledge social"
## [29] "affiliations biases economic decisions correction humanistic"
## [30] "economic burden restless legs syndrome global economic impact"
## [31] "manta ray watching tourism correction economic analysis"
## [32] "vaccination strategies prrs control measuring intangibles metrics"
## [33] "economic complexity countries products correction books average"
## [34] "previous decade economic misery socioeconomic predictors"
## [35] "stillbirths nepal credibility crisis research can economics tools"
## [36] "help revisiting effect colonial institutions comparative economic"
## [37] "development drug trafficking organizations local economic activity"
## [38] "mexico research focus nations economic vs altruistic motivations"
## [39] "accessibility socioeconomic development human settlements economic"
## [40] "burden human papillomavirusrelated diseases italy tv viewing bmi"
## [41] "raceethnicity socioeconomic status potential economic burden zika"
## [42] "continental united states approaches refining estimates global"
## [43] "burden economics dengue mycetoma global medical socioeconomic"
## [44] "dilemma efficiency cost economical brain functional networks"
## [45] "information communication technology use economic growth financing"
## [46] "essential hiv services new economic agenda economic costs alcohol"
## [47] "use sri lanka asymmetric power boosts extortion economic"
## [48] "experiment economic disease burden dengue southeast asia"
## [49] "globalization economic growth empirical evidence role"
## [50] "complementarities agricultural trade networks patterns economic"
## [51] "development economic benefits investing womens health systematic"
## [52] "review future tense economic decisions controlling cultural"
## [53] "evolution humanistic economic burden restless legs syndrome real"
## [54] "bogus predicting susceptibility phishing economic experiments"
## [55] "economic impact malignant catarrhal fever pastoralist livelihoods"
## [56] "economic evaluations ehealth technologies systematic review"
## [57] "modeling simulation economics mining bitcoin market effects"
## [58] "intrapersonal anger regulation economic bargaining hiv treatment"
## [59] "prevention issues economic evaluation metastable features economic"
## [60] "networks responses exogenous shocks taxonomy products drives"
## [61] "economic development countries analysis world economic variables"
## [62] "using multidimensional scaling living lions economics coexistence"
## [63] "gir forests india economic conditions predict prevalence west nile"
## [64] "virus global economic health burden human hookworm infection fast"
## [65] "economic development accelerates biological invasions china"
## [66] "evolving landscape economics hiv treatment prevention economic"
## [67] "analysis pandemic influenza vaccination strategies singapore"
## [68] "systematic review health resilience economic crises modeling"
## [69] "health economic burden hepatitis c virus switzerland economic"
## [70] "growth reduce childhood undernutrition ethiopia economic"
## [71] "evaluation vector control age dengue vaccine roles preventive"
## [72] "curative health care economic development economic analysis"
## [73] "vaccination strategies prrs control books average previous decade"
## [74] "economic misery impact immigrants multiagent economical system"
## [75] "cultural diversity economic development societal instability"
## [76] "economic analysis dengue prevention case management maldives item"
## [77] "social economic conservatism scale secs predicting carer health"
## [78] "effects use economic evaluation socioeconomic determinants need"
## [79] "dental care adults equity must accompany economic growth good"
## [80] "health socioeconomic inequalities use postnatal care india"
## [81] "understanding disease control influence epidemiological economic"
## [82] "factors economic growth associated reduction child undernutrition"
## [83] "india economic development wage inequality complex system analysis"
## [84] "socioeconomic burden snakebite sri lanka economic geography united"
## [85] "states commutes megaregions diversity indoor activities economic"
## [86] "development neighborhoods economic decisions others exception loss"
## [87] "aversion law impact menstrual cycle phase economic choice"
## [88] "rationality race neighborhood economic status income inequality"
## [89] "mortality economic burden diseaseassociated malnutrition state"
## [90] "level effects tillage practice soil structure emissions economics"
## [91] "cereal production current socioeconomic conditions central bosnia"
## [92] "herzegovina socioeconomic instability scaling energy use"
## [93] "population size can economic analysis contribute disease"
## [94] "elimination eradication systematic review equivalences biological"
## [95] "economical systems peloton dynamics rebound effect assessment"
## [96] "methodological quality economic evaluations belgian drug"
## [97] "reimbursement applications screening primarycare patients forgoing"
## [98] "health care economic reasons quantifying distribution flow"
## [99] "cytometric tcrvß usage economic statistics correction economic"
## [100] "burden human papillomavirusrelated diseases italy correction"
## [101] "colorectal cancer screening averagerisk north americans economic"
## [102] "evaluation role cognitive emotional perspective taking economic"
## [103] "decision making ultimatum game multiple chronic health conditions"
## [104] "link labour force participation economic status dissociable"
## [105] "influences skewness valence economic choice neural activity"
## [106] "correction noblesse oblige social status economic inequality"
## [107] "maintenance among politicians correction living lions economics"
## [108] "coexistence gir forests india economic impacts nonnative forest"
## [109] "insects continental united states economic burden attributable"
## [110] "childs inpatient admission diarrheal disease rwanda immediate"
## [111] "economic impact maternal deaths rural chinese households global"
## [112] "establishment risk economically important fruit fly species"
## [113] "tephritidae impact different economic factors biological invasions"
## [114] "global scale developing social cultural economic report card"
## [115] "regional industrial harbour framework economic analysis data"
## [116] "collection methods vital statistics quantifying economic cultural"
## [117] "biases social media trending topics comparative economic"
## [118] "evaluation haemophilus influenzae type b vaccination belarus"
## [119] "uzbekistan correction cognitive fatigue destabilizes economic"
## [120] "decision making preferences strategies maternal neonatal mortality"
## [121] "southwest ethiopia estimates socioeconomic inequality mobile phone"
## [122] "call data regional socioeconomic proxy indicator recent trends"
## [123] "economic burden acute myocardial infarction south korea organizing"
## [124] "effects testosterone economic behavior just risk taking separating"
## [125] "macroecological pattern process comparing ecological economic"
## [126] "geological systems economic environmental impacts harmful"
## [127] "nonindigenous species southeast asia health economic benefits"
## [128] "improved injury prevention trauma care worldwide socialeconomic"
## [129] "status cognitive performance among chinese aged years older"
## [130] "economic impact triptan rxtootc switch six eu countries cohort"
## [131] "study lymphatic filariasis socio economic conditions andhra"
## [132] "pradesh india economic benefits sharing redistributing influenza"
## [133] "vaccines shortages occurred economic impact eradicating peste des"
## [134] "petits ruminants benefitcost analysis health disparity still"
## [135] "exists economically welldeveloped society asia impact topology"
## [136] "global macroeconomic network spreading economic crises incidence"
## [137] "hiv windhoek namibia demographic socioeconomic associations"
## [138] "economic burden hypoglycemia patients type diabetes mellitus korea"
## [139] "economic evaluation interventions prevention hospital acquired"
## [140] "infections systematic review smaller cigarette pack commitment"
## [141] "smoke less insights behavioral economics clinical economic impact"
## [142] "probiotics consumption respiratory tract infections projections"
## [143] "canada economic evaluation alongside multinational studies"
## [144] "systematic review empirical studies economic analysis pine"
## [145] "plantation receiving repeated applications biosolids economic"
## [146] "burden selfreported undiagnosed cardiovascular diseases diabetes"
## [147] "indonesian households economic value environmental services"
## [148] "indigenousheld lands australia economic benefits reducing"
## [149] "cardiovascular disease mortality quebec canada lost forgotten"
## [150] "economics improving patient retention aids treatment programs"
## [151] "confidence sharing economic strategy efficient information flows"
## [152] "animal groups economic appraisal ontarios universal influenza"
## [153] "immunization program costutility analysis ethics economics use"
## [154] "primaquine reduce falciparum malaria transmission asymptomatic"
## [155] "populations health economic evaluations visceral leishmaniasis"
## [156] "treatments systematic review economic assessment model rural"
## [157] "remote satellite hemodialysis units socioeconomic drivers bushmeat"
## [158] "consumption west african ebola crisis oral ondansetron"
## [159] "administration emergency departments children gastroenteritis"
## [160] "economic analysis epidemiological economic impact pandemic"
## [161] "influenza chicago priorities vaccine interventions socioeconomic"
## [162] "cultural determinants human african trypanosomiasis kenya uganda"
## [163] "transboundary predictive power air travel socioeconomic data early"
## [164] "pandemic spread impact determinants energy paradigm economic"
## [165] "growth european union global economic impacts climate variability"
## [166] "change th century economic assessment fmdv releases national bio"
## [167] "agro defense facility economic performance sustainability novel"
## [168] "intercropping system north china plain longrun socioeconomic"
## [169] "consequences large disaster earthquake kobe economic impact dengue"
## [170] "illness costeffectiveness future vaccination programs singapore"
## [171] "economic impact hiv antiretroviral therapy education supply high"
## [172] "prevalence regions prosocial behavior increases age across five"
## [173] "economic games investigation law economics based complex network"
## [174] "time series analysis maternal serologic screening prevent"
## [175] "congenital toxoplasmosis decisionanalytic economic model global"
## [176] "economic tradeoffs wild nature tropical agriculture sociocultural"
## [177] "economic valuation ecosystem services provided mediterranean"
## [178] "mountain agroecosystems correction economic burden human"
## [179] "papillomavirusrelated precancers cancers sweden economic"
## [180] "evaluation brief psychodynamic interpersonal therapy patients"
## [181] "multisomatoform disorder estimating demand industrial commercial"
## [182] "land use given economic forecasts nighttime light data good proxy"
## [183] "measure economic activity economic burden human"
## [184] "papillomavirusrelated precancers cancers sweden economic burden"
## [185] "meningitis households kassenanankana district northern ghana"
## [186] "relationship economic status knowledge dengue risk perceptions"
## [187] "practices biologicals small molecules psoriasis systematic review"
## [188] "economic evaluations examining relationship socioeconomic status"
## [189] "wash practices wasting evaluating roles nodes optimal allocation"
## [190] "vaccines economic considerations psychological traces chinas"
## [191] "socioeconomic reforms ultimatum dictator games molecular"
## [192] "epidemiology drug susceptibility economic aspects tuberculosis"
## [193] "mubende district uganda accurate economical detection alk positive"
## [194] "lung adenocarcinoma semiquantitative immunohistochemical screening"
## [195] "health economic recession evidence united kingdom phylogenomic"
## [196] "analysis reveals deep divergence recombination economically"
## [197] "important grapevine virus socioeconomic determinants anemia"
## [198] "pregnancy north shoa zone ethiopia impact economic crises"
## [199] "communicable disease transmission control systematic review"
## [200] "evidence systematic review economic evaluations treatments"
## [201] "borderline personality disorder understanding reduced rotavirus"
## [202] "vaccine efficacy low socioeconomic settings economic cost"
## [203] "campylobacter norovirus rotavirus disease united kingdom impact"
## [204] "economic problems depression single mothers comparative study"
## [205] "married women potential economic value trypanosoma cruzi chagas"
## [206] "disease vaccine latin america misconduct marginality editorial"
## [207] "practices management business economics journals costs rabies"
## [208] "control economic calculation method applied flores island"
## [209] "constructing consumption model fine dining perspective behavioral"
## [210] "economics financial economic costs elimination eradication"
## [211] "onchocerciasis river blindness africa prophylactic antibiotics"
## [212] "prevent cellulitis leg economic analysis patch ii trials economic"
## [213] "impact dengue multicenter study across four brazilian regions"
## [214] "economics dementiacare mapping nursing homes clusterrandomised"
## [215] "controlled trial cognitive fatigue destabilizes economic decision"
## [216] "making preferences strategies exploring effects working endowments"
## [217] "behaviour standard economic games"
hw11_corpus2[[1]][1] <- str_replace_all(hw11_corpus2[[1]][1], "testing", "")
hw11_corpus2 <- hw11_corpus2 %>%
tm_map(stripWhitespace) %>%
tm_map(removeNumbers) %>%
tm_map(removePunctuation) %>%
tm_map(content_transformer(tolower)) %>%
tm_map(removeWords, stopwords("english"))
strwrap(as.character(hw11_corpus2[[1]]))
## [1] "political regimes political ideology selfrated health europe"
## [2] "multilevel analysis care political neutrality hypocritical nature"
## [3] "reaction political bias political reference point geography shapes"
## [4] "political identity perceptions others political affiliation"
## [5] "moderated individual perceivers political attitudes political"
## [6] "institutions historical dynamics schadenfreude spread political"
## [7] "misfortune mobile app new political discussion platform empirical"
## [8] "study effect wechat use college students political discussion"
## [9] "political efficacy opening politics knowledge power bioscience"
## [10] "science must responsible society politics science politics"
## [11] "colorectal cancer screening political cycles beyond rational"
## [12] "expectations political differences past present future life"
## [13] "satisfaction republicans sensitive democrats political climate"
## [14] "engaging extreme activism support others political struggles role"
## [15] "politically motivated fusion outgroups disgust politics sex"
## [16] "exposure disgusting odorant increases politically conservative"
## [17] "views sex decreases support gay marriage cliophysics"
## [18] "sociopolitical reliability theory polity duration african"
## [19] "political instabilities teasing taunting politics politeness high"
## [20] "sociometric status associated expectationconsistent behavior"
## [21] "preliminary support generalized arousal model political"
## [22] "conservatism correction political attitudes develop independently"
## [23] "personality traits converging modalities ground abstract"
## [24] "categories case politics influence political ideology trust"
## [25] "willingness vaccinate corporate philanthropy political influence"
## [26] "health policy menstrual cycle phase predict political conservatism"
## [27] "editorial bias crowdsourced political information disgust"
## [28] "sensitivity neurophysiology leftright political orientations"
## [29] "political attitudes develop independently personality traits"
## [30] "implicit explicit illusory correlation function political ideology"
## [31] "social media analysis political turbulence correction many"
## [32] "political parties brazil datadriven method assess reduce"
## [33] "fragmentation multiparty political systems many political parties"
## [34] "brazil datadriven method assess reduce fragmentation multiparty"
## [35] "political systems wage gap private public sectors encourage"
## [36] "political corruption understanding dynamics violent political"
## [37] "revolutions agentbased framework cultural evolution democracy"
## [38] "saltational changes political regime landscape jointly edit"
## [39] "examining impact community identification political interaction"
## [40] "wikipedia health community respond violent political conflict"
## [41] "political gender gap gender bias facial inferences predict voting"
## [42] "behavior follow money politics embryonic stem cell research"
## [43] "influence climate change efficacy messages efficacy beliefs"
## [44] "intended political participation faces god america revealing"
## [45] "religious diversity across people politics twitterbased analysis"
## [46] "dynamics collective attention political parties tea china"
## [47] "political ideology avoidance dissonancearousing situations digital"
## [48] "design shapes political participation natural experiment social"
## [49] "information competing value segments insight volatile dutch"
## [50] "political landscape size skills suffrage motivated distortions"
## [51] "perceived formidability political leaders effects temperature"
## [52] "political violence global evidence subnational level mental"
## [53] "suffering protracted political conflict feeling broken destroyed"
## [54] "multilevel geographical study italian political elections twitter"
## [55] "data nation binding public service broadcasting mitigates"
## [56] "political selective exposure genetic environmental sources social"
## [57] "political participation adolescence early adulthood shared"
## [58] "cultural history predictor political economic changes among nation"
## [59] "states tweets votes social media quantitative indicator political"
## [60] "behavior political institutional influences use evidence public"
## [61] "health policy systematic review equitable society politics"
## [62] "global fairness paralympic sport pathogens politics evidence"
## [63] "parasite prevalence predicts authoritarianism deadly alliances"
## [64] "death disease global politics public health policy dystopia model"
## [65] "interpretive analysis tobacco industry political activity"
## [66] "collaboration patterns german political science coauthorship"
## [67] "network moral house divided idealized family models impact"
## [68] "political cognition pvsr open source interface big data american"
## [69] "political sphere moral stereotypes liberals conservatives"
## [70] "exaggeration differences across political spectrum votes votes"
## [71] "female male discursive strategies twitter political hashtags"
## [72] "social justice social order binding moralities across political"
## [73] "spectrum emotion regulation foundation political attitudes"
## [74] "reappraisal decrease support conservative policies simplification"
## [75] "shift cognition political difference applying geometric modeling"
## [76] "analysis semantic similarity judgment framing political messages"
## [77] "fit audiences regulatory orientation improve efficacy message"
## [78] "content effects name religious priming ratings wellknown political"
## [79] "figure president barack obama massive experiment choice blindness"
## [80] "political decisions confidence confabulation unconscious detection"
## [81] "selfdeception dominance politics physiology voters testosterone"
## [82] "changes night united states presidential election political"
## [83] "systems affect mobile sessile species diversity legacy postwwii"
## [84] "period beliefs childhood vaccination united states political"
## [85] "ideology false consensus illusion uniqueness climate variability"
## [86] "perceptions political ecology factors influencing changes"
## [87] "pesticide use years zimbabwean smallholder cotton producers polls"
## [88] "can spot dead wrong using choice blindness shift political"
## [89] "attitudes voter intentions understanding public opinion debates"
## [90] "biomedical research looking beyond political partisanship focus"
## [91] "beliefs science society social influence political mobilization"
## [92] "evidence randomized experiment us presidential election health"
## [93] "human rights eastern myanmar political transition populationbased"
## [94] "assessment using multistaged household cluster sampling applying"
## [95] "corporate political activity cpa analysis australian gambling"
## [96] "industry submissions regulation television sports betting"
## [97] "advertising seeing beyond political affiliations mediating role"
## [98] "perceived moral foundations partisan similarityliking effect"
## [99] "postconflict affiliation chimpanzees aggressors otheroriented"
## [100] "versus selfish political strategy advocacy pedestrian safety study"
## [101] "cluster randomised trial evaluating political advocacy approach"
## [102] "reduce pedestrian injuries deprived communities blessing curse"
## [103] "political institutions growth decay generalized trust"
## [104] "crossnational panel analysis field evidence social influence"
## [105] "expression political preferences case secessionists flags"
## [106] "barcelona reconstruction sociosemantic dynamics political activist"
## [107] "twitter networksmethod application french presidential election"
## [108] "geographic evolution political cleavages switzerland network"
## [109] "approach assessing levels dynamics polarization local populations"
## [110] "ingroups arent perceived political belief similarity moderates"
## [111] "religious ingroup favoritism assessing process content politics"
## [112] "developing global health sector strategy sexually transmitted"
## [113] "infections implementation opportunities policymakers influence"
## [114] "urbanism information consumption political dimensions social"
## [115] "capital exploratory study localities adjacent core city bra<U+0219>ov"
## [116] "metropolitan area romania raising political profile neglected"
## [117] "zoonotic diseases three complementary european commissionfunded"
## [118] "projects streamline research build capacity advocate control"
## [119] "political instability supplyside barriers undermine potential high"
## [120] "participation hiv prevention mothertochild transmission"
## [121] "guineabissau retrospective crosssectional study"
#### Creating Wordcloud
#### wordcloud for economic journals
wordcloud(hw11_corpus, max.words = 30, scale = c(8, 1),
colors = topo.colors(n = 30), random.color = TRUE)
The graph above shows a wordcloud for economic journal titles. We can see that the word “economic” dominates the frequency. We want to remove that obvious keyword and re-generate the word cloud again to answer the initial question.
#### wordcloud for political science journals
wordcloud(hw11_corpus2, max.words = 30, scale = c(8, 1),
colors = topo.colors(n = 30), random.color = TRUE)
The graph above shows wordcloud for political science journal titles.Similarly to the “economic” wordcloud, We can see that the word “political” dominates the frequency here. In the same way, we will remove that obvious keyword and re-generate the word cloud again.
## The size of the word "economic" seems to be too large compared to others. Word "economic", likely appears too often in the title text.
title_text <- unlist(hw11_corpus[[1]][1])
title_text2 <- unlist(hw11_corpus2[[1]][1])
##### Find the word frequency using wfm function in qdap pacakge
title_wfm <- wfm(title_text)
title_df <- data.frame(title_wfm)
title_df <- title_df %>% mutate(term = rownames(title_df)) %>% select(term, all)
title_df %>% arrange(desc(all)) %>% head()
## term all
## 1 economic 160
## 2 burden 23
## 3 socioeconomic 20
## 4 impact 19
## 5 economics 18
## 6 analysis 15
title_wfm2 <- wfm(title_text2)
title_df2 <- data.frame(title_wfm2)
title_df2 <- title_df2 %>% mutate(term = rownames(title_df2)) %>% select(term, all)
title_df2 %>% arrange(desc(all)) %>% head()
## term all
## 1 political 83
## 2 politics 13
## 3 social 9
## 4 analysis 7
## 5 health 7
## 6 influence 6
# Add "economic" to the words to be removed
hw11_corpus <- hw11_corpus %>% tm_map(removeWords, c(stopwords("english"),"economic","economics"))
hw11_corpus2 <- hw11_corpus %>% tm_map(removeWords, c(stopwords("english"),"political","politics"))
wordcloud(hw11_corpus, max.words = 30, scale = c(8, 1),
colors = topo.colors(n = 30), random.color = TRUE)
wordcloud(hw11_corpus2, max.words = 30, scale = c(8, 1),
colors = topo.colors(n = 30), random.color = TRUE)
Much to my surprise, after removing the obvious key words from the wordclouds, it seems as if both economics and political science journals have high frequencies of the same two words, “burden” and “socioeconomic”.The former I would have expected moreso with the political science journals, and the latter more fittingly with the economics journals. However I did not expect that both types of journal articles would include those two words with relatively equalfrequency. In this assignment I demonstrated the use of rplos package to retrieve text data from the internet, used tm and stringr package to clean data and plotted data visualization with wordcloud package. To discover the implications of these findings, I would need to complete a sentiment analysis.