L’objectif de ce chapitre est de donner quelques notions de text mining ou fouilles de textes.
On peut distinguer deux étapes principales dans les traitements mis en place par la fouille de textes.
Le critère de sélection peut être d’au moins deux types : la nouveauté et la similarité. Celui de la nouveauté d’une connaissance consiste à découvrir des relations, notamment des implications qui n’étaient pas explicites car indirectes, ou découlant de deux éléments éloignés dans le texte. Celui de la similarité ou contradiction par rapport à un autre texte, ou encore la réponse à une question spécifique, consiste à découvrir des textes qui correspondent le plus à un ensemble de descripteurs dans la requête initiale. Les descripteurs sont par exemple les noms et verbes les plus fréquents d’un texte.
Dans ce chapitre on va s’intéresser à l’analyse de quelques discours politiques aux Etats Unis et on passera à la fin des analyses de tweet et d’extractions d’informations à partir de pages Facebook.
RR et création du Corpustm (pour faire du text mining) et word cloud.> library(tm)
Loading required package: NLP
> library(wordcloud)
Loading required package: RColorBrewer> txt=readLines("/Users/dhafermalouche/Documents/Teaching/CoursDataMining_1516/WordCloud/ObamaSpeech2011.txt")
> txt[1:4]
[1] "Mr. Speaker, Mr. Vice President, members of Congress, distinguished guests, and fellow Americans:"
[2] ""
[3] " Tonight I want to begin by congratulating the men and women of the 112th Congress, as well as your new Speaker, John Boehner. (Applause.) And as we mark this occasion, we’re also mindful of the empty chair in this chamber, and we pray for the health of our colleague -- and our friend -– Gabby Giffords. (Applause.)"
[4] "" En effet, ces données contiennent un discours de Obama en 2011 sur l’état de l’Union. Ces discours (State of the Union address) est un évènement annuel où le président des États-Unis présente son programme pour l’année en cours. Ce discours est prononcé à Washington au Capitole, où les deux chambres (la Chambre des représentants et le Sénat) sont réunies.
Pour l’histoire, George Washington donna le premier discours sur l’état de l’Union le 8 janvier 1790 dans la ville de New York, qui à l’époque était la capitale.
> txt <- removePunctuation(txt)
> txt[1:5]
[1] "Mr Speaker Mr Vice President members of Congress distinguished guests and fellow Americans"
[2] ""
[3] " Tonight I want to begin by congratulating the men and women of the 112th Congress as well as your new Speaker John Boehner Applause And as we mark this occasion were also mindful of the empty chair in this chamber and we pray for the health of our colleague and our friend Gabby Giffords Applause"
[4] ""
[5] " Its no secret that those of us here tonight have had our differences over the last two years The debates have been contentious we have fought fiercely for our beliefs And thats a good thing Thats what a robust democracy demands Thats what helps set us apart as a nation" > txt <- removeNumbers(txt)
> txt[1:10]
[1] "Mr Speaker Mr Vice President members of Congress distinguished guests and fellow Americans"
[2] ""
[3] " Tonight I want to begin by congratulating the men and women of the th Congress as well as your new Speaker John Boehner Applause And as we mark this occasion were also mindful of the empty chair in this chamber and we pray for the health of our colleague and our friend Gabby Giffords Applause"
[4] ""
[5] " Its no secret that those of us here tonight have had our differences over the last two years The debates have been contentious we have fought fiercely for our beliefs And thats a good thing Thats what a robust democracy demands Thats what helps set us apart as a nation"
[6] ""
[7] " But theres a reason the tragedy in Tucson gave us pause Amid all the noise and passion and rancor of our public debate Tucson reminded us that no matter who we are or where we come from each of us is a part of something greater something more consequential than party or political preference"
[8] ""
[9] " We are part of the American family We believe that in a country where every race and faith and point of view can be found we are still bound together as one people that we share common hopes and a common creed that the dreams of a little girl in Tucson are not so different than those of our own children and that they all deserve the chance to be fulfilled"
[10] "" > txt <- txt[-which(txt=="")]
> txt[1:10]
[1] "Mr Speaker Mr Vice President members of Congress distinguished guests and fellow Americans"
[2] " Tonight I want to begin by congratulating the men and women of the th Congress as well as your new Speaker John Boehner Applause And as we mark this occasion were also mindful of the empty chair in this chamber and we pray for the health of our colleague and our friend Gabby Giffords Applause"
[3] " Its no secret that those of us here tonight have had our differences over the last two years The debates have been contentious we have fought fiercely for our beliefs And thats a good thing Thats what a robust democracy demands Thats what helps set us apart as a nation"
[4] " But theres a reason the tragedy in Tucson gave us pause Amid all the noise and passion and rancor of our public debate Tucson reminded us that no matter who we are or where we come from each of us is a part of something greater something more consequential than party or political preference"
[5] " We are part of the American family We believe that in a country where every race and faith and point of view can be found we are still bound together as one people that we share common hopes and a common creed that the dreams of a little girl in Tucson are not so different than those of our own children and that they all deserve the chance to be fulfilled"
[6] " That too is what sets us apart as a nation Applause"
[7] " Now by itself this simple recognition wont usher in a new era of cooperation What comes of this moment is up to us What comes of this moment will be determined not by whether we can sit together tonight but whether we can work together tomorrow Applause"
[8] " I believe we can And I believe we must Thats what the people who sent us here expect of us With their votes theyve determined that governing will now be a shared responsibility between parties New laws will only pass with support from Democrats and Republicans We will move forward together or not at all for the challenges we face are bigger than party and bigger than politics"
[9] " At stake right now is not who wins the next election after all we just had an election At stake is whether new jobs and industries take root in this country or somewhere else Its whether the hard work and industry of our people is rewarded Its whether we sustain the leadership that has made America not just a place on a map but the light to the world"
[10] " We are poised for progress Two years after the worst recession most of us have ever known the stock market has come roaring back Corporate profits are up The economy is growing again" > for(i in 1:length(txt))
+ txt[i]=tolower(txt[i])
> txt[1:10]
[1] "mr speaker mr vice president members of congress distinguished guests and fellow americans"
[2] " tonight i want to begin by congratulating the men and women of the th congress as well as your new speaker john boehner applause and as we mark this occasion were also mindful of the empty chair in this chamber and we pray for the health of our colleague and our friend gabby giffords applause"
[3] " its no secret that those of us here tonight have had our differences over the last two years the debates have been contentious we have fought fiercely for our beliefs and thats a good thing thats what a robust democracy demands thats what helps set us apart as a nation"
[4] " but theres a reason the tragedy in tucson gave us pause amid all the noise and passion and rancor of our public debate tucson reminded us that no matter who we are or where we come from each of us is a part of something greater something more consequential than party or political preference"
[5] " we are part of the american family we believe that in a country where every race and faith and point of view can be found we are still bound together as one people that we share common hopes and a common creed that the dreams of a little girl in tucson are not so different than those of our own children and that they all deserve the chance to be fulfilled"
[6] " that too is what sets us apart as a nation applause"
[7] " now by itself this simple recognition wont usher in a new era of cooperation what comes of this moment is up to us what comes of this moment will be determined not by whether we can sit together tonight but whether we can work together tomorrow applause"
[8] " i believe we can and i believe we must thats what the people who sent us here expect of us with their votes theyve determined that governing will now be a shared responsibility between parties new laws will only pass with support from democrats and republicans we will move forward together or not at all for the challenges we face are bigger than party and bigger than politics"
[9] " at stake right now is not who wins the next election after all we just had an election at stake is whether new jobs and industries take root in this country or somewhere else its whether the hard work and industry of our people is rewarded its whether we sustain the leadership that has made america not just a place on a map but the light to the world"
[10] " we are poised for progress two years after the worst recession most of us have ever known the stock market has come roaring back corporate profits are up the economy is growing again" > txt <- removeWords(txt,stopwords("en"))
> txt[1:10]
[1] "mr speaker mr vice president members congress distinguished guests fellow americans"
[2] " tonight want begin congratulating men women th congress well new speaker john boehner applause mark occasion also mindful empty chair chamber pray health colleague friend gabby giffords applause"
[3] " secret us tonight differences last two years debates contentious fought fiercely beliefs thats good thing thats robust democracy demands thats helps set us apart nation"
[4] " theres reason tragedy tucson gave us pause amid noise passion rancor public debate tucson reminded us matter come us part something greater something consequential party political preference"
[5] " part american family believe country every race faith point view can found still bound together one people share common hopes common creed dreams little girl tucson different children deserve chance fulfilled"
[6] " sets us apart nation applause"
[7] " now simple recognition wont usher new era cooperation comes moment us comes moment will determined whether can sit together tonight whether can work together tomorrow applause"
[8] " believe can believe must thats people sent us expect us votes theyve determined governing will now shared responsibility parties new laws will pass support democrats republicans will move forward together challenges face bigger party bigger politics"
[9] " stake right now wins next election just election stake whether new jobs industries take root country somewhere else whether hard work industry people rewarded whether sustain leadership made america just place map light world"
[10] " poised progress two years worst recession us ever known stock market come roaring back corporate profits economy growing " mr, us.> txt <- removeWords(txt,c("mr","us","applause"))
> txt[1:10]
[1] " speaker vice president members congress distinguished guests fellow americans"
[2] " tonight want begin congratulating men women th congress well new speaker john boehner mark occasion also mindful empty chair chamber pray health colleague friend gabby giffords "
[3] " secret tonight differences last two years debates contentious fought fiercely beliefs thats good thing thats robust democracy demands thats helps set apart nation"
[4] " theres reason tragedy tucson gave pause amid noise passion rancor public debate tucson reminded matter come part something greater something consequential party political preference"
[5] " part american family believe country every race faith point view can found still bound together one people share common hopes common creed dreams little girl tucson different children deserve chance fulfilled"
[6] " sets apart nation "
[7] " now simple recognition wont usher new era cooperation comes moment comes moment will determined whether can sit together tonight whether can work together tomorrow "
[8] " believe can believe must thats people sent expect votes theyve determined governing will now shared responsibility parties new laws will pass support democrats republicans will move forward together challenges face bigger party bigger politics"
[9] " stake right now wins next election just election stake whether new jobs industries take root country somewhere else whether hard work industry people rewarded whether sustain leadership made america just place map light world"
[10] " poised progress two years worst recession ever known stock market come roaring back corporate profits economy growing " txt dans un format Corpus puisqu’il puisse être analysé> corpus <- Corpus(VectorSource(txt))
> corpus
<<VCorpus>>
Metadata: corpus specific: 0, document level (indexed): 0
Content: documents: 113> tdm <- TermDocumentMatrix(corpus,control = list(minWordLength=3))
> tdm
<<TermDocumentMatrix (terms: 1558, documents: 113)>>
Non-/sparse entries: 3314/172740
Sparsity : 98%
Maximal term length: 16
Weighting : term frequency (tf)
> dim(tdm)
[1] 1558 113On peut conclure que dans le texte il y a + 1558 mots. + 113 paragraphes
Chaque ligne de la matrice tdm correspond à un mot et chaque colonne correspond à un paragraphe.
> sum((tdm==0))
[1] 172740
> sum((tdm!=0))
[1] 3314Les mot le plus fréquents dans le texte
> m <- as.matrix(tdm)
> freqWords=rowSums(m)
> freqWords=sort(freqWords,d=T)
> t(freqWords[1:10])
will can new people jobs now years thats make just
[1,] 58 37 36 31 25 25 25 24 23 21On décide d’éliminer le mot applause du Corpus
> i=grep('thats',rownames(m))
> m=m[-i,]Cherchons le mot economy dans le texte et sa fréquence d’apparition
> i=grep('economy',rownames(m))
> sum(m[i,])
[1] 7Et le mot security?
> i=grep('security',rownames(m))
> sum(m[i,])
[1] 3> freqWords=rowSums(m)
> v=sort(freqWords,d=T)
> dt=data.frame(word=names(v),freq=v)
> head(dt)
word freq
will will 58
can can 37
new new 36
people people 31
jobs jobs 25
now now 25
> par(bg="gray")
> wordcloud(dt$word,dt$freq,min.freq = 5,stack=T,random.order = F)> freq.terms <- findFreqTerms(tdm, lowfreq = 20)
> freq.terms
[1] "can" "jobs" "just" "make" "new" "now" "people"
[8] "thats" "will" "work" "years"
> term.freq <- rowSums(as.matrix(tdm))
> term.freq <- subset(term.freq, term.freq >= 20)
> df <- data.frame(term = names(term.freq), freq = term.freq)
> library(ggplot2)
Attaching package: 'ggplot2'
The following object is masked from 'package:NLP':
annotate
> ggplot(df, aes(x=term, y=freq)) + geom_bar(stat="identity") +
+ xlab("Terms") + ylab("Count") + coord_flip() +
+ theme(axis.text=element_text(size=7))> findAssocs(tdm, "people", 0.2)
$people
aspirations desire dictator powerful stands
0.50 0.50 0.50 0.50 0.50
supports tunisia writ free saw
0.50 0.50 0.50 0.44 0.34
democratic finally proved purpose america
0.32 0.32 0.32 0.32 0.31
assistance danced dawn events independence
0.31 0.31 0.31 0.31 0.31
lined lost recent scene shown
0.31 0.31 0.31 0.31 0.31
sudan summed war able clear
0.31 0.31 0.29 0.27 0.27
degree must around dreams laws
0.27 0.27 0.23 0.23 0.23
man nearly protect security industry
0.23 0.23 0.23 0.23 0.22
also will
0.21 0.21
> findAssocs(tdm, "job", 0.2)
$job
chances decent downtown finding limited
0.71 0.71 0.71 0.71 0.71
maybe much nearby neighbors occasional
0.71 0.71 0.71 0.71 0.71
paycheck pretty pride probably promotion
0.71 0.71 0.71 0.71 0.71
watching youd factory meant seeing
0.71 0.71 0.49 0.49 0.49
showing hard benefits competition forge
0.49 0.45 0.39 0.39 0.39
worked good act born brave
0.39 0.36 0.34 0.34 0.34
bringing choices combat compromise deficits
0.34 0.34 0.34 0.34 0.34
expressed finish formed heads held
0.34 0.34 0.34 0.34 0.34
houses individual interest iraq iraqi
0.34 0.34 0.34 0.34 0.34
kept lasting patrols principled sides
0.34 0.34 0.34 0.34 0.34
always degree end even kids
0.33 0.33 0.33 0.33 0.33
remember time didnt civilians code
0.33 0.29 0.24 0.23 0.23
commitment ended expectations graduates highest
0.23 0.23 0.23 0.23 0.23
join look members partnership prepared
0.23 0.23 0.23 0.23 0.23
proportion raise rein simplify taxes
0.23 0.23 0.23 0.23 0.23
tough violence company
0.23 0.23 0.21
> findAssocs(tdm, "jobs", 0.2)
$jobs
month agreement doubling export finalized
0.60 0.51 0.51 0.51 0.51
signed unprecedented pass support exports
0.51 0.51 0.47 0.47 0.44
create agreements dreams enterprise factories
0.39 0.36 0.36 0.34 0.34
labor least pursue recently soon
0.34 0.34 0.34 0.34 0.34
train biotechnology careers carolina earning
0.34 0.33 0.33 0.33 0.33
fastchanging furniture hope industry kathy
0.33 0.33 0.33 0.33 0.33
measure measured mother offer old
0.33 0.33 0.33 0.33 0.33
opportunities proctor prospects revitalizing shes
0.33 0.33 0.33 0.33 0.33
surrounding tells thriving todays town
0.33 0.33 0.33 0.33 0.33
turning yardsticks trade business better
0.33 0.33 0.31 0.30 0.29
businesses since abroad colleges goal
0.29 0.28 0.25 0.25 0.25
india products quality sell never
0.25 0.25 0.25 0.25 0.24
innovation america ago home alone
0.23 0.22 0.21 0.21 0.20
breakthroughs electricity else forsyth gone
0.20 0.20 0.20 0.20 0.20
inspire inventors keep korea named
0.20 0.20 0.20 0.20 0.20
north owner paychecks progress sputnik
0.20 0.20 0.20 0.20 0.20
tech told wants woman
0.20 0.20 0.20 0.20 > require(RWeka)
Loading required package: RWeka
> require(SnowballC)
Loading required package: SnowballC
> corpus1 <- Corpus(VectorSource(txt))
> tdm1 <- TermDocumentMatrix(corpus1, control=list(stemming=TRUE))
> tdm1 ## A comparer avec l'ancien tdm
<<TermDocumentMatrix (terms: 1266, documents: 113)>>
Non-/sparse entries: 3260/139798
Sparsity : 98%
Maximal term length: 14
Weighting : term frequency (tf)
> tdm
<<TermDocumentMatrix (terms: 1558, documents: 113)>>
Non-/sparse entries: 3314/172740
Sparsity : 98%
Maximal term length: 16
Weighting : term frequency (tf)
> freq.terms1 <- findFreqTerms(tdm1, lowfreq = 20)
> freq.terms1 ## A comparer avec l'ancien freq.terms
[1] "america" "american" "busi" "can" "come" "govern"
[7] "job" "just" "make" "nation" "need" "new"
[13] "now" "peopl" "that" "will" "work" "year"
> freq.terms
[1] "can" "jobs" "just" "make" "new" "now" "people"
[8] "thats" "will" "work" "years"
> term.freq1 <- rowSums(as.matrix(tdm1))
> term.freq1 <- subset(term.freq1, term.freq1 >= 20)
> df1 <- data.frame(term = names(term.freq1), freq = term.freq1)
> library(ggplot2)
> ggplot(df1, aes(x=term, y=freq)) + geom_bar(stat="identity") +
+ xlab("Terms") + ylab("Count") + coord_flip() +
+ theme(axis.text=element_text(size=7))> library(graph)
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
anyDuplicated, append, as.data.frame, cbind, colnames,
do.call, duplicated, eval, evalq, Filter, Find, get, grep,
grepl, intersect, is.unsorted, lapply, lengths, Map, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
Position, rank, rbind, Reduce, rownames, sapply, setdiff,
sort, table, tapply, union, unique, unsplit
> library(Rgraphviz)
Loading required package: grid
> plot(tdm1, term = freq.terms1, corThreshold = 0.1, weighting = T)On utilise ici la commande LDA du package topicmodels
Dans la commande LDA estime un modèle bayésien hiérarchique à trois niveaux.
Pour plus de lectures : Blei D.M., Ng A.Y., Jordan M.I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993–1022.
> dtm1 <- as.DocumentTermMatrix(tdm1)
> dim(dtm1)
[1] 113 1266
> dtm1[1,]
<<DocumentTermMatrix (documents: 1, terms: 1266)>>
Non-/sparse entries: 9/1257
Sparsity : 99%
Maximal term length: 14
Weighting : term frequency (tf)
> sum(dtm1[1,])
[1] 9
> dtm1[10,]
<<DocumentTermMatrix (documents: 1, terms: 1266)>>
Non-/sparse entries: 17/1249
Sparsity : 99%
Maximal term length: 14
Weighting : term frequency (tf)
> sum(dtm1[10,])
[1] 17
> raw.sum=apply(dtm1,1,sum)
> dtm1=dtm1[raw.sum!=0,] ## On enlève les lignes contenant que des zéros
> library(topicmodels)
> lda1 <- LDA(dtm1, k = 4) ## Chercher les 4 sujets qui peuvent être extraits du discours
> term1 <- terms(lda1, 7) ## Chercher les termes les plus fréquents dans chaque sujet.
> term1
Topic 1 Topic 2 Topic 3 Topic 4
[1,] "will" "can" "peopl" "will"
[2,] "year" "will" "new" "that"
[3,] "can" "make" "america" "new"
[4,] "american" "now" "want" "job"
[5,] "last" "know" "job" "educ"
[6,] "take" "get" "will" "work"
[7,] "school" "need" "year" "compani"L’analyse de sentiment (parfois appelée opinion mining) est la partie du text mining qui essaye de définir les opinions, sentiments et attitudes présente dans un texte ou un ensemble de texte.
Elle est particulièrement utilisé en marketing pour analyser par exemple les commentaires des internautes ou les comparatifs et tests des blogueurs ou encore les réseaux sociaux : une grande part de la littérature sur le sujet concerne par exemple les tweets. Mais elle peut également être utilisée pour sonder l’opinion publique sur un sujet, pour chercher a caractériser les relations sociales dans les forums ou encore pour vérifier si Wikipedia est bien un média neutre.
Installation des packages
> require(devtools)
> install_github("sentiment140", "okugami79")> library(sentiment)
Loading required package: RCurl
Loading required package: bitops
Loading required package: rjson
Loading required package: plyr
Attaching package: 'plyr'
The following object is masked from 'package:graph':
join
> sentiments <- sentiment(txt)
> table(sentiments$polarity)
negative neutral positive
12 80 21 On considère maintenant un deuxième discours de Obama en 2012 sur sur l’état de l’Union. Ce texte a été téléchargé à partir du lien suivant : http://www.foxnews.com/politics/2012/01/24/transcript-obamas-2012-state-union/.
On refait donc le même travail avec l’autre discours après avoir intégré les deux discours dans le même data.frame
> source('/Users/dhafermalouche/Documents/Teaching/CoursDataMining_1516/WordCloud/ObamaSpeechs.R')
>
> ds <- DataframeSource(tmpText)
> head(ds)
$encoding
[1] ""
$length
[1] 2
> inspect(VCorpus(ds))
<<VCorpus>>
Metadata: corpus specific: 0, document level (indexed): 0
Content: documents: 2
[[1]]
<<PlainTextDocument>>
Metadata: 7
Content: chars: 46076
[[2]]
<<PlainTextDocument>>
Metadata: 7
Content: chars: 42559
> corp = Corpus(ds)
> corp = tm_map(corp,removePunctuation)
> corp = tm_map(corp, content_transformer(tolower))
> corp = tm_map(corp,removeNumbers)
> corp = tm_map(corp, function(x){removeWords(x,stopwords())})
> corp = tm_map(corp,function(x){removeWords(x,"applause")})
> term.matrix <- TermDocumentMatrix(corp)
> term.matrix <- as.matrix(term.matrix)
> colnames(term.matrix) <- c("SOTU 2011","SOTU 2012")
>
> comparison.cloud(term.matrix,max.words=300,random.order=FALSE,colors=c("#1F497D","#C0504D"),main="Différences entre 2011 et 2012")R à partir de Facebook, il faut d’abord créer une application sous Facebook. Pour cela il faut aller dans https://developers.facebook.comAprès avoir cliquer sur “Create a New App ID“, choisissez une catégorie pour votre app dans la nouvelle fenêtre.
Vous pouvez cliquer sur “Skip Quick Start” et aller directement dans la configuration de votre application
> install.packages("devtools")
> library(devtools)
> install_github("Rfacebook", "pablobarbera", subdir="Rfacebook")fb_oauth> library("Rfacebook")
> library(Rook)
> fb_oauth <- fbOAuth(app_id="#########", app_secret="################",extended_permissions = TRUE) ### Cette information a été cachée pour car elles contiennent des informations personnelles.
> return on obtient alors la page suivanteR pour extraire le nombre de likes dans une page d’une date à une autre.> ExtractData=function(page,dates){
+ n <- length(dates)-1
+ df <- list()
+ for (i in 1:n){
+ cat(as.character(dates[i]), " ")
+ try(df[[i]] <- getPage(page, token=fb_oauth,n=100, since=dates[i], until=dates[i+1]))
+ cat("\n")
+ }
+ df <- do.call(rbind, df)
+ return(df)
+ }> page="DonaldTrump"
> dates <- seq(as.Date("2015/09/01"), as.Date("2016/8/15"), by="days")
> trump=ExtractData(page,dates)
>
> head(trump)> page="hillaryclinton"
> hillary=ExtractData(page,dates)
> head(hillary)
> > TransData=function(data){
+ x=melt(data[,c(4,8,9,10)],id.vars = "created_time")
+ library(plyr)
+ x$variable=mapvalues(x$variable,from = unique(x$variable),to=c("likes","comments","shares"))
+ df=x
+ x=strptime(df$created_time, "%Y-%m-%dT%H:%M:%S")
+ df$created_time=x
+ df$created_time<- as.Date(df$created_time)
+ Csums=unlist(tapply(df$value,df$variable,cumsum))
+ df$Cumsum=Csums
+ return(df)
+ }> library(reshape2)
> df_trump=TransData(trump)
> df_hillary=TransData(hillary)
>
> df=rbind.data.frame(df_trump[df_trump$variable=="likes",],df_hillary[df_hillary$variable=="shares",])
> df$candidate=c(rep("Trump",sum((df_trump$variable=="likes"))),
+ rep("Hillary",sum((df_hillary$variable=="likes"))))> library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:Rgraphviz':
style
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:graphics':
layout
>
> f <- list(
+ family = "Courier New, monospace",
+ size = 14,
+ color = "#7f7f7f"
+ )
> x <- list(
+ title = "",
+ titlefont = f
+ )
> y <- list(
+ title = "Cumulative number of likes",
+ titlefont = f
+ )
>
> plot_ly(data = df,color = candidate,x=created_time,y=Cumsum)%>%
+ layout(xaxis = x, yaxis = y)> sentiments_hl <- sentiment(hillary$message)
> sentiments_tr <- sentiment(trump$message)> table(sentiments_hl$polarity)
negative neutral positive
196 2140 321
> table(sentiments_tr$polarity)
negative neutral positive
195 1595 547 > sentiments_hl$date=hillary$created_time
> sentiments_tr$date=trump$created_time
>
> x=strptime(sentiments_hl$date, "%Y-%m-%d")
> library(chron)
> y=months(x)
> sentiments_hl$Month=y
>
> x=strptime(sentiments_tr$date, "%Y-%m-%d")
> y=months(x)
> sentiments_tr$Month=y
>
> ## Création d'un score
> sentiments_hl$score <- 0
> sentiments_hl$score[sentiments_hl$polarity == "positive"] <- 1
> sentiments_hl$score[sentiments_hl$polarity == "negative"] <- -1
>
> r_hl<-aggregate(score ~ Month, data = sentiments_hl, sum) ## Le score par mois
>
> sentiments_tr$score <- 0
> sentiments_tr$score[sentiments_tr$polarity == "positive"] <- 1
> sentiments_tr$score[sentiments_tr$polarity == "negative"] <- -1
>
> r_tr<-aggregate(score ~ Month, data = sentiments_tr, sum) ## Le score par mois
>
> rr=rbind.data.frame(r_hl,r_tr)
> rr$candidate=c(rep("hillary",12),rep("trump",12))
> rr$Month=factor(rr$Month,levels=unique(rr$Month)[c(5,4,8,1,9,7,6,2,12,11,10,3)])
> p<-ggplot(rr,aes(x=Month,y=score,col=candidate,fill=candidate))+geom_bar(stat="identity",position="dodge")
> p+coord_flip()