Analyse textuelle des tweets de Donald Trump

Jonathan DAHAN

Introduction


Extraction des tweets


lapply(c("twitteR","dplyr","purrr"),require,character.only=T)

options(httr_oauth_cache=T)
setup_twitter_oauth(
  consumer_key="some_special_key",
  consumer_secret="what_a_secret",
  access_token="access_here",
  access_secret="the_secret"
)
tweets<-userTimeline("realDonaldTrump", 
                     n = 3200)
tweets_df<-tbl_df(map_df(tweets, 
                         as.data.frame))

On obtient quoi ?


Table continues below
text created statusSource favoriteCount retweetCount
FAKE NEWS - A TOTAL POLITICAL WITCH HUNT! 2017-01-11 01:19:23 Twitter for Android 96302 30080
longitude latitude id
NA NA 818990655418617856

Activité

À quelle heure ?

Tweets nocturnes ?


“Just tried watching Saturday Night Live - unwatchable! Totally biased, not funny and the Baldwin impersonation just can’t get any worse”

Popularité


  • nombre moyen de retweets par tweet ?
  • nombre moyen d’ajouts en favoris par tweet ?

Popularité


  • RTs/tweet ~ 19400
  • FVs/tweet ~ 73300

Top RTs/tweet


text favoriteCount created statusSource retweetCount
TODAY WE MAKE AMERICA GREAT AGAIN! 573876 2016-11-08 06:43:14 Twitter for Android 345630

Top FV’s/tweet


Table continues below
text favoriteCount created
Such a beautiful and important evening! The forgotten man and woman will never be forgotten again. We will all come together as never before 634231 2016-11-09 06:36:58
statusSource retweetCount
Twitter for Android 221418

Mots vides


  • “the”, “she”, “from”, “about”
  • français: “le”, “il”, “dans”
  • pas de signification propre: inutile de les indexer

Trump, il poste quoi ?

Des hashtags ?

Combien de caractères ?

D’où sont postés les tweets ?


Source %
Android 50.9
iPhone 42.8
Web Client 5.8
iPad 0.3
Periscope 0.1
Twitter 0.1

Fil quotidien

Hypothèse


Utilisation des guillemets


“‘It wasn’t Donald Trump that divided this country, this country has been divided for a long time!" Stated today by Reverend Franklin Graham.’”

Utilisation des guillemets

Partage d’images/liens

iPhone Vs Android: contenus


  • Utilisation du ratio log-odds
  • “fake”: + chances d’être posté par l’iPhone ou Android ?
  • Comparer des proportions

Ratio log-odds


\[\log_2(\frac{\frac{\mbox{nb Android} + 1}{\mbox{total Android} + 1}} {\frac{\mbox{nb iPhone} + 1}{\mbox{total iPhone} + 1}})\]

log-odds: example


Android –> 3 fois / 10 000
iPhone –> 2 fois / 10 000
\[\text{log-odds}=\log_2(\frac{\frac{\mbox{4}}{\mbox{10001}}} {\frac{\mbox{2}}{\mbox{10001}}})\] \[log_2(2^{1})=1\]

log-odds: iPhone Vs Android

Contenus


  • iPhone: plus de hashtags
  • iPhone: plus de mots pour des annonces (“7pm”, “3pm”, “join”, “tickets”)
  • Android: plus de mots à charge émotionnelle négative (“fake”, “illegal”,“badly”, “failing”)

Tweets / iPhone


  • Annonces, événements…
  • “Join me in Florida this Saturday at 5pm for a rally at the Orlando-Melbourne International Airport!”

Tweets / Android


  • d’un autre type…
  • “The FAKE NEWS media (failing @nytimes, @NBCNews, @ABC, @CBS, @CNN) is not my enemy, it is the enemy of the American People!”

Sentiments


  • NRC Word-Emotion Association Lexicon
  • à chaque mot est associé le sentiment qu’il produit
  • “trust”, “fear”,“negative”, “sadness”, “anger”, “surprise”, “positive”, “disgust”, “joy”, “anticipation”

De quoi ça a l’air ?


word sentiment
fake negative
illegal anger
illegal disgust
illegal fear
failing fear
failing anger
failing sadness

L’heure du bilan


Source Sentiments %
Android negative, anger, disgust, fear, sadness 33.2
iPhone negative, anger, disgust, fear, sadness 15.9

Sentiments: et Mélenchon ?


http://www.lirmm.fr/~abdaoui/publications/FEEL.pdf


- +
62.9 % 10.1 %