Politician analysis

1- Download tweets

I create a list of accounts from which I am going to download their tweets, and also an assignment to know to which political party each account.

The original idea is to have 4 members from each of the two main parties, the last 4 candidates in the 2019 elections and also the 4 party accounts on Twitter.

print("Muestra")
[1] "Muestra"
print(Tweets_DF %>% 
  select(text)%>% 
    head(5))

2-ETL process

We are going to clean up certain elements that can complicate the analysis of the text, such as links, numbers, graphics and turn everything to lowercase.

In addition, we create several fields for what the date is, disaggregating the field into several different ones, which will be used in the future, and the time is also updated to the time zone where the tweet (Argentina) was made.

Finally we filter by date, the idea is to have only tweets from the period 1/3/2020 until the publication date (Sep / 2020)

Tweets_DF <-
  Tweets_DF %>%
  ##Todo el texto a minuscula##
  mutate(text = tolower(text)) %>% 
  ##Sin graficos##
  mutate(text = gsub("[^[:graph:]]", " ", text)) %>% 
  ##Sin links##
  mutate(text = gsub("http//S", " ", text)) %>% 
  ##Sin numeros##
  mutate(text = gsub("[[:digit:]]", " ", text)) %>% 
  ##Sin numeros##
  mutate(text = chartr('áéíóúñ','aeioun',text)) %>%
  ##Cambiamos a la zona horaria correspondiente##
  mutate(created_at = with_tz(created_at, "America/Argentina/Buenos_Aires"))%>% 
  ##Separamos en dia y hora el campo created_at##
  separate(created_at, into = c("date", "hour"), sep = " ")%>% 
  ##Separamos la hora en hora,minutos y segundos##
    separate(hour, into = c("hour", "minutes","seconds"), sep = ":")%>% 
  ##Cambiamos la columna con el nombre del politico##
   rename(Politico = screen_name) %>% 
  ##Creamos una columna con el numero de año, mes, dia, nombre de dia y de mes.##
mutate(periodo = year(date), 
         mes = month(date, label = F, abbr = F),
         dia = as.numeric(day(date)),
         dia_sem = wday(date, label = T, abbr = F, week_start = 1),
         dia_per = yday(date),
         date = as.Date(date) 
  ) %>%
  ##Solo vamos a utilizar info de marzo 2020 en adelante##
  filter(periodo == 2020 & mes > 2) 

print("Muestra")
[1] "Muestra"
print(Tweets_DF %>% 
  select(Politico, status_id, periodo, mes, dia)%>% 
    head(10))
NA

3-Normalization

We normalize some fields, to help the analysis and also the visualization.

We add the match to each of those analyzed, and also change their name from the @ we see on Twitter, to a name easy for everyone to understand

Tweets_DF <-
  Tweets_DF %>%
  mutate (Partido = ifelse (Politico == "alferdez", "Peronismo",
                    ifelse (Politico == "CFKArgentina", "Peronismo",
                    ifelse (Politico == "ginesggarcia", "Peronismo", 
                    ifelse (Politico == "Kicillofok", "Peronismo",
                    ifelse (Politico == "UCRNacional", "cuenta partidaria",        
                    ifelse (Politico == "FrenteDeTodos", "cuenta partidaria", 
                    ifelse (Politico == "proargentina", "cuenta partidaria",
                    ifelse (Politico == "PartidoGEN", "cuenta partidaria",
                    ifelse (Politico == "mauriciomacri", "PRO",
                    ifelse (Politico == "PatoBullrich", "PRO",
                    ifelse (Politico == "horaciorlarreta", "PRO",
                    ifelse (Politico == "FernanQuirosBA", "PRO", 
                            "otros candidatos")))))))))))))

Tweets_DF <-
  Tweets_DF %>%
  mutate (Politico = ifelse (Politico == "alferdez", "A.Fernandez",
                    ifelse (Politico == "CFKArgentina", "C.Kirchner",
                    ifelse (Politico == "ginesggarcia", "Gines.GG", 
                    ifelse (Politico == "Kicillofok", "A.Kicillof",
                    ifelse (Politico == "UCRNacional", "UCR",        
                    ifelse (Politico == "FrenteDeTodos", "TODOS", 
                    ifelse (Politico == "proargentina", "PRO",
                    ifelse (Politico == "PartidoGEN", "GEN",
                    ifelse (Politico == "mauriciomacri", "M.Macri",
                    ifelse (Politico == "PatoBullrich", "P.Bullrich",
                    ifelse (Politico == "horaciorlarreta", "H.Larreta",
                    ifelse (Politico == "FernanQuirosBA", "F.Quiros", 
                    ifelse (Politico == "NicolasdelCano", "N.DelCaño",
                    ifelse (Politico == "jlespert", "J.Espert",
                    ifelse (Politico == "RLavagna", "R.Lavagna",
                            "GomezCenturion"
                            ))))))))))))))))

print("Muestra")
[1] "Muestra"
print(Tweets_DF %>% 
  select(Politico, Partido, source)%>% 
    tail(10))

4-Number of tweets

The first approximation that we are going to have is the number of times each tweeted from March 2020 to the date of publication of the report.

There are considerable differences between all, you should normalize or use proportions more than once

Cantidad_tweets = Tweets_DF %>%
  group_by(Politico, Partido) %>%
  count(Politico)
  
Cantidad_tweets%>%  
  ggplot()+
  aes(x=reorder(Politico, n), y= n, fill= Politico) +
  geom_col() +
  facet_wrap("Partido", scales = "free_y") +
  coord_flip() +
  labs(title = "Cantidad total de tweets", x = "tweets", y = "Cantidad") +
    tema1

5-Date of tweets

We see when each of the analyzed tweets have been published, in order to show when they had more or less action.

Tweets_DF %>%
  filter(Partido == "PRO") %>%
  ggplot(aes(x = as.Date(date), fill = Politico)) +
      geom_histogram(position = "identity", bins = 20, show.legend = FALSE) +
      scale_x_date(date_labels = "%d-%m", date_breaks = "1 month") +
      labs(x = "fecha de publicación", y = "número de tweets") +
      facet_wrap(~ Politico, ncol = 1) +
      tema2 +
      theme(axis.text.x = element_text(angle = 90))


Tweets_DF %>%
  filter(Partido == "Peronismo") %>%
  ggplot(aes(x = as.Date(date), fill = Politico)) +
      geom_histogram(position = "identity", bins = 20, show.legend = FALSE) +
      scale_x_date(date_labels = "%d-%m", date_breaks = "1 month") +
      labs(x = "fecha de publicación", y = "número de tweets") +
      facet_wrap(~ Politico, ncol = 1) +
      tema1 +
      theme(axis.text.x = element_text(angle = 90))


Tweets_DF %>%
  filter(Partido == "otros candidatos") %>%
  ggplot(aes(x = as.Date(date), fill = Politico)) +
      geom_histogram(position = "identity", bins = 20, show.legend = FALSE) +
      scale_x_date(date_labels = "%d-%m", date_breaks = "1 month") +
      labs(x = "fecha de publicación", y = "número de tweets") +
      facet_wrap(~ Politico, ncol = 1) +
      tema1 +
      theme(axis.text.x = element_text(angle = 90))


Tweets_DF %>%
  filter(Partido == "cuenta partidaria") %>%
  ggplot(aes(x = as.Date(date), fill = Politico)) +
      geom_histogram(position = "identity", bins = 20, show.legend = FALSE) +
      scale_x_date(date_labels = "%d-%m", date_breaks = "1 month") +
      labs(x = "fecha de publicación", y = "número de tweets") +
      facet_wrap(~ Politico, ncol = 1) +
      tema2  +
      theme(axis.text.x = element_text(angle = 90))

6-Number of tweets about COVID

The most important topic of the year is the coronavirus, the idea is to see what percentage of the tweets made these months deal with the coronavirus, for that they will look for keywords that determine that the tweet is about the pandemic.

#We look for tweets with the word covid
Palabras_covid <- "covid|covid-19|covid19|coronavirus|#covid|#covid-19|#covid19|#coronavirus|test|testeo|testeos|pcr|serologico|hisopado|antibioticos|aplanar|curva|cuarentena|contagio|enfermedad|epidemia|pandemia|alarma|gel|cuidados|incubacion|jabon|barbijo|barbijos|mascarilla|mascarillas|mers|sars|vacuna|wuhan|oxford|astra|zeneca|transmision|exponencial|casos|duplicacion|distanciamiento|colapso|salud|letalidad|mortalidad|ventilador|icu|uci|uti|inmunidad|serologica|distanciamiento|virus|asintomatico|caso sospechoso|olfato|gusto|terapia|saturacion|clinica|positividad|positivios|rebaño|inmunidad|hospital|hospitales|aspo|aislamiento"
Tweets_DF$Covid <- grepl(Palabras_covid, Tweets_DF$text, ignore.case ="True")

Tweets_DF %>% 
count(Politico, Partido,Covid) %>%
  group_by(Politico) %>%
  mutate(Proporcion = n / sum(n)) %>%
  mutate(Covid = ifelse(Covid == T, "Sobre COVID", "Otro tema"))%>%
ggplot() +
  aes(Politico, Proporcion, fill = Covid) +
  geom_col() +
  scale_y_continuous(labels = percent_format()) +
      facet_wrap("Partido", scales = "free") +
  theme(legend.position = "top")

7- Wordcloud

The idea of ​​the word cloud is to know which are the 200 words that were used the most by those analyzed these months, as expected they stand out “coronavirus”, “covid”, “pandemia” or “cuarentena”

tuits_tokens <-
  Tweets_DF %>%
  unnest_tokens(input = text, output = Palabra, token = "words") %>%
  select(Politico, Palabra, status_id, periodo, mes, hour, Partido) %>%
  mutate(status_id = gsub("<(.*)>+?", "", status_id)) %>%
  filter(!Palabra %in% stopwords("es")) %>%
  filter(!Palabra %in% c("t.co", "https", "vía", "youtube", "amp"))

Palabras_sinhoymas = tuits_tokens  %>%
  filter(Palabra != "mas" & Palabra != "hoy") 

wordcloud(words = Palabras_sinhoymas$Palabra, 
          scale=c(2,.2), 
          max.words=200, random.order=FALSE, rot.per=0.35, 
          colors=brewer.pal(8, "Dark2"),
          )
transformation drops documentstransformation drops documents

8-Download a dictionary

We download a dictionary that has the words in Spanish, and assigns it a value between -5 to 5, showing the positivity or negativity of the word.

We eliminate the word “No” that takes it as negative, when in Spanish it is a connector sometimes, and the word “Negro” (Nigga) that takes it with the maximum negative value

download.file("https://raw.githubusercontent.com/jboscomendoza/rpubs/master/sentimientos_afinn/lexico_afinn.en.es.csv",
              "lexico_afinn.en.es.csv")
probando la URL 'https://raw.githubusercontent.com/jboscomendoza/rpubs/master/sentimientos_afinn/lexico_afinn.en.es.csv'
Content type 'text/plain; charset=utf-8' length 51625 bytes (50 KB)
downloaded 50 KB
afinn <- read.csv("lexico_afinn.en.es.csv", stringsAsFactors = F, fileEncoding = "latin1") %>% 
  tbl_df()
`tbl_df()` is deprecated as of dplyr 1.0.0.
Please use `tibble::as_tibble()` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated.
afinn$Puntuacion <- ifelse(afinn$Palabra == "no", 0, afinn$Puntuacion)
afinn$Puntuacion <- ifelse(afinn$Palabra == "negro", 0, afinn$Puntuacion)

print("Muestra")
[1] "Muestra"
afinn %>%
  select(Palabra, Puntuacion) %>%
    arrange(Puntuacion) %>%
  print(head(10))

afinn %>%
  select(Palabra, Puntuacion) %>%
    arrange(-Puntuacion) %>%
  print(tail(10))

9- Word separation

We separated the different words that each of the politicians used in their tweets, and we eliminated some of Twitter’s own words and the so-called stopwords that are the most frequent words in the Spanish language.

print ("Muestra")
[1] "Muestra"
print (tuits_tokens %>%
    select(Politico, Palabra) %>%    
         head (10))

10- We value words

We join the scoring dictionary with the words that each of those analyzed used, so that each word has a value, and it helps us to analyze what each politician wrote.

The words will be:

tuits_tokens_emociones <-   
 tuits_tokens %>%
inner_join(afinn, ., by = "Palabra") %>%
  mutate(Calificacion = ifelse(Puntuacion > 0, "Positiva", 
                              ifelse(Puntuacion == 0, "Neutral",
                              "Negativa")
                            )
  )      

print ("Muestra")
[1] "Muestra"
print (tuits_tokens_emociones %>%
    select(Politico, Palabra, Puntuacion, Calificacion) %>%    
         tail (10))

11- Who uses most characters?

The idea is to find what is the average length (number of characters) of the tweets made by each of those analyzed.

The further to the right the box is, the longer the tweets they write, in that aspect they stand out:

Tweets_DF %>% 
ggplot()+
  aes(x= Politico, y= display_text_width, color= Politico) +
  geom_boxplot () +
    labs(title = "Largo promedio del tweet", x = "Politico", y = "Cantidad caracteres") +
  coord_flip() +
    tema1

12- Who uses most words?

The idea is to analyze who is the one who used the most different words on average during this time, in this case we are going to divide by the number of tweets he made, so it is normalized for all those analyzed.

Common connectors such as “on”, “to”, “from”, etc. are not counted

Cantidad_palabras= tuits_tokens%>%
  group_by(Politico, Partido)%>%
  count(Politico)%>%
inner_join(Cantidad_tweets, ., by = "Politico")%>%
  mutate(cantidad_promedio = n.y / n.x)


Cantidad_palabras%>% ggplot()+
  aes(x=reorder(Politico, -cantidad_promedio), y= cantidad_promedio, fill= Politico) +
  geom_col() +
  facet_wrap("Partido.x", scales = "free_y") +
  labs(title = "Uso de palabras", x = "tweets", y = "Cantidad") +
  coord_flip() +
    tema1

NA
NA

13- Who uses the different words?

We seek to see the distinctive lexicon that is in each of the accounts, counting their unique words and it shows:

tuits_tokens%>%
  group_by(Politico, Partido)%>%
  distinct(Palabra)%>%
  count(Politico)%>%
inner_join(Cantidad_tweets, ., by = "Politico")  %>%
  mutate(cantidad_promedio = n.y / n.x) %>% 
  ggplot()+
  aes(x=reorder(Politico, cantidad_promedio), y= cantidad_promedio, fill= Politico) +
  geom_col() +
  facet_wrap("Partido.x", scales = "free_y") +
  labs(title = "Palabras distintas", x = "tweets", y = "Cantidad") +
  coord_flip() +
    tema1

NA

14- Most used words

Now that we know with what variety of words, we can analyze which ones they used the most

In this case, each graph has a different scale so that it is not lost due to the number of tweets made.

The most used words were:

tuits_tokens_emociones %>%
    group_by(Partido, Politico) %>%
    count(Palabra, sort = T) %>%
     slice_max(order_by = n, n= 10) %>%
      filter(Partido ==  "PRO") %>%
    ggplot() +
    aes(Palabra, n, fill = Politico) +
    geom_col() +
    facet_wrap("Politico", scales = "free") +
    scale_y_continuous(expand = c(0, 0)) +
    coord_flip() +
    labs(title = "Palabras más usadas") +
     tema1


 tuits_tokens_emociones %>%
    group_by(Partido, Politico) %>%
    count(Palabra, sort = T) %>%
     slice_max(order_by = n, n= 10) %>%
      filter(Partido ==  "Peronismo") %>%
    ggplot() +
    aes(Palabra, n, fill = Politico) +
    geom_col() +
    facet_wrap("Politico", scales = "free") +
    scale_y_continuous(expand = c(0, 0)) +
    coord_flip() +
    labs(title = "Palabras más usadas") +
     tema1

 
  tuits_tokens_emociones %>%
    group_by(Partido, Politico) %>%
    count(Palabra, sort = T) %>%
     slice_max(order_by = n, n= 10) %>%
      filter(Partido ==  "otros candidatos") %>%
    ggplot() +
    aes(Palabra, n, fill = Politico) +
    geom_col() +
    facet_wrap("Politico", scales = "free") +
    scale_y_continuous(expand = c(0, 0)) +
    coord_flip() +
    labs(title = "Palabras más usadas") +
     tema1

  
   tuits_tokens_emociones %>%
    group_by(Partido, Politico) %>%
    count(Palabra, sort = T) %>%
     slice_max(order_by = n, n= 10) %>%
      filter(Partido ==  "cuenta partidaria") %>%
    ggplot() +
    aes(Palabra, n, fill = Politico) +
    geom_col() +
    facet_wrap("Politico", scales = "free") +
    scale_y_continuous(expand = c(0, 0)) +
    coord_flip() +
    labs(title = "Palabras más usadas") +
     tema1

15- Most used positive words

By having a score for each word given by the lexicon dictionary, we can also look for the positive words that each of the politicians use the most.

In this case:

 tuits_tokens_emociones %>%
    filter(Calificacion ==  "Positiva") %>%
    group_by(Partido, Politico) %>%
    count(Palabra, sort = T) %>%
     slice_max(order_by = n, n= 10) %>%
      filter(Partido ==  "PRO") %>%
    ggplot() +
    aes(Palabra, n, fill = Politico) +
    geom_col() +
    facet_wrap("Politico", scales = "free") +
    scale_y_continuous(expand = c(0, 0)) +
    coord_flip() +
    labs(title = "Palabras Positivas más usadas") 


 tuits_tokens_emociones %>%
    filter(Calificacion ==  "Positiva") %>%
    group_by(Partido, Politico) %>%
    count(Palabra, sort = T) %>%
     slice_max(order_by = n, n= 10) %>%
      filter(Partido ==  "Peronismo") %>%
    ggplot() +
    aes(Palabra, n, fill = Politico) +
    geom_col() +
    facet_wrap("Politico", scales = "free") +
    scale_y_continuous(expand = c(0, 0)) +
    coord_flip() +
    labs(title = "Palabras Positivas más usadas") 

 
  tuits_tokens_emociones %>%
    filter(Calificacion ==  "Positiva") %>%
    group_by(Partido, Politico) %>%
    count(Palabra, sort = T) %>%
     slice_max(order_by = n, n= 10) %>%
      filter(Partido ==  "otros candidatos") %>%
    ggplot() +
    aes(Palabra, n, fill = Politico) +
    geom_col() +
    facet_wrap("Politico", scales = "free") +
    scale_y_continuous(expand = c(0, 0)) +
    coord_flip() +
    labs(title = "Palabras Positivas más usadas") 

  
   tuits_tokens_emociones %>%
    filter(Calificacion ==  "Positiva") %>%
    group_by(Partido, Politico) %>%
    count(Palabra, sort = T) %>%
     slice_max(order_by = n, n= 10) %>%
      filter(Partido ==  "cuenta partidaria") %>%
    ggplot() +
    aes(Palabra, n, fill = Politico) +
    geom_col() +
    facet_wrap("Politico", scales = "free") +
    scale_y_continuous(expand = c(0, 0)) +
    coord_flip() +
    labs(title = "Palabras Positivas más usadas") 

16- Most used negative words

By having a score for each word given by the lexicon dictionary, we can also look for the negative words that each of the politicians use the most.

 tuits_tokens_emociones %>%
    filter(Calificacion ==  "Negativa") %>%
    group_by(Partido, Politico) %>%
    count(Palabra, sort = T) %>%
     slice_max(order_by = n, n= 10) %>%
      filter(Partido ==  "PRO") %>%
    ggplot() +
    aes(Palabra, n, fill = Politico) +
    geom_col() +
    facet_wrap("Politico", scales = "free") +
    scale_y_continuous(expand = c(0, 0)) +
    coord_flip() +
    labs(title = "Palabras Negativas más usadas") 


 tuits_tokens_emociones %>%
    filter(Calificacion ==  "Negativa") %>%
    group_by(Partido, Politico) %>%
    count(Palabra, sort = T) %>%
     slice_max(order_by = n, n= 10) %>%
      filter(Partido ==  "Peronismo") %>%
    ggplot() +
    aes(Palabra, n, fill = Politico) +
    geom_col() +
    facet_wrap("Politico", scales = "free") +
    scale_y_continuous(expand = c(0, 0)) +
    coord_flip() +
    labs(title = "Palabras Negativas más usadas") 

 
  tuits_tokens_emociones %>%
    filter(Calificacion ==  "Negativa") %>%
    group_by(Partido, Politico) %>%
    count(Palabra, sort = T) %>%
     slice_max(order_by = n, n= 10) %>%
      filter(Partido ==  "otros candidatos") %>%
    ggplot() +
    aes(Palabra, n, fill = Politico) +
    geom_col() +
    facet_wrap("Politico", scales = "free") +
    scale_y_continuous(expand = c(0, 0)) +
    coord_flip() +
    labs(title = "Palabras Negativas más usadas") 

  
   tuits_tokens_emociones %>%
    filter(Calificacion ==  "Negativa") %>%
    group_by(Partido, Politico) %>%
    count(Palabra, sort = T) %>%
     slice_max(order_by = n, n= 10) %>%
      filter(Partido ==  "cuenta partidaria") %>%
    ggplot() +
    aes(Palabra, n, fill = Politico) +
    geom_col() +
    facet_wrap("Politico", scales = "free") +
    scale_y_continuous(expand = c(0, 0)) +
    coord_flip() +
    labs(title = "Palabras Negativas más usadas") 

17- Feelings in the tweet

Using the afinn dictionary punctuation again, we rejoin the words to the tweets and averaged the points of all the words, moving from the unit word value to a unit value for each tweet posted.

Tweets_DF <-
  tuits_tokens_emociones %>%
  group_by(status_id) %>%
  summarise(Puntuacion_tweet.x = mean(Puntuacion)) %>%
  left_join(Tweets_DF, ., by = "status_id")
`summarise()` ungrouping output (override with `.groups` argument)
Tweets_DF <-  Tweets_DF %>%
  mutate(Puntuacion_tweet.x_letra = ifelse(is.na(Puntuacion_tweet.x), "Neutral",
                                   ifelse(Puntuacion_tweet.x > 0, "Positiva", 
                                    ifelse(Puntuacion_tweet.x == 0, "Neutral",
                              "Negativa")
                            )
  )      
)

Tweets_DF %>%
  count(Politico, Partido, Puntuacion_tweet.x_letra) %>%
  group_by(Politico) %>%
  mutate(Proporcion = n / sum(n)) %>%
ggplot() +
  aes(Politico, Proporcion, fill = Puntuacion_tweet.x_letra) +
  geom_col() +
  scale_y_continuous(labels = percent_format()) +
      facet_wrap("Partido", scales = "free") +
  theme(legend.position = "top")

NA
NA

18- Feelings in the PRO vs Peronismo tweet

We take the 4 members that we have already analyzed from each of the parties (PRO and TODOS), and we unite it in a single graph per party, we see that the distribution is something similar, although Peronism had a little more positive tweets and fewer negative tweets, but not at significant levels.

Tweets_DF %>%
  count(Partido, Puntuacion_tweet.x_letra) %>%
  group_by(Partido) %>%
  filter(Partido == "PRO" |Partido == "Peronismo")%>%
  mutate(Proporcion = n / sum(n)) %>%
ggplot() +
  aes(Partido, Proporcion, fill = Puntuacion_tweet.x_letra) +
  geom_col() +
  scale_y_continuous(labels = percent_format()) +
  theme(legend.position = "top")

19- Feeling month by month

The idea is to analyze if there are fluctuations in what they have been tweeting over time and their feelings.

Tweets_DF$Puntuacion_tweet.x = ifelse(is.na(Tweets_DF$Puntuacion_tweet.x), 0, Tweets_DF$Puntuacion_tweet.x)

Tweets_DF %>%
group_by(Politico, Partido, mes) %>%
  filter(Partido == "PRO")%>%
  summarise(sentimiento = mean(Puntuacion_tweet.x)) %>%
ggplot() +
  aes(mes, sentimiento, color = Politico) +
  geom_hline(yintercept = 0, alpha = .35) +
  geom_line() +
  facet_grid(Politico~.) +
  tema1 +
  theme(legend.position = "none")
`summarise()` regrouping output by 'Politico', 'Partido' (override with `.groups` argument)

Tweets_DF %>%
group_by(Politico, Partido, mes) %>%
  filter(Partido == "otros candidatos")%>%
  summarise(sentimiento = mean(Puntuacion_tweet.x)) %>%
ggplot() +
  aes(mes, sentimiento, color = Politico) +
  geom_hline(yintercept = 0, alpha = .35) +
  geom_line() +
  facet_grid(Politico~.) +
  tema1 +
  theme(legend.position = "none")
`summarise()` regrouping output by 'Politico', 'Partido' (override with `.groups` argument)

Tweets_DF %>%
group_by(Politico, Partido, mes) %>%
  filter(Partido == "Peronismo")%>%
  summarise(sentimiento = mean(Puntuacion_tweet.x)) %>%
ggplot() +
  aes(mes, sentimiento, color = Politico) +
  geom_hline(yintercept = 0, alpha = .35) +
  geom_line() +
  facet_grid(Politico~.) +
  tema1 +
  theme(legend.position = "none")
`summarise()` regrouping output by 'Politico', 'Partido' (override with `.groups` argument)

Tweets_DF %>%
group_by(Politico, Partido, mes) %>%
  filter(Partido == "cuenta partidaria")%>%
  summarise(sentimiento = mean(Puntuacion_tweet.x)) %>%
ggplot() +
  aes(mes, sentimiento, color = Politico) +
  geom_hline(yintercept = 0, alpha = .35) +
  geom_line() +
  facet_grid(Politico~.) +
  tema1 +
  theme(legend.position = "none")
`summarise()` regrouping output by 'Politico', 'Partido' (override with `.groups` argument)

20- Feeling Boxplot

The distribution of feelings among all the tweets, those that are enclosed in the boxes are the normal ones, while the loose points are isolated tweets to what they usually write.

Tweets_DF %>%
  ggplot() +
  aes(Politico, Puntuacion_tweet.x, fill = Politico) +
  geom_boxplot() +
  coord_flip() + 
  labs(y= "Sentimiento") +
  tema1

21- Correlation between PRO vs Peronismo tweeted

It is searched through the words that they used what is the correlation between the different politicians and their tweets, and several observations can be made:

tweets_spread2 <- tuits_tokens %>% 
 filter(Partido ==  "PRO" | Partido == "Peronismo")%>% 
  group_by(Politico, Palabra) %>% 
  count(Palabra) %>%
      spread(key = Politico, value = n, fill = NA, drop = TRUE)
tweets_spread2[is.na(tweets_spread2)] <- 0

names(tweets_spread2) <- c("Palabra", "A.Fernandez", "A.Kicillof", 
                          "C.Kirchner", "F.Quiros", "Gines.GG","H.Larreta", "M.Macri", "P.Bullrich" )

method <- "pearson"
m_cor <- matrix(nrow = 8, ncol = 8)
for (i in 1:dim(m_cor)[1]) {
      for (j in 1:dim(m_cor)[2]) {
            form <- as.formula(paste("~", names(tweets_spread2)[i+1], 
                                      "+", names(tweets_spread2)[j+1]))
            if(i!=j){
                  m_cor[i,j] <- cor.test(form, method = method, 
                                   data = tweets_spread2)$estimate
            }
            if(i==j){m_cor[i,j] <- 1}
      }
}
colnames(m_cor) <- names(tweets_spread2)[2:9]
rownames(m_cor) <- names(tweets_spread2)[2:9]
corrplot(m_cor, method="color", type="upper", order="hclust", 
         addCoef.col = "black", tl.col="black", tl.srt=45,
         sig.level = 0.01, insig = "blank", diag=FALSE)

22- Correlation between the tweeted candidates for president.

tweets_spread2 <- tuits_tokens %>% 
  filter(Partido ==  "otros candidatos" | Politico == "A.Fernandez"| Politico == "M.Macri")%>% 
  group_by(Politico, Palabra) %>% 
  count(Palabra) %>%
      spread(key = Politico, value = n, fill = NA, drop = TRUE)
tweets_spread2[is.na(tweets_spread2)] <- 0

names(tweets_spread2) <- c("Palabra", "A.Fernandez", "J.Espert", 
                          "GomezCenturion", "M.Macri", "N.DelCaño","R.Lavagna")

method <- "pearson"
m_cor <- matrix(nrow = 6, ncol = 6)
for (i in 1:dim(m_cor)[1]) {
      for (j in 1:dim(m_cor)[2]) {
            form <- as.formula(paste("~", names(tweets_spread2)[i+1], 
                                      "+", names(tweets_spread2)[j+1]))
            if(i!=j){
                  m_cor[i,j] <- cor.test(form, method = method, 
                                   data = tweets_spread2)$estimate
            }
            if(i==j){m_cor[i,j] <- 1}
      }
}
colnames(m_cor) <- names(tweets_spread2)[2:7]
rownames(m_cor) <- names(tweets_spread2)[2:7]
corrplot(m_cor, method="color", type="upper", order="hclust", 
         addCoef.col = "black", tl.col="black", tl.srt=45,
         sig.level = 0.01, insig = "blank", diag=FALSE)

23- Correlation between what was tweeted between party accounts

This can be an interesting analysis, since the number of tweets is significant for everyone.

tweets_spread2 <- tuits_tokens %>% 
  filter(Partido ==  "cuenta partidaria")%>% 
  group_by(Politico, Palabra) %>% 
  count(Palabra) %>%
      spread(key = Politico, value = n, fill = NA, drop = TRUE)
tweets_spread2[is.na(tweets_spread2)] <- 0

names(tweets_spread2) <- c("Palabra", "GEN", "PRO", 
                          "TODOS", "UCR")

method <- "pearson"
m_cor <- matrix(nrow = 4, ncol = 4)
for (i in 1:dim(m_cor)[1]) {
      for (j in 1:dim(m_cor)[2]) {
            form <- as.formula(paste("~", names(tweets_spread2)[i+1], 
                                      "+", names(tweets_spread2)[j+1]))
            if(i!=j){
                  m_cor[i,j] <- cor.test(form, method = method, 
                                   data = tweets_spread2)$estimate
            }
            if(i==j){m_cor[i,j] <- 1}
      }
}
colnames(m_cor) <- names(tweets_spread2)[2:5]
rownames(m_cor) <- names(tweets_spread2)[2:5]
corrplot(m_cor, method="color", type="upper", order="hclust", 
         addCoef.col = "black", tl.col="black", tl.srt=45,
         sig.level = 0.01, insig = "blank", diag=FALSE)

24- Macri vs Fernandez word use comparison

The idea of this graph is to show which words are the most different in their use, in this case between Mauricio Macri and Alberto Fernández


# Pivotaje y despivotaje
tweets_unpivot <- tuits_tokens %>% group_by(Politico, Palabra) %>%
      count(Palabra) %>%
      spread(key = Politico, value = n, fill = 0, drop = TRUE) %>% 
      gather(key = "Politico", value = "n", -Palabra)

                  # Selección de los autores
                  tweets_unpivot2 <- tweets_unpivot %>% 
                        filter(Politico %in% c("M.Macri", "A.Fernandez"))
                  # Se añade el total de palabras de cada autor
                  tweets_unpivot2 <- tweets_unpivot2 %>%
                        left_join(Tweets_DF %>% group_by(Politico) %>%
                                        summarise(N = n()), by = "Politico")
`summarise()` ungrouping output (override with `.groups` argument)
                  # Cálculo de odds y log of odds de cada palabra
                  tweets_logOdds <- tweets_unpivot2 %>% 
                        mutate(odds = (n + 1) / (N + 1)) %>%
                        select(Politico, Palabra, odds) %>% 
                        spread(key = Politico, value = odds)
                  tweets_logOdds[,4] <- log(tweets_logOdds[,2]/tweets_logOdds[,3])
                  names(tweets_logOdds)[4] <- "log_odds"
                  tweets_logOdds[,5] <- abs(tweets_logOdds$log_odds)
                  names(tweets_logOdds)[5] <- "abs_log_odds"
                  tweets_logOdds <- tweets_logOdds %>%
                        mutate(autor_frecuente = if_else(log_odds > 0,
                                                         names(tweets_logOdds)[2],
                                                         names(tweets_logOdds)[3]))

Diferencia_AF <- tweets_logOdds %>% 
  arrange(-abs_log_odds, bygroup = FALSE)%>% 
  filter(autor_frecuente == "A.Fernandez")%>% 
  head(15)

Diferencia_MM <- tweets_logOdds %>% 
  arrange(log_odds, bygroup = FALSE)%>% 
  filter(autor_frecuente == "M.Macri")%>% 
  head(15)

Diferencia_AF_MM <- rbind(Diferencia_AF,Diferencia_MM)

Diferencia_AF_MM%>% 
    ggplot(aes(x = reorder(Palabra, log_odds), y= log_odds, fill = autor_frecuente)) +
    geom_col() +
    labs(x = "-palabra", y = "Uso", title = "Fernandez vs Macri") +
  coord_flip() +
  tema2

25- Larreta vs Kicillof word use comparison

The idea of this graph is to show which words are the most different in their use, in this case between Horacio Larreta and Axel Kicillof



# Pivotaje y despivotaje
tweets_unpivot <- tuits_tokens %>% group_by(Politico, Palabra) %>%
      count(Palabra) %>%
      spread(key = Politico, value = n, fill = 0, drop = TRUE) %>% 
      gather(key = "Politico", value = "n", -Palabra)

                  # Selección de los autores
                  tweets_unpivot2 <- tweets_unpivot %>% 
                        filter(Politico %in% c("H.Larreta", "A.Kicillof"))
                  # Se añade el total de palabras de cada autor
                  tweets_unpivot2 <- tweets_unpivot2 %>%
                        left_join(Tweets_DF %>% group_by(Politico) %>%
                                        summarise(N = n()), by = "Politico")
`summarise()` ungrouping output (override with `.groups` argument)
                  # Cálculo de odds y log of odds de cada palabra
                  tweets_logOdds <- tweets_unpivot2 %>% 
                        mutate(odds = (n + 1) / (N + 1)) %>%
                        select(Politico, Palabra, odds) %>% 
                        spread(key = Politico, value = odds) 
                  tweets_logOdds[,4] <- log(tweets_logOdds[,2]/tweets_logOdds[,3])
                  names(tweets_logOdds)[4] <- "log_odds"
                  tweets_logOdds[,5] <- abs(tweets_logOdds$log_odds)
                  names(tweets_logOdds)[5] <- "abs_log_odds"
                  tweets_logOdds <- tweets_logOdds %>%
                        mutate(autor_frecuente = if_else(log_odds > 0,
                                                         names(tweets_logOdds)[2],
                                                         names(tweets_logOdds)[3]))

Diferencia_AK <- tweets_logOdds %>% 
  arrange(-log_odds, bygroup = FALSE)%>% 
  filter(autor_frecuente == "A.Kicillof")%>% 
  head(15)

Diferencia_HL <- tweets_logOdds %>% 
  arrange(abs_log_odds, bygroup = FALSE)%>% 
  filter(autor_frecuente == "H.Larreta")%>% 
  tail(15)

Diferencia_AK_HL <- rbind(Diferencia_AK,Diferencia_HL)

Diferencia_AK_HL%>% 
    ggplot(aes(x = reorder(Palabra, log_odds), y= log_odds, fill = autor_frecuente)) +
    geom_col() +
    labs(x = "-palabra", y = "Uso", title = "Kicillof vs Larreta") +
  coord_flip() +
  tema2

26-Gines vs Quirós word use comparison

The idea of this graph is to show which words are the most different in their use, in this case between Gines Gonzalez and Fernán Quirós

tweets_unpivot <- tuits_tokens %>% group_by(Politico, Palabra) %>%
      count(Palabra) %>%
      spread(key = Politico, value = n, fill = 0, drop = TRUE) %>% 
      gather(key = "Politico", value = "n", -Palabra)

                  # Selección de los autores
                  tweets_unpivot2 <- tweets_unpivot %>% 
                        filter(Politico %in% c("Gines.GG", "F.Quiros"))
                  # Se añade el total de palabras de cada autor
                  tweets_unpivot2 <- tweets_unpivot2 %>%
                        left_join(Tweets_DF %>% group_by(Politico) %>%
                                        summarise(N = n()), by = "Politico")
`summarise()` ungrouping output (override with `.groups` argument)
                  # Cálculo de odds y log of odds de cada palabra
                  tweets_logOdds <- tweets_unpivot2 %>% 
                        mutate(odds = (n + 1) / (N + 1)) %>%
                        select(Politico, Palabra, odds) %>% 
                        spread(key = Politico, value = odds) 
                  tweets_logOdds[,4] <- log(tweets_logOdds[,2]/tweets_logOdds[,3])
                  names(tweets_logOdds)[4] <- "log_odds"
                  tweets_logOdds[,5] <- abs(tweets_logOdds$log_odds)
                  names(tweets_logOdds)[5] <- "abs_log_odds"
                  tweets_logOdds <- tweets_logOdds %>%
                        mutate(autor_frecuente = if_else(log_odds > 0,
                                                         names(tweets_logOdds)[2],
                                                         names(tweets_logOdds)[3]))

Diferencia_GG <- tweets_logOdds %>% 
  arrange(-log_odds, bygroup = FALSE)%>% 
  filter(autor_frecuente == "Gines.GG")%>% 
  tail(15)

Diferencia_FQ <- tweets_logOdds %>% 
  arrange(-abs_log_odds, bygroup = FALSE)%>% 
  filter(autor_frecuente == "F.Quiros")%>% 
  head(15)

Diferencia_GG_FQ <- rbind(Diferencia_GG,Diferencia_FQ)

Diferencia_GG_FQ%>% 
    ggplot(aes(x = reorder(Palabra, log_odds), y= log_odds, fill = autor_frecuente)) +
    geom_col() +
    labs(x = "-palabra", y = "Uso", title = "Quirós vs Gines") +
  coord_flip() +
  tema2

27- Bullrich vs Cristina word use comparison

The idea of this graph is to show which words are the most different in their use, in this case between Cristina Kirchner and Patricia Bullrich

tweets_unpivot <- tuits_tokens %>% group_by(Politico, Palabra) %>%
      count(Palabra) %>%
      spread(key = Politico, value = n, fill = 0, drop = TRUE) %>% 
      gather(key = "Politico", value = "n", -Palabra)

                  # Selección de los autores
                  tweets_unpivot2 <- tweets_unpivot %>% 
                        filter(Politico %in% c("P.Bullrich", "C.Kirchner"))
                  # Se añade el total de palabras de cada autor
                  tweets_unpivot2 <- tweets_unpivot2 %>%
                        left_join(Tweets_DF %>% group_by(Politico) %>%
                                        summarise(N = n()), by = "Politico")
`summarise()` ungrouping output (override with `.groups` argument)
                  # Cálculo de odds y log of odds de cada palabra
                  tweets_logOdds <- tweets_unpivot2 %>% 
                        mutate(odds = (n + 1) / (N + 1)) %>%
                        select(Politico, Palabra, odds) %>% 
                        spread(key = Politico, value = odds) 
                  tweets_logOdds[,4] <- log(tweets_logOdds[,2]/tweets_logOdds[,3])
                  names(tweets_logOdds)[4] <- "log_odds"
                  tweets_logOdds[,5] <- abs(tweets_logOdds$log_odds)
                  names(tweets_logOdds)[5] <- "abs_log_odds"
                  tweets_logOdds <- tweets_logOdds %>%
                        mutate(autor_frecuente = if_else(log_odds > 0,
                                                         names(tweets_logOdds)[2],
                                                         names(tweets_logOdds)[3]))

Diferencia_PB <- tweets_logOdds %>% 
  arrange(-log_odds, bygroup = FALSE)%>% 
  filter(autor_frecuente == "P.Bullrich")%>% 
  tail(15)

Diferencia_CFK <- tweets_logOdds %>% 
  arrange(-abs_log_odds, bygroup = FALSE)%>% 
  filter(autor_frecuente == "C.Kirchner")%>% 
  head(15)

Diferencia_PB_CFK <- rbind(Diferencia_PB,Diferencia_CFK)

Diferencia_PB_CFK%>% 
    ggplot(aes(x = reorder(Palabra, log_odds), y= log_odds, fill = autor_frecuente)) +
    geom_col() +
    labs(x = "-palabra", y = "Uso", title = "Cristina vs Bullrich") +
  coord_flip() +
  tema2

28- Emotions in tweets

Now we analyze a broader field of emotions that were used by the different politicians among all the tweets that have been published in this period.

gather(Tweets_DF_sentimiento, "sentiment", "values", 103:112) %>%
  group_by(Politico, Partido, sentiment) %>%
  filter(Partido ==  "cuenta partidaria")%>%
  filter(sentiment != "negative" & sentiment !="positive")%>%
    summarise(Total = sum(values)) %>%
      mutate(Proporcion = Total / sum(Total)) %>%
ggplot() +
  aes(Politico, Proporcion, fill = sentiment) +
  geom_col(position = "stack", color = "black") +
  coord_flip()  +
  scale_y_continuous(expand = c(0,0)) +
  labs(y = "Palabras") +
  theme_minimal()
`summarise()` regrouping output by 'Politico', 'Partido' (override with `.groups` argument)


gather(Tweets_DF_sentimiento, "sentiment", "values", 103:112) %>%
  group_by(Politico, Partido, sentiment) %>%
  filter(Partido ==  "PRO")%>%
    filter(sentiment != "negative" & sentiment !="positive")%>%
    summarise(Total = sum(values)) %>%
      mutate(Proporcion = Total / sum(Total)) %>%
ggplot() +
  aes(Politico, Proporcion, fill = sentiment) +
  geom_col(position = "stack", color = "black") +
  coord_flip()  +
  scale_y_continuous(expand = c(0,0)) +
  labs(y = "Palabras") +
  theme_minimal()
`summarise()` regrouping output by 'Politico', 'Partido' (override with `.groups` argument)

gather(Tweets_DF_sentimiento, "sentiment", "values", 103:112) %>%
  group_by(Politico, Partido, sentiment) %>%
  filter(Partido ==  "Peronismo")%>%
    filter(sentiment != "negative" & sentiment !="positive")%>%
    summarise(Total = sum(values)) %>%
      mutate(Proporcion = Total / sum(Total)) %>%
ggplot() +
  aes(Politico, Proporcion, fill = sentiment) +
  geom_col(position = "stack", color = "black") +
  coord_flip()  +
  scale_y_continuous(expand = c(0,0)) +
  labs(y = "Palabras") +
  theme_minimal()
`summarise()` regrouping output by 'Politico', 'Partido' (override with `.groups` argument)

gather(Tweets_DF_sentimiento, "sentiment", "values", 103:112) %>%
  group_by(Politico, Partido, sentiment) %>%
  filter(Partido ==  "otros candidatos")%>%
   filter(sentiment != "negative" & sentiment !="positive")%>%
    summarise(Total = sum(values)) %>%
      mutate(Proporcion = Total / sum(Total)) %>%
ggplot() +
  aes(Politico, Proporcion, fill = sentiment) +
  geom_col(position = "stack", color = "black") +
  coord_flip()  +
  scale_y_continuous(expand = c(0,0)) +
  labs(y = "Palabras") +
  theme_minimal()
`summarise()` regrouping output by 'Politico', 'Partido' (override with `.groups` argument)

29- Emotions month by month

Although there are no major changes when it comes to doing sentiment analysis from month to month, there are some details:

gather(Tweets_DF_sentimiento, "sentiment", "values", 103:112) %>%
  group_by(Politico, Partido, mes, sentiment) %>%
  filter(Partido ==  "cuenta partidaria")%>%
  filter(sentiment != "negative" & sentiment !="positive")%>%
    summarise(Total = sum(values)) %>%
      mutate(Proporcion = Total / sum(Total)) %>%
ggplot() +
aes(x = mes, y =Proporcion, color = sentiment) +
  geom_point() +
  geom_line(aes(group = sentiment)) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = .4), 
        text = element_text(family = "serif")) +
  tema2 +
  facet_wrap(~ Politico) +
  labs(title = "Cambio de los sentimientos en el tiempo", 
       x = "Mes", y = "Porporción", color = "Sentimiento") 
`summarise()` regrouping output by 'Politico', 'Partido', 'mes' (override with `.groups` argument)

gather(Tweets_DF_sentimiento, "sentiment", "values", 103:112) %>%
  group_by(Politico, Partido, mes, sentiment) %>%
  filter(Partido ==  "otros candidatos")%>%
   filter(sentiment != "negative" & sentiment !="positive")%>%
    summarise(Total = sum(values)) %>%
      mutate(Proporcion = Total / sum(Total)) %>%
ggplot() +
aes(x = mes, y =Proporcion, color = sentiment) +
  geom_point() +
  geom_line(aes(group = sentiment)) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = .4), 
        text = element_text(family = "serif")) +
  tema2 +
  facet_wrap(~ Politico) +
  labs(title = "Cambio de los sentimientos en el tiempo", 
       x = "Mes", y = "Porporción", color = "Sentimiento") 
`summarise()` regrouping output by 'Politico', 'Partido', 'mes' (override with `.groups` argument)

gather(Tweets_DF_sentimiento, "sentiment", "values", 103:112) %>%
  group_by(Politico, Partido, mes, sentiment) %>%
  filter(Partido ==  "Peronismo")%>%
  filter(sentiment != "negative" & sentiment !="positive")%>%
    summarise(Total = sum(values)) %>%
      mutate(Proporcion = Total / sum(Total)) %>%
ggplot() +
aes(x = mes, y =Proporcion, color = sentiment) +
  geom_point() +
  geom_line(aes(group = sentiment)) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = .4), 
        text = element_text(family = "serif")) +
  tema2 +
  facet_wrap(~ Politico) +
  labs(title = "Cambio de los sentimientos en el tiempo", 
       x = "Mes", y = "Porporción", color = "Sentimiento") 
`summarise()` regrouping output by 'Politico', 'Partido', 'mes' (override with `.groups` argument)

gather(Tweets_DF_sentimiento, "sentiment", "values", 103:112) %>%
  group_by(Politico, Partido, mes, sentiment) %>%
  filter(Partido ==  "PRO")%>%
  filter(sentiment != "negative" & sentiment !="positive")%>%
    summarise(Total = sum(values)) %>%
      mutate(Proporcion = Total / sum(Total)) %>%
ggplot() +
aes(x = mes, y =Proporcion, color = sentiment) +
  geom_point() +
  geom_line(aes(group = sentiment)) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = .4), 
        text = element_text(family = "serif")) +
  tema2 +
  facet_wrap(~ Politico) +
  labs(title = "Cambio de los sentimientos en el tiempo", 
       x = "Mes", y = "Porporción", color = "Sentimiento") 
`summarise()` regrouping output by 'Politico', 'Partido', 'mes' (override with `.groups` argument)

30- Comparison of emotions

The idea of ​​comparing everyone’s emotions on a graph helps for a general visualization (eliminating party accounts):

gather(Tweets_DF_sentimiento, "sentiment", "values", 103:112) %>%
  group_by(Politico, Partido, sentiment) %>%
  filter(sentiment != "negative" & sentiment !="positive" & Partido != "cuenta partidaria")%>%
    summarise(Total = sum(values)) %>%
      mutate(Proporcion = Total / sum(Total)) %>%
    ggplot() +
  aes(Politico, Proporcion, color = sentiment, alpha = Proporcion) +
  geom_point(fill = "white", stroke = 1, shape = 21) +
  geom_text(aes(label = sentiment), vjust = -.9, family = "serif") +
  scale_y_continuous(labels = percent_format ()) +
  tema1 +
  theme(legend.position = "none",
        panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank(),
        text =  element_text(family = "serif")) +
  coord_flip() +
  labs(title = "Sentimientos totales comparativo",
       x = "Politico",
       y = "Proporción del sentimiento")
`summarise()` regrouping output by 'Politico', 'Partido' (override with `.groups` argument)

gather(Tweets_DF_sentimiento, "sentiment", "values", 103:112) %>%
  group_by(Politico, Partido, sentiment) %>%
  filter(sentiment != "positive" & sentiment !="negative" & sentiment !="joy" & sentiment !="surprise" & Partido != "cuenta partidaria")%>%
    summarise(Total = sum(values)) %>%
      mutate(Proporcion = Total / sum(Total)) %>%
  ggplot() +
  aes(sentiment, Proporcion, color = sentiment) +
  geom_point() +
  geom_text(aes(label = Politico) ,vjust = -.3, size = 3) +
  scale_y_continuous(limits = c(0.15, 0.47)) +
   labs(title = "Sentimientos totales comparativo",
       x = "Politico",
       y = "Proporción del sentimiento") +
  theme_minimal() +
  theme(legend.position = "none")
`summarise()` regrouping output by 'Politico', 'Partido' (override with `.groups` argument)

---
title: "Analisis de twitter: Políticos en cuarentena"
output: html_notebook
abstract: The idea of this document is to see how some politicians interacted on Twitter
  from the beginning of the pandemic until September 10. For
  download the tweets using the Twitter API and the rtweet library
---
<center> <span style="color: #02182B;"> <h2> Politician analysis </h2> </span> </center>

<center> <span style="color: #282F44;"> <h3> 1- Download tweets </h3> </span> </center>

I create a list of accounts from which I am going to download their tweets, and also an assignment to know to which political party each account.

The original idea is to have 4 members from each of the two main parties, the last 4 candidates in the 2019 elections and also the 4 party accounts on Twitter.

  -  **Frente de todos**: *Alberto Fernández, Cristina Kirchner, Gines González García y Axel Kicillof*
  -  **PRO**: *Mauricio Macri, Horacio Rodríguez Larreta, Fernán Quiroz y Patricia Bullrich*
  -  **Otros candidatos**: *José Luis Espert, Roberto Lavagna, Juan José Gómez Centurión y Nicolas del Caño*
  -  **Partidarias**: *Frente de Todos, PRO, GEN y Unión Cívica Radical*

```{r}
candidatos <- list("alferdez", 
                   "CFKArgentina", 
                   "ginesggarcia",
                   "Kicillofok",
                   "PartidoGEN", 
                   "FrenteDeTodos",
                    "mauriciomacri",
                   "PatoBullrich",
                   "horaciorlarreta",
                   "FernanQuirosBA",
                   "proargentina",
                   "UCRNacional",
                   "jlespert",
                   "NicolasdelCano",
                   "RLavagna",
                   "juanjomalvinas")

tipocuenta <- list ("Peronismo","cuenta partidaria", "PRO","otros candidatos" )

 Tweets<-map(candidatos, function(x){
    get_timeline(user = x, n = 3200, includeRts = F, excludeReplies = F)
     } )
 
Tweets_DF<- do_call_rbind(Tweets)

print("Muestra")
print(Tweets_DF %>% 
  select(text)%>% 
    head(5))

```


<center> <span style="color: #282F44;"> <h3> 2-ETL process </h3> </span> </center>


We are going to clean up certain elements that can complicate the analysis of the text, such as links, numbers, graphics and turn everything to lowercase.

In addition, we create several fields for what the date is, disaggregating the field into several different ones, which will be used in the future, and the time is also updated to the time zone where the tweet *(Argentina)* was made.

Finally we filter by date, the idea is to have only tweets from the period 1/3/2020 until the publication date *(Sep / 2020)*

```{r}
Tweets_DF <-
  Tweets_DF %>%
  ##All text in lowercase##
  mutate(text = tolower(text)) %>% 
  ##No graphs##
  mutate(text = gsub("[^[:graph:]]", " ", text)) %>% 
  ##No links##
  mutate(text = gsub("http//S", " ", text)) %>% 
  ##No numbers##
  mutate(text = gsub("[[:digit:]]", " ", text)) %>% 
  ##No accent mark##
  mutate(text = chartr('áéíóúñ','aeioun',text)) %>%
  ##We change to the corresponding time zone##
  mutate(created_at = with_tz(created_at, "America/Argentina/Buenos_Aires"))%>% 
  ##We separate the created_a field into day and timet##
  separate(created_at, into = c("date", "hour"), sep = " ")%>% 
  ##We separate the time into hours, minutes and seconds##
    separate(hour, into = c("hour", "minutes","seconds"), sep = ":")%>% 
  ##We change the column with the name of the politician##
   rename(Politico = screen_name) %>% 
  ##We create a column with the number of year, month, day, name of day and month.##
mutate(periodo = year(date), 
         mes = month(date, label = F, abbr = F),
         dia = as.numeric(day(date)),
         dia_sem = wday(date, label = T, abbr = F, week_start = 1),
         dia_per = yday(date),
         date = as.Date(date) 
  ) %>%
  ##We will only use info from March 2020 onwards##
  filter(periodo == 2020 & mes > 2) 

print("Muestra")
print(Tweets_DF %>% 
  select(Politico, status_id, periodo, mes, dia)%>% 
    head(10))

```

<center> <span style="color: #282F44;"> <h3> 3-Normalization </h3> </span> </center>


We normalize some fields, to help the analysis and also the visualization.

We add the match to each of those analyzed, and also change their name from the @ we see on Twitter, to a name easy for everyone to understand


```{r}
Tweets_DF <-
  Tweets_DF %>%
  mutate (Partido = ifelse (Politico == "alferdez", "Peronismo",
                    ifelse (Politico == "CFKArgentina", "Peronismo",
                    ifelse (Politico == "ginesggarcia", "Peronismo", 
                    ifelse (Politico == "Kicillofok", "Peronismo",
                    ifelse (Politico == "UCRNacional", "cuenta partidaria",        
                    ifelse (Politico == "FrenteDeTodos", "cuenta partidaria", 
                    ifelse (Politico == "proargentina", "cuenta partidaria",
                    ifelse (Politico == "PartidoGEN", "cuenta partidaria",
                    ifelse (Politico == "mauriciomacri", "PRO",
                    ifelse (Politico == "PatoBullrich", "PRO",
                    ifelse (Politico == "horaciorlarreta", "PRO",
                    ifelse (Politico == "FernanQuirosBA", "PRO", 
                            "otros candidatos")))))))))))))

Tweets_DF <-
  Tweets_DF %>%
  mutate (Politico = ifelse (Politico == "alferdez", "A.Fernandez",
                    ifelse (Politico == "CFKArgentina", "C.Kirchner",
                    ifelse (Politico == "ginesggarcia", "Gines.GG", 
                    ifelse (Politico == "Kicillofok", "A.Kicillof",
                    ifelse (Politico == "UCRNacional", "UCR",        
                    ifelse (Politico == "FrenteDeTodos", "TODOS", 
                    ifelse (Politico == "proargentina", "PRO",
                    ifelse (Politico == "PartidoGEN", "GEN",
                    ifelse (Politico == "mauriciomacri", "M.Macri",
                    ifelse (Politico == "PatoBullrich", "P.Bullrich",
                    ifelse (Politico == "horaciorlarreta", "H.Larreta",
                    ifelse (Politico == "FernanQuirosBA", "F.Quiros", 
                    ifelse (Politico == "NicolasdelCano", "N.DelCaño",
                    ifelse (Politico == "jlespert", "J.Espert",
                    ifelse (Politico == "RLavagna", "R.Lavagna",
                            "GomezCenturion"
                            ))))))))))))))))

print("Muestra")
print(Tweets_DF %>% 
  select(Politico, Partido, source)%>% 
    tail(10))
```


<center> <span style="color: #282F44;"> <h3> 4-Number of tweets </h3> </span> </center>


The first approximation that we are going to have is the number of times each tweeted from March 2020 to the date of publication of the report.

There are considerable differences between all, you should normalize or use proportions more than once

  - *Macri and Lavagna have fewer tweets than the rest*
  - *Cristina Kirchner is also of little participation*
  - *The accounts of Frente de Todos, Espert, Del Caño and the UCR are the most used*


  
```{r}
Cantidad_tweets = Tweets_DF %>%
  group_by(Politico, Partido) %>%
  count(Politico)
  
Cantidad_tweets%>%  
  ggplot()+
  aes(x=reorder(Politico, n), y= n, fill= Politico) +
  geom_col() +
  facet_wrap("Partido", scales = "free_y") +
  coord_flip() +
  labs(title = "Cantidad total de tweets", x = "tweets", y = "Cantidad") +
    tema1
```


<center> <span style="color: #282F44;"> <h3> 5-Date of tweets </h3> </span> </center>


We see when each of the analyzed tweets have been published, in order to show when they had more or less action.

  - *In the case of the Frente de todos account, as it had more than 3,200 tweets, the analysis starts from the first days of April*

```{r}
Tweets_DF %>%
  filter(Partido == "PRO") %>%
  ggplot(aes(x = as.Date(date), fill = Politico)) +
      geom_histogram(position = "identity", bins = 20, show.legend = FALSE) +
      scale_x_date(date_labels = "%d-%m", date_breaks = "1 month") +
      labs(x = "fecha de publicación", y = "número de tweets") +
      facet_wrap(~ Politico, ncol = 1) +
      tema2 +
      theme(axis.text.x = element_text(angle = 90))

Tweets_DF %>%
  filter(Partido == "Peronismo") %>%
  ggplot(aes(x = as.Date(date), fill = Politico)) +
      geom_histogram(position = "identity", bins = 20, show.legend = FALSE) +
      scale_x_date(date_labels = "%d-%m", date_breaks = "1 month") +
      labs(x = "fecha de publicación", y = "número de tweets") +
      facet_wrap(~ Politico, ncol = 1) +
      tema1 +
      theme(axis.text.x = element_text(angle = 90))

Tweets_DF %>%
  filter(Partido == "otros candidatos") %>%
  ggplot(aes(x = as.Date(date), fill = Politico)) +
      geom_histogram(position = "identity", bins = 20, show.legend = FALSE) +
      scale_x_date(date_labels = "%d-%m", date_breaks = "1 month") +
      labs(x = "fecha de publicación", y = "número de tweets") +
      facet_wrap(~ Politico, ncol = 1) +
      tema1 +
      theme(axis.text.x = element_text(angle = 90))

Tweets_DF %>%
  filter(Partido == "cuenta partidaria") %>%
  ggplot(aes(x = as.Date(date), fill = Politico)) +
      geom_histogram(position = "identity", bins = 20, show.legend = FALSE) +
      scale_x_date(date_labels = "%d-%m", date_breaks = "1 month") +
      labs(x = "fecha de publicación", y = "número de tweets") +
      facet_wrap(~ Politico, ncol = 1) +
      tema2  +
      theme(axis.text.x = element_text(angle = 90))
```



<center> <span style="color: #282F44;"> <h3> 6-Number of tweets about COVID </h3> </span> </center>


The most important topic of the year is the coronavirus, the idea is to see what percentage of the tweets made these months deal with the coronavirus, for that they will look for keywords that determine that the tweet is about the pandemic.

  - *Clearly the ministers are the ones who spoke the most about the Coronavirus, with 75% of their tweets.*
  
  - *Governors Kiciloff and Larreta continue, who discussed the evolution of the pandemic in their districts.*

  - *To a lesser extent the leaders of the two main parties spoke.*

  - *Del Caño is the one who spoke the most among the other candidates, followed by Espert, while Lavagna and Gómez Centurión spoke very little.*

  - *Party accounts, except those of the UCR, were not widely used to talk about the Coronavirus*



```{r}
#We look for tweets with the word covid
Palabras_covid <- "covid|covid-19|covid19|coronavirus|#covid|#covid-19|#covid19|#coronavirus|test|testeo|testeos|pcr|serologico|hisopado|antibioticos|aplanar|curva|cuarentena|contagio|enfermedad|epidemia|pandemia|alarma|gel|cuidados|incubacion|jabon|barbijo|barbijos|mascarilla|mascarillas|mers|sars|vacuna|wuhan|oxford|astra|zeneca|transmision|exponencial|casos|duplicacion|distanciamiento|colapso|salud|letalidad|mortalidad|ventilador|icu|uci|uti|inmunidad|serologica|distanciamiento|virus|asintomatico|caso sospechoso|olfato|gusto|terapia|saturacion|clinica|positividad|positivios|rebaño|inmunidad|hospital|hospitales|aspo|aislamiento"
Tweets_DF$Covid <- grepl(Palabras_covid, Tweets_DF$text, ignore.case ="True")

Tweets_DF %>% 
count(Politico, Partido,Covid) %>%
  group_by(Politico) %>%
  mutate(Proporcion = n / sum(n)) %>%
  mutate(Covid = ifelse(Covid == T, "Sobre COVID", "Otro tema"))%>%
ggplot() +
  aes(Politico, Proporcion, fill = Covid) +
  geom_col() +
  scale_y_continuous(labels = percent_format()) +
      facet_wrap("Partido", scales = "free") +
  theme(legend.position = "top")
```



<center> <span style="color: #282F44;"> <h3>  7- Wordcloud </h3> </span> </center>


The idea of ​​the word cloud is to know which are the 200 words that were used the most by those analyzed these months, as expected they stand out **"coronavirus"**, **"covid"**, **"pandemia"** or **"cuarentena"**


```{r}
tuits_tokens <-
  Tweets_DF %>%
  unnest_tokens(input = text, output = Palabra, token = "words") %>%
  select(Politico, Palabra, status_id, periodo, mes, hour, Partido) %>%
  mutate(status_id = gsub("<(.*)>+?", "", status_id)) %>%
  filter(!Palabra %in% stopwords("es")) %>%
  filter(!Palabra %in% c("t.co", "https", "vÃ­a", "youtube", "amp"))

Palabras_sinhoymas = tuits_tokens  %>%
  filter(Palabra != "mas" & Palabra != "hoy") 

wordcloud(words = Palabras_sinhoymas$Palabra, 
          scale=c(2,.2), 
          max.words=200, random.order=FALSE, rot.per=0.35, 
          colors=brewer.pal(8, "Dark2"),
          )


```



<center> <span style="color: #282F44;"> <h3> 8-Download a dictionary </h3> </span> </center>


We download a dictionary that has the words in Spanish, and assigns it a value between -5 to 5, showing the positivity or negativity of the word.

We eliminate the word *"No"* that takes it as negative, when in Spanish it is a connector sometimes, and the word *“Negro”* (Nigga) that takes it with the maximum negative value


```{r}
download.file("https://raw.githubusercontent.com/jboscomendoza/rpubs/master/sentimientos_afinn/lexico_afinn.en.es.csv",
              "lexico_afinn.en.es.csv")

afinn <- read.csv("lexico_afinn.en.es.csv", stringsAsFactors = F, fileEncoding = "latin1") %>% 
  tbl_df()

afinn$Puntuacion <- ifelse(afinn$Palabra == "no", 0, afinn$Puntuacion)
afinn$Puntuacion <- ifelse(afinn$Palabra == "negro", 0, afinn$Puntuacion)

print("Muestra")
afinn %>%
  select(Palabra, Puntuacion) %>%
    arrange(Puntuacion) %>%
  print(head(10))

afinn %>%
  select(Palabra, Puntuacion) %>%
    arrange(-Puntuacion) %>%
  print(tail(10))
```




<center> <span style="color: #282F44;"> <h3> 9- Word separation </h3> </span> </center>


We separated the different words that each of the politicians used in their tweets, and we eliminated some of Twitter's own words and the so-called stopwords that are the most frequent words in the Spanish language.


```{r}
print ("Muestra")
print (tuits_tokens %>%
    select(Politico, Palabra) %>%    
         head (10))
```

<center> <span style="color: #282F44;"> <h3> 10- We value words </h3> </span> </center>


We join the scoring dictionary with the words that each of those analyzed used, so that each word has a value, and it helps us to analyze what each politician wrote.

The words will be:

  - If they have a value greater than 0 **Positive**
  
  - If they are less than 0 **Negative**
  
  - In case the word does not have a load of feelings **Neutral**


```{r}
tuits_tokens_emociones <-   
 tuits_tokens %>%
inner_join(afinn, ., by = "Palabra") %>%
  mutate(Calificacion = ifelse(Puntuacion > 0, "Positiva", 
                              ifelse(Puntuacion == 0, "Neutral",
                              "Negativa")
                            )
  )      

print ("Muestra")
print (tuits_tokens_emociones %>%
    select(Politico, Palabra, Puntuacion, Calificacion) %>%    
         tail (10))
```



<center> <span style="color: #282F44;"> <h3> 11- Who uses most characters? </h3> </span> </center>


The idea is to find what is the average length (number of characters) of the tweets made by each of those analyzed.

The further to the right the box is, the longer the tweets they write, in that aspect they stand out:

  - *Lavagna, Patricia Bullrich, Larreta and Fernán Quirós are the ones who write the longest tweets.*
  
  - *Those related to the PRO are to write longer tweets*


```{r}
Tweets_DF %>% 
ggplot()+
  aes(x= Politico, y= display_text_width, color= Politico) +
  geom_boxplot () +
    labs(title = "Largo promedio del tweet", x = "Politico", y = "Cantidad caracteres") +
  coord_flip() +
    tema1
```




<center> <span style="color: #282F44;"> <h3> 12- Who uses most words? </h3> </span> </center>


The idea is to analyze who is the one who used the most different words on average during this time, in this case we are going to divide by the number of tweets he made, so it is normalized for all those analyzed.

*Common connectors such as "on", "to", "from", etc. are not counted*

  - *Espert is below 15 words per tweet, it is the least used.*
  - *The rest is in a similar amount of between 15 and 20 words.*



```{r}
Cantidad_palabras= tuits_tokens%>%
  group_by(Politico, Partido)%>%
  count(Politico)%>%
inner_join(Cantidad_tweets, ., by = "Politico")%>%
  mutate(cantidad_promedio = n.y / n.x)


Cantidad_palabras%>% ggplot()+
  aes(x=reorder(Politico, -cantidad_promedio), y= cantidad_promedio, fill= Politico) +
  geom_col() +
  facet_wrap("Partido.x", scales = "free_y") +
  labs(title = "Uso de palabras", x = "tweets", y = "Cantidad") +
  coord_flip() +
    tema1


```




<center> <span style="color: #282F44;"> <h3>  13- Who uses the different words? </h3> </span> </center>

We seek to see the distinctive lexicon that is in each of the accounts, counting their unique words and it shows:

- *Differences between Lavagna, Macri, Cristina with the rest of those who write*

- *Espert that is the one with the least varied lexicon*


```{r}
tuits_tokens%>%
  group_by(Politico, Partido)%>%
  distinct(Palabra)%>%
  count(Politico)%>%
inner_join(Cantidad_tweets, ., by = "Politico")  %>%
  mutate(cantidad_promedio = n.y / n.x) %>% 
  ggplot()+
  aes(x=reorder(Politico, cantidad_promedio), y= cantidad_promedio, fill= Politico) +
  geom_col() +
  facet_wrap("Partido.x", scales = "free_y") +
  labs(title = "Palabras distintas", x = "tweets", y = "Cantidad") +
  coord_flip() +
    tema1
  
```




<center> <span style="color: #282F44;"> <h3> 14- Most used words </h3> </span> </center>

Now that we know with what variety of words, we can analyze which ones they used the most

In this case, each graph has a different scale so that it is not lost due to the number of tweets made.

The most used words were:

- "*avoid*"
- "*debt*"
- "*Justice*"
- "*freedom*" mainly used by Espert.


```{r}
tuits_tokens_emociones %>%
    group_by(Partido, Politico) %>%
    count(Palabra, sort = T) %>%
     slice_max(order_by = n, n= 10) %>%
      filter(Partido ==  "PRO") %>%
    ggplot() +
    aes(Palabra, n, fill = Politico) +
    geom_col() +
    facet_wrap("Politico", scales = "free") +
    scale_y_continuous(expand = c(0, 0)) +
    coord_flip() +
    labs(title = "Palabras más usadas") +
     tema1

 tuits_tokens_emociones %>%
    group_by(Partido, Politico) %>%
    count(Palabra, sort = T) %>%
     slice_max(order_by = n, n= 10) %>%
      filter(Partido ==  "Peronismo") %>%
    ggplot() +
    aes(Palabra, n, fill = Politico) +
    geom_col() +
    facet_wrap("Politico", scales = "free") +
    scale_y_continuous(expand = c(0, 0)) +
    coord_flip() +
    labs(title = "Palabras más usadas") +
     tema1
 
  tuits_tokens_emociones %>%
    group_by(Partido, Politico) %>%
    count(Palabra, sort = T) %>%
     slice_max(order_by = n, n= 10) %>%
      filter(Partido ==  "otros candidatos") %>%
    ggplot() +
    aes(Palabra, n, fill = Politico) +
    geom_col() +
    facet_wrap("Politico", scales = "free") +
    scale_y_continuous(expand = c(0, 0)) +
    coord_flip() +
    labs(title = "Palabras más usadas") +
     tema1
  
   tuits_tokens_emociones %>%
    group_by(Partido, Politico) %>%
    count(Palabra, sort = T) %>%
     slice_max(order_by = n, n= 10) %>%
      filter(Partido ==  "cuenta partidaria") %>%
    ggplot() +
    aes(Palabra, n, fill = Politico) +
    geom_col() +
    facet_wrap("Politico", scales = "free") +
    scale_y_continuous(expand = c(0, 0)) +
    coord_flip() +
    labs(title = "Palabras más usadas") +
     tema1
```




<center> <span style="color: #282F44;"> <h3> 15- Most used positive words </h3> </span> </center>

By having a score for each word given by the lexicon dictionary, we can also look for the positive words that each of the politicians use the most.

In this case:

- Note the use of the words **"freedom"** and **"justice"** which are considered positive

- The use of the word **"thank you"** appears quite a lot, mainly from those accounts related to health **(Gines González García and Fernán Quirós)**


```{r}
 tuits_tokens_emociones %>%
    filter(Calificacion ==  "Positiva") %>%
    group_by(Partido, Politico) %>%
    count(Palabra, sort = T) %>%
     slice_max(order_by = n, n= 10) %>%
      filter(Partido ==  "PRO") %>%
    ggplot() +
    aes(Palabra, n, fill = Politico) +
    geom_col() +
    facet_wrap("Politico", scales = "free") +
    scale_y_continuous(expand = c(0, 0)) +
    coord_flip() +
    labs(title = "Palabras Positivas más usadas") 

 tuits_tokens_emociones %>%
    filter(Calificacion ==  "Positiva") %>%
    group_by(Partido, Politico) %>%
    count(Palabra, sort = T) %>%
     slice_max(order_by = n, n= 10) %>%
      filter(Partido ==  "Peronismo") %>%
    ggplot() +
    aes(Palabra, n, fill = Politico) +
    geom_col() +
    facet_wrap("Politico", scales = "free") +
    scale_y_continuous(expand = c(0, 0)) +
    coord_flip() +
    labs(title = "Palabras Positivas más usadas") 
 
  tuits_tokens_emociones %>%
    filter(Calificacion ==  "Positiva") %>%
    group_by(Partido, Politico) %>%
    count(Palabra, sort = T) %>%
     slice_max(order_by = n, n= 10) %>%
      filter(Partido ==  "otros candidatos") %>%
    ggplot() +
    aes(Palabra, n, fill = Politico) +
    geom_col() +
    facet_wrap("Politico", scales = "free") +
    scale_y_continuous(expand = c(0, 0)) +
    coord_flip() +
    labs(title = "Palabras Positivas más usadas") 
  
   tuits_tokens_emociones %>%
    filter(Calificacion ==  "Positiva") %>%
    group_by(Partido, Politico) %>%
    count(Palabra, sort = T) %>%
     slice_max(order_by = n, n= 10) %>%
      filter(Partido ==  "cuenta partidaria") %>%
    ggplot() +
    aes(Palabra, n, fill = Politico) +
    geom_col() +
    facet_wrap("Politico", scales = "free") +
    scale_y_continuous(expand = c(0, 0)) +
    coord_flip() +
    labs(title = "Palabras Positivas más usadas") 
```




<center> <span style="color: #282F44;"> <h3> 16- Most used negative words </h3> </span> </center>


By having a score for each word given by the lexicon dictionary, we can also look for the negative words that each of the politicians use the most.

  - *Avoid*, *Emergency* or *Problem* are the words that appear the most, within the main topic of speech, which is the pandemic.

  - *Debt* is another widely used word, a difficult subject to avoid this year.



```{r}
 tuits_tokens_emociones %>%
    filter(Calificacion ==  "Negativa") %>%
    group_by(Partido, Politico) %>%
    count(Palabra, sort = T) %>%
     slice_max(order_by = n, n= 10) %>%
      filter(Partido ==  "PRO") %>%
    ggplot() +
    aes(Palabra, n, fill = Politico) +
    geom_col() +
    facet_wrap("Politico", scales = "free") +
    scale_y_continuous(expand = c(0, 0)) +
    coord_flip() +
    labs(title = "Palabras Negativas más usadas") 

 tuits_tokens_emociones %>%
    filter(Calificacion ==  "Negativa") %>%
    group_by(Partido, Politico) %>%
    count(Palabra, sort = T) %>%
     slice_max(order_by = n, n= 10) %>%
      filter(Partido ==  "Peronismo") %>%
    ggplot() +
    aes(Palabra, n, fill = Politico) +
    geom_col() +
    facet_wrap("Politico", scales = "free") +
    scale_y_continuous(expand = c(0, 0)) +
    coord_flip() +
    labs(title = "Palabras Negativas más usadas") 
 
  tuits_tokens_emociones %>%
    filter(Calificacion ==  "Negativa") %>%
    group_by(Partido, Politico) %>%
    count(Palabra, sort = T) %>%
     slice_max(order_by = n, n= 10) %>%
      filter(Partido ==  "otros candidatos") %>%
    ggplot() +
    aes(Palabra, n, fill = Politico) +
    geom_col() +
    facet_wrap("Politico", scales = "free") +
    scale_y_continuous(expand = c(0, 0)) +
    coord_flip() +
    labs(title = "Palabras Negativas más usadas") 
  
   tuits_tokens_emociones %>%
    filter(Calificacion ==  "Negativa") %>%
    group_by(Partido, Politico) %>%
    count(Palabra, sort = T) %>%
     slice_max(order_by = n, n= 10) %>%
      filter(Partido ==  "cuenta partidaria") %>%
    ggplot() +
    aes(Palabra, n, fill = Politico) +
    geom_col() +
    facet_wrap("Politico", scales = "free") +
    scale_y_continuous(expand = c(0, 0)) +
    coord_flip() +
    labs(title = "Palabras Negativas más usadas") 
```




<center> <span style="color: #282F44;"> <h3>  17- Feelings in the tweet </h3> </span> </center>


Using the afinn dictionary punctuation again, we rejoin the words to the tweets and averaged the points of all the words, moving from the unit word value to a unit value for each tweet posted.

  - *Most tweets were neutral.*
  
  - *The most positive in this course were M. Macri, A. Fernandez and R. Lavagna.*
  
  - *The most negative were N. Del Caño and the official PRO account.*


```{r}
Tweets_DF <-
  tuits_tokens_emociones %>%
  group_by(status_id) %>%
  summarise(Puntuacion_tweet.x = mean(Puntuacion)) %>%
  left_join(Tweets_DF, ., by = "status_id")


Tweets_DF <-  Tweets_DF %>%
  mutate(Puntuacion_tweet.x_letra = ifelse(is.na(Puntuacion_tweet.x), "Neutral",
                                   ifelse(Puntuacion_tweet.x > 0, "Positiva", 
                                    ifelse(Puntuacion_tweet.x == 0, "Neutral",
                              "Negativa")
                            )
  )      
)

Tweets_DF %>%
  count(Politico, Partido, Puntuacion_tweet.x_letra) %>%
  group_by(Politico) %>%
  mutate(Proporcion = n / sum(n)) %>%
ggplot() +
  aes(Politico, Proporcion, fill = Puntuacion_tweet.x_letra) +
  geom_col() +
  scale_y_continuous(labels = percent_format()) +
      facet_wrap("Partido", scales = "free") +
  theme(legend.position = "top")


```




<center> <span style="color: #282F44;"> <h3> 18- Feelings in the PRO vs Peronismo tweet </h3> </span> </center>


We take the 4 members that we have already analyzed from each of the parties *(PRO and TODOS)*, and we unite it in a single graph per party, we see that the distribution is something similar, although Peronism had a little more positive tweets and fewer negative tweets, but not at significant levels.



```{r}
Tweets_DF %>%
  count(Partido, Puntuacion_tweet.x_letra) %>%
  group_by(Partido) %>%
  filter(Partido == "PRO" |Partido == "Peronismo")%>%
  mutate(Proporcion = n / sum(n)) %>%
ggplot() +
  aes(Partido, Proporcion, fill = Puntuacion_tweet.x_letra) +
  geom_col() +
  scale_y_continuous(labels = percent_format()) +
  theme(legend.position = "top")
```




<center> <span style="color: #282F44;"> <h3>  19- Feeling month by month </h3> </span> </center>


The idea is to analyze if there are fluctuations in what they have been tweeting over time and their feelings.

  - In the **PRO** there were no big changes, the last tweets of Bullrich are more positive.
  
  - Among the other candidates **Nicolas Del Caño** is always negative, but has been less negative lately.
  
  - Among the **Frente de Todos** there is a lot of fluctuation, the President seems to be on a path towards negativity.
  

```{r}
Tweets_DF$Puntuacion_tweet.x = ifelse(is.na(Tweets_DF$Puntuacion_tweet.x), 0, Tweets_DF$Puntuacion_tweet.x)

Tweets_DF %>%
group_by(Politico, Partido, mes) %>%
  filter(Partido == "PRO")%>%
  summarise(sentimiento = mean(Puntuacion_tweet.x)) %>%
ggplot() +
  aes(mes, sentimiento, color = Politico) +
  geom_hline(yintercept = 0, alpha = .35) +
  geom_line() +
  facet_grid(Politico~.) +
  tema1 +
  theme(legend.position = "none")

Tweets_DF %>%
group_by(Politico, Partido, mes) %>%
  filter(Partido == "otros candidatos")%>%
  summarise(sentimiento = mean(Puntuacion_tweet.x)) %>%
ggplot() +
  aes(mes, sentimiento, color = Politico) +
  geom_hline(yintercept = 0, alpha = .35) +
  geom_line() +
  facet_grid(Politico~.) +
  tema1 +
  theme(legend.position = "none")

Tweets_DF %>%
group_by(Politico, Partido, mes) %>%
  filter(Partido == "Peronismo")%>%
  summarise(sentimiento = mean(Puntuacion_tweet.x)) %>%
ggplot() +
  aes(mes, sentimiento, color = Politico) +
  geom_hline(yintercept = 0, alpha = .35) +
  geom_line() +
  facet_grid(Politico~.) +
  tema1 +
  theme(legend.position = "none")

Tweets_DF %>%
group_by(Politico, Partido, mes) %>%
  filter(Partido == "cuenta partidaria")%>%
  summarise(sentimiento = mean(Puntuacion_tweet.x)) %>%
ggplot() +
  aes(mes, sentimiento, color = Politico) +
  geom_hline(yintercept = 0, alpha = .35) +
  geom_line() +
  facet_grid(Politico~.) +
  tema1 +
  theme(legend.position = "none")
```




<center> <span style="color: #282F44;"> <h3>  20- Feeling Boxplot </h3> </span> </center>



The distribution of feelings among all the tweets, those that are enclosed in the boxes are the normal ones, while the loose points are isolated tweets to what they usually write.

  - *Espert does not present any kind of pattern of feelings*


```{r}
Tweets_DF %>%
  ggplot() +
  aes(Politico, Puntuacion_tweet.x, fill = Politico) +
  geom_boxplot() +
  coord_flip() + 
  labs(y= "Sentimiento") +
  tema1
```




<center> <span style="color: #282F44;"> <h3> 21- Correlation between PRO vs Peronismo tweeted</h3> </span> </center>

It is searched through the words that they used what is the correlation between the different politicians and their tweets, and several observations can be made:

  - *Quirós and Larreta are the ones with the highest correlation, demonstrating good management of Buenos Aires communication, where both continue in the same direction. *

  - *Macri is the one with the lowest correlation with the rest of the participants, but it also makes sense that his highest level of relationship is with Bullrich. *

  - *Between Kiciloff and Gines, there is an important relationship, as well as the two of them with their similar porteños (Quirós and Larreta). *

  - *Cristina Kirchner is another who does not have a high level of relationship in her tweets with other politicians. *

  - *It is surprising that one of the highest levels of relationship in terms of what is communicated is between Alberto Fernández and Patricia Bullrich. *



```{r}
tweets_spread2 <- tuits_tokens %>% 
 filter(Partido ==  "PRO" | Partido == "Peronismo")%>% 
  group_by(Politico, Palabra) %>% 
  count(Palabra) %>%
      spread(key = Politico, value = n, fill = NA, drop = TRUE)
tweets_spread2[is.na(tweets_spread2)] <- 0

names(tweets_spread2) <- c("Palabra", "A.Fernandez", "A.Kicillof", 
                          "C.Kirchner", "F.Quiros", "Gines.GG","H.Larreta", "M.Macri", "P.Bullrich" )

method <- "pearson"
m_cor <- matrix(nrow = 8, ncol = 8)
for (i in 1:dim(m_cor)[1]) {
      for (j in 1:dim(m_cor)[2]) {
            form <- as.formula(paste("~", names(tweets_spread2)[i+1], 
                                      "+", names(tweets_spread2)[j+1]))
            if(i!=j){
                  m_cor[i,j] <- cor.test(form, method = method, 
                                   data = tweets_spread2)$estimate
            }
            if(i==j){m_cor[i,j] <- 1}
      }
}
colnames(m_cor) <- names(tweets_spread2)[2:9]
rownames(m_cor) <- names(tweets_spread2)[2:9]
corrplot(m_cor, method="color", type="upper", order="hclust", 
         addCoef.col = "black", tl.col="black", tl.srt=45,
         sig.level = 0.01, insig = "blank", diag=FALSE)

```




<center> <span style="color: #282F44;"> <h3> 22- Correlation between the tweeted candidates for president. </h3> </span> </center>


- The relationship between what the candidates write has a cluster that stands out above the rest, which is the relationship between Espert, Gómez Centurión and Alberto Fernández.

- Nicolas Del Caño has a high level of relationship with Gómez Centurión, and to a medium extent with Alberto Fernández and Espert.

- Lavagna and Macri do not present a great correlation with the rest of the politicians.


```{r}
tweets_spread2 <- tuits_tokens %>% 
  filter(Partido ==  "otros candidatos" | Politico == "A.Fernandez"| Politico == "M.Macri")%>% 
  group_by(Politico, Palabra) %>% 
  count(Palabra) %>%
      spread(key = Politico, value = n, fill = NA, drop = TRUE)
tweets_spread2[is.na(tweets_spread2)] <- 0

names(tweets_spread2) <- c("Palabra", "A.Fernandez", "J.Espert", 
                          "GomezCenturion", "M.Macri", "N.DelCaño","R.Lavagna")

method <- "pearson"
m_cor <- matrix(nrow = 6, ncol = 6)
for (i in 1:dim(m_cor)[1]) {
      for (j in 1:dim(m_cor)[2]) {
            form <- as.formula(paste("~", names(tweets_spread2)[i+1], 
                                      "+", names(tweets_spread2)[j+1]))
            if(i!=j){
                  m_cor[i,j] <- cor.test(form, method = method, 
                                   data = tweets_spread2)$estimate
            }
            if(i==j){m_cor[i,j] <- 1}
      }
}
colnames(m_cor) <- names(tweets_spread2)[2:7]
rownames(m_cor) <- names(tweets_spread2)[2:7]
corrplot(m_cor, method="color", type="upper", order="hclust", 
         addCoef.col = "black", tl.col="black", tl.srt=45,
         sig.level = 0.01, insig = "blank", diag=FALSE)
```




<center> <span style="color: #282F44;"> <h3> 23- Correlation between what was tweeted between party accounts</h3> </span> </center>


This can be an interesting analysis, since the number of tweets is significant for everyone.

- Surprisingly, the maximum level of relationship occurs between the UCR and TODOS, two parties that today are presented as opposite

- The high relationship between the PRO and the UCR makes more sense.

- The GEN seems to be the party that writes more differently from the rest.

- However, we can note that, unlike the individual accounts, the supporters have more relationship, due to their more neutral language and organic communication.


```{r}
tweets_spread2 <- tuits_tokens %>% 
  filter(Partido ==  "cuenta partidaria")%>% 
  group_by(Politico, Palabra) %>% 
  count(Palabra) %>%
      spread(key = Politico, value = n, fill = NA, drop = TRUE)
tweets_spread2[is.na(tweets_spread2)] <- 0

names(tweets_spread2) <- c("Palabra", "GEN", "PRO", 
                          "TODOS", "UCR")

method <- "pearson"
m_cor <- matrix(nrow = 4, ncol = 4)
for (i in 1:dim(m_cor)[1]) {
      for (j in 1:dim(m_cor)[2]) {
            form <- as.formula(paste("~", names(tweets_spread2)[i+1], 
                                      "+", names(tweets_spread2)[j+1]))
            if(i!=j){
                  m_cor[i,j] <- cor.test(form, method = method, 
                                   data = tweets_spread2)$estimate
            }
            if(i==j){m_cor[i,j] <- 1}
      }
}
colnames(m_cor) <- names(tweets_spread2)[2:5]
rownames(m_cor) <- names(tweets_spread2)[2:5]
corrplot(m_cor, method="color", type="upper", order="hclust", 
         addCoef.col = "black", tl.col="black", tl.srt=45,
         sig.level = 0.01, insig = "blank", diag=FALSE)
```





<center> <span style="color: #282F44;"> <h3> 24- Macri vs Fernandez word use comparison </h3> </span> </center>


The idea of this graph is to show which words are the most different in their use, in this case between *Mauricio Macri* and *Alberto Fernández*



```{r}

# Pivotaje y despivotaje
tweets_unpivot <- tuits_tokens %>% group_by(Politico, Palabra) %>%
      count(Palabra) %>%
      spread(key = Politico, value = n, fill = 0, drop = TRUE) %>% 
      gather(key = "Politico", value = "n", -Palabra)


                  # Selección de los autores
                  tweets_unpivot2 <- tweets_unpivot %>% 
                        filter(Politico %in% c("M.Macri", "A.Fernandez"))
                  # Se añade el total de palabras de cada autor
                  tweets_unpivot2 <- tweets_unpivot2 %>%
                        left_join(Tweets_DF %>% group_by(Politico) %>%
                                        summarise(N = n()), by = "Politico")
                  # Cálculo de odds y log of odds de cada palabra
                  tweets_logOdds <- tweets_unpivot2 %>% 
                        mutate(odds = (n + 1) / (N + 1)) %>%
                        select(Politico, Palabra, odds) %>% 
                        spread(key = Politico, value = odds)
                  tweets_logOdds[,4] <- log(tweets_logOdds[,2]/tweets_logOdds[,3])
                  names(tweets_logOdds)[4] <- "log_odds"
                  tweets_logOdds[,5] <- abs(tweets_logOdds$log_odds)
                  names(tweets_logOdds)[5] <- "abs_log_odds"
                  tweets_logOdds <- tweets_logOdds %>%
                        mutate(autor_frecuente = if_else(log_odds > 0,
                                                         names(tweets_logOdds)[2],
                                                         names(tweets_logOdds)[3]))

Diferencia_AF <- tweets_logOdds %>% 
  arrange(-abs_log_odds, bygroup = FALSE)%>% 
  filter(autor_frecuente == "A.Fernandez")%>% 
  head(15)

Diferencia_MM <- tweets_logOdds %>% 
  arrange(log_odds, bygroup = FALSE)%>% 
  filter(autor_frecuente == "M.Macri")%>% 
  head(15)

Diferencia_AF_MM <- rbind(Diferencia_AF,Diferencia_MM)

Diferencia_AF_MM%>% 
    ggplot(aes(x = reorder(Palabra, log_odds), y= log_odds, fill = autor_frecuente)) +
    geom_col() +
    labs(x = "-palabra", y = "Uso", title = "Fernandez vs Macri") +
  coord_flip() +
  tema2

```




<center> <span style="color: #282F44;"> <h3> 25- Larreta vs Kicillof word use comparison</h3> </span> </center>


The idea of this graph is to show which words are the most different in their use, in this case between *Horacio Larreta* and *Axel Kicillof*



```{r}


# Pivotaje y despivotaje
tweets_unpivot <- tuits_tokens %>% group_by(Politico, Palabra) %>%
      count(Palabra) %>%
      spread(key = Politico, value = n, fill = 0, drop = TRUE) %>% 
      gather(key = "Politico", value = "n", -Palabra)


                  # Selección de los autores
                  tweets_unpivot2 <- tweets_unpivot %>% 
                        filter(Politico %in% c("H.Larreta", "A.Kicillof"))
                  # Se añade el total de palabras de cada autor
                  tweets_unpivot2 <- tweets_unpivot2 %>%
                        left_join(Tweets_DF %>% group_by(Politico) %>%
                                        summarise(N = n()), by = "Politico")
                  # Cálculo de odds y log of odds de cada palabra
                  tweets_logOdds <- tweets_unpivot2 %>% 
                        mutate(odds = (n + 1) / (N + 1)) %>%
                        select(Politico, Palabra, odds) %>% 
                        spread(key = Politico, value = odds) 
                  tweets_logOdds[,4] <- log(tweets_logOdds[,2]/tweets_logOdds[,3])
                  names(tweets_logOdds)[4] <- "log_odds"
                  tweets_logOdds[,5] <- abs(tweets_logOdds$log_odds)
                  names(tweets_logOdds)[5] <- "abs_log_odds"
                  tweets_logOdds <- tweets_logOdds %>%
                        mutate(autor_frecuente = if_else(log_odds > 0,
                                                         names(tweets_logOdds)[2],
                                                         names(tweets_logOdds)[3]))

Diferencia_AK <- tweets_logOdds %>% 
  arrange(-log_odds, bygroup = FALSE)%>% 
  filter(autor_frecuente == "A.Kicillof")%>% 
  head(15)

Diferencia_HL <- tweets_logOdds %>% 
  arrange(abs_log_odds, bygroup = FALSE)%>% 
  filter(autor_frecuente == "H.Larreta")%>% 
  tail(15)

Diferencia_AK_HL <- rbind(Diferencia_AK,Diferencia_HL)

Diferencia_AK_HL%>% 
    ggplot(aes(x = reorder(Palabra, log_odds), y= log_odds, fill = autor_frecuente)) +
    geom_col() +
    labs(x = "-palabra", y = "Uso", title = "Kicillof vs Larreta") +
  coord_flip() +
  tema2

```




<center> <span style="color: #282F44;"> <h3> 26-Gines vs Quirós word use comparison </h3> </span> </center>


The idea of this graph is to show which words are the most different in their use, in this case between *Gines Gonzalez* and *Fernán Quirós*


```{r}
tweets_unpivot <- tuits_tokens %>% group_by(Politico, Palabra) %>%
      count(Palabra) %>%
      spread(key = Politico, value = n, fill = 0, drop = TRUE) %>% 
      gather(key = "Politico", value = "n", -Palabra)


                  # Selección de los autores
                  tweets_unpivot2 <- tweets_unpivot %>% 
                        filter(Politico %in% c("Gines.GG", "F.Quiros"))
                  # Se añade el total de palabras de cada autor
                  tweets_unpivot2 <- tweets_unpivot2 %>%
                        left_join(Tweets_DF %>% group_by(Politico) %>%
                                        summarise(N = n()), by = "Politico")
                  # Cálculo de odds y log of odds de cada palabra
                  tweets_logOdds <- tweets_unpivot2 %>% 
                        mutate(odds = (n + 1) / (N + 1)) %>%
                        select(Politico, Palabra, odds) %>% 
                        spread(key = Politico, value = odds) 
                  tweets_logOdds[,4] <- log(tweets_logOdds[,2]/tweets_logOdds[,3])
                  names(tweets_logOdds)[4] <- "log_odds"
                  tweets_logOdds[,5] <- abs(tweets_logOdds$log_odds)
                  names(tweets_logOdds)[5] <- "abs_log_odds"
                  tweets_logOdds <- tweets_logOdds %>%
                        mutate(autor_frecuente = if_else(log_odds > 0,
                                                         names(tweets_logOdds)[2],
                                                         names(tweets_logOdds)[3]))

Diferencia_GG <- tweets_logOdds %>% 
  arrange(-log_odds, bygroup = FALSE)%>% 
  filter(autor_frecuente == "Gines.GG")%>% 
  tail(15)

Diferencia_FQ <- tweets_logOdds %>% 
  arrange(-abs_log_odds, bygroup = FALSE)%>% 
  filter(autor_frecuente == "F.Quiros")%>% 
  head(15)

Diferencia_GG_FQ <- rbind(Diferencia_GG,Diferencia_FQ)

Diferencia_GG_FQ%>% 
    ggplot(aes(x = reorder(Palabra, log_odds), y= log_odds, fill = autor_frecuente)) +
    geom_col() +
    labs(x = "-palabra", y = "Uso", title = "Quirós vs Gines") +
  coord_flip() +
  tema2
```




<center> <span style="color: #282F44;"> <h3> 27- Bullrich vs Cristina word use comparison </h3> </span> </center>


The idea of this graph is to show which words are the most different in their use, in this case between
 *Cristina Kirchner* and *Patricia Bullrich*


```{r}
tweets_unpivot <- tuits_tokens %>% group_by(Politico, Palabra) %>%
      count(Palabra) %>%
      spread(key = Politico, value = n, fill = 0, drop = TRUE) %>% 
      gather(key = "Politico", value = "n", -Palabra)


                  # Selección de los autores
                  tweets_unpivot2 <- tweets_unpivot %>% 
                        filter(Politico %in% c("P.Bullrich", "C.Kirchner"))
                  # Se añade el total de palabras de cada autor
                  tweets_unpivot2 <- tweets_unpivot2 %>%
                        left_join(Tweets_DF %>% group_by(Politico) %>%
                                        summarise(N = n()), by = "Politico")
                  # Cálculo de odds y log of odds de cada palabra
                  tweets_logOdds <- tweets_unpivot2 %>% 
                        mutate(odds = (n + 1) / (N + 1)) %>%
                        select(Politico, Palabra, odds) %>% 
                        spread(key = Politico, value = odds) 
                  tweets_logOdds[,4] <- log(tweets_logOdds[,2]/tweets_logOdds[,3])
                  names(tweets_logOdds)[4] <- "log_odds"
                  tweets_logOdds[,5] <- abs(tweets_logOdds$log_odds)
                  names(tweets_logOdds)[5] <- "abs_log_odds"
                  tweets_logOdds <- tweets_logOdds %>%
                        mutate(autor_frecuente = if_else(log_odds > 0,
                                                         names(tweets_logOdds)[2],
                                                         names(tweets_logOdds)[3]))

Diferencia_PB <- tweets_logOdds %>% 
  arrange(-log_odds, bygroup = FALSE)%>% 
  filter(autor_frecuente == "P.Bullrich")%>% 
  tail(15)

Diferencia_CFK <- tweets_logOdds %>% 
  arrange(-abs_log_odds, bygroup = FALSE)%>% 
  filter(autor_frecuente == "C.Kirchner")%>% 
  head(15)

Diferencia_PB_CFK <- rbind(Diferencia_PB,Diferencia_CFK)

Diferencia_PB_CFK%>% 
    ggplot(aes(x = reorder(Palabra, log_odds), y= log_odds, fill = autor_frecuente)) +
    geom_col() +
    labs(x = "-palabra", y = "Uso", title = "Cristina vs Bullrich") +
  coord_flip() +
  tema2
```




<center> <span style="color: #282F44;"> <h3>  28- Emotions in tweets </h3> </span> </center>

Now we analyze a broader field of emotions that were used by the different politicians among all the tweets that have been published in this period.

  - Among the members of **PRO** and **Peronism**, *trust* stands out as the main sentiment expressed

  - In accounts like **Espert** or **Del Caño** what their tweets show the most is *fear*.

  - The **PRO** and **Peronism** accounts, mainly by the ministers and heads of governments, seek a feeling of * anticipation *

  - Other accounts are higher tone *anguish* over time.

  - *Sadness* is a general feeling that is incorporated in all tweets, and is understood by the complex situation that a pandemic means.




```{r}
TextSentiment <- get_nrc_sentiment(Tweets_DF$text)


Tweets_DF_sentimiento <- cbind(Tweets_DF, TextSentiment)


gather(Tweets_DF_sentimiento, "sentiment", "values", 103:112) %>%
  group_by(Politico, Partido, sentiment) %>%
  filter(Partido ==  "cuenta partidaria")%>%
  filter(sentiment != "negative" & sentiment !="positive")%>%
    summarise(Total = sum(values)) %>%
      mutate(Proporcion = Total / sum(Total)) %>%
ggplot() +
  aes(Politico, Proporcion, fill = sentiment) +
  geom_col(position = "stack", color = "black") +
  coord_flip()  +
  scale_y_continuous(expand = c(0,0)) +
  labs(y = "Palabras") +
  theme_minimal()

gather(Tweets_DF_sentimiento, "sentiment", "values", 103:112) %>%
  group_by(Politico, Partido, sentiment) %>%
  filter(Partido ==  "PRO")%>%
    filter(sentiment != "negative" & sentiment !="positive")%>%
    summarise(Total = sum(values)) %>%
      mutate(Proporcion = Total / sum(Total)) %>%
ggplot() +
  aes(Politico, Proporcion, fill = sentiment) +
  geom_col(position = "stack", color = "black") +
  coord_flip()  +
  scale_y_continuous(expand = c(0,0)) +
  labs(y = "Palabras") +
  theme_minimal()

gather(Tweets_DF_sentimiento, "sentiment", "values", 103:112) %>%
  group_by(Politico, Partido, sentiment) %>%
  filter(Partido ==  "Peronismo")%>%
    filter(sentiment != "negative" & sentiment !="positive")%>%
    summarise(Total = sum(values)) %>%
      mutate(Proporcion = Total / sum(Total)) %>%
ggplot() +
  aes(Politico, Proporcion, fill = sentiment) +
  geom_col(position = "stack", color = "black") +
  coord_flip()  +
  scale_y_continuous(expand = c(0,0)) +
  labs(y = "Palabras") +
  theme_minimal()

gather(Tweets_DF_sentimiento, "sentiment", "values", 103:112) %>%
  group_by(Politico, Partido, sentiment) %>%
  filter(Partido ==  "otros candidatos")%>%
   filter(sentiment != "negative" & sentiment !="positive")%>%
    summarise(Total = sum(values)) %>%
      mutate(Proporcion = Total / sum(Total)) %>%
ggplot() +
  aes(Politico, Proporcion, fill = sentiment) +
  geom_col(position = "stack", color = "black") +
  coord_flip()  +
  scale_y_continuous(expand = c(0,0)) +
  labs(y = "Palabras") +
  theme_minimal()
```




<center> <span style="color: #282F44;"> <h3>  29- Emotions month by month </h3> </span> </center>

Although there are no major changes when it comes to doing sentiment analysis from month to month, there are some details:

- Health ministers transmit more and more *confidence*.

- The so-called other candidates is declining the transmission of *confidence*.

- **Larreta** although it has low in the feeling of *confidence*, continues to expose one of * anticipation *.

- **Kicillof** has an increasing message of *anguish*.



```{r}
gather(Tweets_DF_sentimiento, "sentiment", "values", 103:112) %>%
  group_by(Politico, Partido, mes, sentiment) %>%
  filter(Partido ==  "cuenta partidaria")%>%
  filter(sentiment != "negative" & sentiment !="positive")%>%
    summarise(Total = sum(values)) %>%
      mutate(Proporcion = Total / sum(Total)) %>%
ggplot() +
aes(x = mes, y =Proporcion, color = sentiment) +
  geom_point() +
  geom_line(aes(group = sentiment)) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = .4), 
        text = element_text(family = "serif")) +
  tema2 +
  facet_wrap(~ Politico) +
  labs(title = "Cambio de los sentimientos en el tiempo", 
       x = "Mes", y = "Porporción", color = "Sentimiento") 

gather(Tweets_DF_sentimiento, "sentiment", "values", 103:112) %>%
  group_by(Politico, Partido, mes, sentiment) %>%
  filter(Partido ==  "otros candidatos")%>%
   filter(sentiment != "negative" & sentiment !="positive")%>%
    summarise(Total = sum(values)) %>%
      mutate(Proporcion = Total / sum(Total)) %>%
ggplot() +
aes(x = mes, y =Proporcion, color = sentiment) +
  geom_point() +
  geom_line(aes(group = sentiment)) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = .4), 
        text = element_text(family = "serif")) +
  tema2 +
  facet_wrap(~ Politico) +
  labs(title = "Cambio de los sentimientos en el tiempo", 
       x = "Mes", y = "Porporción", color = "Sentimiento") 


gather(Tweets_DF_sentimiento, "sentiment", "values", 103:112) %>%
  group_by(Politico, Partido, mes, sentiment) %>%
  filter(Partido ==  "Peronismo")%>%
  filter(sentiment != "negative" & sentiment !="positive")%>%
    summarise(Total = sum(values)) %>%
      mutate(Proporcion = Total / sum(Total)) %>%
ggplot() +
aes(x = mes, y =Proporcion, color = sentiment) +
  geom_point() +
  geom_line(aes(group = sentiment)) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = .4), 
        text = element_text(family = "serif")) +
  tema2 +
  facet_wrap(~ Politico) +
  labs(title = "Cambio de los sentimientos en el tiempo", 
       x = "Mes", y = "Porporción", color = "Sentimiento") 

gather(Tweets_DF_sentimiento, "sentiment", "values", 103:112) %>%
  group_by(Politico, Partido, mes, sentiment) %>%
  filter(Partido ==  "PRO")%>%
  filter(sentiment != "negative" & sentiment !="positive")%>%
    summarise(Total = sum(values)) %>%
      mutate(Proporcion = Total / sum(Total)) %>%
ggplot() +
aes(x = mes, y =Proporcion, color = sentiment) +
  geom_point() +
  geom_line(aes(group = sentiment)) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = .4), 
        text = element_text(family = "serif")) +
  tema2 +
  facet_wrap(~ Politico) +
  labs(title = "Cambio de los sentimientos en el tiempo", 
       x = "Mes", y = "Porporción", color = "Sentimiento") 
```




<center> <span style="color: #282F44;"> <h3> 30- Comparison of emotions </h3> </span> </center>


The idea of ​​comparing everyone's emotions on a graph helps for a general visualization (eliminating party accounts):

- Although the number of tweets from **Macri** are few, it is the one that transmits the most *confidence*.

- Those who have elected power are the ones who most *trust* seek in their tweets.

- Candidates who have lost have mixed feelings, with a high level of *fear* and *anguish* in their writing.

- *Feelings that represent less than 10% of tweets are discarded.*





```{r}
gather(Tweets_DF_sentimiento, "sentiment", "values", 103:112) %>%
  group_by(Politico, Partido, sentiment) %>%
  filter(sentiment != "negative" & sentiment !="positive" & Partido != "cuenta partidaria")%>%
    summarise(Total = sum(values)) %>%
      mutate(Proporcion = Total / sum(Total)) %>%
    ggplot() +
  aes(Politico, Proporcion, color = sentiment, alpha = Proporcion) +
  geom_point(fill = "white", stroke = 1, shape = 21) +
  geom_text(aes(label = sentiment), vjust = -.9, family = "serif") +
  scale_y_continuous(labels = percent_format ()) +
  tema1 +
  theme(legend.position = "none",
        panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank(),
        text =  element_text(family = "serif")) +
  coord_flip() +
  labs(title = "Sentimientos totales comparativo",
       x = "Politico",
       y = "Proporción del sentimiento")

gather(Tweets_DF_sentimiento, "sentiment", "values", 103:112) %>%
  group_by(Politico, Partido, sentiment) %>%
  filter(sentiment != "positive" & sentiment !="negative" & sentiment !="joy" & sentiment !="surprise" & Partido != "cuenta partidaria")%>%
    summarise(Total = sum(values)) %>%
      mutate(Proporcion = Total / sum(Total)) %>%
  ggplot() +
  aes(sentiment, Proporcion, color = sentiment) +
  geom_point() +
  geom_text(aes(label = Politico) ,vjust = -.3, size = 3) +
  scale_y_continuous(limits = c(0.15, 0.47)) +
   labs(title = "Sentimientos totales comparativo",
       x = "Politico",
       y = "Proporción del sentimiento") +
  theme_minimal() +
  theme(legend.position = "none")


```
