Tarea 4

En este ejercicio se realizará una exploración de datos de Twitter Primero, se cargan las librerías necesarias

library(sf)

## Warning: package 'sf' was built under R version 4.0.5

## Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1

library(ggmap)

## Warning: package 'ggmap' was built under R version 4.0.5

## Loading required package: ggplot2

## Warning: package 'ggplot2' was built under R version 4.0.5

## Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.

## Please cite ggmap if you use it! See citation("ggmap") for details.

library(tidyverse)

## Warning: package 'tidyverse' was built under R version 4.0.5

## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --

## v tibble  3.1.0     v dplyr   1.0.5
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## v purrr   0.3.4

## Warning: package 'tidyr' was built under R version 4.0.5

## Warning: package 'dplyr' was built under R version 4.0.5

## Warning: package 'stringr' was built under R version 4.0.5

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(rtweet)

## Warning: package 'rtweet' was built under R version 4.0.5

## 
## Attaching package: 'rtweet'

## The following object is masked from 'package:purrr':
## 
##     flatten

library(leaflet)

## Warning: package 'leaflet' was built under R version 4.0.5

Se activa el token

twitter_token <- create_token(app = appname,
                              consumer_key = consumer_key,
                              consumer_secret = consumer_secret,
                              access_token = access_token,
                              access_secret = access_secret)

El pasado 6 de junio fueron las elecciones a las alcaldías y al Congreso en la ciudad de México, por lo que seguro habrá miles de tweets que hablen al respecto.

Se buscan 18 mil tweets con la función search_tweets que contengan la palabra elecciones con base en las coordenadas geográficas del Zócalo de la Ciudad de México y un radio de 20 millas.

tw_elecciones <- search_tweets(q = "elecciones",
                               geocode = "19.432733,-99.133327,20mi",
                               n = 18000, 
                               lang = "es", 
                               include_rts = FALSE)

Se revisa la tabla generada, aunque se le pidió 18,000 registros a twitter, éste sólo arrojó poco más de 11,000.

head(tw_elecciones)

## # A tibble: 6 x 90
##   user_id  status_id   created_at          screen_name text             source  
##   <chr>    <chr>       <dttm>              <chr>       <chr>            <chr>   
## 1 86421671 1403458821~ 2021-06-11 21:07:12 luislex     "Pues no sé ust~ Twitter~
## 2 86421671 1401655264~ 2021-06-06 21:40:31 luislex     "Si las eleccio~ Twitter~
## 3 2988349~ 1403458477~ 2021-06-11 21:05:50 jorgesalva~ "@JacoboGonzale~ Twitter~
## 4 83932185 1403458303~ 2021-06-11 21:05:09 ciudadanos~ "#MarioMoreno, ~ Twitter~
## 5 83932185 1400642444~ 2021-06-04 02:35:56 ciudadanos~ "\U0001f5f3<U+FE0F>Por~  Twitter~
## 6 83932185 1402275686~ 2021-06-08 14:45:51 ciudadanos~ "\U0001f9d0Segú~ Twitter~
## # ... with 84 more variables: display_text_width <dbl>,
## #   reply_to_status_id <chr>, reply_to_user_id <chr>,
## #   reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
## #   favorite_count <int>, retweet_count <int>, quote_count <int>,
## #   reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
## #   urls_t.co <list>, urls_expanded_url <list>, media_url <list>,
## #   media_t.co <list>, media_expanded_url <list>, media_type <list>,
## #   ext_media_url <list>, ext_media_t.co <list>, ext_media_expanded_url <list>,
## #   ext_media_type <chr>, mentions_user_id <list>, mentions_screen_name <list>,
## #   lang <chr>, quoted_status_id <chr>, quoted_text <chr>,
## #   quoted_created_at <dttm>, quoted_source <chr>, quoted_favorite_count <int>,
## #   quoted_retweet_count <int>, quoted_user_id <chr>, quoted_screen_name <chr>,
## #   quoted_name <chr>, quoted_followers_count <int>,
## #   quoted_friends_count <int>, quoted_statuses_count <int>,
## #   quoted_location <chr>, quoted_description <chr>, quoted_verified <lgl>,
## #   retweet_status_id <chr>, retweet_text <chr>, retweet_created_at <dttm>,
## #   retweet_source <chr>, retweet_favorite_count <int>,
## #   retweet_retweet_count <int>, retweet_user_id <chr>,
## #   retweet_screen_name <chr>, retweet_name <chr>,
## #   retweet_followers_count <int>, retweet_friends_count <int>,
## #   retweet_statuses_count <int>, retweet_location <chr>,
## #   retweet_description <chr>, retweet_verified <lgl>, place_url <chr>,
## #   place_name <chr>, place_full_name <chr>, place_type <chr>, country <chr>,
## #   country_code <chr>, geo_coords <list>, coords_coords <list>,
## #   bbox_coords <list>, status_url <chr>, name <chr>, location <chr>,
## #   description <chr>, url <chr>, protected <lgl>, followers_count <int>,
## #   friends_count <int>, listed_count <int>, statuses_count <int>,
## #   favourites_count <int>, account_created_at <dttm>, verified <lgl>,
## #   profile_url <chr>, profile_expanded_url <chr>, account_lang <lgl>,
## #   profile_banner_url <chr>, profile_background_url <chr>,
## #   profile_image_url <chr>

Primero, vamos a identificar los tweets con mayor repercusión

tw_elecciones %>%
  group_by(screen_name, retweet_count, text) %>%
  summarise() %>%
  arrange(desc(retweet_count)) %>%
  head(10)

## `summarise()` has grouped output by 'screen_name', 'retweet_count'. You can override using the `.groups` argument.

## # A tibble: 10 x 3
## # Groups:   screen_name, retweet_count [10]
##    screen_name   retweet_count text                                             
##    <chr>                 <int> <chr>                                            
##  1 Garcimonero            3533 "En mi. Casilla no llego un funcionario y para q~
##  2 beltrandelrio          3449 "El Presidente reveló hoy lo que más le molesta ~
##  3 fmartinmoreno          2596 "Es muy raro que a un paso de las elecciones, AM~
##  4 beltrandelrio          2400 "El Presidente lleva tres días tratando de conve~
##  5 SNietoCastil~          2354 "#UIF.  triunfo de Morena en Tamaulipas complica~
##  6 lopezdoriga            1736 "#IMPORTANTE \n\n¡Tómala! Así las cosas al inter~
##  7 lopezdoriga            1716 "Lo que les cuento en radio. Este viernes amanec~
##  8 CarlosLoret            1708 "El Presidente está como los que dicen “no me do~
##  9 beltrandelrio          1569 "Los resultados de las elecciones de hoy no son ~
## 10 abrahamendie~          1495 "<U+26A0><U+FE0F> La Policía de Ixtapaluca, al servicio de Ant~

Se observa que el usuario identificado como Garcimoreno fue el que tuvo el mayor número de retweets con 3,534.

Ahora identificaremos cuáles son los usuarios con el mayor número de seguidores y graficaremos. Lo asignaremos a una variable llamada popular.

popular <- tw_elecciones %>%
  group_by(screen_name, followers_count) %>%
  summarise() %>%
  arrange(desc(followers_count))

## `summarise()` has grouped output by 'screen_name'. You can override using the `.groups` argument.

head(popular)

## # A tibble: 6 x 2
## # Groups:   screen_name [6]
##   screen_name     followers_count
##   <chr>                     <int>
## 1 AristeguiOnline         8842206
## 2 CarlosLoret             8718759
## 3 werevertumorro          8581694
## 4 lopezdoriga             7838690
## 5 DeniseDresserG          4338003
## 6 ChilangoCom             3842539

Se puede apreciar que la agencia de noticias AristeguiOnline es la cuenta con el mayor número de seguidores (más de 8 millones) seguida muy de cerca por el periodista Carlos Loret de Mola.

Vamos a graficar a los usuarios con el mayor número de seguidores aplicando un filtro para sólo considerar los que tienen más de 1 millón de seguidores.

ggplot(popular %>%
         filter(followers_count > 1000000))+
  geom_bar(aes(x=reorder(screen_name, followers_count), weight=followers_count))+
  labs(title = "Usuarios de Twitter con mayor popularidad",
       subtitle = "Publicando algo de las elecciones del 6 de junio en Ciudad de México",
       caption = "Fuente: API Twitter",
       x = "@ Usuario",
       y = "Cantidad de seguidores") +
  theme_bw() +
  coord_flip() +
  theme (plot.title = element_text(family = "sans",
                                   size = rel(1), 
                                   vjust = 2, 
                                   face = "bold.italic", 
                                   color = "black", 
                                   lineheight = 1.5), 
          plot.subtitle = element_text(family = "sans",
                                      size = rel(0.8),
                                      vjust = 2, 
                                      face = "italic", 
                                      color = "gray40", 
                                      lineheight = 1.5),
          plot.caption = element_text(family = "sans",
                                     size = rel(0.7),
                                     vjust = 2, 
                                     face = "italic", 
                                     color = "gray30", 
                                     lineheight = 1.5)) + 
  theme(axis.title.x = element_text(face="bold", vjust=-0.5, colour="gray60", size=rel(0.75)), 
        axis.title.y = element_text(face="bold", vjust=1.5, colour="gray60", size=rel(0.75)),
        axis.text.x = element_text(face="italic", colour="gray60", size=rel(0.65)),
        axis.text.y = element_text(face="italic", colour="gray60", size=rel(0.65)),
        legend.title = element_text(face = "bold", colour="gray60", size=rel(0.75)),
        legend.text = element_text(face="italic", colour="gray60", size=rel(0.6)))

Ahora vamos a graficar en qué día hubo una mayor cantidad de tweets acerca de las elecciones

ts_plot(tw_elecciones, by="day")+
  labs(title = "Día con mayor número de tweets",
       subtitle = "Acerca de las elecciones del 6 de junio en Ciudad de México",
       caption = "Fuente: API Twitter",
       x = "Fecha",
       y = "Cantidad de tweets") +
  theme_bw() +
  theme (plot.title = element_text(family = "sans",
                                   size = rel(1), 
                                   vjust = 2, 
                                   face = "bold.italic", 
                                   color = "black", 
                                   lineheight = 1.5), 
          plot.subtitle = element_text(family = "sans",
                                      size = rel(0.8),
                                      vjust = 2, 
                                      face = "italic", 
                                      color = "gray40", 
                                      lineheight = 1.5),
          plot.caption = element_text(family = "sans",
                                     size = rel(0.7),
                                     vjust = 2, 
                                     face = "italic", 
                                     color = "gray30", 
                                     lineheight = 1.5)) + 
  theme(axis.title.x = element_text(face="bold", vjust=-0.5, colour="gray60", size=rel(0.75)), 
        axis.title.y = element_text(face="bold", vjust=1.5, colour="gray60", size=rel(0.75)),
        axis.text.x = element_text(face="italic", colour="gray60", size=rel(0.65)),
        axis.text.y = element_text(face="italic", colour="gray60", size=rel(0.65)),
        legend.title = element_text(face = "bold", colour="gray60", size=rel(0.75)),
        legend.text = element_text(face="italic", colour="gray60", size=rel(0.6)))

Se puede apreciar que aunque las elecciones fueron el 6 de junio, el día con la mayor cantidad de tweets acerca del tema fue al día siguiente, el lunes 7 de junio y a partir de entonces ha habido un descenso del número de tweets.

Finalmente, se aislarán los tweets con coordenadas geográficas para hacer un mapa. Primero se crean las columnas de latitud y longitud con la función lat_lng

tw_elecciones <- lat_lng(tw_elecciones, coords = c("coords_coords", "bbox_coords", "geo_coords"))

Después se filtran los campos que no tienen coordenadas

tw_elecciones_geo <- tw_elecciones %>% 
    filter(!is.na(lat), !is.na(lng))

Sólo 727 tweets, de los más de 11 mil que eran en un principio, tienen coordenadas geográficas

head(tw_elecciones_geo)

## # A tibble: 6 x 92
##   user_id  status_id  created_at          screen_name text              source  
##   <chr>    <chr>      <dttm>              <chr>       <chr>             <chr>   
## 1 81998832 140267043~ 2021-06-09 16:54:26 el_Barto__  "@carmelodifazio~ Twitter~
## 2 81998832 140227335~ 2021-06-08 14:36:34 el_Barto__  "@ManuelVegaMX !~ Twitter~
## 3 81998832 140227369~ 2021-06-08 14:37:56 el_Barto__  "@lucky_894 @Man~ Twitter~
## 4 81998832 140205286~ 2021-06-08 00:00:27 el_Barto__  "@Isaorfebre @Va~ Twitter~
## 5 81998832 140209774~ 2021-06-08 02:58:47 el_Barto__  "@Jesus_Zambrano~ Twitter~
## 6 81998832 140345558~ 2021-06-11 20:54:21 el_Barto__  "@adnware @MexSi~ Twitter~
## # ... with 86 more variables: display_text_width <dbl>,
## #   reply_to_status_id <chr>, reply_to_user_id <chr>,
## #   reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
## #   favorite_count <int>, retweet_count <int>, quote_count <int>,
## #   reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
## #   urls_t.co <list>, urls_expanded_url <list>, media_url <list>,
## #   media_t.co <list>, media_expanded_url <list>, media_type <list>,
## #   ext_media_url <list>, ext_media_t.co <list>, ext_media_expanded_url <list>,
## #   ext_media_type <chr>, mentions_user_id <list>, mentions_screen_name <list>,
## #   lang <chr>, quoted_status_id <chr>, quoted_text <chr>,
## #   quoted_created_at <dttm>, quoted_source <chr>, quoted_favorite_count <int>,
## #   quoted_retweet_count <int>, quoted_user_id <chr>, quoted_screen_name <chr>,
## #   quoted_name <chr>, quoted_followers_count <int>,
## #   quoted_friends_count <int>, quoted_statuses_count <int>,
## #   quoted_location <chr>, quoted_description <chr>, quoted_verified <lgl>,
## #   retweet_status_id <chr>, retweet_text <chr>, retweet_created_at <dttm>,
## #   retweet_source <chr>, retweet_favorite_count <int>,
## #   retweet_retweet_count <int>, retweet_user_id <chr>,
## #   retweet_screen_name <chr>, retweet_name <chr>,
## #   retweet_followers_count <int>, retweet_friends_count <int>,
## #   retweet_statuses_count <int>, retweet_location <chr>,
## #   retweet_description <chr>, retweet_verified <lgl>, place_url <chr>,
## #   place_name <chr>, place_full_name <chr>, place_type <chr>, country <chr>,
## #   country_code <chr>, geo_coords <list>, coords_coords <list>,
## #   bbox_coords <list>, status_url <chr>, name <chr>, location <chr>,
## #   description <chr>, url <chr>, protected <lgl>, followers_count <int>,
## #   friends_count <int>, listed_count <int>, statuses_count <int>,
## #   favourites_count <int>, account_created_at <dttm>, verified <lgl>,
## #   profile_url <chr>, profile_expanded_url <chr>, account_lang <lgl>,
## #   profile_banner_url <chr>, profile_background_url <chr>,
## #   profile_image_url <chr>, lat <dbl>, lng <dbl>

Ahora se creará el mapa, pero primero es necesaria la caja con las coordenadas extremas

bbox <- make_bbox(lon = tw_elecciones_geo$lng, lat = tw_elecciones_geo$lat)

Y después se “llama” a los mapas de la página stanenmap

basemapa <- get_stamenmap(bbox,
                      maptype = "terrain-background",
                      zoom = 10)

## Source : http://tile.stamen.com/terrain-background/10/228/454.png

## Source : http://tile.stamen.com/terrain-background/10/229/454.png

## Source : http://tile.stamen.com/terrain-background/10/230/454.png

## Source : http://tile.stamen.com/terrain-background/10/231/454.png

## Source : http://tile.stamen.com/terrain-background/10/228/455.png

## Source : http://tile.stamen.com/terrain-background/10/229/455.png

## Source : http://tile.stamen.com/terrain-background/10/230/455.png

## Source : http://tile.stamen.com/terrain-background/10/231/455.png

## Source : http://tile.stamen.com/terrain-background/10/228/456.png

## Source : http://tile.stamen.com/terrain-background/10/229/456.png

## Source : http://tile.stamen.com/terrain-background/10/230/456.png

## Source : http://tile.stamen.com/terrain-background/10/231/456.png

ggmap(basemapa)

Los siguientes códigos, permiten que el mapa del fondo tenga transparencia y que no robe protagonismo a los datos que nos interesan

basemap_attributes <- attributes(basemapa)

transparent_map <- matrix(adjustcolor(basemapa, 
                                      alpha.f = 0.3),
                          nrow = nrow(basemapa))

attributes(transparent_map) <- basemap_attributes

ggmap(transparent_map)

Ahora se añaden los puntos de los tweets

ggmap(transparent_map) +
  geom_point(data = tw_elecciones_geo, aes(x=lng, y=lat))

Se mejora el mapa en función del número de seguidores del usuario y de retweets.

ggmap(transparent_map) +
  geom_point(data = tw_elecciones_geo, aes(x=lng, y=lat, size = retweet_count, color = followers_count))+
  scale_color_distiller(palette = "Spectral")+
  labs(title = "Localización de usuarios de Twitter",
       subtitle = "Publicación de elecciones en ciudad de México y alrededores",
       caption = "Fuente: API Twitter",
       size = "Número de retweets",
       color = "Número de seguidores",
       x = "Longitud", 
       y = "Latitud") +
  theme (plot.title = element_text(family = "sans",
                                   size = rel(1), 
                                   vjust = 2, 
                                   face = "bold.italic", 
                                   color = "black", 
                                   lineheight = 1.5), 
          plot.subtitle = element_text(family = "sans",
                                      size = rel(0.8),
                                      vjust = 2, 
                                      face = "italic", 
                                      color = "gray40", 
                                      lineheight = 1.5),
          plot.caption = element_text(family = "sans",
                                     size = rel(0.7),
                                     vjust = 2, 
                                     face = "italic", 
                                     color = "gray30", 
                                     lineheight = 1.5)) + 
  theme(axis.title.x = element_text(face="bold", vjust=-0.5, colour="gray60", size=rel(0.75)), 
        axis.title.y = element_text(face="bold", vjust=1.5, colour="gray60", angle= 90, size=rel(0.75)),
        axis.text.x = element_text(face="italic", colour="gray60",  size=rel(0.65)),
        axis.text.y = element_text(face="italic", colour="gray60", size=rel(0.65)),
        legend.title = element_text(face = "bold", colour="gray60", size=rel(0.75)),
        legend.text = element_text(face="italic", colour="gray60", size=rel(0.6)))

Por último se mostrará un mapa interactivo con la librería leaflet Pero primero se hará una muestra del 15% de los tweets con coordenadas para evitar que se trabe la máquina

sample <- tw_elecciones_geo %>%
  sample_frac(0.15)

head(sample)

## # A tibble: 6 x 92
##   user_id  status_id   created_at          screen_name text             source  
##   <chr>    <chr>       <dttm>              <chr>       <chr>            <chr>   
## 1 74093436 1402988478~ 2021-06-10 13:58:14 rutgiverin  "Noruega está m~ Twitter~
## 2 14378372 1401579867~ 2021-06-06 16:40:55 OphCourse   "Me pregunto si~ Twitter~
## 3 1242490~ 1400923994~ 2021-06-04 21:14:42 AlexAlonss~ "@laishawilkins~ Twitter~
## 4 1223754~ 1400805893~ 2021-06-04 13:25:25 TiaLolo80s  "@jgnaredo @Tro~ Twitter~
## 5 68790334 1401571425~ 2021-06-06 16:07:22 Pablo_Hdez  "No politicen l~ Twitter~
## 6 89779162 1401678089~ 2021-06-06 23:11:13 pachame     "La gente muy r~ Twitter~
## # ... with 86 more variables: display_text_width <dbl>,
## #   reply_to_status_id <chr>, reply_to_user_id <chr>,
## #   reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
## #   favorite_count <int>, retweet_count <int>, quote_count <int>,
## #   reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
## #   urls_t.co <list>, urls_expanded_url <list>, media_url <list>,
## #   media_t.co <list>, media_expanded_url <list>, media_type <list>,
## #   ext_media_url <list>, ext_media_t.co <list>, ext_media_expanded_url <list>,
## #   ext_media_type <chr>, mentions_user_id <list>, mentions_screen_name <list>,
## #   lang <chr>, quoted_status_id <chr>, quoted_text <chr>,
## #   quoted_created_at <dttm>, quoted_source <chr>, quoted_favorite_count <int>,
## #   quoted_retweet_count <int>, quoted_user_id <chr>, quoted_screen_name <chr>,
## #   quoted_name <chr>, quoted_followers_count <int>,
## #   quoted_friends_count <int>, quoted_statuses_count <int>,
## #   quoted_location <chr>, quoted_description <chr>, quoted_verified <lgl>,
## #   retweet_status_id <chr>, retweet_text <chr>, retweet_created_at <dttm>,
## #   retweet_source <chr>, retweet_favorite_count <int>,
## #   retweet_retweet_count <int>, retweet_user_id <chr>,
## #   retweet_screen_name <chr>, retweet_name <chr>,
## #   retweet_followers_count <int>, retweet_friends_count <int>,
## #   retweet_statuses_count <int>, retweet_location <chr>,
## #   retweet_description <chr>, retweet_verified <lgl>, place_url <chr>,
## #   place_name <chr>, place_full_name <chr>, place_type <chr>, country <chr>,
## #   country_code <chr>, geo_coords <list>, coords_coords <list>,
## #   bbox_coords <list>, status_url <chr>, name <chr>, location <chr>,
## #   description <chr>, url <chr>, protected <lgl>, followers_count <int>,
## #   friends_count <int>, listed_count <int>, statuses_count <int>,
## #   favourites_count <int>, account_created_at <dttm>, verified <lgl>,
## #   profile_url <chr>, profile_expanded_url <chr>, account_lang <lgl>,
## #   profile_banner_url <chr>, profile_background_url <chr>,
## #   profile_image_url <chr>, lat <dbl>, lng <dbl>

Ahora tenemos una muestra con poco más de 100 tweets con coordenadas que colocaremos en un mapa interactivo leaflet que permita identificar el tweet y el nombre de usuario mediante un pop-up

leaflet(sample) %>% 
    addTiles() %>%
    addProviderTiles(providers$CartoDB.Voyager) %>%
    addAwesomeMarkers(popup = paste("Usuario:", sample$screen_name, "<br>",
                                    "Tweet:", sample$text),
                      icon=awesomeIcons(icon = "twitter", library = "fa", iconColor = "black", markerColor = "blue"))

## Assuming "lng" and "lat" are longitude and latitude, respectively

Y así terminamos con este interesante ejercicio.

Tarea 4

Roberto Aguilar Celis

11/6/2021