En este ejercicio se realizará una exploración de datos de Twitter Primero, se cargan las librerías necesarias
library(sf)
## Warning: package 'sf' was built under R version 4.0.5
## Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
library(ggmap)
## Warning: package 'ggmap' was built under R version 4.0.5
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.0.5
## Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.
## Please cite ggmap if you use it! See citation("ggmap") for details.
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.0.5
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v tibble 3.1.0 v dplyr 1.0.5
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## v purrr 0.3.4
## Warning: package 'tidyr' was built under R version 4.0.5
## Warning: package 'dplyr' was built under R version 4.0.5
## Warning: package 'stringr' was built under R version 4.0.5
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(rtweet)
## Warning: package 'rtweet' was built under R version 4.0.5
##
## Attaching package: 'rtweet'
## The following object is masked from 'package:purrr':
##
## flatten
library(leaflet)
## Warning: package 'leaflet' was built under R version 4.0.5
Se activa el token
twitter_token <- create_token(app = appname,
consumer_key = consumer_key,
consumer_secret = consumer_secret,
access_token = access_token,
access_secret = access_secret)
El pasado 6 de junio fueron las elecciones a las alcaldías y al Congreso en la ciudad de México, por lo que seguro habrá miles de tweets que hablen al respecto.
Se buscan 18 mil tweets con la función search_tweets que contengan la palabra elecciones con base en las coordenadas geográficas del Zócalo de la Ciudad de México y un radio de 20 millas.
tw_elecciones <- search_tweets(q = "elecciones",
geocode = "19.432733,-99.133327,20mi",
n = 18000,
lang = "es",
include_rts = FALSE)
Se revisa la tabla generada, aunque se le pidió 18,000 registros a twitter, éste sólo arrojó poco más de 11,000.
head(tw_elecciones)
## # A tibble: 6 x 90
## user_id status_id created_at screen_name text source
## <chr> <chr> <dttm> <chr> <chr> <chr>
## 1 86421671 1403458821~ 2021-06-11 21:07:12 luislex "Pues no sé ust~ Twitter~
## 2 86421671 1401655264~ 2021-06-06 21:40:31 luislex "Si las eleccio~ Twitter~
## 3 2988349~ 1403458477~ 2021-06-11 21:05:50 jorgesalva~ "@JacoboGonzale~ Twitter~
## 4 83932185 1403458303~ 2021-06-11 21:05:09 ciudadanos~ "#MarioMoreno, ~ Twitter~
## 5 83932185 1400642444~ 2021-06-04 02:35:56 ciudadanos~ "\U0001f5f3<U+FE0F>Por~ Twitter~
## 6 83932185 1402275686~ 2021-06-08 14:45:51 ciudadanos~ "\U0001f9d0Segú~ Twitter~
## # ... with 84 more variables: display_text_width <dbl>,
## # reply_to_status_id <chr>, reply_to_user_id <chr>,
## # reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
## # favorite_count <int>, retweet_count <int>, quote_count <int>,
## # reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
## # urls_t.co <list>, urls_expanded_url <list>, media_url <list>,
## # media_t.co <list>, media_expanded_url <list>, media_type <list>,
## # ext_media_url <list>, ext_media_t.co <list>, ext_media_expanded_url <list>,
## # ext_media_type <chr>, mentions_user_id <list>, mentions_screen_name <list>,
## # lang <chr>, quoted_status_id <chr>, quoted_text <chr>,
## # quoted_created_at <dttm>, quoted_source <chr>, quoted_favorite_count <int>,
## # quoted_retweet_count <int>, quoted_user_id <chr>, quoted_screen_name <chr>,
## # quoted_name <chr>, quoted_followers_count <int>,
## # quoted_friends_count <int>, quoted_statuses_count <int>,
## # quoted_location <chr>, quoted_description <chr>, quoted_verified <lgl>,
## # retweet_status_id <chr>, retweet_text <chr>, retweet_created_at <dttm>,
## # retweet_source <chr>, retweet_favorite_count <int>,
## # retweet_retweet_count <int>, retweet_user_id <chr>,
## # retweet_screen_name <chr>, retweet_name <chr>,
## # retweet_followers_count <int>, retweet_friends_count <int>,
## # retweet_statuses_count <int>, retweet_location <chr>,
## # retweet_description <chr>, retweet_verified <lgl>, place_url <chr>,
## # place_name <chr>, place_full_name <chr>, place_type <chr>, country <chr>,
## # country_code <chr>, geo_coords <list>, coords_coords <list>,
## # bbox_coords <list>, status_url <chr>, name <chr>, location <chr>,
## # description <chr>, url <chr>, protected <lgl>, followers_count <int>,
## # friends_count <int>, listed_count <int>, statuses_count <int>,
## # favourites_count <int>, account_created_at <dttm>, verified <lgl>,
## # profile_url <chr>, profile_expanded_url <chr>, account_lang <lgl>,
## # profile_banner_url <chr>, profile_background_url <chr>,
## # profile_image_url <chr>
Primero, vamos a identificar los tweets con mayor repercusión
tw_elecciones %>%
group_by(screen_name, retweet_count, text) %>%
summarise() %>%
arrange(desc(retweet_count)) %>%
head(10)
## `summarise()` has grouped output by 'screen_name', 'retweet_count'. You can override using the `.groups` argument.
## # A tibble: 10 x 3
## # Groups: screen_name, retweet_count [10]
## screen_name retweet_count text
## <chr> <int> <chr>
## 1 Garcimonero 3533 "En mi. Casilla no llego un funcionario y para q~
## 2 beltrandelrio 3449 "El Presidente reveló hoy lo que más le molesta ~
## 3 fmartinmoreno 2596 "Es muy raro que a un paso de las elecciones, AM~
## 4 beltrandelrio 2400 "El Presidente lleva tres días tratando de conve~
## 5 SNietoCastil~ 2354 "#UIF. triunfo de Morena en Tamaulipas complica~
## 6 lopezdoriga 1736 "#IMPORTANTE \n\n¡Tómala! Así las cosas al inter~
## 7 lopezdoriga 1716 "Lo que les cuento en radio. Este viernes amanec~
## 8 CarlosLoret 1708 "El Presidente está como los que dicen “no me do~
## 9 beltrandelrio 1569 "Los resultados de las elecciones de hoy no son ~
## 10 abrahamendie~ 1495 "<U+26A0><U+FE0F> La Policía de Ixtapaluca, al servicio de Ant~
Se observa que el usuario identificado como Garcimoreno fue el que tuvo el mayor número de retweets con 3,534.
Ahora identificaremos cuáles son los usuarios con el mayor número de seguidores y graficaremos. Lo asignaremos a una variable llamada popular.
popular <- tw_elecciones %>%
group_by(screen_name, followers_count) %>%
summarise() %>%
arrange(desc(followers_count))
## `summarise()` has grouped output by 'screen_name'. You can override using the `.groups` argument.
head(popular)
## # A tibble: 6 x 2
## # Groups: screen_name [6]
## screen_name followers_count
## <chr> <int>
## 1 AristeguiOnline 8842206
## 2 CarlosLoret 8718759
## 3 werevertumorro 8581694
## 4 lopezdoriga 7838690
## 5 DeniseDresserG 4338003
## 6 ChilangoCom 3842539
Se puede apreciar que la agencia de noticias AristeguiOnline es la cuenta con el mayor número de seguidores (más de 8 millones) seguida muy de cerca por el periodista Carlos Loret de Mola.
Vamos a graficar a los usuarios con el mayor número de seguidores aplicando un filtro para sólo considerar los que tienen más de 1 millón de seguidores.
ggplot(popular %>%
filter(followers_count > 1000000))+
geom_bar(aes(x=reorder(screen_name, followers_count), weight=followers_count))+
labs(title = "Usuarios de Twitter con mayor popularidad",
subtitle = "Publicando algo de las elecciones del 6 de junio en Ciudad de México",
caption = "Fuente: API Twitter",
x = "@ Usuario",
y = "Cantidad de seguidores") +
theme_bw() +
coord_flip() +
theme (plot.title = element_text(family = "sans",
size = rel(1),
vjust = 2,
face = "bold.italic",
color = "black",
lineheight = 1.5),
plot.subtitle = element_text(family = "sans",
size = rel(0.8),
vjust = 2,
face = "italic",
color = "gray40",
lineheight = 1.5),
plot.caption = element_text(family = "sans",
size = rel(0.7),
vjust = 2,
face = "italic",
color = "gray30",
lineheight = 1.5)) +
theme(axis.title.x = element_text(face="bold", vjust=-0.5, colour="gray60", size=rel(0.75)),
axis.title.y = element_text(face="bold", vjust=1.5, colour="gray60", size=rel(0.75)),
axis.text.x = element_text(face="italic", colour="gray60", size=rel(0.65)),
axis.text.y = element_text(face="italic", colour="gray60", size=rel(0.65)),
legend.title = element_text(face = "bold", colour="gray60", size=rel(0.75)),
legend.text = element_text(face="italic", colour="gray60", size=rel(0.6)))
Ahora vamos a graficar en qué día hubo una mayor cantidad de tweets acerca de las elecciones
ts_plot(tw_elecciones, by="day")+
labs(title = "Día con mayor número de tweets",
subtitle = "Acerca de las elecciones del 6 de junio en Ciudad de México",
caption = "Fuente: API Twitter",
x = "Fecha",
y = "Cantidad de tweets") +
theme_bw() +
theme (plot.title = element_text(family = "sans",
size = rel(1),
vjust = 2,
face = "bold.italic",
color = "black",
lineheight = 1.5),
plot.subtitle = element_text(family = "sans",
size = rel(0.8),
vjust = 2,
face = "italic",
color = "gray40",
lineheight = 1.5),
plot.caption = element_text(family = "sans",
size = rel(0.7),
vjust = 2,
face = "italic",
color = "gray30",
lineheight = 1.5)) +
theme(axis.title.x = element_text(face="bold", vjust=-0.5, colour="gray60", size=rel(0.75)),
axis.title.y = element_text(face="bold", vjust=1.5, colour="gray60", size=rel(0.75)),
axis.text.x = element_text(face="italic", colour="gray60", size=rel(0.65)),
axis.text.y = element_text(face="italic", colour="gray60", size=rel(0.65)),
legend.title = element_text(face = "bold", colour="gray60", size=rel(0.75)),
legend.text = element_text(face="italic", colour="gray60", size=rel(0.6)))
Se puede apreciar que aunque las elecciones fueron el 6 de junio, el día con la mayor cantidad de tweets acerca del tema fue al día siguiente, el lunes 7 de junio y a partir de entonces ha habido un descenso del número de tweets.
Finalmente, se aislarán los tweets con coordenadas geográficas para hacer un mapa. Primero se crean las columnas de latitud y longitud con la función lat_lng
tw_elecciones <- lat_lng(tw_elecciones, coords = c("coords_coords", "bbox_coords", "geo_coords"))
Después se filtran los campos que no tienen coordenadas
tw_elecciones_geo <- tw_elecciones %>%
filter(!is.na(lat), !is.na(lng))
Sólo 727 tweets, de los más de 11 mil que eran en un principio, tienen coordenadas geográficas
head(tw_elecciones_geo)
## # A tibble: 6 x 92
## user_id status_id created_at screen_name text source
## <chr> <chr> <dttm> <chr> <chr> <chr>
## 1 81998832 140267043~ 2021-06-09 16:54:26 el_Barto__ "@carmelodifazio~ Twitter~
## 2 81998832 140227335~ 2021-06-08 14:36:34 el_Barto__ "@ManuelVegaMX !~ Twitter~
## 3 81998832 140227369~ 2021-06-08 14:37:56 el_Barto__ "@lucky_894 @Man~ Twitter~
## 4 81998832 140205286~ 2021-06-08 00:00:27 el_Barto__ "@Isaorfebre @Va~ Twitter~
## 5 81998832 140209774~ 2021-06-08 02:58:47 el_Barto__ "@Jesus_Zambrano~ Twitter~
## 6 81998832 140345558~ 2021-06-11 20:54:21 el_Barto__ "@adnware @MexSi~ Twitter~
## # ... with 86 more variables: display_text_width <dbl>,
## # reply_to_status_id <chr>, reply_to_user_id <chr>,
## # reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
## # favorite_count <int>, retweet_count <int>, quote_count <int>,
## # reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
## # urls_t.co <list>, urls_expanded_url <list>, media_url <list>,
## # media_t.co <list>, media_expanded_url <list>, media_type <list>,
## # ext_media_url <list>, ext_media_t.co <list>, ext_media_expanded_url <list>,
## # ext_media_type <chr>, mentions_user_id <list>, mentions_screen_name <list>,
## # lang <chr>, quoted_status_id <chr>, quoted_text <chr>,
## # quoted_created_at <dttm>, quoted_source <chr>, quoted_favorite_count <int>,
## # quoted_retweet_count <int>, quoted_user_id <chr>, quoted_screen_name <chr>,
## # quoted_name <chr>, quoted_followers_count <int>,
## # quoted_friends_count <int>, quoted_statuses_count <int>,
## # quoted_location <chr>, quoted_description <chr>, quoted_verified <lgl>,
## # retweet_status_id <chr>, retweet_text <chr>, retweet_created_at <dttm>,
## # retweet_source <chr>, retweet_favorite_count <int>,
## # retweet_retweet_count <int>, retweet_user_id <chr>,
## # retweet_screen_name <chr>, retweet_name <chr>,
## # retweet_followers_count <int>, retweet_friends_count <int>,
## # retweet_statuses_count <int>, retweet_location <chr>,
## # retweet_description <chr>, retweet_verified <lgl>, place_url <chr>,
## # place_name <chr>, place_full_name <chr>, place_type <chr>, country <chr>,
## # country_code <chr>, geo_coords <list>, coords_coords <list>,
## # bbox_coords <list>, status_url <chr>, name <chr>, location <chr>,
## # description <chr>, url <chr>, protected <lgl>, followers_count <int>,
## # friends_count <int>, listed_count <int>, statuses_count <int>,
## # favourites_count <int>, account_created_at <dttm>, verified <lgl>,
## # profile_url <chr>, profile_expanded_url <chr>, account_lang <lgl>,
## # profile_banner_url <chr>, profile_background_url <chr>,
## # profile_image_url <chr>, lat <dbl>, lng <dbl>
Ahora se creará el mapa, pero primero es necesaria la caja con las coordenadas extremas
bbox <- make_bbox(lon = tw_elecciones_geo$lng, lat = tw_elecciones_geo$lat)
Y después se “llama” a los mapas de la página stanenmap
basemapa <- get_stamenmap(bbox,
maptype = "terrain-background",
zoom = 10)
## Source : http://tile.stamen.com/terrain-background/10/228/454.png
## Source : http://tile.stamen.com/terrain-background/10/229/454.png
## Source : http://tile.stamen.com/terrain-background/10/230/454.png
## Source : http://tile.stamen.com/terrain-background/10/231/454.png
## Source : http://tile.stamen.com/terrain-background/10/228/455.png
## Source : http://tile.stamen.com/terrain-background/10/229/455.png
## Source : http://tile.stamen.com/terrain-background/10/230/455.png
## Source : http://tile.stamen.com/terrain-background/10/231/455.png
## Source : http://tile.stamen.com/terrain-background/10/228/456.png
## Source : http://tile.stamen.com/terrain-background/10/229/456.png
## Source : http://tile.stamen.com/terrain-background/10/230/456.png
## Source : http://tile.stamen.com/terrain-background/10/231/456.png
ggmap(basemapa)
Los siguientes códigos, permiten que el mapa del fondo tenga transparencia y que no robe protagonismo a los datos que nos interesan
basemap_attributes <- attributes(basemapa)
transparent_map <- matrix(adjustcolor(basemapa,
alpha.f = 0.3),
nrow = nrow(basemapa))
attributes(transparent_map) <- basemap_attributes
ggmap(transparent_map)
Ahora se añaden los puntos de los tweets
ggmap(transparent_map) +
geom_point(data = tw_elecciones_geo, aes(x=lng, y=lat))
Se mejora el mapa en función del número de seguidores del usuario y de retweets.
ggmap(transparent_map) +
geom_point(data = tw_elecciones_geo, aes(x=lng, y=lat, size = retweet_count, color = followers_count))+
scale_color_distiller(palette = "Spectral")+
labs(title = "Localización de usuarios de Twitter",
subtitle = "Publicación de elecciones en ciudad de México y alrededores",
caption = "Fuente: API Twitter",
size = "Número de retweets",
color = "Número de seguidores",
x = "Longitud",
y = "Latitud") +
theme (plot.title = element_text(family = "sans",
size = rel(1),
vjust = 2,
face = "bold.italic",
color = "black",
lineheight = 1.5),
plot.subtitle = element_text(family = "sans",
size = rel(0.8),
vjust = 2,
face = "italic",
color = "gray40",
lineheight = 1.5),
plot.caption = element_text(family = "sans",
size = rel(0.7),
vjust = 2,
face = "italic",
color = "gray30",
lineheight = 1.5)) +
theme(axis.title.x = element_text(face="bold", vjust=-0.5, colour="gray60", size=rel(0.75)),
axis.title.y = element_text(face="bold", vjust=1.5, colour="gray60", angle= 90, size=rel(0.75)),
axis.text.x = element_text(face="italic", colour="gray60", size=rel(0.65)),
axis.text.y = element_text(face="italic", colour="gray60", size=rel(0.65)),
legend.title = element_text(face = "bold", colour="gray60", size=rel(0.75)),
legend.text = element_text(face="italic", colour="gray60", size=rel(0.6)))
Por último se mostrará un mapa interactivo con la librería leaflet Pero primero se hará una muestra del 15% de los tweets con coordenadas para evitar que se trabe la máquina
sample <- tw_elecciones_geo %>%
sample_frac(0.15)
head(sample)
## # A tibble: 6 x 92
## user_id status_id created_at screen_name text source
## <chr> <chr> <dttm> <chr> <chr> <chr>
## 1 74093436 1402988478~ 2021-06-10 13:58:14 rutgiverin "Noruega está m~ Twitter~
## 2 14378372 1401579867~ 2021-06-06 16:40:55 OphCourse "Me pregunto si~ Twitter~
## 3 1242490~ 1400923994~ 2021-06-04 21:14:42 AlexAlonss~ "@laishawilkins~ Twitter~
## 4 1223754~ 1400805893~ 2021-06-04 13:25:25 TiaLolo80s "@jgnaredo @Tro~ Twitter~
## 5 68790334 1401571425~ 2021-06-06 16:07:22 Pablo_Hdez "No politicen l~ Twitter~
## 6 89779162 1401678089~ 2021-06-06 23:11:13 pachame "La gente muy r~ Twitter~
## # ... with 86 more variables: display_text_width <dbl>,
## # reply_to_status_id <chr>, reply_to_user_id <chr>,
## # reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
## # favorite_count <int>, retweet_count <int>, quote_count <int>,
## # reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
## # urls_t.co <list>, urls_expanded_url <list>, media_url <list>,
## # media_t.co <list>, media_expanded_url <list>, media_type <list>,
## # ext_media_url <list>, ext_media_t.co <list>, ext_media_expanded_url <list>,
## # ext_media_type <chr>, mentions_user_id <list>, mentions_screen_name <list>,
## # lang <chr>, quoted_status_id <chr>, quoted_text <chr>,
## # quoted_created_at <dttm>, quoted_source <chr>, quoted_favorite_count <int>,
## # quoted_retweet_count <int>, quoted_user_id <chr>, quoted_screen_name <chr>,
## # quoted_name <chr>, quoted_followers_count <int>,
## # quoted_friends_count <int>, quoted_statuses_count <int>,
## # quoted_location <chr>, quoted_description <chr>, quoted_verified <lgl>,
## # retweet_status_id <chr>, retweet_text <chr>, retweet_created_at <dttm>,
## # retweet_source <chr>, retweet_favorite_count <int>,
## # retweet_retweet_count <int>, retweet_user_id <chr>,
## # retweet_screen_name <chr>, retweet_name <chr>,
## # retweet_followers_count <int>, retweet_friends_count <int>,
## # retweet_statuses_count <int>, retweet_location <chr>,
## # retweet_description <chr>, retweet_verified <lgl>, place_url <chr>,
## # place_name <chr>, place_full_name <chr>, place_type <chr>, country <chr>,
## # country_code <chr>, geo_coords <list>, coords_coords <list>,
## # bbox_coords <list>, status_url <chr>, name <chr>, location <chr>,
## # description <chr>, url <chr>, protected <lgl>, followers_count <int>,
## # friends_count <int>, listed_count <int>, statuses_count <int>,
## # favourites_count <int>, account_created_at <dttm>, verified <lgl>,
## # profile_url <chr>, profile_expanded_url <chr>, account_lang <lgl>,
## # profile_banner_url <chr>, profile_background_url <chr>,
## # profile_image_url <chr>, lat <dbl>, lng <dbl>
Ahora tenemos una muestra con poco más de 100 tweets con coordenadas que colocaremos en un mapa interactivo leaflet que permita identificar el tweet y el nombre de usuario mediante un pop-up
leaflet(sample) %>%
addTiles() %>%
addProviderTiles(providers$CartoDB.Voyager) %>%
addAwesomeMarkers(popup = paste("Usuario:", sample$screen_name, "<br>",
"Tweet:", sample$text),
icon=awesomeIcons(icon = "twitter", library = "fa", iconColor = "black", markerColor = "blue"))
## Assuming "lng" and "lat" are longitude and latitude, respectively
Y así terminamos con este interesante ejercicio.