TP 4 CSDATOS

Pimero voy a activar las librerias.

library(rtweet)

## Warning: package 'rtweet' was built under R version 4.0.5

library(tidyverse)

## Warning: package 'tidyverse' was built under R version 4.0.5

## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --

## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.2     v dplyr   1.0.7
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1

## Warning: package 'ggplot2' was built under R version 4.0.5

## Warning: package 'tibble' was built under R version 4.0.5

## Warning: package 'dplyr' was built under R version 4.0.5

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter()  masks stats::filter()
## x purrr::flatten() masks rtweet::flatten()
## x dplyr::lag()     masks stats::lag()

library(sf)

## Warning: package 'sf' was built under R version 4.0.5

## Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1

library(ggmap)

## Warning: package 'ggmap' was built under R version 4.0.5

## Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.

## Please cite ggmap if you use it! See citation("ggmap") for details.

library(leaflet)

## Warning: package 'leaflet' was built under R version 4.0.5

library(lubridate)

## Warning: package 'lubridate' was built under R version 4.0.5

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

Ahora voy a cargar los tweets a trabajar. El data set es de aquellos tweets que contienen la palabra de Vacunación en Mendoza.

tweets <- read_rds("tw_mendoza_vacunacion.rds")

Explorando el dataset a trabajar

users_data(tweets)

## # A tibble: 700 x 20
##    user_id  screen_name  name    location  description         url     protected
##    <chr>    <chr>        <chr>   <chr>     <chr>               <chr>   <lgl>    
##  1 43617701 mnvadillo    Mario ~ Mendoza ~ Abogado. Diputado ~ https:~ FALSE    
##  2 43617701 mnvadillo    Mario ~ Mendoza ~ Abogado. Diputado ~ https:~ FALSE    
##  3 1723734~ cortegamahan Cristi~ Mendoza,~ Periodista.         https:~ FALSE    
##  4 4453635~ menduco_lib~ Ronnie~ Mendoza,~ Mendocino. Liberal. <NA>    FALSE    
##  5 4453635~ menduco_lib~ Ronnie~ Mendoza,~ Mendocino. Liberal. <NA>    FALSE    
##  6 1500839~ JornadaMend~ Diario~ Mendoza,~ Diario Impreso y o~ http:/~ FALSE    
##  7 1500839~ JornadaMend~ Diario~ Mendoza,~ Diario Impreso y o~ http:/~ FALSE    
##  8 1500839~ JornadaMend~ Diario~ Mendoza,~ Diario Impreso y o~ http:/~ FALSE    
##  9 1500839~ JornadaMend~ Diario~ Mendoza,~ Diario Impreso y o~ http:/~ FALSE    
## 10 1500839~ JornadaMend~ Diario~ Mendoza,~ Diario Impreso y o~ http:/~ FALSE    
## # ... with 690 more rows, and 13 more variables: followers_count <int>,
## #   friends_count <int>, listed_count <int>, statuses_count <int>,
## #   favourites_count <int>, account_created_at <dttm>, verified <lgl>,
## #   profile_url <chr>, profile_expanded_url <chr>, account_lang <lgl>,
## #   profile_banner_url <chr>, profile_background_url <chr>,
## #   profile_image_url <chr>

Ahora lo vamos a ver en un gráfico los usuarios y las cantidades de seguidores para entender su popularidad.

options(scipen = 20)
ggplot(tweets) +
    geom_histogram(aes(x = followers_count)) +
      labs(title = "Popularidad de los usuarios",
           x = "Seguidores",
           y = "Cuentas")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Ahora voy a analizar los tweets y la cantidad de seguidores de esas cuentas.

tweets %>%
  select(screen_name, followers_count) %>%
  distinct()%>%
  arrange(desc(followers_count))

## # A tibble: 375 x 2
##    screen_name     followers_count
##    <chr>                     <int>
##  1 diariouno                327443
##  2 mdzol                    221407
##  3 LosAndesDiario           212546
##  4 mendozaopina             130570
##  5 elsolonline               47462
##  6 radionihuil               46365
##  7 Canal9Televida            44037
##  8 MendozaGobierno           37385
##  9 menduco_liberal           33346
## 10 noticieronueve            24727
## # ... with 365 more rows

En este listado podemos ver los usuarios y la cantidad de seguidores que tienen. Ahora vamos a ver quienes son los cinco usuarios con mayor cantidad de seguidores.

tweets %>%
  select(screen_name, followers_count) %>%
  distinct() %>%
  arrange(desc(followers_count)) %>%
  head(5)

## # A tibble: 5 x 2
##   screen_name    followers_count
##   <chr>                    <int>
## 1 diariouno               327443
## 2 mdzol                   221407
## 3 LosAndesDiario          212546
## 4 mendozaopina            130570
## 5 elsolonline              47462

Los cinco usuarios con mas seguidos son:

diariouno 327443 mdzol 221407 LosAndesDiario 212546 mendozaopina 130570 elsolonline 47462

A continuación voy a analizar el horario de los tweets.

tweets1 <- tweets %>% mutate(created_at = ymd_hms(created_at))

tweets1 %>% count(hora=hour(created_at)) %>%
  ggplot() + 
  geom_col(aes(x = hora, y = n))  +
      labs(title = "Horario de los tweets",
           x = "horario",
           y = "cantidad")

Este gráfico nos muestra que el horario donde más twitean los usuarios es entre las 12 de mediodia y las 18 horas. Además podemos indicar que las 15 horas es el punto máximo y como punto más bajo las 7 am. No se registran tweets a las 6 am.

tweets1 %>%
    filter(location != "", !is.na(location)) %>% 
    count(location) %>% 
    top_n(10, n) %>% 
    ggplot() +
      geom_col(aes(x = reorder(location, n), y = n)) + 
      coord_flip() +
      labs(title = "Procedencia de los usuarios",
           x = "ubicación",
           y = "cantidad")

Podemos concluir que el grafico muestra que los tweets corresponden a la Ciudad de Mendoza y a Lujan de Cuyo.

Ahora vamos a ver de que lugar geografico corresponden los tweets.

tweets1 <- lat_lng(tweets1)

tweets1_transporte <- tweets1

Ahora voy a aaislar los tweets que tienen coordenadas exactas. Para esto voy a extraer las latitudes/longitudes.

tweets1_transporte_geo <- tweets1_transporte %>% 
  filter(!is.na(lat), !is.na(lng))

A continuación voy a traer el mapa de Mendoza.

bbox <- make_bbox(lon = tweets1_transporte_geo$lng, lat = tweets1_transporte_geo$lat)

bbox

##      left    bottom     right       top 
## -68.88865 -33.00947 -68.77490 -32.82862

mapa_mendoza <- get_stamenmap(bbox, maptype = "toner-lite", zoom = 11)

## Source : http://tile.stamen.com/toner-lite/11/632/1221.png

## Source : http://tile.stamen.com/toner-lite/11/632/1222.png

## Source : http://tile.stamen.com/toner-lite/11/632/1223.png

ggmap(mapa_mendoza)

Ahora voy a dejar asentado en el mapa a que punto geografico corresponden los tweets y la cantidad de seguidores de los mismos

ggmap(mapa_mendoza) + 
  geom_point(data = arrange(tweets1_transporte_geo, followers_count), aes(x =lng, y =lat, color = followers_count)) + 
  scale_color_distiller(palette = "Spectral") + 
   labs(title = "Posición geográfica de los Tweets y la cantidad de followers",
           x = "Longitud",
           y = "Latitud")

Se puede observar que los puntos geograficos de los tweets realizados. Podemos observar que corresponden a distintos barrios. Los tweets estan muy cerca entre si es por eso que los puntos se solapan y se llegan a ver solo 6, pero podemos darnos cuenta que corresponden a diferentes barrios de mendoza, uno aislado en el sur, 4 en el centro y otro en el norte.

TP 4 CSDATOS

Magdalena Cortiñas

9/7/2021