R Markdown and Leaflet. Twitter data

Summary

In this project I am going to map a small amount of tweets about public spaces (parks, gardens and squares), posted in New York.

Downloading and cleaning data

Firstly, we need to load required packages for downloading, merging and mapping data.

library(twitteR)
library(leaflet)
library(prob)

The next step is the connection opening. I got access token, consumer key etc. from my twitter account (real token and key are hidden, sorry :) )

consumer_key <-"my_consumer_key"
consumer_secret <- "my_consumer_secret"
access_token <- "my_access_token"
access_secret <-"my_access_secret"
setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret)

## [1] "Using direct authentication"

I have downloaded 1000 tweets for each category (park, garden, square) and removed rows without coordinates. Also I have added the category name for popup and icon URL, which is needed for presenting each category on the map.

#park
my_tweets_park <- searchTwitter('Park', geocode = '40.91758,-73.70027,35km', n=1000)
my_tweets_df_park = do.call("rbind", lapply(my_tweets_park, as.data.frame))
my_tweets_df_park<- subset(my_tweets_df_park, longitude!="NA")
my_tweets_df_park$space <- 'Park'
my_tweets_df_park$icon <- 'https://cdn-icons-png.flaticon.com/512/2204/2204056.png'

#garden
my_tweets_garden <- searchTwitter('Garden', geocode = '40.91758,-73.70027,35km', n=1000)
my_tweets_df_garden = do.call("rbind", lapply(my_tweets_garden, as.data.frame))
my_tweets_df_garden<- subset(my_tweets_df_garden, longitude!="NA")
my_tweets_df_garden$space <- 'Garden'
my_tweets_df_garden$icon <- "https://cdn-icons-png.flaticon.com/512/1973/1973742.png"

#square
my_tweets_square <- searchTwitter('Square', geocode = '40.91758,-73.70027,35km', n=1000)
my_tweets_df_square = do.call("rbind", lapply(my_tweets_square, as.data.frame))
my_tweets_df_square<- subset(my_tweets_df_square, longitude!="NA")
my_tweets_df_square$space <- 'Square'
my_tweets_df_square$icon <- "https://cdn-icons-png.flaticon.com/512/6427/6427078.png"

By the next step three datesets are combined to the one. In addition rows with NA values are removed. Only 5 columns are used at the final dataset:

id of the tweet
longitude
latitude
space category (Park, Garden, Square)
icon

#union
twitter_nyc <- union(my_tweets_df_garden, union(my_tweets_df_park, my_tweets_df_square))
twitter_nyc <- twitter_nyc[ ,c(8, 15:18)]
twitter_nyc$longitude <- as.numeric(twitter_nyc$longitude)
twitter_nyc$latitude <- as.numeric(twitter_nyc$latitude)
twitter_nyc <- na.omit(twitter_nyc)

All 5 variables have correct type. Finally, we have 75 tweets about gardens, 63 tweets about parks and 84 observations about squares.

summary(twitter_nyc)

##       id              longitude         latitude        space          
##  Length:223         Min.   :-74.13   Min.   :40.66   Length:223        
##  Class :character   1st Qu.:-73.99   1st Qu.:40.74   Class :character  
##  Mode  :character   Median :-73.99   Median :40.76   Mode  :character  
##                     Mean   :-73.94   Mean   :40.76                     
##                     3rd Qu.:-73.96   3rd Qu.:40.76                     
##                     Max.   :-73.41   Max.   :41.10                     
##      icon          
##  Length:223        
##  Class :character  
##  Mode  :character  
##                    
##                    
##

table(twitter_nyc$space)

## 
## Garden   Park Square 
##     75     64     84

table(twitter_nyc$icon)

## 
## https://cdn-icons-png.flaticon.com/512/1973/1973742.png 
##                                                      75 
## https://cdn-icons-png.flaticon.com/512/2204/2204056.png 
##                                                      64 
## https://cdn-icons-png.flaticon.com/512/6427/6427078.png 
##                                                      84

Mapping

Firstly, we are creating a small dataset with the size and URLs of icons.

space_icons <- icons(
  iconUrl = twitter_nyc$icon,
  iconWidth = 30, iconHeight = 30)

Finally, mapping of the data is proceeded. All markers are combined to clusters, icons are described by the value “space_icons” and popups are described by the value “space”.

set.seed(123456789)
leaflet(twitter_nyc) %>%
  addTiles() %>%
  addMarkers(lng = ~longitude, lat = ~latitude, clusterOptions = markerClusterOptions(),
             icon = space_icons, popup = ~space)

Conclusion

The majority of tweets are concentrated at Manhattan. We can observe many tweets about parks near the Central Park. Mostly, tweets about squares spread along the 7th Avenue. Unfortunately, the majority of garden tweets are related to the Winter Garden Theater or children’s garden, therefore, these observations should be excluded from the analysis. Supposedly, the real number of garden public spaces is much less.

R Markdown and Leaflet. Twitter data

Petrukhina Alexandra

2/7/2022

Summary

Downloading and cleaning data

Mapping

Conclusion