Twitter data of @GonzagaMBA followers

The project uses a GitHub package called rtweet to strip Twitter data from the Twitter API. We use the get_followers() function to gather a data frame of Twitter-assigned User IDs from the Gonzaga MBA account. Next, the lookup function uses that data frame to compile all of the relevant user data (found under names(gumbafollowersdata)) such as user name, location, longitude and latitude coordinates, links to profile pictures, follower counts, following counts, etc.

Now that we have the data in a large format, we can filter out some set parameters and visualize it with a graph. In this case, we are looking to label the accounts that follow GUMBA and that GUMBA follows back with more than 1 million followers or 250,000 friends. Twitter defines a friend as someone who you follow and who follows you back. The graph shows us that hootsuite, PenguinUKBooks, soledadobrien, and KarenJeanHood all meet the label criteria.

#https://github.com/ropensci/rtweet


  gumbafollowers=get_followers("GonzagaMBA")
  gumbafollowersdata <- lookup_users(gumbafollowers$user_id)
 # save(gumbafollowersdata,file="gumbafollowers.Rda")
# load("gumbafollowers.Rda")
names(gumbafollowersdata)
##  [1] "user_id"                 "status_id"              
##  [3] "created_at"              "screen_name"            
##  [5] "text"                    "source"                 
##  [7] "display_text_width"      "reply_to_status_id"     
##  [9] "reply_to_user_id"        "reply_to_screen_name"   
## [11] "is_quote"                "is_retweet"             
## [13] "favorite_count"          "retweet_count"          
## [15] "quote_count"             "reply_count"            
## [17] "hashtags"                "symbols"                
## [19] "urls_url"                "urls_t.co"              
## [21] "urls_expanded_url"       "media_url"              
## [23] "media_t.co"              "media_expanded_url"     
## [25] "media_type"              "ext_media_url"          
## [27] "ext_media_t.co"          "ext_media_expanded_url" 
## [29] "ext_media_type"          "mentions_user_id"       
## [31] "mentions_screen_name"    "lang"                   
## [33] "quoted_status_id"        "quoted_text"            
## [35] "quoted_created_at"       "quoted_source"          
## [37] "quoted_favorite_count"   "quoted_retweet_count"   
## [39] "quoted_user_id"          "quoted_screen_name"     
## [41] "quoted_name"             "quoted_followers_count" 
## [43] "quoted_friends_count"    "quoted_statuses_count"  
## [45] "quoted_location"         "quoted_description"     
## [47] "quoted_verified"         "retweet_status_id"      
## [49] "retweet_text"            "retweet_created_at"     
## [51] "retweet_source"          "retweet_favorite_count" 
## [53] "retweet_retweet_count"   "retweet_user_id"        
## [55] "retweet_screen_name"     "retweet_name"           
## [57] "retweet_followers_count" "retweet_friends_count"  
## [59] "retweet_statuses_count"  "retweet_location"       
## [61] "retweet_description"     "retweet_verified"       
## [63] "place_url"               "place_name"             
## [65] "place_full_name"         "place_type"             
## [67] "country"                 "country_code"           
## [69] "geo_coords"              "coords_coords"          
## [71] "bbox_coords"             "status_url"             
## [73] "name"                    "location"               
## [75] "description"             "url"                    
## [77] "protected"               "followers_count"        
## [79] "friends_count"           "listed_count"           
## [81] "statuses_count"          "favourites_count"       
## [83] "account_created_at"      "verified"               
## [85] "profile_url"             "profile_expanded_url"   
## [87] "account_lang"            "profile_banner_url"     
## [89] "profile_background_url"  "profile_image_url"
# followers count 
# friends count

gumbafollowersdata%>%mutate(label=ifelse(friends_count>250000|followers_count>1000000,screen_name,""))%>%ggplot(.,aes(friends_count,followers_count))+geom_point()+geom_text_repel(aes(label=label),size = 10/.pt,
      point.padding = 0.1, box.padding = .6, force = 1,min.segment.length = 0, seed = 7654)+theme_bw()+labs(x="Count of friends",y="Count of followers", options(scipen=3))+ggtitle("Friends greater than 250,000 or Followers greater than 1m")

Followers of @GonzagaMBA again..

The above data shows a few outliers, so we can dive deeper into the cluster to see who the larger makeup of followers are. This data filters out those outliers and graphs the followers of GonzagaMBA that have less than 250,000 friends or less than 1 million followers. The labels show the accounts that have more than 23,000 followers or more than 60,000 friends.

gumbafollowersdata%>%filter(friends_count<=250000 & followers_count<=1000000)%>%mutate(label=ifelse(friends_count>60000|followers_count>23000,screen_name,""))%>%ggplot(.,aes(friends_count,followers_count))+geom_point()+geom_text_repel(aes(label=label), size = 10/.pt,
      point.padding = 0.1, box.padding = .6, force = 1,
      min.segment.length = 0, seed = 7654)+theme_bw()+labs(x="Count of friends",y="Count of followers")+ggtitle("Friends less than 250k and Followers less than 1m")

Location of GUMBA followers

If we want to visualize the locations of the followers, we can gather the location variable from our dataset. Using the geocode() function, we can use Google’s geocoding API to create a new dataset called locs that contains the locations and their corresponding coordinates. Once we add these coordinates as longitude and latitude variables in our gumbafollowersdata data set, we can plot. We can add these points to a world map using ggplot which calls the MapsStatic API from Google which must be enabled in the Google Cloud console. The output map is a static map that shows the entire world with the plotted twitter followers.

gumbafollowersdata%>%select(location)%>%head()
## # A tibble: 6 x 1
##   location            
##   <chr>               
## 1 Seattle, WA         
## 2 EARTH               
## 3 Denver, CO          
## 4 Spokane, WA         
## 5 United States       
## 6 Somewhere, Out There
# 
# API KEY # for ggmap to geocode
 

locs=geocode(as.character(unique(gumbafollowersdata$location)))
  locs$address=unique(gumbafollowersdata$location)
  gumbafollowersdata$latitude=locs$lat[ match(gumbafollowersdata$location,locs$address)]
  gumbafollowersdata$longitude=locs$lon[ match(gumbafollowersdata$location,locs$address)]

#    save(gumbafollowersdata,file="gumbafollowers.Rda")
  # load("gumbafollowers.Rda")
  ggmap(get_map("world",zoom=1))+geom_point(data=gumbafollowersdata,aes(longitude,latitude),color="red")  

Locations again

This final map is an interactive one (thanks to leaflet()) with blue circles representing each follower. The circles can be interacted with and will display the city, state, country, or region that the user has set their location to.

gumbafollowersdata%>%leaflet()%>%addTiles() %>%
    addCircleMarkers(popup =~location,radius = 4,fillOpacity = 0.5,stroke = FALSE)