For this assignment I decided to take a look at what the highest trending topics on Twitter were in the United States, and then take a look at what is trending worldwide to see if there are any shared trends. I also sought to determine what stories were trending in some other countries, and then compare those to the topics trending worldwide.
Though this analysis cannot prove it conclusively, I wanted to get an idea as to which countries are most “in touch” with what is going on around the world. A very rough way to do this would be to figure out which countries had the most topics in common with what is considered as trending worldwide. I am fully aware that there may be varying reasons why an inidividual country would not share similar trending stories, but again this is a very rough analysis.
I started out downloading the packages that I thought I might need for this analysis, and connecting to my newly formed Twitter account.
library(twitteR)
library(tidytext)
library(stringr)
library(ggplot2)
library(ggvis)
library(knitr)
library(jsonlite)
library(dplyr)
consumer_key <- "xUWOyZg664cKQegalSSadgKQ1"
consumer_secret <- "s8yrUXzs54qTgV5a3ltjGLd6Tsx6tUQBoFYo53vRCaMEixEDAw"
access_token <- "797494742435188736-5va89eSP4sor8nWwEAQkD2ZuQLNSOsc"
access_secret <- "tVbqmiaLlFODESTHv7llXtlRH3oizPa6xz7PnR4kdHjfv"
setup_twitter_oauth(consumer_key, consumer_secret,
access_token, access_secret)
## [1] "Using direct authentication"
I then created tables showing the trending topics in the United States. I used the getTrends function in R, which requires you to lookup the woeid (Where on Earth ID) for whatever city or country you want to get information for. I also used the closestTrendLocations function out of curiosity. This function allows you to find the closest city and woeid to the coordinates you put in. I put in the coordinates for Portland. The closest city to us with a woeid is Boston. I’m not sure how they assign woeid’s, but apparently Portland didn’t make the cut.
closestTrendLocations(43.6615,-70.2553)
## name country woeid
## 1 Boston United States 2367105
trendsUSA <- getTrends(23424977, exclude=NULL)
After bringing in the USA data from Twitter, I began bringing in Twitter data from other countries, as well as worldwide data. My hope with the country selection was to pick a reasonable sample of countries from around the world. It is definitely not perfect, but I think it is sufficient for a rough analysis.
trendsUK <- getTrends(23424975, exclude=NULL)
trendsRussia <- getTrends(23424936, exclude=NULL)
trendsMexico <- getTrends(23424900, exclude=NULL)
trendsFrance <- getTrends(23424819, exclude=NULL)
trendsAustralia <- getTrends(23424748, exclude=NULL)
trendsJapan <- getTrends(23424856, exclude=NULL)
trendsKenya <- getTrends(23424863, exclude=NULL)
trendsWorld <- getTrends(1, exclude=NULL)
I then created new tables that combined data from each individual country with the worldwide trends.
USAWorld <- inner_join(trendsUSA, trendsWorld, by = "name")
UKWorld <- inner_join(trendsUK, trendsWorld, by = "name")
RussiaWorld <- inner_join(trendsRussia, trendsWorld, by = "name")
MexicoWorld <- inner_join(trendsMexico, trendsWorld, by = "name")
FranceWorld <- inner_join(trendsFrance, trendsWorld, by = "name")
AustraliaWorld <- inner_join(trendsAustralia, trendsWorld, by = "name")
JapanWorld <- inner_join(trendsJapan, trendsWorld, by = "name")
KenyaWorld <- inner_join(trendsKenya, trendsWorld, by = "name")
After combining each country with the worldwide trending stories, I observed which data frames had the greatest and least amount of observations. France, USA, UK and Mexico consistently had the largest number of topics trending that were also trending worldwide.
FranceWorld <- subset(FranceWorld, select = c(name, woeid.x))
colnames(FranceWorld) <- c("Name-France", "woeid")
kable(FranceWorld)
| Name-France | woeid |
|---|---|
| #Babysitting | 23424819 |
| #LaCouleurDesSentiments | 23424819 |
| #WeWantEyewitnessSeason2 | 23424819 |
| #ImACeleb | 23424819 |
| #XFactor | 23424819 |
MexicoWorld <- subset(MexicoWorld, select = c(name, woeid.x))
colnames(MexicoWorld) <- c("Name-Mexico", "woeid")
kable(MexicoWorld)
| Name-Mexico | woeid |
|---|---|
| #YLoQueMásDuele | 23424900 |
| #BrazilGP | 23424900 |
| Packers | 23424900 |
| #Broncos | 23424900 |
The country with the lowest amount of shared topics was Japan. There were zero topics trending in Japan that were also trending worldwide. There is no table , so I didn’t bother trying to show the results in this report.
USAWorld <- subset(USAWorld, select = c(name, woeid.x))
colnames(USAWorld) <- c("Name-USA", "woeid")
kable(USAWorld)
| Name-USA | woeid |
|---|---|
| #DALvsPIT | 23424977 |
| Steve Bannon | 23424977 |
| #TheChase | 23424977 |
| #FlyEaglesFly | 23424977 |
| Packers | 23424977 |
| #WorldKindnessDay | 23424977 |
| #MIAvsSD | 23424977 |
| #HTTR | 23424977 |
| #BeforeSmartPhones | 23424977 |
| #AnimalRestaurantSmash | 23424977 |
| #WeWantEyewitnessSeason2 | 23424977 |
UKWorld <- subset(UKWorld, select = c(name, woeid.x))
colnames(UKWorld) <- c("Name-UK", "woeid")
kable(UKWorld)
| Name-UK | woeid |
|---|---|
| #ImACeleb | 23424975 |
| #planetearth2 | 23424975 |
| #XFactor | 23424975 |
| Steve Bannon | 23424975 |
| #DALvsPIT | 23424975 |
| #AnimalRestaurantSmash | 23424975 |
| #BeforeSmartPhones | 23424975 |
What I found most interesting about this analysis is that whether a country had a high number of trending stories that were also trending worldwide had nothing to do with the size or development of the country. For instance, when only looking at Japan, they had a high number of countrywide trending topics, however none of them were also trending worldwide. Likewise Russia continually had a very small amount of countrywide trending topics that were also trending worldwide.
My initial thought was that all of the “developed” countries would have a high number of countrywide trends that were shared with the worldwide trends. This hypothesis ended up not being true.
There are a million and one reasons as to why the data shows what it does. The possible factors that might influence the results of this analysis include at least the following: time the data was pulled from Twitter, media restrictions in a particular country, access to news outlets and technology, affects of globalization on the country.
Also it is worth noting that the results of the getTrends function continually brought in news results for trending topics, as they constantly change with so many users on Twitter. My analysis was based on the average results I saw over the course of preparing the analysis.
In order to present a reasonable hypothesis on the reasons for the results of this analysis, you would need many other data sources. I simply thought it would be interesting (or cool) to look into it a little bit.