Assignment 4 - Discovering Trends

For this assignment I decided to take a look at what the highest trending topics on Twitter were in the United States, and then take a look at what is trending worldwide to see if there are any shared trends. I also sought to determine what stories were trending in some other countries, and then compare those to the topics trending worldwide.

Though this analysis cannot prove it conclusively, I wanted to get an idea as to which countries are most “in touch” with what is going on around the world. A very rough way to do this would be to figure out which countries had the most topics in common with what is considered as trending worldwide. I am fully aware that there may be varying reasons why an inidividual country would not share similar trending stories, but again this is a very rough analysis.

I started out downloading the packages that I thought I might need for this analysis, and connecting to my newly formed Twitter account.

library(twitteR)
library(tidytext)
library(stringr)
library(ggplot2)
library(ggvis)
library(knitr)
library(jsonlite)
library(dplyr)

consumer_key <- "xUWOyZg664cKQegalSSadgKQ1"
consumer_secret <- "s8yrUXzs54qTgV5a3ltjGLd6Tsx6tUQBoFYo53vRCaMEixEDAw"
access_token <- "797494742435188736-5va89eSP4sor8nWwEAQkD2ZuQLNSOsc"
access_secret <- "tVbqmiaLlFODESTHv7llXtlRH3oizPa6xz7PnR4kdHjfv"
setup_twitter_oauth(consumer_key, consumer_secret,
                    access_token, access_secret)

## [1] "Using direct authentication"

I then created tables showing the trending topics in the United States. I used the getTrends function in R, which requires you to lookup the woeid (Where on Earth ID) for whatever city or country you want to get information for. I also used the closestTrendLocations function out of curiosity. This function allows you to find the closest city and woeid to the coordinates you put in. I put in the coordinates for Portland. The closest city to us with a woeid is Boston. I’m not sure how they assign woeid’s, but apparently Portland didn’t make the cut.

closestTrendLocations(43.6615,-70.2553)

##     name       country   woeid
## 1 Boston United States 2367105

trendsUSA <- getTrends(23424977, exclude=NULL)

After bringing in the USA data from Twitter, I began bringing in Twitter data from other countries, as well as worldwide data. My hope with the country selection was to pick a reasonable sample of countries from around the world. It is definitely not perfect, but I think it is sufficient for a rough analysis.

trendsUK <- getTrends(23424975, exclude=NULL)
trendsRussia <- getTrends(23424936, exclude=NULL)
trendsMexico <- getTrends(23424900, exclude=NULL)
trendsFrance <- getTrends(23424819, exclude=NULL)
trendsAustralia <- getTrends(23424748, exclude=NULL)
trendsJapan <- getTrends(23424856, exclude=NULL)
trendsKenya <- getTrends(23424863, exclude=NULL)
trendsWorld <- getTrends(1, exclude=NULL)

I then created new tables that combined data from each individual country with the worldwide trends.

USAWorld <- inner_join(trendsUSA, trendsWorld, by = "name")
UKWorld <- inner_join(trendsUK, trendsWorld, by = "name")
RussiaWorld <- inner_join(trendsRussia, trendsWorld, by = "name")
MexicoWorld <- inner_join(trendsMexico, trendsWorld, by = "name")
FranceWorld <- inner_join(trendsFrance, trendsWorld, by = "name")
AustraliaWorld <- inner_join(trendsAustralia, trendsWorld, by = "name")
JapanWorld <- inner_join(trendsJapan, trendsWorld, by = "name")
KenyaWorld <- inner_join(trendsKenya, trendsWorld, by = "name")

After combining each country with the worldwide trending stories, I observed which data frames had the greatest and least amount of observations. France, USA, UK and Mexico consistently had the largest number of topics trending that were also trending worldwide.

FranceWorld <- subset(FranceWorld, select = c(name, woeid.x))
colnames(FranceWorld) <- c("Name-France", "woeid")
kable(FranceWorld)

Name-France	woeid
#Babysitting	23424819
#LaCouleurDesSentiments	23424819
#WeWantEyewitnessSeason2	23424819
#ImACeleb	23424819
#XFactor	23424819

MexicoWorld <- subset(MexicoWorld, select = c(name, woeid.x))
colnames(MexicoWorld) <- c("Name-Mexico", "woeid")
kable(MexicoWorld)

Name-Mexico	woeid
#YLoQueMásDuele	23424900
#BrazilGP	23424900
Packers	23424900
#Broncos	23424900

The country with the lowest amount of shared topics was Japan. There were zero topics trending in Japan that were also trending worldwide. There is no table , so I didn’t bother trying to show the results in this report.

USAWorld <- subset(USAWorld, select = c(name, woeid.x))
colnames(USAWorld) <- c("Name-USA", "woeid")
kable(USAWorld)

Name-USA	woeid
#DALvsPIT	23424977
Steve Bannon	23424977
#TheChase	23424977
#FlyEaglesFly	23424977
Packers	23424977
#WorldKindnessDay	23424977
#MIAvsSD	23424977
#HTTR	23424977
#BeforeSmartPhones	23424977
#AnimalRestaurantSmash	23424977
#WeWantEyewitnessSeason2	23424977

UKWorld <- subset(UKWorld, select = c(name, woeid.x))
colnames(UKWorld) <- c("Name-UK", "woeid")
kable(UKWorld)

Name-UK	woeid
#ImACeleb	23424975
#planetearth2	23424975
#XFactor	23424975
Steve Bannon	23424975
#DALvsPIT	23424975
#AnimalRestaurantSmash	23424975
#BeforeSmartPhones	23424975

What I found most interesting about this analysis is that whether a country had a high number of trending stories that were also trending worldwide had nothing to do with the size or development of the country. For instance, when only looking at Japan, they had a high number of countrywide trending topics, however none of them were also trending worldwide. Likewise Russia continually had a very small amount of countrywide trending topics that were also trending worldwide.

My initial thought was that all of the “developed” countries would have a high number of countrywide trends that were shared with the worldwide trends. This hypothesis ended up not being true.

There are a million and one reasons as to why the data shows what it does. The possible factors that might influence the results of this analysis include at least the following: time the data was pulled from Twitter, media restrictions in a particular country, access to news outlets and technology, affects of globalization on the country.

Also it is worth noting that the results of the getTrends function continually brought in news results for trending topics, as they constantly change with so many users on Twitter. My analysis was based on the average results I saw over the course of preparing the analysis.

In order to present a reasonable hypothesis on the reasons for the results of this analysis, you would need many other data sources. I simply thought it would be interesting (or cool) to look into it a little bit.