Part 1. Frequencies on World Maps

Using the streamR parsing command and what you learned in assignment 1, change the .json file into a r data frame and divide the results into five subsets for each of the five viable candidates in the race: Clinton, Cruz, Rubio, Sanders, Trump. Generate five maps showing the origins of tweets on each of the candidates. Do most of them come from inside the U.S.?

The following maps show the geolocation of tweets mentioning five candidates that were still running campaigns on Super Tuesday 2016 (March 1, 2016) for the 2016 Presidential Primary Elections. These were Hilary Clinton, Bernie Sanders, Marco Rubio, Ted Cruz, and Donald Trump. Each map shows the tweets that mentioned only the pertinent candidate.

First, comparing the two candidates for the Democrat Party, the maps look very similar with subtle distinctions in the volume and map-spread of the tweets. The locations of the tweets were concentrated in the US and with similar concentrations therein. This was also the case for the locations outside of the US. That is, there were noticeable concentrations in Western Europe and India. However, the tweets mentioning Clinton showed up in more countries than those mentioning Sanders. In particular, there are blue dots scattered throughout Latin America in what looks like Mexico, Colombia, Brazil, and Chile as well as scatterings in South Africa, Mauritania, and Tanzania. Based on these map comparisons, there were fewer tweets in non-US countries for mentions of Sanders. This could indicate that Clinton’s name has broader recognition outside of the US, which is a reasonable assumption given that she was secretary of State for a presidential term and is the wife of former president, Bill Clinton.

There are comparable distinctions for the remaining three maps indicating tweets mentioning the Republican party candidates. Once again, the geolocation spread of the tweets made from within the US do not have noticeable differences across the three candidates, save for the greater volume of those mentioning Trump. The clearest difference is seen outside of the US where there were virtually no tweets mentioning Cruz and some that mentioned Rubio in Latin America, Western Europe, India, and Australia. Looking at the final map showing tweets that mentioned Trump, this is clearly the most concentrated and extensive spread of tweets, comparing all five maps. Unlike the previous four maps, there were tweets throughout East and South Asia, the Near East, and in countries in Latin America and Africa that did not display tweets for the previous four candidates. It seems that, unlike the other candidates, Trump was generating noticeable Twitter noise outside of the US.

Part 2. Relative Frequencies By State

Count the number of geolocated tweets from each U.S. state on each of the five candidates. Based on the counts produce five U.S. maps, one for each candidate, with states color-coded for showing the proportion of tweets coming from each state. Describe and interpret what you see in the plots. Is it true that there were more tweets in states holding primaries? Can you see any difference among those states with primaries, and those without?

While no inferential analyses can be made from these maps, there are some insightful patterns to the relative proportions of tweets for each candidate mention by state. These proportions may indicate relative social media interest, positive or negative, for the five candidates.

Overall, there is not a clear relation between those states that held primary elections on Super Tuesday 2016 and the proportion of tweets mentioning each of the candidates by state. Indeed, stats like Washington, California, Illinois, and Florida had relatively more tweets for the candidates in their respective US maps, but those states did not hold primaries on Super Tuesday. Moreover, Wyoming and North Dakota held Super Tuesday primaries but fell under the lowest proportions across all of the candidates. This suggests that these maps may be more indicative of noise among those who use Twitter and those in more densely populated areas.

There were however a few interesting observations from comparing the maps. Between the Democratic candidates, Clinton had more tweets in South Carolina which had a primary in February as well as in Pennsylvania and Oklahoma (which did have a Super Tuesday primary). Minnesota and Georgia did have relatively high numbers of tweets for all the candidates, and may indicate that the tweets from these states may have been greater due to Super Tuesday.

Comparing the candidates by party, there were not many tweets for either Sanders or Clinton coming from Vermont, which participated in Super Tuesday and is known to be a safely blue state. Clinton, however, did have greater proportions in Oklahoma despite Sanders having received more delegates than she did in Oklahoma on Super Tuesday. To be sure, this could have been the driving reason for the resulting proportions.

On the Republican side, Cruz and Rubio both seem to have generated more noise in the Southeastern states like Tennessee and Alabama compared to the other candidates. On the other hand, the map for tweet frequencies for mentioning Trump look very similar to that of Clinton, with greatest proportions in the states East of the Midwestern states, suggesting similar social media patterns for Clinton and Trump.