Garrett Stanford 23 January 2020
The goal of this assignment is simple: I want you to produce four different figures using ggplot2. They don’t have to be particularly complicated, identify causal relationships, or anything like that. I just want you to stretch your visualization legs and demonstrate any new (or existing) ggplot2 skills that you have acquired since our first lecture. Some additional points:
You are free to use any dataset that comes in-built with base R, or bundled together with an external R package. See here for an impressive list.
That being said, I would especially encourage you to use your own data.
readr, read_excel and haven packages first, though.data/ folder of this repo.You can use the same dataset for all four of your plots.Or you can use a new dataset for each plot. Regardless of what you choose, I want you to try and use different geoms for each figure. Creating an intermediate plotting object and then layering on top of it is perfectly fine, though.
Any other ggplot2 skills and add-ons like facetting, changing aesthetic scales or legends, using different themes (e.g. from the ggthemes package), animation, etc. are all welcome and encouraged.
Lastly, I want to see the code that produces the figures. Don’t use echo=FALSE in any of the code chunks.
Here is a chunk for you to load your libraries. (You could also have done that in the “setup” chunk at the top of this document… or, for that matter, in any of the individual figure chunks below.) Feel free to load as many other packages and insert as many additional code chunks (Ctr+Alt+I) as you need.
library("pacman")
p_load(dplyr, ggplot2, readr, rmarkdown, tidyverse, here,
lfe, ggrepel, ggthemes, ggpubr, gganimate, viridis, data.table, ggrepel, ggmap, maps, mapdata)Load/read in the data. (Delete this chunk if you don’t need it.)
# library(readr)
# cool_data <- "data/my-cool-data.csv"
trade_data <- read_csv("data/NR_depletion_3_data.csv")## Warning: Missing column names filled in: 'X1' [1]
## Parsed with column specification:
## cols(
## .default = col_double(),
## iso3c = col_character(),
## iso2c = col_character(),
## country.x = col_character(),
## country.y = col_character()
## )
## See spec(...) for full column specifications.
#clean data, looking at five year intervals and removing Gabon because it showed abnormal qualities
trade_data<- trade_data%>% filter(country.x != "Gabon")
ninety5_df<-trade_data%>%filter(date==1995& country.x != "Gabon")
two_df<-trade_data%>%filter(date==2000& country.x != "Gabon")
two5_df<-trade_data%>%filter(date==2005& country.x != "Gabon")
two10_df<-trade_data%>%filter(date==2010& country.x != "Gabon")
two15_df<-trade_data%>%filter(date==2015& country.x != "Gabon")
total_five_year_df<- rbind(ninety5_df,two_df, two10_df, two5_df)This data is collected from the World Bank and Polity IV. The data is being used by yours truly to look at the relationship between trade openness, industry compostion and natural resource depletion rates. The countries included are low and lower-middle income countries. The data is from 1990 to 2015.
# Plot code here; my code here
ggplot(data=total_five_year_df, aes(x=emp_indst, y=nr_depletion))+
geom_point(alpha=0.6, aes(size=pop_density, color= country.x), show.legend = FALSE) +
theme_clean()+
xlab("Share of Employment in Manufacturing")+ylab("Natural Resource Depletion Rate")+
facet_wrap(~date)+
geom_smooth(method=lm,formula= y~ x, se=FALSE)+
scale_color_viridis(discrete = TRUE, option = "D")+
scale_fill_viridis(discrete = TRUE) The figure displays the level of NR depeltion as openess to trade changes. The color of the points delineate the different countries and the size of the points delineate the population density for the country. What I find intersting is that up until 2010, roughly, the relationship between resource depletion and openness to trade is negative. This observation may run contrary to what we may think of as the Resource Haven Hypothesis.
The data source is the same as in the previous figure.
# Plot code here
ggplot(trade_data, aes(x=gdp_capita, y=emp_srvs, color=country.x, size=pop_density))+
geom_point(alpha = 0.3, show.legend = FALSE)+
labs(title = "Year: {frame_time}", x = 'GDP per Capita', y = 'Level of Employment in the Service Industry')+
transition_time(date, range = c(1990, 2015)) +
theme_minimal()+
ease_aes('linear')## Warning: Removed 31 rows containing missing values (geom_point).
## Warning: Removed 31 rows containing missing values (geom_point).
## Warning: Removed 31 rows containing missing values (geom_point).
## Warning: Removed 31 rows containing missing values (geom_point).
## Warning: Removed 31 rows containing missing values (geom_point).
Colors and sizes again indicate country and population density respectively. Now we are looking at the relationship between the percent of the formal workforce in the service industry as a country becomes richer. We watch this relationship evolve from 1990 to 2015. First, we observe that the realtionship is signifcantly positive–that is to say, the richer a country is the higher the level of employment in the service industry. This is not a new economic fact, but it is assuring that the data is consistent with the theory. What I find interesting is that the realtionship becomes looser as time goes on. Where in the 90’s the realtionship is quite clear, as time passes for countires exhibiting higher levels of GDP per capita the relationship becomes more convoluted.
The data was collected by creating an API with Twitter and then using search_tweets to look for tweets containing “Burrow.”
# Plot code here
#get some twitter data
p_load(rtweet, httr, tidyverse, ggpmisc)
library(readr)
lsu_twitter_save_data <- read_csv("data/lsu_twitter_save_data.csv")## Parsed with column specification:
## cols(
## .default = col_character(),
## created_at = col_datetime(format = ""),
## display_text_width = col_double(),
## is_quote = col_logical(),
## is_retweet = col_logical(),
## favorite_count = col_double(),
## retweet_count = col_double(),
## quote_count = col_logical(),
## reply_count = col_logical(),
## symbols = col_logical(),
## ext_media_type = col_logical(),
## quoted_created_at = col_datetime(format = ""),
## quoted_favorite_count = col_double(),
## quoted_retweet_count = col_double(),
## quoted_followers_count = col_double(),
## quoted_friends_count = col_double(),
## quoted_statuses_count = col_double(),
## quoted_verified = col_logical(),
## retweet_status_id = col_logical(),
## retweet_text = col_logical(),
## retweet_created_at = col_logical()
## # ... with 21 more columns
## )
## See spec(...) for full column specifications.
## Warning: 1 parsing failure.
## row col expected actual file
## 16143 symbols 1/0/T/F/TRUE/FALSE grnf 'data/lsu_twitter_save_data.csv'
#LSU graph
ts_plot(lsu_twitter_save_data, "1 minute")+theme_minimal() +
theme(axis.text.x=element_blank())+
labs(
x = NULL, y = "Mention Count",
title = "The Mention of LSU's quarterback \'Burrow\' on Twitter",
caption = "\nSource: Data collected from Twitter"
)+geom_line(color="plum2", size = .5,show.legend = F)The figure displays the mention of ‘Burrow’, the last name of the quarterback of the LSU Tigers, on Twitter leading up to, during, and after the championship game in which LSU won. I imagine this isn’t especially interesting to a wide audience but I enjoyed “scraping” data off of Twitter. It could be an interesting/useful tool for future enterprises. Side note: I was having trouble with the ‘Timezone’ component of the Twitter data.
The data came from the package “map data” and also just handmade ggmap coordinates for cities that have an NBA team
# Plot code here
#ggmap
register_google(key = "AIzaSyA8qjsaZ__iuOCk_PLsKXIa6ZR-ljl00ko", write = TRUE )## Replacing old key (AIzaSyA8qjsaZ__iuOCk_PLsKXIa6ZR) with new key in /Users/garrettstanford/.Renviron
#create city name vector for NBA cities
temp<- matrix(c("San Francisco", "Atlanta", "Chicago", "Portland", "Dallas",
"Houston", "Denver", "Manhattan", "Charlotte", "Los Angeles",
"Philadelphia", "Orlando", "Cleveland",
"Minneapolis", "New Orleans", "Boston",
"Toronto", "Miami", "San Antonio", "Oklahoma City",
"Milwaukee", "Salt Lake", "Brooklyn", "Phoenix", "Detroit",
"Sacramento", "Memphis", "Washington D.C.", "Indianapolis"
), byrow = TRUE)
#get coordinates for city vector
City_Location<- geocode(temp)## Source : https://maps.googleapis.com/maps/api/geocode/json?address=San+Francisco&key=xxx-ljl00ko
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Atlanta&key=xxx-ljl00ko
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Chicago&key=xxx-ljl00ko
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Portland&key=xxx-ljl00ko
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Dallas&key=xxx-ljl00ko
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Houston&key=xxx-ljl00ko
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Denver&key=xxx-ljl00ko
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Manhattan&key=xxx-ljl00ko
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Charlotte&key=xxx-ljl00ko
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Los+Angeles&key=xxx-ljl00ko
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Philadelphia&key=xxx-ljl00ko
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Orlando&key=xxx-ljl00ko
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Cleveland&key=xxx-ljl00ko
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Minneapolis&key=xxx-ljl00ko
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=New+Orleans&key=xxx-ljl00ko
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Boston&key=xxx-ljl00ko
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Toronto&key=xxx-ljl00ko
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Miami&key=xxx-ljl00ko
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=San+Antonio&key=xxx-ljl00ko
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Oklahoma+City&key=xxx-ljl00ko
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Milwaukee&key=xxx-ljl00ko
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Salt+Lake&key=xxx-ljl00ko
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Brooklyn&key=xxx-ljl00ko
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Phoenix&key=xxx-ljl00ko
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Detroit&key=xxx-ljl00ko
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Sacramento&key=xxx-ljl00ko
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Memphis&key=xxx-ljl00ko
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Washington+D.C.&key=xxx-ljl00ko
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Indianapolis&key=xxx-ljl00ko
#create usa map
usa <- map_data("usa")
usa_map<-ggplot() +
geom_polygon(data = usa, aes(x=long, y = lat, group = group), fill = "pink", color = "green") +
coord_fixed(1.3)
#add cities to map
usa_map+geom_point(data = City_Location, aes(x = lon, y = lat), color = "black", size = 1.5) +
theme_void()+labs(
title = "NBA Cities Watermelon Map",
caption = "\nSource: Data collected from Twitter"
)We are looking at a map of the United States that has cities with NBA teams modeled as seeds of a United States of Watermelon. I think the interesting part of this figure was how cool ggmap is and it may be amusing that the picture looks like a watermelon.