It’s a good idea to load your libraries at the top of the Rmd document so that everyone can see what you’re using. Similarly, it’s good practice to set cache=FALSE to ensure that the libraries are dynamically loaded each time you knit the document.
Use httr::GET() to fetch the EIA excel file for us from web. (We’ll learn more about httr, GET and other HTTP methods when we get to webscraping next week.)
# library(here) ## Already loaded
# library(httr) ## Already loaded
url <- "https://www.eia.gov/coal/archive/coal_historical_exports.xlsx"
## Only download the file if we need to
if(!file.exists(here::here("data/coal.xlsx"))) {
GET(url, write_disk(here::here("data/coal.xlsx")))
}Next, we read in the file.
# library(readxl) Already loaded
coal_df <- read_excel(here::here("/data/coal.xlsx"), skip=3, na=".")We are now ready to go.
The column (i.e. variable) names aren’t great: Spacing, uppercase letters, etc.
## [1] "Year" "Quarter"
## [3] "Type" "Customs District"
## [5] "Coal Origin Country" "Coal Destination Country"
## [7] "Steam Coal" "Steam Revenue"
## [9] "Metallurgical" "Metallurgical Revenue"
## [11] "Total" "Total Revenue"
## [13] "Coke" "Coke Revenue"
## # A tibble: 10,336 x 14
## year quarter type customs_district coal_origin_cou… coal_destinatio…
## <dbl> <dbl> <chr> <chr> <chr> <chr>
## 1 2002 1 Coal… Anchorage, AK United States South Korea (Re…
## 2 2002 1 Coal… Baltimore, MD United States Belgium
## 3 2002 1 Coal… Baltimore, MD United States Brazil
## 4 2002 1 Coal… Baltimore, MD United States Canada
## 5 2002 1 Coal… Baltimore, MD United States Germany, Federa…
## 6 2002 1 Coal… Baltimore, MD United States Ireland
## 7 2002 1 Coal… Baltimore, MD United States Israel
## 8 2002 1 Coal… Baltimore, MD United States Jamaica
## 9 2002 1 Coal… Baltimore, MD United States Netherlands
## 10 2002 1 Coal… Baltimore, MD United States Norway
## # … with 10,326 more rows, and 8 more variables: steam_coal <dbl>,
## # steam_revenue <dbl>, metallurgical <dbl>, metallurgical_revenue <dbl>,
## # total <dbl>, total_revenue <dbl>, coke <dbl>, coke_revenue <dbl>
Clean them.
Hint: Use either gsub() and regular expressions or, more simply, the janitor() package. You will need to install the latter first.
Plot the US’s total coal exports over time by year ONLY. What secular trends do you notice in the data?
## [1] "year" "quarter"
## [3] "type" "customs_district"
## [5] "coal_origin_country" "coal_destination_country"
## [7] "steam_coal" "steam_revenue"
## [9] "metallurgical" "metallurgical_revenue"
## [11] "total" "total_revenue"
## [13] "coke" "coke_revenue"
us_df<-coal_df%>%
group_by(year)%>%
summarise(us_total=sum(total, na.rm=T)/1000)
ggplot(data = us_df, aes(x=year, y=us_total))+
ggtitle(label = "Total U.S. Coal Exports", subtitle = "from 2002 until 2019 (Tron Theme)")+
scale_x_continuous( "Date",
breaks = c(2002, 2008, 2009, 2012, 2016, 2018),
minor_breaks = c(2002))+
scale_y_continuous("Tons of Coal \n (in thousands)",
minor_breaks = c(25000, 20000, 15000, 10000, 5000),labels = scales::comma)+
geom_point(color="green")+
geom_line(linetype="dashed", color="blue")+
theme(panel.grid.major = element_line(colour = "gray"),
panel.background = element_rect(fill = "black", colour = "grey50"),
axis.text.y = element_text(face="bold", color="darkblue",
size=10),
axis.text.x = element_text(face="bold", color="darkblue",
size=10),
plot.title = element_text(size=15,hjust = 0.5),
plot.subtitle = element_text(size = 11, hjust = 0.5),
axis.title = element_text(size=13))Hints: Use the tidyverse. If you want nicely formatted y-axis label, add + scale_y_continuous(labels = scales::comma) to your ggplot2 code.
There are certainly some trends. With the exception of the “Great Recession” (pretty ridiculous name imo), there is a steady increase in exports in 2012. According to some googled sources, the decline in coal exports was due to a drop in European demand as well as an increase in global supply. I am not sure about the incline in 2016, but Trump has done a number on exports which may be reflected by the precipitous plummet from 2018 to 2019. Generally, many of these trends will be explained by trade policy, coal production technology, and alternative energy technology.
Now do the same as the above, expect aggregate by quarter and year. Do you notice any seasonality that was masked from the yearly averages?
Hint: See ?lubridate::yq(). You will probably also want to use the paste() function (along with dplyr::mutate()), or tidyr::unite()
coal_df<- coal_df%>% mutate(quarters=quarter) #create a separate quarter variable for plotting
coal_df<-coal_df%>%unite("date", year:quarter) #combine the quarter and year column into a single variable
coal_df<-coal_df%>%mutate(date=lubridate::yq(coal_df$date)) #lubridate that date column
coal_df<-coal_df%>%select(date, quarters, everything()) #rearrange so that years and date are on the LHS
coal_df #make sure everything looks ok## # A tibble: 10,336 x 14
## date quarters type customs_district coal_origin_cou… coal_destinatio…
## <date> <dbl> <chr> <chr> <chr> <chr>
## 1 2002-01-01 1 Coal… Anchorage, AK United States South Korea (Re…
## 2 2002-01-01 1 Coal… Baltimore, MD United States Belgium
## 3 2002-01-01 1 Coal… Baltimore, MD United States Brazil
## 4 2002-01-01 1 Coal… Baltimore, MD United States Canada
## 5 2002-01-01 1 Coal… Baltimore, MD United States Germany, Federa…
## 6 2002-01-01 1 Coal… Baltimore, MD United States Ireland
## 7 2002-01-01 1 Coal… Baltimore, MD United States Israel
## 8 2002-01-01 1 Coal… Baltimore, MD United States Jamaica
## 9 2002-01-01 1 Coal… Baltimore, MD United States Netherlands
## 10 2002-01-01 1 Coal… Baltimore, MD United States Norway
## # … with 10,326 more rows, and 8 more variables: steam_coal <dbl>,
## # steam_revenue <dbl>, metallurgical <dbl>, metallurgical_revenue <dbl>,
## # total <dbl>, total_revenue <dbl>, coke <dbl>, coke_revenue <dbl>
us_seasonal_df<-coal_df%>%filter(coal_origin_country=="United States")%>%mutate()%>%
group_by(date)%>%
summarise(us_total=sum(total, na.rm = T)/1000)%>%
mutate(quarter=quarters(date))
us_seasonal_df## # A tibble: 71 x 3
## date us_total quarter
## <date> <dbl> <chr>
## 1 2002-01-01 9253. Q1
## 2 2002-04-01 11043. Q2
## 3 2002-07-01 9257. Q3
## 4 2002-10-01 10050. Q4
## 5 2003-01-01 8518. Q1
## 6 2003-04-01 11450. Q2
## 7 2003-07-01 12094. Q3
## 8 2003-10-01 10952. Q4
## 9 2004-01-01 9688. Q1
## 10 2004-04-01 15255. Q2
## # … with 61 more rows
ggplot(data = us_seasonal_df, aes(x=date, y=us_total, color=quarter, shape=quarter))+
ggtitle(label = "Total U.S. Coal Exports", subtitle = "quarterly data from 2002 until 2019")+
theme(legend.position = "right",
axis.text.x = element_text( color="darkseagreen4",
size=10, angle = 45),
axis.text.y = element_text( color="darkseagreen4",
size=10, angle = 45),
panel.grid.major = element_line(colour = "gray"),
panel.background = element_rect(fill = "white", colour = "grey50"),
plot.title = element_text(size=15,hjust = 0.5),
plot.subtitle = element_text(size = 11, hjust = 0.5),
axis.title = element_text(size=13))+
scale_y_continuous("Tons of Coal \n (in thousands)", labels = scales::comma, breaks = c( 15000, 20000, 25000,30000,35000))+
xlab("Date")+
geom_point()## [1] "date" "us_total" "quarter"
us_seasonal_avg_df<-us_seasonal_df%>% #create a dataframe for a histogram
group_by(quarter)%>%
summarise(avg_total=mean(us_total))%>%mutate(avg_rounded=as.integer(avg_total))
ggplot(data= us_seasonal_avg_df, aes(quarter, avg_total,label = avg_total, fill=quarter))+
scale_y_continuous("Tons of Coal \n (in thousands)", labels = scales::comma)+
coord_cartesian(ylim = c(10000, 25000))+
scale_fill_brewer(palette="Dark2")+
theme(legend.position = "right",
axis.text.x = element_text( color="darkseagreen4",
size=10, angle = 45),
axis.text.y = element_text( color="darkseagreen4",
size=10, angle = 45),
panel.grid.major = element_line(colour = "gray"),
panel.background = element_rect(fill = "white", colour = "grey50"),
plot.title = element_text(size=15,hjust = 0.5),
plot.subtitle = element_text(size = 11, hjust = 0.5),
axis.title = element_text(size=13))+
geom_text(aes(label = avg_rounded), vjust = -4.5, hjust=0)+
ggtitle(label = "Average U.S. Coal Exports by Quarter", subtitle = "2002 until 2019")+
geom_bar(stat = "identity")It looks like on average more coal is exported in the second quarter and fourth quarter. The second quarter, especially, indicates a higher level of exports. Looking at the graphic representation, it also seems like the seasonality effect was more pronounced before 2010–which is interesting
Create a new data frame called coal_country that aggregates total exports by destination country (and by year and quarter).
coal_country<-coal_df%>%group_by(coal_destination_country, date)%>%
mutate(destination_country_total=sum(total, na.rm = T))
coal_country## # A tibble: 10,336 x 15
## # Groups: coal_destination_country, date [4,358]
## date quarters type customs_district coal_origin_cou… coal_destinatio…
## <date> <dbl> <chr> <chr> <chr> <chr>
## 1 2002-01-01 1 Coal… Anchorage, AK United States South Korea (Re…
## 2 2002-01-01 1 Coal… Baltimore, MD United States Belgium
## 3 2002-01-01 1 Coal… Baltimore, MD United States Brazil
## 4 2002-01-01 1 Coal… Baltimore, MD United States Canada
## 5 2002-01-01 1 Coal… Baltimore, MD United States Germany, Federa…
## 6 2002-01-01 1 Coal… Baltimore, MD United States Ireland
## 7 2002-01-01 1 Coal… Baltimore, MD United States Israel
## 8 2002-01-01 1 Coal… Baltimore, MD United States Jamaica
## 9 2002-01-01 1 Coal… Baltimore, MD United States Netherlands
## 10 2002-01-01 1 Coal… Baltimore, MD United States Norway
## # … with 10,326 more rows, and 9 more variables: steam_coal <dbl>,
## # steam_revenue <dbl>, metallurgical <dbl>, metallurgical_revenue <dbl>,
## # total <dbl>, total_revenue <dbl>, coke <dbl>, coke_revenue <dbl>,
## # destination_country_total <dbl>
Hint: Make sure to type coal_country (on its own separate line) after you have created this new data frame so that I (and you) can see the preview output in the markdown document.
It looks like some countries are missing data for a number of years and periods (e.g. Albania). Confirm that this is the case. What do you think is happening here?
## # A tibble: 1 x 1
## n
## <int>
## 1 150
coal_country%>%ddply(~coal_destination_country,summarise,n=length(unique(date)))%>%
arrange(desc(n)) #count the number of unique date entries for each importer## coal_destination_country n
## 1 Belgium 71
## 2 Brazil 71
## 3 Canada 71
## 4 Italy 71
## 5 Japan 71
## 6 Mexico 71
## 7 Netherlands 71
## 8 South Korea (Republic of Korea) 71
## 9 Spain 71
## 10 Turkey 71
## 11 United Kingdom 71
## 12 Venezuela 71
## 13 Costa Rica 70
## 14 France 70
## 15 Germany, Federal Republic of 70
## 16 Sweden 70
## 17 Chile 69
## 18 Egypt 69
## 19 El Salvador 69
## 20 India 69
## 21 Singapore 69
## 22 Thailand 69
## 23 Argentina 68
## 24 China 68
## 25 Colombia 68
## 26 Guatemala 68
## 27 Norway 68
## 28 Taiwan 68
## 29 Saudi Arabia 67
## 30 Trinidad and Tobago 66
## 31 Panama 65
## 32 Ecuador 63
## 33 Dominican Republic 61
## 34 Peru 60
## 35 South Africa 60
## 36 Croatia 58
## 37 Morocco 58
## 38 Australia 57
## 39 Iceland 57
## 40 Indonesia 57
## 41 Malaysia 57
## 42 Romania 57
## 43 Honduras 56
## 44 Finland 55
## 45 Austria 53
## 46 Portugal 52
## 47 United Arab Emirates 52
## 48 Israel 51
## 49 Ukraine 50
## 50 Jamaica 49
## 51 Poland 48
## 52 Pakistan 47
## 53 Algeria 45
## 54 Russia 45
## 55 Uruguay 45
## 56 Angola 43
## 57 Gabon 39
## 58 New Zealand 37
## 59 Slovenia 37
## 60 Slovakia 35
## 61 Switzerland 32
## 62 Vietnam 32
## 63 Serbia 30
## 64 Brunei 27
## 65 Bermuda 26
## 66 Hong Kong 26
## 67 Bahamas 25
## 68 Bulgaria 25
## 69 Denmark (Except Greenland) 24
## 70 Bolivia 23
## 71 Kazakhstan 23
## 72 Ireland 21
## 73 Oman 21
## 74 Nigeria 20
## 75 Greece 18
## 76 Latvia 18
## 77 Hungary 14
## 78 Qatar 14
## 79 Togo 13
## 80 Philippines 12
## 81 Equatorial Guinea 11
## 82 Jordan 11
## 83 Libya 11
## 84 Bosnia and Herzegovina 10
## 85 Cameroon 10
## 86 Iraq 10
## 87 Netherlands Antilles 10
## 88 Nicaragua 10
## 89 Ivory Coast 9
## 90 Mozambique 9
## 91 Aruba 7
## 92 Congo (Brazzaville) 7
## 93 Tunisia 7
## 94 Azerbaijan 6
## 95 Ghana 6
## 96 Lebanon 6
## 97 Malta 6
## 98 Saint Lucia 6
## 99 Senegal 6
## 100 Suriname 6
## 101 Bangladesh 5
## 102 Barbados 5
## 103 Curacao 5
## 104 Kenya 5
## 105 Madagascar 5
## 106 Armenia 4
## 107 Belize 4
## 108 Burma (Myanmar) 4
## 109 Cayman Islands 4
## 110 Dominica 4
## 111 Kuwait 4
## 112 British Virgin Islands 3
## 113 Cambodia 3
## 114 Gibraltar 3
## 115 Guyana 3
## 116 Paraguay 3
## 117 Tanzania (United Republic of Tanzania) 3
## 118 Antigua and Barbuda 2
## 119 Central African Republic 2
## 120 Congo (Kinshasa) 2
## 121 Czech Republic 2
## 122 Faroe Islands 2
## 123 Georgia 2
## 124 Guadeloupe 2
## 125 Liberia 2
## 126 Luxembourg 2
## 127 Montenegro 2
## 128 Saint Kitts and Nevis 2
## 129 Saint Vincent and the Grenadines 2
## 130 Sint Maarten 2
## 131 Turkmenistan 2
## 132 Albania 1
## 133 Andorra 1
## 134 Anguilla 1
## 135 Bahrain 1
## 136 Benin 1
## 137 British Indian Ocean Territory 1
## 138 Eritrea 1
## 139 Estonia 1
## 140 Fedrated States of Micronesia 1
## 141 French Southern and Antarctic Lands 1
## 142 Grenada 1
## 143 Holy See (Vatican City) 1
## 144 Lithuania 1
## 145 Mongolia 1
## 146 New Caledonia 1
## 147 Sri Lanka 1
## 148 Turks and Caicos Islands 1
## 149 Uganda 1
## 150 Western Samoa 1
Hint: Use dplyr::count(). You may want to ungroup your data first (dplyr::ungroup), though.
First it may be useful to know how many importing countries there are in total. By using sqldf we can search for distinct strings. We find that there are 150 importing countries. We could then use that information to back out a lower limit on a balanced panel (18 years x 4 quarters x 150 countries). But we even if we divided total rows by 150, because of the customs_district we might still not know if the panel is balanced. The ddply function may be more helpful. It gives us the number of times a importer appears in the df with a distinct date. We know that 71 entires is a full panel (we don’t have the last quarter from 2019). We see that the max of n is 71, but there are certainly countries with n<71 so we conclude that the panel isn’t balanced
Fill in the implicit missing values, so that each country has a representative row for every time year-by-quarter period. In other words, you should modify the data frame so that there are 72 rows (18 years * 4 quarters) for each country. Arrange your data by country, year and quarter.
balanced_coal<-coal_country%>%group_by(coal_destination_country, date)%>%
mutate(total_dest=sum(total, na.rm=T))%>%
ungroup()%>%
complete(date, nesting(coal_destination_country), fill=list(total_dest=0))%>%
select(-customs_district)%>%
distinct(date, coal_destination_country, .keep_all = TRUE)%>%
select(date, coal_destination_country, total_dest, everything())%>%
arrange(date, coal_destination_country)
balanced_coal## # A tibble: 10,650 x 15
## date coal_destinatio… total_dest quarters type coal_origin_cou…
## <date> <chr> <dbl> <dbl> <chr> <chr>
## 1 2002-01-01 Albania 0 NA <NA> <NA>
## 2 2002-01-01 Algeria 129305 1 Coal… United States
## 3 2002-01-01 Andorra 0 NA <NA> <NA>
## 4 2002-01-01 Angola 5713 1 Coal… United States
## 5 2002-01-01 Anguilla 0 NA <NA> <NA>
## 6 2002-01-01 Antigua and Bar… 0 NA <NA> <NA>
## 7 2002-01-01 Argentina 40159 1 Coal… United States
## 8 2002-01-01 Armenia 0 NA <NA> <NA>
## 9 2002-01-01 Aruba 0 NA <NA> <NA>
## 10 2002-01-01 Australia 175 1 Coal… United States
## # … with 10,640 more rows, and 9 more variables: steam_coal <dbl>,
## # steam_revenue <dbl>, metallurgical <dbl>, metallurgical_revenue <dbl>,
## # total <dbl>, total_revenue <dbl>, coke <dbl>, coke_revenue <dbl>,
## # destination_country_total <dbl>
Hints: Again, you may need to ungroup your data first. Then see ?tidyr::complete(). ?tidyr::expand() also provides some useful examples. Pay attention to the “nesting” option. And don’t forget dplyr:arrange() Finally, make sure to again type coal_country on its own line of your code chunk so that I can see the resulting data frame.
** What I assume to be happening, which is reflected in the way I wrote the code (i.e. fill=list(total_dest=0), is that missing values reflect that there were no exports to that particular country for that particular quarter. It would make sense that this may end up being not recorded insted of recorded as a 0.**
If you followed my hints above, you may encounter a situation where the data frame contains a quarter — probably 2019q4 — that is missing total export numbers for all countries. Did this happen to you? Filter out the completely missing quarter if so. Also: Why do you think this might have happened? (Please answer the latter question even if it didn’t happen to you.)
I did not get the error, and I think it may because I had one column that combined year and quarter, but what I supsect would be happening is that we don’t have the final quarter for 2019. So when this complete function acts on the datatable there isn’t any entry for every single country
Produce a vector — call it coal10_culm — of the top 10 top coal destinations over the full 2002–2019 study period. What are they?
coal10_culm<-balanced_coal%>%group_by(coal_destination_country)%>%
summarise(top_importers=sum(total_dest))%>%
mutate(rank=rank(-top_importers, ties.method = "min"))%>%
arrange(rank)%>%
select(coal_destination_country)%>%
head(10)
coal10_culm## # A tibble: 10 x 1
## coal_destination_country
## <chr>
## 1 Canada
## 2 Netherlands
## 3 Brazil
## 4 South Korea (Republic of Korea)
## 5 United Kingdom
## 6 India
## 7 Japan
## 8 Italy
## 9 Germany, Federal Republic of
## 10 Mexico
Hint: Extract a vector from data frame using dplyr::pull(). You can, of course, also use base R’s $ function. (You don’t strictly need either of them, but dplyr::row_number() or base::rank() are useful functions for attaching a rank number to each country.)
Top Importers from 2002-2019 | Right | Left | |——:|:–| | Canada| 1 | | Netherlands | 2 | | Brazil | 3 | | South Korea | 4| |United Kingdom | 5| |India | 6| |Japan | 7 | | Italy | 8 | | Germany | 9 | | Mexico | 10 |
Now do the same, except for most recent period on record (i.e. final quarter in the dataset). Call this vector coal10_recent. Are there any interesting differences between the two vectors? Apart from any secular trends, what else might explain these differences?
coal10_recent<-balanced_coal%>%
filter(date=="2019-07-01")%>%
mutate(rank=rank(-total_dest, ties.method = "min"))%>%
select(coal_destination_country, total_dest, rank)%>%
arrange(rank)%>%
select(coal_destination_country)%>%
head(10)
coal10_recent## # A tibble: 10 x 1
## coal_destination_country
## <chr>
## 1 Japan
## 2 South Korea (Republic of Korea)
## 3 India
## 4 Netherlands
## 5 Canada
## 6 Brazil
## 7 Ukraine
## 8 Egypt
## 9 Morocco
## 10 Mexico
I would say a notable trend might be that the majority of the european countries are no longer in the top ten. This may be because there was a new source of coal discovered in Europe, or perhaps Europe is turning to alternative energy sources–this seems especially likely for Germany. Maybe also because it is the third quarter countries in the northern hempisphere demand less coal than southern hemisphere countries
Now plot a figure that depicts coal exports dis-aggregated by country. Highlight the top 10 (cumulative) export destinations and sum the remaining countries into a combined “Other” category.
#Don't Run
#plot_1<-balanced_coal%>%filter(coal_destination_country=="Canada"|
# coal_destination_country=="Netherlands"|
# coal_destination_country=="Brazil"|
# coal_destination_country=="South Korea (Republic of Korea)"|
# coal_destination_country=="United Kingdom"|
# coal_destination_country=="India"|
# coal_destination_country=="Japan"|
# coal_destination_country=="Italy"|
# coal_destination_country=="Germany, Federal Republic of"|
# coal_destination_country=="Mexico")%>%
#select(date, total_dest, coal_destination_country)
#arrange(coal_destination_country) #i was trying to think of a better way to do this via grepl or paste, but my success was limited
#Don't run
#plot_2<-balanced_coal%>%filter(coal_destination_country!="Canada"&
# coal_destination_country!="Netherlands"&
# coal_destination_country!="Brazil"&
# coal_destination_country!="South Korea (Republic of Korea)"&
# coal_destination_country!="United Kingdom"&
# coal_destination_country!="India"&
# coal_destination_country!="Japan"&
# coal_destination_country!="Italy"&
# coal_destination_country!="Germany, Federal Republic of"&
# coal_destination_country!="Mexico")%>%
#select(date, total_dest, coal_destination_country)
#arrange(coal_destination_country)
#strategy #2, get a names vector from coal10_culm with paste, then use ifelse to convert the not top 10 to 'other'
names_vec= c("Canada", "Netherlands", "Brazil", "South Korea (Republic of Korea)", "United Kingdom", "India", "Japan", "Italy", "Germany, Federal Republic of", "Mexico")
# temp<-balanced_coal
# temp$coal_destination_country<-countrycode(c("Canada", "Netherlands", "Brazil", "South Korea (Republic of Korea)", "United Kingdom", "India", "Japan", "Italy", "Germany, Federal Republic of", "Mexico"), origin= "country.name", destination = 'iso3c')
# tried to get country code but the formatting for some countries wasn't right so gave me NA
coal_top_others<- balanced_coal%>%
mutate(coal_dest_country=ifelse(coal_destination_country %in% names_vec, coal_destination_country, "Other Countries"))%>%
group_by(coal_dest_country, date)%>%
summarise(total_export=sum(total_dest, na.rm = T)/1000)
p<-ggplot( data=coal_top_others, aes(x=date, y=total_export, colour=coal_dest_country, label=coal_dest_country))
p+geom_line()Take your previous plot and add some swag to it. That is, try to make it as visually appealing as possible without overloading it with chart junk.
colourCount = length(unique(coal_top_others$coal_dest_country))
getPalette = colorRampPalette(brewer.pal(9, "Set1"))
p<-ggplot(data=coal_top_others, aes(x=date, y=total_export, colour = fct_reorder(coal_dest_country, total_export, .desc = T), label=coal_dest_country))
plot1<-p+geom_line()+
theme_minimal_hgrid()+
ggtitle(label = "U.S. Coal Exports by Destination Country", subtitle = "quarterly data from 2002 until 2019")+
theme(legend.position = "right",
axis.text.x = element_text( color="grey23",
size=10),
axis.text.y = element_text( color="grey23",
size=10),
panel.grid.major = element_line(colour = "gray23"),
panel.background = element_rect(fill = "gray98", colour = "grey50"),
plot.title = element_text(size=15,hjust = 0.5),
plot.subtitle = element_text(size = 11, hjust = 0.5),
axis.title = element_text(size=13),
axis.title.x=element_blank())+
scale_color_brewer(palette = "Paired")+
scale_y_continuous("Thousand Tons of Coal", labels = scales::comma)+
labs(colour="Export Destination Country")
plot1Hint: You’ve got loads of options here. If you haven’t already done so, consider a more besoke theme with the ggthemes, hrbrthemes, or cowplot packages. Try out scale_fill_brewer() and scale_colour_brewer() for a range of interesting colour palettes. Try some transparency effects with alpha. Give your axis labels more refined names with the labs() layer in ggplot2. While you’re at it, you might want to scale (i.e. normalise) your y-variable to get rid of all those zeros. You can shorten any country names to their ISO abbreviation; see ?countrycode::countrycode. More substantively — but more complicated — you might want to re-order your legend (and the plot itself) according to the relative importance of the destination countres. See ?forcats::fct_reorder or forcats::fct_relevel`.
Create an interactive version of your previous figure.
plot2<-
ggplot(data=coal_top_others, aes(x=coal_dest_country, y=total_export, colour = fct_reorder(coal_dest_country, total_export, .desc = T)))+
geom_point(size = 4, stroke = 2, shape = 3, alpha=.5)+
theme_minimal_hgrid()+
ggtitle(label = "U.S. Coal Exports by Destination Country")+
theme(legend.position = "right",
axis.text.x = element_blank(),
axis.text.y = element_text( color="grey23",
size=10),
panel.grid.major = element_line(colour = "gray23"),
panel.background = element_rect(fill = "gray98", colour = "grey50"),
plot.title = element_text(size=15,hjust = 0.5),
plot.subtitle = element_text(size = 11, hjust = 0.5),
axis.title = element_text(size=13),
axis.title.x=element_blank(),
axis.ticks.x = element_blank())+
coord_cartesian(ylim = c(0, 17000))+
scale_color_brewer(palette = "Paired")+
scale_y_continuous("Thousand Tons of Coal", labels = scales::comma)+
labs(title = "Date: {frame_time}")+
transition_time(date) +
ease_aes('linear')+
labs(colour="Export Destination Country")
animate(plot2, fps=2) Hint: Take a look at plotly::ggplotly(), or the gganimate package.
There’s a lot still to explore with this data set. Your final task is to show me something interesting. Drill down into the data and explain what’s driving the secular trends that we have observed above. Or highlight interesting seasonality within a particular country. Or go back to the original coal data frame and look at exports by customs district, or by coal type. Do we changes or trends there? Etcetera. Etcetera. My only requirement is that you show your work and tell me what you have found.
Japan_coal<-coal_country%>%
filter(coal_destination_country=="India")%>%
group_by(date)%>%
mutate(total_dest=sum(total, na.rm=T))%>%
mutate(total_metal=sum(metallurgical, na.rm=T))%>%
mutate(total_steam=sum(steam_coal, na.rm=T))%>%
distinct(date, coal_destination_country, .keep_all = TRUE)%>%
select(date, destination_country_total, steam_coal, metallurgical, total_steam, total_metal)
Japan_coal[is.na(Japan_coal)] <- 0
Japan_Long<-melt(Japan_coal, id = "date", measure = c("destination_country_total", "total_steam", "total_metal"))
ggplot(Japan_Long)+
ggtitle(label = "U.S. Coal Exports to Japan", subtitle = "2002 to 2019")+
geom_line(aes(x=date, y=value/1000, color=variable))+
geom_vline(xintercept = as.numeric(as.Date("2011-04-01")), linetype=3, color="Yellow",)+
scale_color_brewer()+
theme_dark()+
labs(colour="Type of Coal")+
scale_y_continuous("Thousand Tons of Coal", labels = scales::comma)+
theme( axis.title.x = element_blank(),
plot.title = element_text(size=15,hjust = 0.5),
plot.subtitle = element_text(size = 11, hjust = 0.5))## date variable value
## 1 2002-01-01 destination_country_total 179
## 2 2002-04-01 destination_country_total 9914
## 3 2002-10-01 destination_country_total 535
## 4 2003-01-01 destination_country_total 10880
## 5 2003-07-01 destination_country_total 175
## 6 2003-10-01 destination_country_total 8994
## 7 2004-01-01 destination_country_total 46027
## 8 2004-04-01 destination_country_total 344399
## 9 2004-07-01 destination_country_total 504364
## 10 2004-10-01 destination_country_total 196086
## 11 2005-01-01 destination_country_total 291178
## 12 2005-04-01 destination_country_total 629008
## 13 2005-07-01 destination_country_total 52522
## 14 2005-10-01 destination_country_total 454704
## 15 2006-01-01 destination_country_total 213590
## 16 2006-04-01 destination_country_total 57067
## 17 2006-07-01 destination_country_total 404930
## 18 2006-10-01 destination_country_total 383898
## 19 2007-01-01 destination_country_total 325332
## 20 2007-04-01 destination_country_total 212162
## 21 2007-07-01 destination_country_total 51546
## 22 2007-10-01 destination_country_total 294289
## 23 2008-01-01 destination_country_total 321982
## 24 2008-04-01 destination_country_total 425821
## 25 2008-07-01 destination_country_total 515946
## 26 2008-10-01 destination_country_total 402882
## 27 2009-01-01 destination_country_total 438400
## 28 2009-04-01 destination_country_total 557723
## 29 2009-07-01 destination_country_total 472808
## 30 2009-10-01 destination_country_total 593120
## 31 2010-01-01 destination_country_total 567421
## 32 2010-04-01 destination_country_total 954509
## 33 2010-07-01 destination_country_total 493828
## 34 2010-10-01 destination_country_total 706919
## 35 2011-01-01 destination_country_total 1226395
## 36 2011-04-01 destination_country_total 1527570
## 37 2011-07-01 destination_country_total 639073
## 38 2011-10-01 destination_country_total 1107067
## 39 2012-01-01 destination_country_total 1474171
## 40 2012-04-01 destination_country_total 1958639
## 41 2012-07-01 destination_country_total 1727985
## 42 2012-10-01 destination_country_total 1653138
## 43 2013-01-01 destination_country_total 859503
## 44 2013-04-01 destination_country_total 1045686
## 45 2013-07-01 destination_country_total 1054783
## 46 2013-10-01 destination_country_total 960722
## 47 2014-01-01 destination_country_total 1500787
## 48 2014-04-01 destination_country_total 1371768
## 49 2014-07-01 destination_country_total 843713
## 50 2014-10-01 destination_country_total 870405
## 51 2015-01-01 destination_country_total 2574784
## 52 2015-04-01 destination_country_total 2270539
## 53 2015-07-01 destination_country_total 597089
## 54 2015-10-01 destination_country_total 945602
## 55 2016-01-01 destination_country_total 1869932
## 56 2016-04-01 destination_country_total 1771137
## 57 2016-07-01 destination_country_total 481589
## 58 2016-10-01 destination_country_total 1405564
## 59 2017-01-01 destination_country_total 1886297
## 60 2017-04-01 destination_country_total 2335306
## 61 2017-07-01 destination_country_total 2896777
## 62 2017-10-01 destination_country_total 4346265
## 63 2018-01-01 destination_country_total 5153963
## 64 2018-04-01 destination_country_total 4832813
## 65 2018-07-01 destination_country_total 3729446
## 66 2018-10-01 destination_country_total 3471318
## 67 2019-01-01 destination_country_total 4388111
## 68 2019-04-01 destination_country_total 3646421
## 69 2019-07-01 destination_country_total 2258358
## 70 2002-01-01 total_steam 179
## 71 2002-04-01 total_steam 0
## 72 2002-10-01 total_steam 535
## 73 2003-01-01 total_steam 0
## 74 2003-07-01 total_steam 175
## 75 2003-10-01 total_steam 0
## 76 2004-01-01 total_steam 0
## 77 2004-04-01 total_steam 28370
## 78 2004-07-01 total_steam 219311
## 79 2004-10-01 total_steam 646
## 80 2005-01-01 total_steam 0
## 81 2005-04-01 total_steam 159726
## 82 2005-07-01 total_steam 263
## 83 2005-10-01 total_steam 79405
## 84 2006-01-01 total_steam 0
## 85 2006-04-01 total_steam 267
## 86 2006-07-01 total_steam 78231
## 87 2006-10-01 total_steam 66
## 88 2007-01-01 total_steam 22
## 89 2007-04-01 total_steam 773
## 90 2007-07-01 total_steam 66
## 91 2007-10-01 total_steam 707
## 92 2008-01-01 total_steam 444
## 93 2008-04-01 total_steam 79226
## 94 2008-07-01 total_steam 419
## 95 2008-10-01 total_steam 90
## 96 2009-01-01 total_steam 161
## 97 2009-04-01 total_steam 153
## 98 2009-07-01 total_steam 689
## 99 2009-10-01 total_steam 951
## 100 2010-01-01 total_steam 113
## 101 2010-04-01 total_steam 132
## 102 2010-07-01 total_steam 145
## 103 2010-10-01 total_steam 188568
## 104 2011-01-01 total_steam 231664
## 105 2011-04-01 total_steam 229458
## 106 2011-07-01 total_steam 237579
## 107 2011-10-01 total_steam 193
## 108 2012-01-01 total_steam 421855
## 109 2012-04-01 total_steam 562711
## 110 2012-07-01 total_steam 344002
## 111 2012-10-01 total_steam 687039
## 112 2013-01-01 total_steam 110975
## 113 2013-04-01 total_steam 199554
## 114 2013-07-01 total_steam 453873
## 115 2013-10-01 total_steam 113977
## 116 2014-01-01 total_steam 542108
## 117 2014-04-01 total_steam 405169
## 118 2014-07-01 total_steam 168330
## 119 2014-10-01 total_steam 293
## 120 2015-01-01 total_steam 753443
## 121 2015-04-01 total_steam 983807
## 122 2015-07-01 total_steam 238372
## 123 2015-10-01 total_steam 458943
## 124 2016-01-01 total_steam 679742
## 125 2016-04-01 total_steam 881690
## 126 2016-07-01 total_steam 140552
## 127 2016-10-01 total_steam 978861
## 128 2017-01-01 total_steam 1137416
## 129 2017-04-01 total_steam 1395132
## 130 2017-07-01 total_steam 1809947
## 131 2017-10-01 total_steam 3197290
## 132 2018-01-01 total_steam 2997699
## 133 2018-04-01 total_steam 2923202
## 134 2018-07-01 total_steam 3054197
## 135 2018-10-01 total_steam 2556630
## 136 2019-01-01 total_steam 3003943
## 137 2019-04-01 total_steam 2460187
## 138 2019-07-01 total_steam 1204539
## 139 2002-01-01 total_metal 0
## 140 2002-04-01 total_metal 9914
## 141 2002-10-01 total_metal 0
## 142 2003-01-01 total_metal 10880
## 143 2003-07-01 total_metal 0
## 144 2003-10-01 total_metal 8994
## 145 2004-01-01 total_metal 46027
## 146 2004-04-01 total_metal 316029
## 147 2004-07-01 total_metal 285053
## 148 2004-10-01 total_metal 195440
## 149 2005-01-01 total_metal 291178
## 150 2005-04-01 total_metal 469282
## 151 2005-07-01 total_metal 52259
## 152 2005-10-01 total_metal 375299
## 153 2006-01-01 total_metal 213590
## 154 2006-04-01 total_metal 56800
## 155 2006-07-01 total_metal 326699
## 156 2006-10-01 total_metal 383832
## 157 2007-01-01 total_metal 325310
## 158 2007-04-01 total_metal 211389
## 159 2007-07-01 total_metal 51480
## 160 2007-10-01 total_metal 293582
## 161 2008-01-01 total_metal 321538
## 162 2008-04-01 total_metal 346595
## 163 2008-07-01 total_metal 515527
## 164 2008-10-01 total_metal 402792
## 165 2009-01-01 total_metal 438239
## 166 2009-04-01 total_metal 557570
## 167 2009-07-01 total_metal 472119
## 168 2009-10-01 total_metal 592169
## 169 2010-01-01 total_metal 567308
## 170 2010-04-01 total_metal 954377
## 171 2010-07-01 total_metal 493683
## 172 2010-10-01 total_metal 518351
## 173 2011-01-01 total_metal 994731
## 174 2011-04-01 total_metal 1298112
## 175 2011-07-01 total_metal 401494
## 176 2011-10-01 total_metal 1106874
## 177 2012-01-01 total_metal 1052316
## 178 2012-04-01 total_metal 1395928
## 179 2012-07-01 total_metal 1383983
## 180 2012-10-01 total_metal 966099
## 181 2013-01-01 total_metal 748528
## 182 2013-04-01 total_metal 846132
## 183 2013-07-01 total_metal 600910
## 184 2013-10-01 total_metal 846745
## 185 2014-01-01 total_metal 958679
## 186 2014-04-01 total_metal 966599
## 187 2014-07-01 total_metal 675383
## 188 2014-10-01 total_metal 870112
## 189 2015-01-01 total_metal 1821341
## 190 2015-04-01 total_metal 1286732
## 191 2015-07-01 total_metal 358717
## 192 2015-10-01 total_metal 486659
## 193 2016-01-01 total_metal 1190190
## 194 2016-04-01 total_metal 889447
## 195 2016-07-01 total_metal 341037
## 196 2016-10-01 total_metal 426703
## 197 2017-01-01 total_metal 748881
## 198 2017-04-01 total_metal 940174
## 199 2017-07-01 total_metal 1086830
## 200 2017-10-01 total_metal 1148975
## 201 2018-01-01 total_metal 2156264
## 202 2018-04-01 total_metal 1909611
## 203 2018-07-01 total_metal 675249
## 204 2018-10-01 total_metal 914688
## 205 2019-01-01 total_metal 1384168
## 206 2019-04-01 total_metal 1186234
## 207 2019-07-01 total_metal 1053819
I wanted to see how Japan’s coal imports reacted to Fukushima nuclear power plant accident in 2011–delineated by the dotter yellow line. Apparently as far as coal imports from the US goes not too much changed. Of course, this could just mean that the U.S. isn’t an important supplier of coal for Japan, but this seems very unlikely given that Japan is in the United States’ top 10 export destinations. Something certainly happened around 2010–where we see the huge spike and subsequent fall, but I am not yet sure what event or policy took place then. I enjoyed using melt and adding a vertical line! I was frustrated that I couldn’t figure out how to add a label to the Fukushima line. I also want to learn how to change the names in the legend, not sure if thats possible.