US coal exports

Preliminaries:

Load libraries

It’s a good idea to load your libraries at the top of the Rmd document so that everyone can see what you’re using. Similarly, it’s good practice to set cache=FALSE to ensure that the libraries are dynamically loaded each time you knit the document.

Hint: These are only some of the libraries that you’ll need to complete this assignment. Add more here, if and when you discover that you need them.

library(pacman)
pacman::p_load( httr, here, readxl, janitor, zoo, ggplot2, grid, gridExtra, plyr, dplyr, lubridate, tidyr, sqldf, cowplot, RColorBrewer, countrycode, forcats, gganimate, reshape2)

Read in the data

Use httr::GET() to fetch the EIA excel file for us from web. (We’ll learn more about httr, GET and other HTTP methods when we get to webscraping next week.)

# library(here) ## Already loaded
# library(httr) ## Already loaded
url <- "https://www.eia.gov/coal/archive/coal_historical_exports.xlsx"
## Only download the file if we need to
if(!file.exists(here::here("data/coal.xlsx"))) {
  GET(url, write_disk(here::here("data/coal.xlsx")))
}

Next, we read in the file.

# library(readxl) Already loaded
coal_df <- read_excel(here::here("/data/coal.xlsx"), skip=3, na=".")

We are now ready to go.

1) Clean the column names

The column (i.e. variable) names aren’t great: Spacing, uppercase letters, etc.

names(coal_df)

##  [1] "Year"                     "Quarter"                 
##  [3] "Type"                     "Customs District"        
##  [5] "Coal Origin Country"      "Coal Destination Country"
##  [7] "Steam Coal"               "Steam Revenue"           
##  [9] "Metallurgical"            "Metallurgical Revenue"   
## [11] "Total"                    "Total Revenue"           
## [13] "Coke"                     "Coke Revenue"

coal_df<-coal_df%>%clean_names() #turns into lowercase and uses snake instead of spaces

coal_df

## # A tibble: 10,336 x 14
##     year quarter type  customs_district coal_origin_cou… coal_destinatio…
##    <dbl>   <dbl> <chr> <chr>            <chr>            <chr>           
##  1  2002       1 Coal… Anchorage, AK    United States    South Korea (Re…
##  2  2002       1 Coal… Baltimore, MD    United States    Belgium         
##  3  2002       1 Coal… Baltimore, MD    United States    Brazil          
##  4  2002       1 Coal… Baltimore, MD    United States    Canada          
##  5  2002       1 Coal… Baltimore, MD    United States    Germany, Federa…
##  6  2002       1 Coal… Baltimore, MD    United States    Ireland         
##  7  2002       1 Coal… Baltimore, MD    United States    Israel          
##  8  2002       1 Coal… Baltimore, MD    United States    Jamaica         
##  9  2002       1 Coal… Baltimore, MD    United States    Netherlands     
## 10  2002       1 Coal… Baltimore, MD    United States    Norway          
## # … with 10,326 more rows, and 8 more variables: steam_coal <dbl>,
## #   steam_revenue <dbl>, metallurgical <dbl>, metallurgical_revenue <dbl>,
## #   total <dbl>, total_revenue <dbl>, coke <dbl>, coke_revenue <dbl>

Clean them.

Hint: Use either gsub() and regular expressions or, more simply, the janitor() package. You will need to install the latter first.

2) Total US coal exports over time (year only)

Plot the US’s total coal exports over time by year ONLY. What secular trends do you notice in the data?

names(coal_df)

##  [1] "year"                     "quarter"                 
##  [3] "type"                     "customs_district"        
##  [5] "coal_origin_country"      "coal_destination_country"
##  [7] "steam_coal"               "steam_revenue"           
##  [9] "metallurgical"            "metallurgical_revenue"   
## [11] "total"                    "total_revenue"           
## [13] "coke"                     "coke_revenue"

us_df<-coal_df%>%
  group_by(year)%>%
  summarise(us_total=sum(total, na.rm=T)/1000)


ggplot(data = us_df, aes(x=year, y=us_total))+
  ggtitle(label = "Total U.S. Coal Exports", subtitle = "from 2002 until 2019 (Tron Theme)")+
  scale_x_continuous( "Date",
                      breaks = c(2002, 2008, 2009, 2012, 2016, 2018),
                      minor_breaks = c(2002))+
  scale_y_continuous("Tons of Coal \n (in thousands)",
                     minor_breaks = c(25000, 20000, 15000, 10000, 5000),labels = scales::comma)+
  geom_point(color="green")+
  geom_line(linetype="dashed", color="blue")+
theme(panel.grid.major = element_line(colour = "gray"), 
      panel.background = element_rect(fill = "black", colour = "grey50"),
          axis.text.y = element_text(face="bold", color="darkblue", 
                           size=10),
      axis.text.x = element_text(face="bold", color="darkblue", 
                           size=10),
      plot.title = element_text(size=15,hjust = 0.5),
      plot.subtitle = element_text(size = 11, hjust = 0.5),
      axis.title = element_text(size=13))

Hints: Use the tidyverse. If you want nicely formatted y-axis label, add + scale_y_continuous(labels = scales::comma) to your ggplot2 code.

There are certainly some trends. With the exception of the “Great Recession” (pretty ridiculous name imo), there is a steady increase in exports in 2012. According to some googled sources, the decline in coal exports was due to a drop in European demand as well as an increase in global supply. I am not sure about the incline in 2016, but Trump has done a number on exports which may be reflected by the precipitous plummet from 2018 to 2019. Generally, many of these trends will be explained by trade policy, coal production technology, and alternative energy technology.

3) Total US coal exports over time (year AND quarter)

Now do the same as the above, expect aggregate by quarter and year. Do you notice any seasonality that was masked from the yearly averages?

Hint: See ?lubridate::yq(). You will probably also want to use the paste() function (along with dplyr::mutate()), or tidyr::unite()

coal_df<- coal_df%>% mutate(quarters=quarter) #create a separate quarter variable for plotting 

coal_df<-coal_df%>%unite("date", year:quarter) #combine the quarter and year column into a single variable



coal_df<-coal_df%>%mutate(date=lubridate::yq(coal_df$date)) #lubridate that date column 


coal_df<-coal_df%>%select(date, quarters, everything()) #rearrange so that years and date are on the LHS

coal_df #make sure everything looks ok

## # A tibble: 10,336 x 14
##    date       quarters type  customs_district coal_origin_cou… coal_destinatio…
##    <date>        <dbl> <chr> <chr>            <chr>            <chr>           
##  1 2002-01-01        1 Coal… Anchorage, AK    United States    South Korea (Re…
##  2 2002-01-01        1 Coal… Baltimore, MD    United States    Belgium         
##  3 2002-01-01        1 Coal… Baltimore, MD    United States    Brazil          
##  4 2002-01-01        1 Coal… Baltimore, MD    United States    Canada          
##  5 2002-01-01        1 Coal… Baltimore, MD    United States    Germany, Federa…
##  6 2002-01-01        1 Coal… Baltimore, MD    United States    Ireland         
##  7 2002-01-01        1 Coal… Baltimore, MD    United States    Israel          
##  8 2002-01-01        1 Coal… Baltimore, MD    United States    Jamaica         
##  9 2002-01-01        1 Coal… Baltimore, MD    United States    Netherlands     
## 10 2002-01-01        1 Coal… Baltimore, MD    United States    Norway          
## # … with 10,326 more rows, and 8 more variables: steam_coal <dbl>,
## #   steam_revenue <dbl>, metallurgical <dbl>, metallurgical_revenue <dbl>,
## #   total <dbl>, total_revenue <dbl>, coke <dbl>, coke_revenue <dbl>

us_seasonal_df<-coal_df%>%filter(coal_origin_country=="United States")%>%mutate()%>%
  group_by(date)%>%
  summarise(us_total=sum(total, na.rm = T)/1000)%>%
  mutate(quarter=quarters(date))
  
us_seasonal_df

## # A tibble: 71 x 3
##    date       us_total quarter
##    <date>        <dbl> <chr>  
##  1 2002-01-01    9253. Q1     
##  2 2002-04-01   11043. Q2     
##  3 2002-07-01    9257. Q3     
##  4 2002-10-01   10050. Q4     
##  5 2003-01-01    8518. Q1     
##  6 2003-04-01   11450. Q2     
##  7 2003-07-01   12094. Q3     
##  8 2003-10-01   10952. Q4     
##  9 2004-01-01    9688. Q1     
## 10 2004-04-01   15255. Q2     
## # … with 61 more rows

ggplot(data = us_seasonal_df, aes(x=date, y=us_total, color=quarter, shape=quarter))+
  ggtitle(label = "Total U.S. Coal Exports", subtitle = "quarterly data from 2002 until 2019")+
  theme(legend.position = "right",
        axis.text.x = element_text( color="darkseagreen4", 
                           size=10, angle = 45),
        axis.text.y = element_text( color="darkseagreen4", 
                           size=10, angle = 45),
        panel.grid.major = element_line(colour = "gray"), 
      panel.background = element_rect(fill = "white", colour = "grey50"),
      plot.title = element_text(size=15,hjust = 0.5),
      plot.subtitle = element_text(size = 11, hjust = 0.5),
      axis.title = element_text(size=13))+
  scale_y_continuous("Tons of Coal \n (in thousands)", labels = scales::comma, breaks = c( 15000, 20000, 25000,30000,35000))+
                     xlab("Date")+
  
geom_point()

 names(us_seasonal_df)

## [1] "date"     "us_total" "quarter"

us_seasonal_avg_df<-us_seasonal_df%>% #create a dataframe for a histogram
  group_by(quarter)%>%
  summarise(avg_total=mean(us_total))%>%mutate(avg_rounded=as.integer(avg_total))

ggplot(data= us_seasonal_avg_df, aes(quarter, avg_total,label = avg_total, fill=quarter))+
  scale_y_continuous("Tons of Coal \n (in thousands)", labels = scales::comma)+
  coord_cartesian(ylim = c(10000, 25000))+
  scale_fill_brewer(palette="Dark2")+
  theme(legend.position = "right",
        axis.text.x = element_text( color="darkseagreen4", 
                           size=10, angle = 45),
        axis.text.y = element_text( color="darkseagreen4", 
                           size=10, angle = 45),
        panel.grid.major = element_line(colour = "gray"), 
      panel.background = element_rect(fill = "white", colour = "grey50"),
      plot.title = element_text(size=15,hjust = 0.5),
      plot.subtitle = element_text(size = 11, hjust = 0.5),
      axis.title = element_text(size=13))+
  geom_text(aes(label = avg_rounded),  vjust = -4.5, hjust=0)+
  ggtitle(label = "Average U.S. Coal Exports by Quarter", subtitle =  "2002 until 2019")+
  geom_bar(stat = "identity")

It looks like on average more coal is exported in the second quarter and fourth quarter. The second quarter, especially, indicates a higher level of exports. Looking at the graphic representation, it also seems like the seasonality effect was more pronounced before 2010–which is interesting

4) Exports by destination country

4.1) Create a new data frame

Create a new data frame called coal_country that aggregates total exports by destination country (and by year and quarter).

coal_country<-coal_df%>%group_by(coal_destination_country, date)%>%
  mutate(destination_country_total=sum(total, na.rm = T))

coal_country

## # A tibble: 10,336 x 15
## # Groups:   coal_destination_country, date [4,358]
##    date       quarters type  customs_district coal_origin_cou… coal_destinatio…
##    <date>        <dbl> <chr> <chr>            <chr>            <chr>           
##  1 2002-01-01        1 Coal… Anchorage, AK    United States    South Korea (Re…
##  2 2002-01-01        1 Coal… Baltimore, MD    United States    Belgium         
##  3 2002-01-01        1 Coal… Baltimore, MD    United States    Brazil          
##  4 2002-01-01        1 Coal… Baltimore, MD    United States    Canada          
##  5 2002-01-01        1 Coal… Baltimore, MD    United States    Germany, Federa…
##  6 2002-01-01        1 Coal… Baltimore, MD    United States    Ireland         
##  7 2002-01-01        1 Coal… Baltimore, MD    United States    Israel          
##  8 2002-01-01        1 Coal… Baltimore, MD    United States    Jamaica         
##  9 2002-01-01        1 Coal… Baltimore, MD    United States    Netherlands     
## 10 2002-01-01        1 Coal… Baltimore, MD    United States    Norway          
## # … with 10,326 more rows, and 9 more variables: steam_coal <dbl>,
## #   steam_revenue <dbl>, metallurgical <dbl>, metallurgical_revenue <dbl>,
## #   total <dbl>, total_revenue <dbl>, coke <dbl>, coke_revenue <dbl>,
## #   destination_country_total <dbl>

Hint: Make sure to type coal_country (on its own separate line) after you have created this new data frame so that I (and you) can see the preview output in the markdown document.

4.2) Inspect the data frame

It looks like some countries are missing data for a number of years and periods (e.g. Albania). Confirm that this is the case. What do you think is happening here?

sqldf("select distinct(coal_destination_country) from coal_country")%>%count() #count the number of coal importers

## # A tibble: 1 x 1
##       n
##   <int>
## 1   150

coal_country%>%ddply(~coal_destination_country,summarise,n=length(unique(date)))%>%
  arrange(desc(n)) #count the number of unique date entries for each importer

##                   coal_destination_country  n
## 1                                  Belgium 71
## 2                                   Brazil 71
## 3                                   Canada 71
## 4                                    Italy 71
## 5                                    Japan 71
## 6                                   Mexico 71
## 7                              Netherlands 71
## 8          South Korea (Republic of Korea) 71
## 9                                    Spain 71
## 10                                  Turkey 71
## 11                          United Kingdom 71
## 12                               Venezuela 71
## 13                              Costa Rica 70
## 14                                  France 70
## 15            Germany, Federal Republic of 70
## 16                                  Sweden 70
## 17                                   Chile 69
## 18                                   Egypt 69
## 19                             El Salvador 69
## 20                                   India 69
## 21                               Singapore 69
## 22                                Thailand 69
## 23                               Argentina 68
## 24                                   China 68
## 25                                Colombia 68
## 26                               Guatemala 68
## 27                                  Norway 68
## 28                                  Taiwan 68
## 29                            Saudi Arabia 67
## 30                     Trinidad and Tobago 66
## 31                                  Panama 65
## 32                                 Ecuador 63
## 33                      Dominican Republic 61
## 34                                    Peru 60
## 35                            South Africa 60
## 36                                 Croatia 58
## 37                                 Morocco 58
## 38                               Australia 57
## 39                                 Iceland 57
## 40                               Indonesia 57
## 41                                Malaysia 57
## 42                                 Romania 57
## 43                                Honduras 56
## 44                                 Finland 55
## 45                                 Austria 53
## 46                                Portugal 52
## 47                    United Arab Emirates 52
## 48                                  Israel 51
## 49                                 Ukraine 50
## 50                                 Jamaica 49
## 51                                  Poland 48
## 52                                Pakistan 47
## 53                                 Algeria 45
## 54                                  Russia 45
## 55                                 Uruguay 45
## 56                                  Angola 43
## 57                                   Gabon 39
## 58                             New Zealand 37
## 59                                Slovenia 37
## 60                                Slovakia 35
## 61                             Switzerland 32
## 62                                 Vietnam 32
## 63                                  Serbia 30
## 64                                  Brunei 27
## 65                                 Bermuda 26
## 66                               Hong Kong 26
## 67                                 Bahamas 25
## 68                                Bulgaria 25
## 69              Denmark (Except Greenland) 24
## 70                                 Bolivia 23
## 71                              Kazakhstan 23
## 72                                 Ireland 21
## 73                                    Oman 21
## 74                                 Nigeria 20
## 75                                  Greece 18
## 76                                  Latvia 18
## 77                                 Hungary 14
## 78                                   Qatar 14
## 79                                    Togo 13
## 80                             Philippines 12
## 81                       Equatorial Guinea 11
## 82                                  Jordan 11
## 83                                   Libya 11
## 84                  Bosnia and Herzegovina 10
## 85                                Cameroon 10
## 86                                    Iraq 10
## 87                    Netherlands Antilles 10
## 88                               Nicaragua 10
## 89                             Ivory Coast  9
## 90                              Mozambique  9
## 91                                   Aruba  7
## 92                     Congo (Brazzaville)  7
## 93                                 Tunisia  7
## 94                              Azerbaijan  6
## 95                                   Ghana  6
## 96                                 Lebanon  6
## 97                                   Malta  6
## 98                             Saint Lucia  6
## 99                                 Senegal  6
## 100                               Suriname  6
## 101                             Bangladesh  5
## 102                               Barbados  5
## 103                                Curacao  5
## 104                                  Kenya  5
## 105                             Madagascar  5
## 106                                Armenia  4
## 107                                 Belize  4
## 108                        Burma (Myanmar)  4
## 109                         Cayman Islands  4
## 110                               Dominica  4
## 111                                 Kuwait  4
## 112                 British Virgin Islands  3
## 113                               Cambodia  3
## 114                              Gibraltar  3
## 115                                 Guyana  3
## 116                               Paraguay  3
## 117 Tanzania (United Republic of Tanzania)  3
## 118                    Antigua and Barbuda  2
## 119               Central African Republic  2
## 120                       Congo (Kinshasa)  2
## 121                         Czech Republic  2
## 122                          Faroe Islands  2
## 123                                Georgia  2
## 124                             Guadeloupe  2
## 125                                Liberia  2
## 126                             Luxembourg  2
## 127                             Montenegro  2
## 128                  Saint Kitts and Nevis  2
## 129       Saint Vincent and the Grenadines  2
## 130                           Sint Maarten  2
## 131                           Turkmenistan  2
## 132                                Albania  1
## 133                                Andorra  1
## 134                               Anguilla  1
## 135                                Bahrain  1
## 136                                  Benin  1
## 137         British Indian Ocean Territory  1
## 138                                Eritrea  1
## 139                                Estonia  1
## 140          Fedrated States of Micronesia  1
## 141    French Southern and Antarctic Lands  1
## 142                                Grenada  1
## 143                Holy See (Vatican City)  1
## 144                              Lithuania  1
## 145                               Mongolia  1
## 146                          New Caledonia  1
## 147                              Sri Lanka  1
## 148               Turks and Caicos Islands  1
## 149                                 Uganda  1
## 150                          Western Samoa  1

                  #the number is 71 for a full panel--18 years and four quarters with the exception of the last quarter of 2019

Hint: Use dplyr::count(). You may want to ungroup your data first (dplyr::ungroup), though.

First it may be useful to know how many importing countries there are in total. By using sqldf we can search for distinct strings. We find that there are 150 importing countries. We could then use that information to back out a lower limit on a balanced panel (18 years x 4 quarters x 150 countries). But we even if we divided total rows by 150, because of the customs_district we might still not know if the panel is balanced. The ddply function may be more helpful. It gives us the number of times a importer appears in the df with a distinct date. We know that 71 entires is a full panel (we don’t have the last quarter from 2019). We see that the max of n is 71, but there are certainly countries with n<71 so we conclude that the panel isn’t balanced

4.3) Complete the data frame

Fill in the implicit missing values, so that each country has a representative row for every time year-by-quarter period. In other words, you should modify the data frame so that there are 72 rows (18 years * 4 quarters) for each country. Arrange your data by country, year and quarter.

balanced_coal<-coal_country%>%group_by(coal_destination_country, date)%>%
  mutate(total_dest=sum(total, na.rm=T))%>%
  ungroup()%>%
  complete(date, nesting(coal_destination_country), fill=list(total_dest=0))%>%
  select(-customs_district)%>%
  distinct(date, coal_destination_country, .keep_all = TRUE)%>%
  select(date, coal_destination_country, total_dest, everything())%>%
  arrange(date, coal_destination_country)
  
  balanced_coal

## # A tibble: 10,650 x 15
##    date       coal_destinatio… total_dest quarters type  coal_origin_cou…
##    <date>     <chr>                 <dbl>    <dbl> <chr> <chr>           
##  1 2002-01-01 Albania                   0       NA <NA>  <NA>            
##  2 2002-01-01 Algeria              129305        1 Coal… United States   
##  3 2002-01-01 Andorra                   0       NA <NA>  <NA>            
##  4 2002-01-01 Angola                 5713        1 Coal… United States   
##  5 2002-01-01 Anguilla                  0       NA <NA>  <NA>            
##  6 2002-01-01 Antigua and Bar…          0       NA <NA>  <NA>            
##  7 2002-01-01 Argentina             40159        1 Coal… United States   
##  8 2002-01-01 Armenia                   0       NA <NA>  <NA>            
##  9 2002-01-01 Aruba                     0       NA <NA>  <NA>            
## 10 2002-01-01 Australia               175        1 Coal… United States   
## # … with 10,640 more rows, and 9 more variables: steam_coal <dbl>,
## #   steam_revenue <dbl>, metallurgical <dbl>, metallurgical_revenue <dbl>,
## #   total <dbl>, total_revenue <dbl>, coke <dbl>, coke_revenue <dbl>,
## #   destination_country_total <dbl>

Hints: Again, you may need to ungroup your data first. Then see ?tidyr::complete(). ?tidyr::expand() also provides some useful examples. Pay attention to the “nesting” option. And don’t forget dplyr:arrange() Finally, make sure to again type coal_country on its own line of your code chunk so that I can see the resulting data frame.

** What I assume to be happening, which is reflected in the way I wrote the code (i.e. fill=list(total_dest=0), is that missing values reflect that there were no exports to that particular country for that particular quarter. It would make sense that this may end up being not recorded insted of recorded as a 0.**

4.4 Some more tidying up

If you followed my hints above, you may encounter a situation where the data frame contains a quarter — probably 2019q4 — that is missing total export numbers for all countries. Did this happen to you? Filter out the completely missing quarter if so. Also: Why do you think this might have happened? (Please answer the latter question even if it didn’t happen to you.)

I did not get the error, and I think it may because I had one column that combined year and quarter, but what I supsect would be happening is that we don’t have the final quarter for 2019. So when this complete function acts on the datatable there isn’t any entry for every single country

4.5) Culmulative top 10 US coal export destinations

Produce a vector — call it coal10_culm — of the top 10 top coal destinations over the full 2002–2019 study period. What are they?

coal10_culm<-balanced_coal%>%group_by(coal_destination_country)%>%
  summarise(top_importers=sum(total_dest))%>%
  mutate(rank=rank(-top_importers, ties.method = "min"))%>%
  arrange(rank)%>%
  select(coal_destination_country)%>%
  head(10)

coal10_culm

## # A tibble: 10 x 1
##    coal_destination_country       
##    <chr>                          
##  1 Canada                         
##  2 Netherlands                    
##  3 Brazil                         
##  4 South Korea (Republic of Korea)
##  5 United Kingdom                 
##  6 India                          
##  7 Japan                          
##  8 Italy                          
##  9 Germany, Federal Republic of   
## 10 Mexico

Hint: Extract a vector from data frame using dplyr::pull(). You can, of course, also use base R’s $ function. (You don’t strictly need either of them, but dplyr::row_number() or base::rank() are useful functions for attaching a rank number to each country.)

And now (drumroll) for your top 10 allstar coal importers! Introducing:: Top Importers from 2002-2019 | Right | Left | |——:|:–| | Canada| 1 | | Netherlands | 2 | | Brazil | 3 | | South Korea | 4| |United Kingdom | 5| |India | 6| |Japan | 7 | | Italy | 8 | | Germany | 9 | | Mexico | 10 |

4.6) Recent top 10 US coal export destinations

Now do the same, except for most recent period on record (i.e. final quarter in the dataset). Call this vector coal10_recent. Are there any interesting differences between the two vectors? Apart from any secular trends, what else might explain these differences?

coal10_recent<-balanced_coal%>%
  filter(date=="2019-07-01")%>%
  mutate(rank=rank(-total_dest, ties.method = "min"))%>%
  select(coal_destination_country, total_dest, rank)%>%
  arrange(rank)%>%
  select(coal_destination_country)%>%
  head(10)

  coal10_recent

## # A tibble: 10 x 1
##    coal_destination_country       
##    <chr>                          
##  1 Japan                          
##  2 South Korea (Republic of Korea)
##  3 India                          
##  4 Netherlands                    
##  5 Canada                         
##  6 Brazil                         
##  7 Ukraine                        
##  8 Egypt                          
##  9 Morocco                        
## 10 Mexico

I would say a notable trend might be that the majority of the european countries are no longer in the top ten. This may be because there was a new source of coal discovered in Europe, or perhaps Europe is turning to alternative energy sources–this seems especially likely for Germany. Maybe also because it is the third quarter countries in the northern hempisphere demand less coal than southern hemisphere countries

4.7) Plot US coal export by country

Now plot a figure that depicts coal exports dis-aggregated by country. Highlight the top 10 (cumulative) export destinations and sum the remaining countries into a combined “Other” category.

#Don't Run
#plot_1<-balanced_coal%>%filter(coal_destination_country=="Canada"|
 #                        coal_destination_country=="Netherlands"|
  #                       coal_destination_country=="Brazil"|
   #                      coal_destination_country=="South Korea (Republic of Korea)"|
   #                      coal_destination_country=="United Kingdom"|
    #                     coal_destination_country=="India"|
     #                    coal_destination_country=="Japan"|
      #                   coal_destination_country=="Italy"|
       #                  coal_destination_country=="Germany, Federal Republic of"|
        #                 coal_destination_country=="Mexico")%>%
  #select(date, total_dest, coal_destination_country)
  #arrange(coal_destination_country) #i was trying to think of a better way to do this via grepl or paste, but my success was limited

  #Don't run
#plot_2<-balanced_coal%>%filter(coal_destination_country!="Canada"&
   #                              coal_destination_country!="Netherlands"&
    #                             coal_destination_country!="Brazil"&
     #                            coal_destination_country!="South Korea (Republic of Korea)"&
      #                           coal_destination_country!="United Kingdom"&
       #                          coal_destination_country!="India"&
        #                         coal_destination_country!="Japan"&
         #                        coal_destination_country!="Italy"&
          #                       coal_destination_country!="Germany, Federal Republic of"&
           #                      coal_destination_country!="Mexico")%>%
  #select(date, total_dest, coal_destination_country)
  #arrange(coal_destination_country)

#strategy #2, get a names vector from coal10_culm with paste, then use ifelse to convert the not top 10 to 'other' 
  
  names_vec= c("Canada", "Netherlands", "Brazil", "South Korea (Republic of Korea)", "United Kingdom", "India", "Japan", "Italy", "Germany, Federal Republic of", "Mexico")
  
  
# temp<-balanced_coal
# temp$coal_destination_country<-countrycode(c("Canada", "Netherlands", "Brazil", "South Korea (Republic of Korea)", "United Kingdom", "India", "Japan", "Italy", "Germany, Federal Republic of", "Mexico"), origin= "country.name", destination =  'iso3c')
# tried to get country code but the formatting for some countries wasn't right so gave me NA

 coal_top_others<-  balanced_coal%>%
  mutate(coal_dest_country=ifelse(coal_destination_country %in% names_vec, coal_destination_country, "Other Countries"))%>%
  group_by(coal_dest_country, date)%>%
  summarise(total_export=sum(total_dest, na.rm = T)/1000)
  




p<-ggplot( data=coal_top_others, aes(x=date, y=total_export, colour=coal_dest_country, label=coal_dest_country))


p+geom_line()

4.8) Make it pretty

Take your previous plot and add some swag to it. That is, try to make it as visually appealing as possible without overloading it with chart junk.

colourCount = length(unique(coal_top_others$coal_dest_country))
getPalette = colorRampPalette(brewer.pal(9, "Set1"))
 

p<-ggplot(data=coal_top_others, aes(x=date, y=total_export, colour = fct_reorder(coal_dest_country, total_export, .desc = T), label=coal_dest_country))

plot1<-p+geom_line()+
  theme_minimal_hgrid()+
  ggtitle(label = "U.S. Coal Exports by Destination Country", subtitle = "quarterly data from 2002 until 2019")+
  theme(legend.position = "right",
        axis.text.x = element_text( color="grey23", 
                           size=10),
        axis.text.y = element_text( color="grey23", 
                           size=10),
        panel.grid.major = element_line(colour = "gray23"), 
      panel.background = element_rect(fill = "gray98", colour = "grey50"),
      plot.title = element_text(size=15,hjust = 0.5),
      plot.subtitle = element_text(size = 11, hjust = 0.5),
      axis.title = element_text(size=13),
      axis.title.x=element_blank())+
scale_color_brewer(palette = "Paired")+
  scale_y_continuous("Thousand Tons of Coal", labels = scales::comma)+
 labs(colour="Export Destination Country")

plot1

?forcats::fct_reorder

Hint: You’ve got loads of options here. If you haven’t already done so, consider a more besoke theme with the ggthemes, hrbrthemes, or cowplot packages. Try out scale_fill_brewer() and scale_colour_brewer() for a range of interesting colour palettes. Try some transparency effects with alpha. Give your axis labels more refined names with the labs() layer in ggplot2. While you’re at it, you might want to scale (i.e. normalise) your y-variable to get rid of all those zeros. You can shorten any country names to their ISO abbreviation; see ?countrycode::countrycode. More substantively — but more complicated — you might want to re-order your legend (and the plot itself) according to the relative importance of the destination countres. See ?forcats::fct_reorder or forcats::fct_relevel`.

4.9) Make it interactive

Create an interactive version of your previous figure.

plot2<-
  ggplot(data=coal_top_others, aes(x=coal_dest_country, y=total_export, colour = fct_reorder(coal_dest_country, total_export, .desc = T)))+
  geom_point(size = 4, stroke = 2, shape = 3, alpha=.5)+
  theme_minimal_hgrid()+
  ggtitle(label = "U.S. Coal Exports by Destination Country")+
  theme(legend.position = "right",
        axis.text.x = element_blank(),
        axis.text.y = element_text( color="grey23", 
                           size=10),
        panel.grid.major = element_line(colour = "gray23"), 
      panel.background = element_rect(fill = "gray98", colour = "grey50"),
      plot.title = element_text(size=15,hjust = 0.5),
      plot.subtitle = element_text(size = 11, hjust = 0.5),
      axis.title = element_text(size=13),
      axis.title.x=element_blank(),
      axis.ticks.x = element_blank())+
  coord_cartesian(ylim = c(0, 17000))+
scale_color_brewer(palette = "Paired")+
  scale_y_continuous("Thousand Tons of Coal", labels = scales::comma)+
  labs(title = "Date: {frame_time}")+
  transition_time(date) +
  ease_aes('linear')+
  labs(colour="Export Destination Country")

animate(plot2, fps=2)

Hint: Take a look at plotly::ggplotly(), or the gganimate package.

5) Show me something interesting

There’s a lot still to explore with this data set. Your final task is to show me something interesting. Drill down into the data and explain what’s driving the secular trends that we have observed above. Or highlight interesting seasonality within a particular country. Or go back to the original coal data frame and look at exports by customs district, or by coal type. Do we changes or trends there? Etcetera. Etcetera. My only requirement is that you show your work and tell me what you have found.

Japan_coal<-coal_country%>%
  filter(coal_destination_country=="India")%>%
  group_by(date)%>%
  mutate(total_dest=sum(total, na.rm=T))%>%
  mutate(total_metal=sum(metallurgical, na.rm=T))%>%
  mutate(total_steam=sum(steam_coal, na.rm=T))%>%
  distinct(date, coal_destination_country, .keep_all = TRUE)%>%
  select(date, destination_country_total, steam_coal, metallurgical, total_steam, total_metal)

Japan_coal[is.na(Japan_coal)] <- 0

Japan_Long<-melt(Japan_coal, id = "date", measure = c("destination_country_total", "total_steam", "total_metal"))

ggplot(Japan_Long)+
   ggtitle(label = "U.S. Coal Exports to Japan", subtitle = "2002 to 2019")+
  geom_line(aes(x=date, y=value/1000, color=variable))+
geom_vline(xintercept = as.numeric(as.Date("2011-04-01")), linetype=3, color="Yellow",)+
  scale_color_brewer()+
  theme_dark()+
  labs(colour="Type of Coal")+
  scale_y_continuous("Thousand Tons of Coal", labels = scales::comma)+
  theme( axis.title.x = element_blank(),
         plot.title = element_text(size=15,hjust = 0.5),
      plot.subtitle = element_text(size = 11, hjust = 0.5))

Japan_Long

##           date                  variable   value
## 1   2002-01-01 destination_country_total     179
## 2   2002-04-01 destination_country_total    9914
## 3   2002-10-01 destination_country_total     535
## 4   2003-01-01 destination_country_total   10880
## 5   2003-07-01 destination_country_total     175
## 6   2003-10-01 destination_country_total    8994
## 7   2004-01-01 destination_country_total   46027
## 8   2004-04-01 destination_country_total  344399
## 9   2004-07-01 destination_country_total  504364
## 10  2004-10-01 destination_country_total  196086
## 11  2005-01-01 destination_country_total  291178
## 12  2005-04-01 destination_country_total  629008
## 13  2005-07-01 destination_country_total   52522
## 14  2005-10-01 destination_country_total  454704
## 15  2006-01-01 destination_country_total  213590
## 16  2006-04-01 destination_country_total   57067
## 17  2006-07-01 destination_country_total  404930
## 18  2006-10-01 destination_country_total  383898
## 19  2007-01-01 destination_country_total  325332
## 20  2007-04-01 destination_country_total  212162
## 21  2007-07-01 destination_country_total   51546
## 22  2007-10-01 destination_country_total  294289
## 23  2008-01-01 destination_country_total  321982
## 24  2008-04-01 destination_country_total  425821
## 25  2008-07-01 destination_country_total  515946
## 26  2008-10-01 destination_country_total  402882
## 27  2009-01-01 destination_country_total  438400
## 28  2009-04-01 destination_country_total  557723
## 29  2009-07-01 destination_country_total  472808
## 30  2009-10-01 destination_country_total  593120
## 31  2010-01-01 destination_country_total  567421
## 32  2010-04-01 destination_country_total  954509
## 33  2010-07-01 destination_country_total  493828
## 34  2010-10-01 destination_country_total  706919
## 35  2011-01-01 destination_country_total 1226395
## 36  2011-04-01 destination_country_total 1527570
## 37  2011-07-01 destination_country_total  639073
## 38  2011-10-01 destination_country_total 1107067
## 39  2012-01-01 destination_country_total 1474171
## 40  2012-04-01 destination_country_total 1958639
## 41  2012-07-01 destination_country_total 1727985
## 42  2012-10-01 destination_country_total 1653138
## 43  2013-01-01 destination_country_total  859503
## 44  2013-04-01 destination_country_total 1045686
## 45  2013-07-01 destination_country_total 1054783
## 46  2013-10-01 destination_country_total  960722
## 47  2014-01-01 destination_country_total 1500787
## 48  2014-04-01 destination_country_total 1371768
## 49  2014-07-01 destination_country_total  843713
## 50  2014-10-01 destination_country_total  870405
## 51  2015-01-01 destination_country_total 2574784
## 52  2015-04-01 destination_country_total 2270539
## 53  2015-07-01 destination_country_total  597089
## 54  2015-10-01 destination_country_total  945602
## 55  2016-01-01 destination_country_total 1869932
## 56  2016-04-01 destination_country_total 1771137
## 57  2016-07-01 destination_country_total  481589
## 58  2016-10-01 destination_country_total 1405564
## 59  2017-01-01 destination_country_total 1886297
## 60  2017-04-01 destination_country_total 2335306
## 61  2017-07-01 destination_country_total 2896777
## 62  2017-10-01 destination_country_total 4346265
## 63  2018-01-01 destination_country_total 5153963
## 64  2018-04-01 destination_country_total 4832813
## 65  2018-07-01 destination_country_total 3729446
## 66  2018-10-01 destination_country_total 3471318
## 67  2019-01-01 destination_country_total 4388111
## 68  2019-04-01 destination_country_total 3646421
## 69  2019-07-01 destination_country_total 2258358
## 70  2002-01-01               total_steam     179
## 71  2002-04-01               total_steam       0
## 72  2002-10-01               total_steam     535
## 73  2003-01-01               total_steam       0
## 74  2003-07-01               total_steam     175
## 75  2003-10-01               total_steam       0
## 76  2004-01-01               total_steam       0
## 77  2004-04-01               total_steam   28370
## 78  2004-07-01               total_steam  219311
## 79  2004-10-01               total_steam     646
## 80  2005-01-01               total_steam       0
## 81  2005-04-01               total_steam  159726
## 82  2005-07-01               total_steam     263
## 83  2005-10-01               total_steam   79405
## 84  2006-01-01               total_steam       0
## 85  2006-04-01               total_steam     267
## 86  2006-07-01               total_steam   78231
## 87  2006-10-01               total_steam      66
## 88  2007-01-01               total_steam      22
## 89  2007-04-01               total_steam     773
## 90  2007-07-01               total_steam      66
## 91  2007-10-01               total_steam     707
## 92  2008-01-01               total_steam     444
## 93  2008-04-01               total_steam   79226
## 94  2008-07-01               total_steam     419
## 95  2008-10-01               total_steam      90
## 96  2009-01-01               total_steam     161
## 97  2009-04-01               total_steam     153
## 98  2009-07-01               total_steam     689
## 99  2009-10-01               total_steam     951
## 100 2010-01-01               total_steam     113
## 101 2010-04-01               total_steam     132
## 102 2010-07-01               total_steam     145
## 103 2010-10-01               total_steam  188568
## 104 2011-01-01               total_steam  231664
## 105 2011-04-01               total_steam  229458
## 106 2011-07-01               total_steam  237579
## 107 2011-10-01               total_steam     193
## 108 2012-01-01               total_steam  421855
## 109 2012-04-01               total_steam  562711
## 110 2012-07-01               total_steam  344002
## 111 2012-10-01               total_steam  687039
## 112 2013-01-01               total_steam  110975
## 113 2013-04-01               total_steam  199554
## 114 2013-07-01               total_steam  453873
## 115 2013-10-01               total_steam  113977
## 116 2014-01-01               total_steam  542108
## 117 2014-04-01               total_steam  405169
## 118 2014-07-01               total_steam  168330
## 119 2014-10-01               total_steam     293
## 120 2015-01-01               total_steam  753443
## 121 2015-04-01               total_steam  983807
## 122 2015-07-01               total_steam  238372
## 123 2015-10-01               total_steam  458943
## 124 2016-01-01               total_steam  679742
## 125 2016-04-01               total_steam  881690
## 126 2016-07-01               total_steam  140552
## 127 2016-10-01               total_steam  978861
## 128 2017-01-01               total_steam 1137416
## 129 2017-04-01               total_steam 1395132
## 130 2017-07-01               total_steam 1809947
## 131 2017-10-01               total_steam 3197290
## 132 2018-01-01               total_steam 2997699
## 133 2018-04-01               total_steam 2923202
## 134 2018-07-01               total_steam 3054197
## 135 2018-10-01               total_steam 2556630
## 136 2019-01-01               total_steam 3003943
## 137 2019-04-01               total_steam 2460187
## 138 2019-07-01               total_steam 1204539
## 139 2002-01-01               total_metal       0
## 140 2002-04-01               total_metal    9914
## 141 2002-10-01               total_metal       0
## 142 2003-01-01               total_metal   10880
## 143 2003-07-01               total_metal       0
## 144 2003-10-01               total_metal    8994
## 145 2004-01-01               total_metal   46027
## 146 2004-04-01               total_metal  316029
## 147 2004-07-01               total_metal  285053
## 148 2004-10-01               total_metal  195440
## 149 2005-01-01               total_metal  291178
## 150 2005-04-01               total_metal  469282
## 151 2005-07-01               total_metal   52259
## 152 2005-10-01               total_metal  375299
## 153 2006-01-01               total_metal  213590
## 154 2006-04-01               total_metal   56800
## 155 2006-07-01               total_metal  326699
## 156 2006-10-01               total_metal  383832
## 157 2007-01-01               total_metal  325310
## 158 2007-04-01               total_metal  211389
## 159 2007-07-01               total_metal   51480
## 160 2007-10-01               total_metal  293582
## 161 2008-01-01               total_metal  321538
## 162 2008-04-01               total_metal  346595
## 163 2008-07-01               total_metal  515527
## 164 2008-10-01               total_metal  402792
## 165 2009-01-01               total_metal  438239
## 166 2009-04-01               total_metal  557570
## 167 2009-07-01               total_metal  472119
## 168 2009-10-01               total_metal  592169
## 169 2010-01-01               total_metal  567308
## 170 2010-04-01               total_metal  954377
## 171 2010-07-01               total_metal  493683
## 172 2010-10-01               total_metal  518351
## 173 2011-01-01               total_metal  994731
## 174 2011-04-01               total_metal 1298112
## 175 2011-07-01               total_metal  401494
## 176 2011-10-01               total_metal 1106874
## 177 2012-01-01               total_metal 1052316
## 178 2012-04-01               total_metal 1395928
## 179 2012-07-01               total_metal 1383983
## 180 2012-10-01               total_metal  966099
## 181 2013-01-01               total_metal  748528
## 182 2013-04-01               total_metal  846132
## 183 2013-07-01               total_metal  600910
## 184 2013-10-01               total_metal  846745
## 185 2014-01-01               total_metal  958679
## 186 2014-04-01               total_metal  966599
## 187 2014-07-01               total_metal  675383
## 188 2014-10-01               total_metal  870112
## 189 2015-01-01               total_metal 1821341
## 190 2015-04-01               total_metal 1286732
## 191 2015-07-01               total_metal  358717
## 192 2015-10-01               total_metal  486659
## 193 2016-01-01               total_metal 1190190
## 194 2016-04-01               total_metal  889447
## 195 2016-07-01               total_metal  341037
## 196 2016-10-01               total_metal  426703
## 197 2017-01-01               total_metal  748881
## 198 2017-04-01               total_metal  940174
## 199 2017-07-01               total_metal 1086830
## 200 2017-10-01               total_metal 1148975
## 201 2018-01-01               total_metal 2156264
## 202 2018-04-01               total_metal 1909611
## 203 2018-07-01               total_metal  675249
## 204 2018-10-01               total_metal  914688
## 205 2019-01-01               total_metal 1384168
## 206 2019-04-01               total_metal 1186234
## 207 2019-07-01               total_metal 1053819

I wanted to see how Japan’s coal imports reacted to Fukushima nuclear power plant accident in 2011–delineated by the dotter yellow line. Apparently as far as coal imports from the US goes not too much changed. Of course, this could just mean that the U.S. isn’t an important supplier of coal for Japan, but this seems very unlikely given that Japan is in the United States’ top 10 export destinations. Something certainly happened around 2010–where we see the huge spike and subsequent fall, but I am not yet sure what event or policy took place then. I enjoyed using melt and adding a vertical line! I was frustrated that I couldn’t figure out how to add a label to the Fukushima line. I also want to learn how to change the names in the legend, not sure if thats possible.