This document is an example of exploratory data analysis (EDA) with R, using ASEAN trade statistics.
The goal of EDA is to raise further inquiry by looking at the data. Because trade is indicative of economic makeup, we will use some techniques that compare across countries. Because trade is also a result of government policy and global economic conditions, we’ll explore how trade conditions change over time for each country to get a feel for cause-and-effect type questions. And lastly, because South East Asia economies are tightly integrated, we’ll introduce some ways to explore the dyanmics of ASEAN as a unit.
ASEAN provides access to trade statistics on its website, however the format is not easily machine readible (inconsistent spacing) and not in the prefered “tidy” format.
After downloading the data for total value of inter-ASEAN imports and exports by country pairs, I first rearranged the data in spreadsheet software before loading it in R. Here is what we have so far
asean_trade_matrix <- read.csv("~/projects/asean-trade/asean_trade_matrix.csv")
head(asean_trade_matrix)
## Year Type Source Brunei.Darussalam Indonesia Lao.PDR
## 1 2000 Export Brunei Darussalam 0 31238742 0
## 2 2000 Import Brunei Darussalam 0 26802806 37374
## 3 2001 Export Brunei Darussalam 0 33148011 0
## 4 2001 Import Brunei Darussalam 0 27160939 2
## 5 2002 Export Brunei Darussalam 0 42526903 19093
## 6 2002 Import Brunei Darussalam 0 38783220 849
## Myanmar Philippines Singapore Viet.Nam X
## 1 16297 379997 170881421 267516 NA
## 2 62866 1985446 282213018 557554 NA
## 3 134728 392895 310818251 14587 NA
## 4 24034 3524788 271484296 647006 NA
## 5 9982 22846933 116556957 16011 NA
## 6 28755449 2626251 288914990 886589 NA
You can see the Year, Type (export or import) and source columns are already in a tidy format. However, destinations are split across columns when destination should be a variable. We can use the tidyr package to fix that:
library(tidyr)
library(dplyr) # I like the 'pipe'-style operations of dplyr + magrittr
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Remove the stray column introduced by saving ODF to CSV
asean_trade_matrix <- asean_trade_matrix %>%
select(-X)
data <- asean_trade_matrix %>%
gather(Destination, Value, -c(Year, Type, Source))
head(data)
## Year Type Source Destination Value
## 1 2000 Export Brunei Darussalam Brunei.Darussalam 0
## 2 2000 Import Brunei Darussalam Brunei.Darussalam 0
## 3 2001 Export Brunei Darussalam Brunei.Darussalam 0
## 4 2001 Import Brunei Darussalam Brunei.Darussalam 0
## 5 2002 Export Brunei Darussalam Brunei.Darussalam 0
## 6 2002 Import Brunei Darussalam Brunei.Darussalam 0
Now that the data is tidy, we are ready to start working with it. Also note that while it makes sense Brunei to Brunei trade would be 0, there are instances of a country having trade with itself in the data. That is the first question raised in this EDA session.
A relatively simple plot to get us started, visualizaing the Philippines largest ASEAN trading partner in 2014:
library(ggvis)
data %>%
filter(Source == 'Philippines' & Year == 2014) %>%
group_by(Type) %>%
ggvis(~Destination, ~Value, fill = ~Type) %>%
layer_bars(stack = TRUE)
This chart plots one categorical variable (Desintation country) against one numeric variable (trade Value), with the added dimension of trade type (Export or Import) encoded here by color.
Singapore was by far the largest ASEAN trading partner for the Philippines in 2014. Also note that the Philippines did not run a large tarde deficit with Singapore (value of exports - value of imports), but it did with it’s other two main trading parterns Indonesia and Viet Nam.
Because we have focused on only one country and only one year of a data set which contrains many countries across 14 years, this graph would be considered cross-sectional. It represents a slice in time, but not trends across time.
To explore time series data, we can check how the Philippines trade condition in ASEAN has changed between 2000 and 2014 by plotting its inter-ASEAN balance of trade:
data %>%
filter(Destination == 'Philippines') %>%
spread(Type, Value) %>%
mutate(Balance = Export - Import) %>%
group_by(Year) %>%
summarize(`ASEAN BoT` = sum(Balance)) %>%
ggvis(~Year, ~`ASEAN BoT`) %>%
layer_lines()
Time series visualizations are useful for spotting trends and anomalies. For example, we can see that The Philippines’ inter-ASEAN balance of trade remained positive (exports exceed imports) and grown over time. The spike in 2007 - 2008 may be caused by the US financial crisis. I don’t know why levels have remained high since 2011. These would all be good questions to ask: why is the Philippines exporting to more than importing from ASEAN countries, what caused the spike in 2008, and what caused the sudden and sustained rise that began in 2011? Additionally, keep in mind ASEAN trade only represents a portion of the Philippines overall trade, so we could ask: does the inter-ASEAN trade surplus trend with the overall balance of trade? If not, what is different about The Philippine’s ASEAN trading partners versus the global ones?
Multi-series time series plots allow for both cross-sectional (multiple subjects) and time-series analysis. Multi-series time series are useful for spotting trends across countries or observing cross-cutting anomalies among other things.
data %>%
spread(Type, Value) %>%
mutate(Balance = Export - Import) %>%
group_by(Destination, Year) %>%
summarize(`ASEAN BoT` = sum(Balance)) %>%
ggvis(~Year, ~`ASEAN BoT`, stroke = ~Destination) %>%
layer_lines()
Immediately, you notice Singapore is the only country running a significant trade deficit (imports exceed exports). The intuitive explination is that Singapore has no resources or primary industry of its own. However, notice that until 2008, Singapore was running a trade surplus. What is driving the increasing trade deficit of Singapore, and is that good or bad for their economy?
In addition to Singapore, Indonesia is the largest player in inter-ASEAN trade, but they are running in the opposite direction: quickly moving out of a trade deficit into a large trade surplus. Indonesia is rich in primary (resource) and seconday (manufacturing) industries, which might explain the difference with Signapore. What is driving Indonesia export growth over the last 15 years? They recently joined the G20, how much of that high GDP growth is driven by export-oriented manufacturing? How much of that is a result of statist development policy, and what does that say about the competence of the Indonesian government?
We can visualize tabular data as a chord diagram. A chord diagram is a visualization of a square matrix with some direction values (or non-directed if the matrix is symmetric).
We need to use the original data source again because it was already close to a matrix format. First we need to select either only exports or imports for a given year. We’ll use exports in 2014.
The data is missing entries for Thailand and Cambodia. We could reconstruct the export numbers from the imports to other countries. For now we just leave them out.
m <- as.matrix(filter(asean_trade_matrix,
Type == 'Export' & Year == 2013)[4:10])
row.names(m) <- filter(asean_trade_matrix,
Type == 'Export' & Year == 2013)[,3]
library(chorddiag)
chorddiag(m[-c(2,8),], groupnamePadding = 50,
palette = 'Set3',
showTooltips = TRUE)
## Warning in chorddiag(m[-c(2, 8), ], groupnamePadding = 50, palette =
## "Set3", : row names of the 'data' matrix differ from its column names or
## the 'groupNames' argument.
As you can see from the graph, Singapore exports a lot more to Indonesia than it imports. Similarly, Myanmar exports a lot more to Singapore than it imports. I was surprised to see Myanmar has the second highest value of exports in this comparison.
Adding in ASEAN+1 (or ASEAN+N) data. According to popular economic theories (the flying geese model, Asian nodelbowl economy, etc.), East and South East Asia are part of a hub-and-spoke network economy of distributed manufacturing and assembly. Therefore, in order to understand the international trade dynamics, you would want to see the whole picture.
Building an app that allows you to filter any of the above visualizations by trade category. This would allow distinctions between trade in services and trade in goods, or by type of service or good. A lot is hidden in aggregated trade numbers, service-oriented economies like Singapore tend to be higher up the value added chain and have higher wages as a result.