Denmark, a gigantic small country in Europe (pun intended) with a population of 5.749 million (its about 19.2 times the population of Iceland, my home country). I live here and work within logistics while I also practice my data analytics skills.
Recently I have been looking at time series analysis using Forecasting: Principles and Practice. Its a fantastic open book that goes deep into time series analysis and one of the aspects is plotting. I have always been a huge fan of visualization which makes ggplot my absolute favourite package to to “show” my data.
In this book, the authors go straight into visualizing time series data, this is what I want to do, but not on clean ready data within the packages, so I downloaded a dataset that has total weight of goods transported by lorries in Denmark (measured in 1000’s tons) from 1998 to 2018 (from Statistics Denmark website). So lets do some plotting, but I will also pass my ggplots through ggplotly function within plotly package to make them interactive, just for extra cool and wow factor.
So lets load the data and transpose it since it is going the wrong way and make it as a time series object.
data <- read.csv("denmark_data2.csv", header = FALSE, sep =";")
data2 <- t(data)
data2 <- ts(data2, frequency = 4, start = 1998)This first plot is my favourite. We can see where the financial crash of 2009 happened and for how long it effected logistics in Denmark. The trend is going up but seems to be levelling off, but this might be due to changes in consumer behaviour, more effective logistics methods or what ever else that might be explaining this.
a <- autoplot(data2, color = "blue")+
xlab("Quarterly measurements from 1998 to 2019")+
ylab("Measurements in 1000s tons")+
scale_x_continuous(breaks = seq(from = 1998, to = 2019, by = 1))+
theme(axis.text.x = element_text(angle = 45, hjust = 1))+
theme(axis.text.y = element_text(angle = 55))+
geom_smooth(method = "loess", se = FALSE)
ggplotly(a)Now I want to look at each quarter. To do this we use ggsubseriesplot function and this one is a doozy. We can see that constant trend up as we close out the year. The summer months Q3 have almost the same mean as Q4. Its interesting to see how the holidays during the winter months seem to generate more business in logistics than summer time.
b<-ggsubseriesplot(data2)+
xlab("Quarterly Measurments")+
ylab("")+
ggtitle("Transportation of goods measured in 1000s tons")
ggplotly(b)Some times to see something this simple makes you think about how much information one can extract from a simple data line, well two data lines in this case since time is a variable, but you get the point. For good measure we can look at the strength of the trend by looking at an autocorrelation plot. The autocorrelation confirms our seasonal trend in the data, showing Q4 to always be largest and Q3 following there after, tapering of at the end as the lags increase.
There isn’t much more to this data, although adding more data could always be a goodie. But in this simple illustration we can glimpse market behaviour by observing this simple data.
William of Ockham.
“All things being equal, the simplest solution tends to be the best one”.