This R package provides tools to access Eurostat open data as part of the rOpenGov project.
For contact information and source code, see the github page
Release version for general use:
install.packages("eurostat")
library(eurostat)
Development version (potentially unstable):
install.packages("devtools")
library(devtools)
install_github("eurostat", "ropengov")
library(eurostat)
Function getEurostatTOC downloads a table of contents of eurostat datasets. Note that the values in column ‘code’ should be used to download a selected dataset.
library(eurostat)
# Get Eurostat data listing
toc <- getEurostatTOC()
toc[200:210,]
## title
## 200 Gross value added at basic pricesby NUTS 3 regions
## 201 Employment (in 1000 persons) by NUTS 3 regions
## 202 Gross fixed capital formation by NUTS 2 regions (NACE Rev. 2)
## 203 Compensation of employees by NUTS 2 regions (NACE Rev. 2)
## 204 Employment (in 1000 hours worked) by NUTS 2 regions (NACE Rev. 2)
## 205 Employment (in 1000 persons) by NUTS 3 regions (NACE Rev. 2)
## 206 Household accounts - ESA95
## 207 Allocation of primary income account of households by NUTS 2 regions
## 208 Secondary distribution of income account of households by NUTS 2 regions
## 209 Income of households by NUTS 2 regions
## 210 Regional education statistics
## code type last.update.of.data
## 200 nama_r_e3vabp95 dataset 18.06.2013
## 201 nama_r_e3empl95 dataset 11.06.2012
## 202 nama_r_e2gfcfr2 dataset 22.08.2014
## 203 nama_r_e2remr2 dataset 21.08.2014
## 204 nama_r_e2em95hr2 dataset 21.08.2014
## 205 nama_r_e3em95r2 dataset 21.08.2014
## 206 reg_ecohh folder
## 207 nama_r_ehh2p dataset 27.03.2014
## 208 nama_r_ehh2s dataset 27.03.2014
## 209 nama_r_ehh2inc dataset 28.03.2014
## 210 reg_educ folder
## last.table.structure.change data.start data.end values
## 200 26.06.2013 1995 2009 NA
## 201 04.02.2014 1995 2009 NA
## 202 08.08.2014 2000 2011 NA
## 203 30.07.2014 2000 2011 NA
## 204 14.08.2014 1995 2011 NA
## 205 14.08.2014 2000 2012 NA
## 206 NA
## 207 25.03.2014 2000 2011 NA
## 208 25.03.2014 2000 2011 NA
## 209 24.03.2014 2000 2011 NA
## 210 NA
With grepEurostatTOC you can search through the table of content for particular patterns, e.g. all datasets related to passenger transport.
# info about passengers
head(grepEurostatTOC("passenger transport", type = "dataset"))
## title
## 4945 Volume of passenger transport relative to GDP
## 4946 Modal split of passenger transport
## 4985 Railway transport - Total annual passenger transport (1 000 pass., million pkm)
## 4989 International railway passenger transport from the reporting country to the country of disembarkation (1 000 passengers)
## 4990 International railway passenger transport from the country of embarkation to the reporting country (1 000 passengers)
## 5341 Air passenger transport by reporting country
## code type last.update.of.data
## 4945 tran_hv_pstra dataset 25.06.2014
## 4946 tran_hv_psmod dataset 25.06.2014
## 4985 rail_pa_total dataset 14.08.2014
## 4989 rail_pa_intgong dataset 14.08.2014
## 4990 rail_pa_intcmng dataset 14.08.2014
## 5341 avia_paoc dataset 18.08.2014
## last.table.structure.change data.start data.end values
## 4945 25.06.2014 1995 2012 NA
## 4946 25.06.2014 1990 2012 NA
## 4985 10.07.2014 2004 2013 NA
## 4989 10.07.2014 2002 2013 NA
## 4990 10.07.2014 2002 2013 NA
## 5341 13.08.2014 1993 2014Q2 NA
head(grepEurostatTOC("passenger transport", type = "table"))
## title
## 7105 Volume of passenger transport relative to GDP
## 7106 Modal split of passenger transport
## 7613 Modal split of passenger transport
## 7737 Modal split of passenger transport
## 7740 Volume of passenger transport relative to GDP
## code type last.update.of.data last.table.structure.change
## 7105 tsdtr240 table 25.06.2014 25.06.2014
## 7106 tsdtr210 table 25.06.2014 25.06.2014
## 7613 tsdtr210 table 25.06.2014 25.06.2014
## 7737 tsdtr210 table 25.06.2014 25.06.2014
## 7740 tsdtr240 table 25.06.2014 25.06.2014
## data.start data.end values
## 7105 1995 2012 NA
## 7106 1990 2012 NA
## 7613 1990 2012 NA
## 7737 1990 2012 NA
## 7740 1995 2012 NA
Package has two functions for downloading the data. When using get_eurostat_raw the data is transformed into the tabular format, whereas get_eurostat returns dataset transformed into the molten / row-column-value format (RCV). Let’s focus on indicator (Modal split of passenger transport) in this document.
This indicator is defined as the percentage share of each mode of transport in total inland transport, expressed in passenger-kilometres (pkm). It is based on transport by passenger cars, buses and coaches, and trains. All data should be based on movements on national territory, regardless of the nationality of the vehicle. However, the data collection methodology is not harmonised at the EU level.
# Pick ID for the table
id <- unique(grepEurostatTOC("Modal split of passenger transport",
type = "table")$code)
# Get table with the given ID
dat_raw <- get_eurostat_raw(id)
# lets use kable function from knitr for nicer table outputs
library(knitr)
kable(head(dat_raw))
| vehicle.geo.time | X1990 | X1991 | X1992 | X1993 | X1994 | X1995 | X1996 | X1997 | X1998 | X1999 | X2000 | X2001 | X2002 | X2003 | X2004 | X2005 | X2006 | X2007 | X2008 | X2009 | X2010 | X2011 | X2012 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BUS_TOT,AT | NA | 10.6 | 10.5 | 10.7 | 10.6 | 10.9 | 10.7 | 10.9 | 10.9 | 10.7 | 11 | 10.9 | 10.9 | 10.9 | 11 | 10.5 | 10.4 | 10.8 | 10.2 | 9.6 | 10.3 | 10.1 | 10 |
| BUS_TOT,BE | NA | 10.1 e | 10.3 e | 10.3 e | 10.4 e | 11.2 | 11.2 e | 11.1 | 11.1 | 10.7 e | 10.5 | 10.7 | 11.4 | 12.5 | 12.7 | 13 | 13.2 | 13.4 | 12.5 | 12.5 | 12.2 | 12.3 | 12.4 |
| BUS_TOT,BG | NA | NA | NA | NA | NA | 28.0 e | 26.3 e | 28.5 e | 30.3 e | 33.5 e | 31.4 b | 32 | 33.4 | 28.1 | 25 | 24.3 | 22.7 | 21.8 | 20.8 | 16.8 | 16.4 | 15.9 | 16.9 |
| BUS_TOT,CH | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 5.2 | 5.2 | 5.1 | 5.2 | 5.2 | 5.3 | 5.6 | 5.5 | 5.1 | 5.1 | 5.1 | 5.1 | 5.1 |
| BUS_TOT,CY | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 22.3 e | 22.5 e | 22.6 e | 23.6 e | 21.2 e | 20.8 e | 20.4 e | 19.7 e | 18.8 e | 17.6 e | 18.1 e | 18.3 e | 18.7 e |
| BUS_TOT,CZ | NA | NA | NA | 19.1 e | 17.0 e | 15.8 e | 20.1 e | 19.0 e | 18.5 e | 18.2 e | 18.6 | 19.9 | 18.7 | 17.2 | 16 | 17.2 | 17.3 | 17 | 16.9 | 16 | 18.9 | 17 | 16.8 |
dat <- get_eurostat(id)
kable(head(dat))
| vehicle | geo | time | value |
|---|---|---|---|
| BUS_TOT | AT | 1990 | NA |
| BUS_TOT | BE | 1990 | NA |
| BUS_TOT | BG | 1990 | NA |
| BUS_TOT | CH | 1990 | NA |
| BUS_TOT | CY | 1990 | NA |
| BUS_TOT | CZ | 1990 | NA |
Function label_eurostat replaces the eurostat codes with definitions from Eurostat dictionaries for data frames created using get_eurostat-function.
datl <- label_eurostat(dat)
kable(head(datl))
| vehicle | geo | time | value |
|---|---|---|---|
| Motor coaches, buses and trolley buses | Austria | 1990 | NA |
| Motor coaches, buses and trolley buses | Belgium | 1990 | NA |
| Motor coaches, buses and trolley buses | Bulgaria | 1990 | NA |
| Motor coaches, buses and trolley buses | Switzerland | 1990 | NA |
| Motor coaches, buses and trolley buses | Cyprus | 1990 | NA |
| Motor coaches, buses and trolley buses | Czech Republic | 1990 | NA |
library(reshape)
tmp <- get_eurostat("tsdtr210")
bus <- cast(tmp, geo ~ time , mean, subset= vehicle=="BUS_TOT")
car <- cast(tmp, geo ~ time , mean, subset= vehicle=="CAR")
train <- cast(tmp, geo ~ time , mean, subset= vehicle=="TRN")
# select 2010 data
allTransports <- data.frame(bus = bus[,"2010"],
car = car[,"2010"],
train = train[,"2010"])
# add countrynames
rownames(allTransports) <- levels(bus[,1])
allTransports <- na.omit(allTransports)
# triangle plot
library("plotrix")
triax.plot(allTransports, show.grid=TRUE,
label.points=TRUE, point.labels=rownames(allTransports),
pch=19)
Eurostat is using ISO2 format for country names, OECD is using ISO3 for their studies, and Statistics Finland uses full country names. There are (at least) two ways to solve the issue. First one is to apply label_eurostat-function to your dataset.
tmp <- get_eurostat("tsdtr210")
tmpl <- label_eurostat(tmp)
kable(head(tmpl))
| vehicle | geo | time | value |
|---|---|---|---|
| Motor coaches, buses and trolley buses | Austria | 1990 | NA |
| Motor coaches, buses and trolley buses | Belgium | 1990 | NA |
| Motor coaches, buses and trolley buses | Bulgaria | 1990 | NA |
| Motor coaches, buses and trolley buses | Switzerland | 1990 | NA |
| Motor coaches, buses and trolley buses | Cyprus | 1990 | NA |
| Motor coaches, buses and trolley buses | Czech Republic | 1990 | NA |
A second option is to use countrycode package can be used to convert between these formats.
library("countrycode")
# Use the country codes from previous examples
countries <- rownames(allTransports)
head(countries)
## [1] "AT" "BE" "BG" "CH" "CZ" "DE"
# From ISO2 (used by Eurostat) into ISO3 (used by OECD)
head(countrycode(countries, "iso2c", "iso3c"))
## [1] "AUT" "BEL" "BGR" "CHE" "CZE" "DEU"
# From ISO2 (used by Eurostat) into ISO (short country names)
head(countrycode(rownames(allTransports), "iso2c", "country.name"))
## [1] "Austria" "Belgium" "Bulgaria" "Switzerland"
## [5] "Czech Republic" "Germany"
This R package is based on earlier CRAN packages statfi and smarterpoland. The datamart package contains related tools for Eurostat but at the time of writing this tutorial this package seems to be in an experimental stage.
Citing the Data Kindly cite Eurostat.
Citing the R tools This work can be freely used, modified and distributed under the [BSD-2-clause (modified FreeBSD) license]. Kindly cite the R package as ‘Leo Lahti, Przemyslaw Biecek, Janne Huovari and Markus Kainu (C) 2014. eurostat R package. URL: http://ropengov.github.io/eurostat’.
This tutorial was created with
sessionInfo()
## R version 3.1.1 (2014-07-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
##
## locale:
## [1] LC_CTYPE=fi_FI.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=fi_FI.UTF-8 LC_COLLATE=fi_FI.UTF-8
## [5] LC_MONETARY=fi_FI.UTF-8 LC_MESSAGES=fi_FI.UTF-8
## [7] LC_PAPER=fi_FI.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=fi_FI.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] countrycode_0.17 plotrix_3.5-7 reshape_0.8.5 knitr_1.6
## [5] eurostat_0.9.35 tidyr_0.1 plyr_1.8.1
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.4 evaluate_0.5.5 formatR_1.0 htmltools_0.2.4
## [5] Rcpp_0.11.2 reshape2_1.4 rmarkdown_0.2.64 stringr_0.6.2
## [9] tools_3.1.1 yaml_2.1.13