Eurostat R tools

This R package provides tools to access Eurostat open data as part of the rOpenGov project.

For contact information and source code, see the github page

Installation

Release version for general use:

install.packages("eurostat")
library(eurostat)

Development version (potentially unstable):

install.packages("devtools")
library(devtools)
install_github("eurostat", "ropengov")
library(eurostat)

Finding the data

Function getEurostatTOC downloads a table of contents of eurostat datasets. Note that the values in column ‘code’ should be used to download a selected dataset.

library(eurostat)

# Get Eurostat data listing
toc <- getEurostatTOC()
toc[200:210,]

##                                                                                            title
## 200                                           Gross value added at basic pricesby NUTS 3 regions
## 201                                               Employment (in 1000 persons) by NUTS 3 regions
## 202                                Gross fixed capital formation by NUTS 2 regions (NACE Rev. 2)
## 203                                    Compensation of employees by NUTS 2 regions (NACE Rev. 2)
## 204                            Employment (in 1000 hours worked) by NUTS 2 regions (NACE Rev. 2)
## 205                                Employment (in 1000 persons) by NUTS 3 regions (NACE Rev. 2) 
## 206                                                                   Household accounts - ESA95
## 207                         Allocation of primary income account of households by NUTS 2 regions
## 208                     Secondary distribution of income account of households by NUTS 2 regions
## 209                                                       Income of households by NUTS 2 regions
## 210                                                                Regional education statistics
##                 code    type last.update.of.data
## 200  nama_r_e3vabp95 dataset          18.06.2013
## 201  nama_r_e3empl95 dataset          11.06.2012
## 202  nama_r_e2gfcfr2 dataset          22.08.2014
## 203   nama_r_e2remr2 dataset          21.08.2014
## 204 nama_r_e2em95hr2 dataset          21.08.2014
## 205  nama_r_e3em95r2 dataset          21.08.2014
## 206        reg_ecohh  folder                    
## 207     nama_r_ehh2p dataset          27.03.2014
## 208     nama_r_ehh2s dataset          27.03.2014
## 209   nama_r_ehh2inc dataset          28.03.2014
## 210         reg_educ  folder                    
##     last.table.structure.change data.start data.end values
## 200                  26.06.2013       1995     2009     NA
## 201                  04.02.2014       1995     2009     NA
## 202                  08.08.2014       2000     2011     NA
## 203                  30.07.2014       2000     2011     NA
## 204                  14.08.2014       1995     2011     NA
## 205                  14.08.2014       2000     2012     NA
## 206                                                     NA
## 207                  25.03.2014       2000     2011     NA
## 208                  25.03.2014       2000     2011     NA
## 209                  24.03.2014       2000     2011     NA
## 210                                                     NA

With grepEurostatTOC you can search through the table of content for particular patterns, e.g. all datasets related to passenger transport.

# info about passengers
head(grepEurostatTOC("passenger transport", type = "dataset"))

##                                                                                                                                         title
## 4945                                                                                            Volume of passenger transport relative to GDP
## 4946                                                                                                       Modal split of passenger transport
## 4985                                                          Railway transport - Total annual passenger transport (1 000 pass., million pkm)
## 4989                 International railway passenger transport from the reporting country to the country of disembarkation (1 000 passengers)
## 4990                    International railway passenger transport from the country of embarkation to the reporting country (1 000 passengers)
## 5341                                                                                             Air passenger transport by reporting country
##                 code    type last.update.of.data
## 4945   tran_hv_pstra dataset          25.06.2014
## 4946   tran_hv_psmod dataset          25.06.2014
## 4985   rail_pa_total dataset          14.08.2014
## 4989 rail_pa_intgong dataset          14.08.2014
## 4990 rail_pa_intcmng dataset          14.08.2014
## 5341       avia_paoc dataset          18.08.2014
##      last.table.structure.change data.start data.end values
## 4945                  25.06.2014       1995     2012     NA
## 4946                  25.06.2014       1990     2012     NA
## 4985                  10.07.2014       2004     2013     NA
## 4989                  10.07.2014       2002     2013     NA
## 4990                  10.07.2014       2002     2013     NA
## 5341                  13.08.2014       1993   2014Q2     NA

head(grepEurostatTOC("passenger transport", type = "table"))

##                                                              title
## 7105                 Volume of passenger transport relative to GDP
## 7106                            Modal split of passenger transport
## 7613                            Modal split of passenger transport
## 7737                            Modal split of passenger transport
## 7740                 Volume of passenger transport relative to GDP
##          code  type last.update.of.data last.table.structure.change
## 7105 tsdtr240 table          25.06.2014                  25.06.2014
## 7106 tsdtr210 table          25.06.2014                  25.06.2014
## 7613 tsdtr210 table          25.06.2014                  25.06.2014
## 7737 tsdtr210 table          25.06.2014                  25.06.2014
## 7740 tsdtr240 table          25.06.2014                  25.06.2014
##      data.start data.end values
## 7105       1995     2012     NA
## 7106       1990     2012     NA
## 7613       1990     2012     NA
## 7737       1990     2012     NA
## 7740       1995     2012     NA

Downloading the data

Package has two functions for downloading the data. When using get_eurostat_raw the data is transformed into the tabular format, whereas get_eurostat returns dataset transformed into the molten / row-column-value format (RCV). Let’s focus on indicator (Modal split of passenger transport) in this document.

This indicator is defined as the percentage share of each mode of transport in total inland transport, expressed in passenger-kilometres (pkm). It is based on transport by passenger cars, buses and coaches, and trains. All data should be based on movements on national territory, regardless of the nationality of the vehicle. However, the data collection methodology is not harmonised at the EU level.

# Pick ID for the table
id <- unique(grepEurostatTOC("Modal split of passenger transport", 
                         type = "table")$code)
# Get table with the given ID
dat_raw <- get_eurostat_raw(id)
# lets use kable function from knitr for nicer table outputs
library(knitr)
kable(head(dat_raw))

vehicle.geo.time	X1990	X1991	X1992	X1993	X1994	X1995	X1996	X1997	X1998	X1999	X2000	X2001	X2002	X2003	X2004	X2005	X2006	X2007	X2008	X2009	X2010	X2011	X2012
BUS_TOT,AT	NA	10.6	10.5	10.7	10.6	10.9	10.7	10.9	10.9	10.7	11	10.9	10.9	10.9	11	10.5	10.4	10.8	10.2	9.6	10.3	10.1	10
BUS_TOT,BE	NA	10.1 e	10.3 e	10.3 e	10.4 e	11.2	11.2 e	11.1	11.1	10.7 e	10.5	10.7	11.4	12.5	12.7	13	13.2	13.4	12.5	12.5	12.2	12.3	12.4
BUS_TOT,BG	NA	NA	NA	NA	NA	28.0 e	26.3 e	28.5 e	30.3 e	33.5 e	31.4 b	32	33.4	28.1	25	24.3	22.7	21.8	20.8	16.8	16.4	15.9	16.9
BUS_TOT,CH	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	5.2	5.2	5.1	5.2	5.2	5.3	5.6	5.5	5.1	5.1	5.1	5.1	5.1
BUS_TOT,CY	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	22.3 e	22.5 e	22.6 e	23.6 e	21.2 e	20.8 e	20.4 e	19.7 e	18.8 e	17.6 e	18.1 e	18.3 e	18.7 e
BUS_TOT,CZ	NA	NA	NA	19.1 e	17.0 e	15.8 e	20.1 e	19.0 e	18.5 e	18.2 e	18.6	19.9	18.7	17.2	16	17.2	17.3	17	16.9	16	18.9	17	16.8

dat <- get_eurostat(id)
kable(head(dat))

vehicle	geo	time	value
BUS_TOT	AT	1990	NA
BUS_TOT	BE	1990	NA
BUS_TOT	BG	1990	NA
BUS_TOT	CH	1990	NA
BUS_TOT	CY	1990	NA
BUS_TOT	CZ	1990	NA

Labelling the data

Function label_eurostat replaces the eurostat codes with definitions from Eurostat dictionaries for data frames created using get_eurostat-function.

datl <- label_eurostat(dat)

kable(head(datl))

vehicle	geo	time	value
Motor coaches, buses and trolley buses	Austria	1990	NA
Motor coaches, buses and trolley buses	Belgium	1990	NA
Motor coaches, buses and trolley buses	Bulgaria	1990	NA
Motor coaches, buses and trolley buses	Switzerland	1990	NA
Motor coaches, buses and trolley buses	Cyprus	1990	NA
Motor coaches, buses and trolley buses	Czech Republic	1990	NA

Triangle plot for split of passenger transport

library(reshape)
tmp <- get_eurostat("tsdtr210")
bus  <- cast(tmp, geo ~ time , mean, subset= vehicle=="BUS_TOT")
car <- cast(tmp, geo ~ time , mean, subset= vehicle=="CAR")
train   <- cast(tmp, geo ~ time , mean, subset= vehicle=="TRN")

# select 2010 data
allTransports <- data.frame(bus = bus[,"2010"], 
                            car = car[,"2010"],
                            train = train[,"2010"])
# add countrynames
rownames(allTransports) <- levels(bus[,1])
allTransports <- na.omit(allTransports)

# triangle plot
library("plotrix")
triax.plot(allTransports, show.grid=TRUE, 
           label.points=TRUE, point.labels=rownames(allTransports), 
           pch=19)

plot of chunk plotGallery

Working with country codes

Eurostat is using ISO2 format for country names, OECD is using ISO3 for their studies, and Statistics Finland uses full country names. There are (at least) two ways to solve the issue. First one is to apply label_eurostat-function to your dataset.

tmp <- get_eurostat("tsdtr210")
tmpl <- label_eurostat(tmp)

kable(head(tmpl))

vehicle	geo	time	value
Motor coaches, buses and trolley buses	Austria	1990	NA
Motor coaches, buses and trolley buses	Belgium	1990	NA
Motor coaches, buses and trolley buses	Bulgaria	1990	NA
Motor coaches, buses and trolley buses	Switzerland	1990	NA
Motor coaches, buses and trolley buses	Cyprus	1990	NA
Motor coaches, buses and trolley buses	Czech Republic	1990	NA

A second option is to use countrycode package can be used to convert between these formats.

library("countrycode")

# Use the country codes from previous examples
countries <- rownames(allTransports)
head(countries)

## [1] "AT" "BE" "BG" "CH" "CZ" "DE"

# From ISO2 (used by Eurostat) into ISO3 (used by OECD)
head(countrycode(countries, "iso2c", "iso3c"))

## [1] "AUT" "BEL" "BGR" "CHE" "CZE" "DEU"

# From ISO2 (used by Eurostat) into ISO (short country names)
head(countrycode(rownames(allTransports), "iso2c", "country.name"))

## [1] "Austria"        "Belgium"        "Bulgaria"       "Switzerland"   
## [5] "Czech Republic" "Germany"

Citing the package

This R package is based on earlier CRAN packages statfi and smarterpoland. The datamart package contains related tools for Eurostat but at the time of writing this tutorial this package seems to be in an experimental stage.

Citing the Data Kindly cite Eurostat.

Citing the R tools This work can be freely used, modified and distributed under the [BSD-2-clause (modified FreeBSD) license]. Kindly cite the R package as ‘Leo Lahti, Przemyslaw Biecek, Janne Huovari and Markus Kainu (C) 2014. eurostat R package. URL: http://ropengov.github.io/eurostat’.

Session info

This tutorial was created with

sessionInfo()

## R version 3.1.1 (2014-07-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## 
## locale:
##  [1] LC_CTYPE=fi_FI.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=fi_FI.UTF-8        LC_COLLATE=fi_FI.UTF-8    
##  [5] LC_MONETARY=fi_FI.UTF-8    LC_MESSAGES=fi_FI.UTF-8   
##  [7] LC_PAPER=fi_FI.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=fi_FI.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] countrycode_0.17 plotrix_3.5-7    reshape_0.8.5    knitr_1.6       
## [5] eurostat_0.9.35  tidyr_0.1        plyr_1.8.1      
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.4     evaluate_0.5.5   formatR_1.0      htmltools_0.2.4 
##  [5] Rcpp_0.11.2      reshape2_1.4     rmarkdown_0.2.64 stringr_0.6.2   
##  [9] tools_3.1.1      yaml_2.1.13