The deflated house price index (or real house price index) is the ratio between the house price index (HPI) and the national accounts deflator for private final consumption expenditure (households and non-profit institutions serving households (NPISHs) ). This indicator therefore measures inflation in the house market relative to inflation in the final consumption expenditure of households and NPISHs. Eurostat HPI captures price changes of all residential properties purchased by households (flats, detached houses, terraced houses, etc.), both new and existing, independently of their final use and their previous owners. Only market prices are considered, self-build dwellings are therefore excluded. The land component is included. The data are expressed as annual index 2015=100 and as 1 year % change.
We are interested here in inspecting the evolution of this index between 2010 and 2017.
I voluntarly did not get the data from the get_eurostat() function, so that I can do some more data manipulations and transformations (otherwise, data is more tidied from the function).
# Get the data
house_idx <- read_delim(file = "https://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?file=data/tipsho10.tsv.gz",
delim = "\t", # This is a tsv file, tabulation separated
na = ": ") # Missing values are represented using the string ": "
# Quick view to the data
house_idx## # A tibble: 56 x 19
## `unit,geo\\time` `2000 ` `2001 ` `2002 ` `2003 ` `2004 ` `2005 ` `2006 `
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 INX_A_AVG,AT 80.48 e 79.19 e 78.88 e 77.87 e 75.07 e 76.99 e 78.45 e
## 2 INX_A_AVG,BE 65.23 e 66.96 e 70.32 e 74.00 e 78.56 e "86.20… "91.76…
## 3 INX_A_AVG,BG <NA> 57.91 e 51.02 e 56.59 e 80.63 e 103.08… 115.84…
## 4 INX_A_AVG,CY <NA> <NA> 105.20… 102.68… 111.53… "114.4… "124.2…
## 5 INX_A_AVG,CZ 64.50 e 68.16 e 75.75 e 83.35 e 81.21 e 81.58 e 86.32 e
## 6 INX_A_AVG,DE 102.24… 100.63… 97.95 e 96.52 e 94.20 e "93.81… "92.53…
## 7 INX_A_AVG,DK 73.72 e 76.27 e 77.55 e 79.13 e 85.18 e "98.49… "119.5…
## 8 INX_A_AVG,EE <NA> <NA> <NA> <NA> <NA> "85.65… "120.6…
## 9 INX_A_AVG,EL 113.46… 126.20… 140.67… 144.68… 143.21… 154.32… 169.20…
## 10 INX_A_AVG,ES 84.57 e 89.70 e 100.66… 114.35… 129.26… 141.90… "155.3…
## # ... with 46 more rows, and 11 more variables: `2007 ` <chr>, `2008
## # ` <chr>, `2009 ` <chr>, `2010 ` <chr>, `2011 ` <chr>, `2012 ` <chr>,
## # `2013 ` <chr>, `2014 ` <chr>, `2015 ` <chr>, `2016 ` <chr>, `2017
## # ` <chr>
Several transformations to perform:
# Clean variable names
house_idx_clean <- house_idx %>%
janitor::clean_names()
# Separate 1st column into statistic/country
house_idx_clean <- house_idx_clean %>%
separate(col = unit_geo_time, into = c("statistic", "country"), sep = ",")For the next transformations, it will be easier to perform them using a tidy dataframe.
# Tidy the data
house_idx_tidy <- house_idx_clean %>%
gather(key = year, value = index_value, x2000:x2017)
# View the new dataframe
glimpse(house_idx_tidy)## Observations: 1,008
## Variables: 4
## $ statistic <chr> "INX_A_AVG", "INX_A_AVG", "INX_A_AVG", "INX_A_AVG"...
## $ country <chr> "AT", "BE", "BG", "CY", "CZ", "DE", "DK", "EE", "E...
## $ year <chr> "x2000", "x2000", "x2000", "x2000", "x2000", "x200...
## $ index_value <chr> "80.48 e", "65.23 e", NA, NA, "64.50 e", "102.24 e...
Change the country codes into their name.
# 'eu_countries' from 'eurostat' package contains a table with code and country names
head(eu_countries)## code name
## 1 BE Belgium
## 2 BG Bulgaria
## 3 CZ Czech Republic
## 4 DK Denmark
## 5 DE Germany
## 6 EE Estonia
# Replace country code by country names
house_idx_tidy <- house_idx_tidy %>%
inner_join(eu_countries, by = c("country" = "code")) %>%
# Reorder columns and drop the country code column
select(statistic, name, everything(), -country) %>%
# Rename country column
rename(country = name)# Keep only numeric values in 'year' and 'index_value'
house_idx_tidy <- house_idx_tidy %>%
mutate(year = parse_number(year),
index_value = parse_number(index_value))
# Convert 'statistic' and 'country' in factors
house_idx_tidy <- house_idx_tidy %>%
mutate_if(is.character, as.factor)
# Recode the 'statistic' levels to be understandable
house_idx_tidy <- house_idx_tidy %>%
mutate(statistic = fct_recode(statistic, "index" = "INX_A_AVG",
"rate" = "RCH_A_AVG"))Have a look at our neat table.
## Observations: 1,008
## Variables: 4
## $ statistic <fct> index, index, index, index, index, index, index, i...
## $ country <fct> Austria, Belgium, Bulgaria, Cyprus, Czech Republic...
## $ year <dbl> 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 20...
## $ index_value <dbl> 80.48, 65.23, NA, NA, 64.50, 102.24, 73.72, NA, 11...
## # A tibble: 6 x 4
## statistic country year index_value
## <fct> <fct> <dbl> <dbl>
## 1 index Austria 2000 80.5
## 2 index Belgium 2000 65.2
## 3 index Bulgaria 2000 NA
## 4 index Cyprus 2000 NA
## 5 index Czech Republic 2000 64.5
## 6 index Germany 2000 102.
We are interested in the percentage value between 2017 annual average index and 2010 annual average index. To obtain this value, let’s untidy the data.
# Get the final data
house_idx_final <- house_idx_tidy %>%
# only keep the 'index' statistic and both 2010 and 2017 years
filter(statistic == "index",
year %in% c(2010, 2017)) %>%
# untidy the data to have both '2010' and '2017' columns
spread(key = year, value = index_value) %>%
# create the ratio value
mutate(index_perc = `2017` / `2010`) %>%
# change it into a percentage value
mutate(index_perc = if_else(index_perc > 1,
(index_perc - 1) * 100,
-(1 - index_perc) * 100))
house_idx_final## # A tibble: 28 x 5
## statistic country `2010` `2017` index_perc
## <fct> <fct> <dbl> <dbl> <dbl>
## 1 index Austria 85.7 111. 29.1
## 2 index Belgium 98.5 103. 4.21
## 3 index Bulgaria 112. 115. 3.06
## 4 index Croatia 118. 105. -10.9
## 5 index Cyprus 116. 103. -11.4
## 6 index Czech Republic 101. 116. 15.7
## 7 index Denmark 97.0 107. 10.8
## 8 index Estonia 72.5 106. 45.7
## 9 index Finland 104. 100. -3.39
## 10 index France 104. 103. -1.31
## # ... with 18 more rows
We now have all the data we need. Let’s make a plot of the index evolution.
#
ggplot(house_idx_final, aes(x = fct_reorder(.f = country, .x = index_perc), y = index_perc, fill = index_perc)) +
# Write the country names on the plot directly on the graph, not on the axis (see https://stackoverflow.com/questions/15107672/how-to-label-a-barplot-on-the-opposite-side-of-a-bar-of-ve-and-ve-values-with?noredirect=1)
geom_text(aes(label = country,
y = 0,
hjust = if_else(index_perc > 0, 1.05, -0.05)),
# change text appearance
family = "NewCenturySchoolbook",
fontface = "italic",
size = 3) +
# create a bar plot
geom_col(width = 0.4) +
# Flip the coordinates
coord_flip(ylim = c(-50, 60)) +
# change the filling colors
scale_fill_gradientn(colours = RColorBrewer::brewer.pal(n = 6, name = "RdYlGn")) +
# add the values of index, adding the 'percent' and 'positive' signs
geom_text(aes(label = if_else(index_perc > 0,
paste0("+", scales::percent(index_perc / 100)),
scales::percent(index_perc / 100)),
hjust = if_else(index_perc > 0, -0.2, 1.1)),
# change text appearance
family = "NewCenturySchoolbook",
fontface = "bold",
size = 3) +
# change titles and source
labs(title = "Southern Europe houses are much cheaper\nin 2017 compared to 2010",
subtitle = "Annual House Price Average Index evolution\nbetween 2010 and 2017",
caption = "Source: Eurostat (2018)") +
# tweak plot
theme(legend.position = "none", # no legend
panel.grid = element_blank(), # no grid
panel.background = element_rect(fill = "grey97"), # change the background color
axis.ticks = element_blank(), # no axis
axis.text = element_blank(), # no axis text
axis.title = element_blank(), # no axis title
# change title appearance
plot.title = element_text(hjust = 0.5, family = "Times", face = "bold", size = 20),
plot.subtitle = element_text(hjust = 0.5, family = "Times", face = "italic", size = 15),
plot.caption = element_text(family = "Times", face = "italic", size = 10),
# add border around the plot
plot.background = element_rect(color = "grey60")
)Let’s import the basic world map.
## long lat group order region subregion
## 1 -69.89912 12.45200 1 1 Aruba <NA>
## 2 -69.89571 12.42300 1 2 Aruba <NA>
## 3 -69.94219 12.43853 1 3 Aruba <NA>
## 4 -70.00415 12.50049 1 4 Aruba <NA>
## 5 -70.06612 12.54697 1 5 Aruba <NA>
## 6 -70.05088 12.59707 1 6 Aruba <NA>
We have a small issue in the map dataframe, since the country ‘United Kingdom’ is not present, replaced by the value ‘UK’.
# 'UK' and not 'United Kingdom
map %>%
filter(str_detect(string = region, pattern = paste(c("UK", "United Kingdom"), collapse = "|"))) %>%
distinct(region)## region
## 1 UK
# Change 'UK' into 'United Kingdom'
map <- map %>%
mutate(region = str_replace(string = region, pattern = "UK", replacement = "United Kingdom"))We can now include the index evolution in the mapping dataframe.
# Join datasets for mapping
data_map <- map %>%
inner_join(house_idx_final, by = c("region" = "country"))
head(data_map)## long lat group order region subregion statistic 2010
## 1 20.61133 60.04068 7 884 Finland Aland Islands index 103.73
## 2 20.60342 60.01694 7 885 Finland Aland Islands index 103.73
## 3 20.52178 60.01167 7 886 Finland Aland Islands index 103.73
## 4 20.48750 60.03276 7 887 Finland Aland Islands index 103.73
## 5 20.41123 60.03013 7 888 Finland Aland Islands index 103.73
## 6 20.39795 60.04068 7 889 Finland Aland Islands index 103.73
## 2017 index_perc
## 1 100.21 -3.393425
## 2 100.21 -3.393425
## 3 100.21 -3.393425
## 4 100.21 -3.393425
## 5 100.21 -3.393425
## 6 100.21 -3.393425
Let’s plot the final map!
# above, change the dimensiosn of the plot
ggplot(data_map, aes(x = long, y = lat, group = group, fill = index_perc)) +
# plot the borders of the countries, and fill countries with the index percentage value
geom_polygon() +
# change the coordinates system (more readable)
coord_map(projection = "albers", lat0 = 30, lat1 = 40) +
# change the filling colors
scale_fill_gradientn(colours = RColorBrewer::brewer.pal(n = 6, name = "RdYlGn"),
# change legend elements (title, breaks, and make percent values)
name = "Price Index\nEvolution",
breaks = seq(-40, 50, 10),
labels = scales::percent(x = seq(-40, 50, 10), scale = 1)) +
# add title and source
labs(subtitle = "Annual House Price Average Index evolution\nbetween 2010 and 2017",
caption = "Source: Eurostat (2018)") +
# tweak plot
theme(legend.position = "left", # change legend position
legend.title.align = 0.5, # center legned title
legend.title = element_text(family = "Times", size = 12), # change legent font
legend.key.height = unit(1.5, "cm"), # change legend dimensions
legend.text = element_text(family = "Times", face = "italic", size = 10), # change legend text appearance
panel.grid = element_blank(), # remove grid
panel.background = element_blank(), # remove background color
axis.ticks = element_blank(), # remove axis ticks
axis.text = element_blank(), # remove axis text
axis.title = element_blank(), # remove axis title
# change title and caption text appearance
plot.subtitle = element_text(hjust = 0.5, family = "Times", face = "italic", size = 15),
plot.caption = element_text(family = "Times", face = "italic", size = 10),
# add border around the plot
plot.background = element_rect(color = "grey60")
)