This notebook is aimed at creating an animated plot of England’s house price cycle, using data from Land Registry’s UK House Price Index.
Broadly speaking, local house prices in England seem to follow a cyclical ‘ripple’ pattern, with the first phase showing rapid growth in the most expensive areas (generally in London or nearby), followed by a phase in which mid-priced areas grow most quickly, followed by a final phase in which prices grow fastest in the cheapest areas.
Thanks to Thomas Lin Pederson for the amazing gganimate package, and to Hadley Wickham and other contributors for the tidyverse.
The first step is to load the packages we’ll be using.
library(tidyverse)
library(gganimate)
Now let’s download our data from the Land Registry’s UKHPI pages.
hpi <- read_csv("http://publicdata.landregistry.gov.uk/market-trend-data/house-price-index-data/Average-prices-2018-09.csv?utm_medium=GOV.UK&utm_source=datadownload&utm_campaign=average_price&utm_term=9.30_14_11_18")
We only want data for local authorities, not countries, regions or counties. So let’s filter out any areas with a code that’s ‘greater than’ E09. We also won’t be using the Average_Price_SA column so let’s get rid of that.
hpif <- filter(hpi, Area_Code < "E10") %>%
select(-ends_with("_SA"))
In order to be able to compare recent price changes with previous prices, we need to create a lagged price variable. In order to reduce volatility, we’re going to calculate price growth over three years, so we need to lag prices by three years too.
hpif <- hpif %>%
arrange(Area_Code, Date) %>%
group_by(Area_Code) %>%
mutate(lag_price = lag(Average_Price, 36))
Now let’s calculate the average annual price change over three years.
hpif <- hpif %>%
mutate(Y3_price_change = (Average_Price/lag_price -1)/3)
And then create a lagged price rank variable.
hpif <- hpif %>%
group_by(Date) %>%
mutate(lag_price_rank = dense_rank(desc(lag_price)))
We can now filter out cases in the first three years (i.e. where the three year price change is NA), as well as filtering out the City of London as it’s so volatile due to small numbers of sales.
hpif <- hpif %>%
filter(!is.na(Y3_price_change)) %>%
filter(Region_Name != "City of London") %>%
group_by(Date)
Let’s also join on the region that each local authority is in.
lookup <- rio::import("https://www.ons.gov.uk/file?uri=/peoplepopulationandcommunity/populationandmigration/migrationwithintheuk/datasets/userinformationenglandandwaleslocalauthoritytoregionlookup/june2017/lookuplasregionew2017.xlsx", skip=4)
hpif <- left_join(hpif, lookup, by=c("Area_Code" = "LA code"))
names(hpif) <- make.names(names(hpif))
One of the region names is quite long so let’s change it.
hpif$Region.name[hpif$Region.name =="Yorkshire and The Humber"] <- "Yorks and Humber"
Change the region name variable into an ordered factor.
hpif$Region.name <- ordered(hpif$Region.name,
levels = c("North East", "North West", "Yorks and Humber",
"East Midlands", "West Midlands", "East",
"London", "South East", "South West"))
We are now ready to make an animated plot of lagged price versus percentage price change. First let’s create a base plot to work from.
p <- hpif %>%
# sample_frac(1/2) %>%
ggplot(aes(x = lag_price_rank, y = Y3_price_change)) +
geom_point(aes(fill = Region.name), shape = 21, size = 3, alpha = 0.6) +
stat_smooth(colour = "black") +
scale_x_reverse(breaks = c(1, 100, 200, 300)) +
scale_y_continuous(labels = scales::percent) +
scale_colour_brewer(palette = "Set1") +
theme(legend.title = element_blank()) +
labs(title = "Local house price growth in England by initial price rank: {frame_time}",
subtitle = "Average annual percentage price growth over three years",
caption = "Chart by @geographyjim, data from UK House Price Index",
x = "Rank of average price at start of three-year period (1 is most expensive)",
y = "")
Now we add animation. Because gganimate defaults to 100 frames but we have a relatively large number of time periods, we need to specify the number of frames as the number of different dates.
animate(p + transition_time(Date), nframes = nlevels(as.factor(hpif$Date)))