Image by Anton_Sokolov

INTRODUCTION

This project analyses the data of all registered property sales in England and Wales that are sold for full market value from 2018 to 2021.

The purpose of this project is to create insights from the UK houses prices paid data sets by analyzing the trends in the residential property price information in other to identify opportunities in the UK real estate business for both buyers and sellers of properties in the UK.

R programming language was used for the data exploration, data cleaning, data transformation and data visualizations.

DESCRIPTION OF DATA SOURCE USED

For this analysis the Prices paid data for 2018, 2019, 2020 and 2021 is used, this data sets were made available by the HM Land Registry © Crown copyright and database right 2021. This data is licensed under the Open Government License v3.0.

The available fields are as follows:

  • Transaction unique identifier; a reference number which is generated automatically recording each published sale. The number is unique and will change each time a sale is recorded.

  • Price Sale; price stated on the transfer deed.

  • Date of Transfer; Date when the sale was completed, as stated on the transfer deed.

  • Property Type; where D represents Detached, S represents Semi-Detached, T represents Terraced including end-of-terrace properties, F represents Flats/Maisonettes and O for Other where the transaction relates to a property type that is not covered by existing values.

  • Old/New; indicates the age of the property and applies to all price paid transactions, residential and non-residential. Where Y represents a newly built property, N represents an established residential building.

  • Duration; this relates to the tenure where F represents Freehold and L represents Leasehold. It is worth noting that HM Land Registry does not record leases of 7 years or less in the Prices Paid Data sets.

  • Town/City.

  • District.

  • County. Note that Due to data-privacy issues for this analysis address details have been truncated down to the town/city level.

  • PPD Category Type; this indicates the type of Price Paid transaction. Where A represents the Standard Price Paid entry, which includes single residential property sold for full market value and B represents any additional Price Paid entry including transfers under a power of sale/repossessions, buy-to-lets (where they can be identified by a Mortgage) and transfers to non-private individuals. Note that category B does not separately identify the transaction types stated. HM Land Registry has been collecting information on Category A transactions from January 1995. Category B transactions were identified from October 2013.

load("~/PP.RData")
library(dplyr)
glimpse(PP_data)
## Rows: 3,894,704
## Columns: 10
## $ `Transaction unique identifier` <chr> "{7C2D0701-024F-4963-E053-6B04A8C07B97~
## $ `Price Sale`                    <dbl> 5000, 320000, 650000, 350000, 575000, ~
## $ `Date of Transfer`              <date> 2018-10-02, 2018-08-29, 2018-11-02, 2~
## $ `Property Type`                 <chr> "O", "O", "O", "T", "O", "O", "O", "T"~
## $ `Old/New`                       <chr> "N", "N", "N", "N", "N", "N", "N", "N"~
## $ Duration                        <chr> "L", "F", "F", "F", "F", "F", "F", "F"~
## $ `Town/City`                     <chr> "PADSTOW", "MARAZION", "PADSTOW", "FAL~
## $ District                        <chr> "CORNWALL", "CORNWALL", "CORNWALL", "C~
## $ County                          <chr> "CORNWALL", "CORNWALL", "CORNWALL", "C~
## $ `PPD Category Type`             <chr> "B", "B", "B", "B", "B", "B", "B", "B"~

Above is a glimpse of the Data set that is being used for this analysis.

MANIPULATION OF THE DATA

The data cleaning and transformation process carried out includes;

  • All four data sets were imported as one data set known as ‘PP_data’

  • The date format was changed from character to date format.

  • Filtering the price sale column to show only prices more than £1,000, this is because the minimum down payment for house is usually at 5% of the market value and it was assumed that the cheapest house in the UK could go for £20,000.

glimpse(PP_data1)
## Rows: 3,891,205
## Columns: 10
## $ `Transaction unique identifier` <chr> "{64342BFD-F066-422C-E053-6C04A8C0FB8A~
## $ `Price Sale`                    <dbl> 220000, 250000, 130000, 575000, 570000~
## $ `Date of Transfer`              <date> 2018-01-01, 2018-01-01, 2018-01-01, 2~
## $ `Property Type`                 <chr> "T", "D", "T", "D", "D", "S", "S", "S"~
## $ `Old/New`                       <chr> "N", "N", "N", "N", "N", "N", "N", "N"~
## $ Duration                        <chr> "F", "F", "F", "F", "F", "F", "F", "L"~
## $ `Town/City`                     <chr> "EXETER", "MALVERN", "CWMBRAN", "COVEN~
## $ District                        <chr> "EXETER", "MALVERN HILLS", "TORFAEN", ~
## $ County                          <chr> "DEVON", "WORCESTERSHIRE", "TORFAEN", ~
## $ `PPD Category Type`             <chr> "A", "A", "A", "A", "A", "A", "A", "A"~

Above is a glimpse of the Data set after carrying out the Data Cleaning process.

SUMMARY OF ANALYSIS

For the purpose of this analysis the data sets have been divided into six clusters based on the Price Sale variable, this is to reduce the magnitude of items to visualize. The new tables will show the mean of the variable based on Price Sale, Standard Deviation of the variable based on the Price Sale and a count of the number of sales per variable.

The following is a list of the newly created tables:

  • Date of Transfer
  • Location
  • Property Type
  • Old/New
  • Duration
  • PPD Category Type

Date of Transfer

This table shows the summary of data for Housing Prices Paid on a monthly basis, from the table we can see that the month with the highest number of housing sales between 2018 to 2021 in the UK is June 2021 with an average price of £440,942 per house, this could be as a result of the recovery from the pandemic.After which the number of sales has been fluctuating.

library("DT")
D_O_T %>%
  datatable(extensions = 'Buttons',
            options = list(dom = 'Blfrtip',
                           buttons = c('copy', 'csv', 'excel', 'pdf', 'print'),
                           lengthMenu = list(c(10,25,50,-1),
                                             c(10,25,50,"All"))))

Also from the line graph below, the analysis showed that the month with the lowest number of housing sales between 2018 to 2021 in the UK is April 2020 with a total number of sales of 31,109, this is the time that the pandemic is at the highest and could be seen as a major factor for that. It can be concluded that the ability of people to buy houses in the UK is correlated with the external economic factor.

library(ggplot2)
ggplot(data = D_O_T,
       mapping = aes(x = `month`, y = `Number_of_Sales`)) +
  geom_line(stat='identity', color="red") +
  labs(x = "Date", y = "Number of Sales",
       title = "Number of Sales Per Month (2018 - 2021)") +
  theme_bw()

Looking at the Bar chart below, it can be concluded that regardless of the economic factors the average price of houses in the UK remained between £310,744 and £459,354. This shows the value of the houses in the UK are valid as the external economic factor could not lead to a crash in housing prices.

ggplot(data = D_O_T,
       mapping = aes(x = `month`, y = `Mean_Price_Sale`, 
                     fill = `Mean_Price_Sale`)) +
  geom_bar(stat='identity', color="white") +
  labs(x = "Date", y = "Average Sales",
       title = "Average Sales Per Month (2018 - 2021)") +
  theme_void() 

Location

This table shows the summary of data for Housing Prices Paid based on the sale per county between 2018 to 2021, the data showed a record of sales in 115 counties in the UK. The analysis showed that the county with the smallest number of housing sales between 2018 to 2021 is ISLES OF SCILLY with a total number of sales of 111 in four years, While the county the highest number of housing sales between 2018 to 2021 is GREATER LONDON with a total number of sales of 425,286 in four years.

L %>%
  datatable(extensions = 'Buttons',
            options = list(dom = 'Blfrtip',
                           buttons = c('copy', 'csv', 'excel', 'pdf', 'print'),
                           lengthMenu = list(c(10,25,50,-1),
                                             c(10,25,50,"All"))))

Looking at the wordcloud below which shows the average sale by counties, we can see that GREATER LONDON has the highest average sale this is evident in the fact that it is the economic capital of the country, While the county with the lowest average sale per county is BLAENAU GWENT with an average sale of £112,617.

library(ggwordcloud)

ggplot(data = `L` , 
       mapping = aes(label = `County`, size = `Mean_Price_Sale`,
                     color = factor(sample.int(10, nrow(`L`), replace = TRUE)))) +
  geom_text_wordcloud(stat = "identity", area_corr_power = 1) +
  labs(title = "Average Sale by Counties (2018 - 2021)") +
  scale_size_area(max_size = 5) +
  theme_bw() +
  scale_fill_brewer(palette="Set1")

Property Type

This table shows the summary of data for Housing Prices Paid based on the Property Types, there are 5 different property types where D represents Detached, S represents Semi-Detached, T represents Terraced including end-of-terrace properties, F represents Flats/Maisonettes and O for Other where the transaction relates to a property type that is not covered by existing values.

P_T %>%
  datatable(extensions = 'Buttons',
            options = list(dom = 'Blfrtip',
                           buttons = c('copy', 'csv', 'excel', 'pdf', 'print'),
                           lengthMenu = list(c(10,25,50,-1),
                                             c(10,25,50,"All"))))

The Bar Chart below shows the Number of sales per Property type between 2018 to 2021, It can be identified that terraced properties is the most popular type of house bought in the UK followed by Semi-detached properties.

ggplot(data = P_T,
       mapping = aes(x = `Property Type`, y = `Number_of_Sales`,
                     fill = `Property Type`)) +
  geom_bar(stat = "identity", color="black")+
  labs(x = " Property Type", y = "Average Price per sale",
       title = "Number of Sales Per Property type (2018 - 2021)")+
  theme_void()+
  theme(legend.position="none")+
  geom_text(mapping = aes(y = `Number_of_Sales`, label = `Property Type`), 
            color = "white",
            size =6,
            stat = "identity",
            position = position_stack(vjust = 0.5)) +
  scale_fill_brewer(palette="Set1")

Looking at the Histogram below can see that the other property types have the highest average sale per.

ggplot(data = P_T,
       mapping = aes(x = `Property Type`, y = `Mean_Price_Sale`,
                     fill = `Property Type`)) +
  geom_histogram(stat = "identity", color = "black") +
  labs(x = " Property Type", y = "Average Price per sale",
       title = "Average Sale Per Property Type (2018 - 2021)") +
  theme_bw() +
  scale_fill_brewer(palette="Set1")

Old/New

This table shows the summary of data for Housing Prices Paid based on the age of the property. Where Y represents a newly built property, N represents an established residential building.

O_N %>%
  datatable(extensions = 'Buttons',
            options = list(dom = 'Blfrtip',
                           buttons = c('copy', 'csv', 'excel', 'pdf', 'print'),
                           lengthMenu = list(c(10,25,50,-1),
                                             c(10,25,50,"All"))))

From the table above, it can be identified that there are more sales of established residential buildings than of newly built properties.

Duration

This table shows the summary of data for Housing Prices Paid based on the tenure where F represents Freehold and L represents Leasehold within the time period in check.

D %>%
  datatable(extensions = 'Buttons',
            options = list(dom = 'Blfrtip',
                           buttons = c('copy', 'csv', 'excel', 'pdf', 'print'),
                           lengthMenu = list(c(10,25,50,-1),
                                             c(10,25,50,"All"))))

From the table above, it can be identified that there are more sales of Freehold properties than Leasehold properties.

PPD Category Type

This table shows the summary of data for Housing Prices Paid based on the type of Price Paid transaction. Where A represents the Standard Price Paid entry, which includes single residential property sold for full market value and B represents any additional Price Paid entry including transfers under a power of sale/repossessions, buy-to-lets (where they can be identified by a Mortgage) and transfers to non-private individuals.

P_C_T%>%
  datatable(extensions = 'Buttons',
            options = list(dom = 'Blfrtip',
                           buttons = c('copy', 'csv', 'excel', 'pdf', 'print'),
                           lengthMenu = list(c(10,25,50,-1),
                                             c(10,25,50,"All"))))

From the table above, it can be identified that there are more sales based on the full market value. It can be concluded that more people prefer to pay the standard price.

CONCLUSIONS

From the analysis it is concluded that;

  • Average price sale of properties in the UK is £364,640 this is higher when compared with the average prices of properties in USA which is $374,900 which is about £299,009.

  • The analysis showed that there is a correlation between the price of properties and the underlying external economic factors.

  • The analysis showed that Greater London is the most sort after County in the UK, this is evident in the fact that it is the economic capital of the country.

  • The analysis showed that Terraced houses are the most popular type of house to buy in the UK followed by Semi-detached houses.

  • The analysis showed that there are more established residential buildings i.e old buildings in the UK than newly built properties.

  • The analysis showed that more people prefer to buy freehold properties in the UK rather than taking lease agreements.

  • The analysis showed that more people prefer to pay the standard price.