Introduction

This project aims to analyse, understand, and forecast how the Housing Price Index (HPI) has evolved over time for the region of Greater London.

Data Importing, Cleaning and Sorting

Understanding the dataset

# A tibble: 6 × 54
  Date   RegionName AreaCode AveragePrice Index IndexSA `1m%Change` `12m%Change`
  <chr>  <chr>      <chr>           <dbl> <dbl>   <dbl>       <dbl>        <dbl>
1 01/01… Aberdeens… S120000…        84638  41.1      NA        NA             NA
2 01/02… Aberdeens… S120000…        84623  41.1      NA         0             NA
3 01/03… Aberdeens… S120000…        86536  42.1      NA         2.3           NA
4 01/04… Aberdeens… S120000…        87373  42.5      NA         1             NA
5 01/05… Aberdeens… S120000…        89493  43.5      NA         2.4           NA
6 01/06… Aberdeens… S120000…        92485  44.9      NA         3.3           NA
# ℹ 46 more variables: AveragePriceSA <dbl>, SalesVolume <dbl>,
#   DetachedPrice <dbl>, DetachedIndex <dbl>, `Detached1m%Change` <dbl>,
#   `Detached12m%Change` <dbl>, SemiDetachedPrice <dbl>,
#   SemiDetachedIndex <dbl>, `SemiDetached1m%Change` <dbl>,
#   `SemiDetached12m%Change` <dbl>, TerracedPrice <dbl>, TerracedIndex <dbl>,
#   `Terraced1m%Change` <dbl>, `Terraced12m%Change` <dbl>, FlatPrice <dbl>,
#   FlatIndex <dbl>, `Flat1m%Change` <dbl>, `Flat12m%Change` <dbl>, …
spc_tbl_ [144,630 × 54] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ Date                  : chr [1:144630] "01/01/2004" "01/02/2004" "01/03/2004" "01/04/2004" ...
 $ RegionName            : chr [1:144630] "Aberdeenshire" "Aberdeenshire" "Aberdeenshire" "Aberdeenshire" ...
 $ AreaCode              : chr [1:144630] "S12000034" "S12000034" "S12000034" "S12000034" ...
 $ AveragePrice          : num [1:144630] 84638 84623 86536 87373 89493 ...
 $ Index                 : num [1:144630] 41.1 41.1 42.1 42.5 43.5 44.9 46.8 49.2 49.7 49.9 ...
 $ IndexSA               : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
 $ 1m%Change             : num [1:144630] NA 0 2.3 1 2.4 3.3 4.2 5.1 0.9 0.4 ...
 $ 12m%Change            : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
 $ AveragePriceSA        : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
 $ SalesVolume           : num [1:144630] 388 326 453 571 502 525 652 512 497 590 ...
 $ DetachedPrice         : num [1:144630] 130620 129330 131585 130454 132762 ...
 $ DetachedIndex         : num [1:144630] 43.2 42.7 43.5 43.1 43.9 45.1 47.4 50.1 50.7 50.9 ...
 $ Detached1m%Change     : num [1:144630] NA -1 1.7 -0.9 1.8 2.9 5.1 5.7 1.2 0.4 ...
 $ Detached12m%Change    : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
 $ SemiDetachedPrice     : num [1:144630] 73972 74225 76201 78082 80340 ...
 $ SemiDetachedIndex     : num [1:144630] 40.9 41.1 42.2 43.2 44.5 46 47.7 49.9 50.3 50.4 ...
 $ SemiDetached1m%Change : num [1:144630] NA 0.3 2.7 2.5 2.9 3.5 3.5 4.7 0.7 0.2 ...
 $ SemiDetached12m%Change: num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
 $ TerracedPrice         : num [1:144630] 58247 58669 60399 62326 64442 ...
 $ TerracedIndex         : num [1:144630] 38.8 39.1 40.2 41.5 42.9 44.6 46.1 48.2 48.5 48.6 ...
 $ Terraced1m%Change     : num [1:144630] NA 0.7 2.9 3.2 3.4 3.9 3.4 4.6 0.6 0.3 ...
 $ Terraced12m%Change    : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
 $ FlatPrice             : num [1:144630] 49322 50364 51719 53143 54678 ...
 $ FlatIndex             : num [1:144630] 45.7 46.7 47.9 49.2 50.7 52.7 54.4 56.7 57 57.5 ...
 $ Flat1m%Change         : num [1:144630] NA 2.1 2.7 2.8 2.9 4.1 3.2 4.3 0.5 0.9 ...
 $ Flat12m%Change        : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
 $ CashPrice             : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
 $ CashIndex             : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
 $ Cash1m%Change         : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
 $ Cash12m%Change        : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
 $ CashSalesVolume       : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
 $ MortgagePrice         : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
 $ MortgageIndex         : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
 $ Mortgage1m%Change     : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
 $ Mortgage12m%Change    : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
 $ MortgageSalesVolume   : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
 $ FTBPrice              : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
 $ FTBIndex              : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
 $ FTB1m%Change          : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
 $ FTB12m%Change         : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
 $ FOOPrice              : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
 $ FOOIndex              : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
 $ FOO1m%Change          : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
 $ FOO12m%Change         : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
 $ NewPrice              : num [1:144630] 112843 113061 115218 115247 117377 ...
 $ NewIndex              : num [1:144630] 40.7 40.8 41.6 41.6 42.4 43.7 45.4 47.5 47.8 48.3 ...
 $ New1m%Change          : num [1:144630] NA 0.2 1.9 0 1.8 3.2 3.9 4.6 0.6 1 ...
 $ New12m%Change         : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
 $ NewSalesVolume        : num [1:144630] 103 107 140 180 167 164 163 130 142 164 ...
 $ OldPrice              : num [1:144630] 81273 81194 83137 84241 86466 ...
 $ OldIndex              : num [1:144630] 41 40.9 41.9 42.5 43.6 45.1 47 49.5 50 50.1 ...
 $ Old1m%Change          : num [1:144630] NA -0.1 2.4 1.3 2.6 3.4 4.4 5.3 1 0.2 ...
 $ Old12m%Change         : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
 $ OldSalesVolume        : num [1:144630] 285 219 313 391 335 361 489 382 355 426 ...
 - attr(*, "spec")=
  .. cols(
  ..   Date = col_character(),
  ..   RegionName = col_character(),
  ..   AreaCode = col_character(),
  ..   AveragePrice = col_double(),
  ..   Index = col_double(),
  ..   IndexSA = col_double(),
  ..   `1m%Change` = col_double(),
  ..   `12m%Change` = col_double(),
  ..   AveragePriceSA = col_double(),
  ..   SalesVolume = col_double(),
  ..   DetachedPrice = col_double(),
  ..   DetachedIndex = col_double(),
  ..   `Detached1m%Change` = col_double(),
  ..   `Detached12m%Change` = col_double(),
  ..   SemiDetachedPrice = col_double(),
  ..   SemiDetachedIndex = col_double(),
  ..   `SemiDetached1m%Change` = col_double(),
  ..   `SemiDetached12m%Change` = col_double(),
  ..   TerracedPrice = col_double(),
  ..   TerracedIndex = col_double(),
  ..   `Terraced1m%Change` = col_double(),
  ..   `Terraced12m%Change` = col_double(),
  ..   FlatPrice = col_double(),
  ..   FlatIndex = col_double(),
  ..   `Flat1m%Change` = col_double(),
  ..   `Flat12m%Change` = col_double(),
  ..   CashPrice = col_double(),
  ..   CashIndex = col_double(),
  ..   `Cash1m%Change` = col_double(),
  ..   `Cash12m%Change` = col_double(),
  ..   CashSalesVolume = col_double(),
  ..   MortgagePrice = col_double(),
  ..   MortgageIndex = col_double(),
  ..   `Mortgage1m%Change` = col_double(),
  ..   `Mortgage12m%Change` = col_double(),
  ..   MortgageSalesVolume = col_double(),
  ..   FTBPrice = col_double(),
  ..   FTBIndex = col_double(),
  ..   `FTB1m%Change` = col_double(),
  ..   `FTB12m%Change` = col_double(),
  ..   FOOPrice = col_double(),
  ..   FOOIndex = col_double(),
  ..   `FOO1m%Change` = col_double(),
  ..   `FOO12m%Change` = col_double(),
  ..   NewPrice = col_double(),
  ..   NewIndex = col_double(),
  ..   `New1m%Change` = col_double(),
  ..   `New12m%Change` = col_double(),
  ..   NewSalesVolume = col_double(),
  ..   OldPrice = col_double(),
  ..   OldIndex = col_double(),
  ..   `Old1m%Change` = col_double(),
  ..   `Old12m%Change` = col_double(),
  ..   OldSalesVolume = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 
     Date            RegionName          AreaCode          AveragePrice    
 Length:144630      Length:144630      Length:144630      Min.   :   2553  
 Class :character   Class :character   Class :character   1st Qu.:  98624  
 Mode  :character   Mode  :character   Mode  :character   Median : 151494  
                                                          Mean   : 179345  
                                                          3rd Qu.: 225218  
                                                          Max.   :1656986  
                                                                           
     Index        IndexSA         1m%Change     12m%Change   AveragePriceSA  
 Min.   :  1   Min.   : 14      Min.   :-30   Min.   :-36    Min.   : 39008  
 1st Qu.: 36   1st Qu.: 45      1st Qu.:  0   1st Qu.:  1    1st Qu.:113751  
 Median : 60   Median : 62      Median :  0   Median :  5    Median :150794  
 Mean   : 59   Mean   : 61      Mean   :  1   Mean   :  6    Mean   :165313  
 3rd Qu.: 78   3rd Qu.: 79      3rd Qu.:  2   3rd Qu.: 10    3rd Qu.:203270  
 Max.   :153   Max.   :107      Max.   : 35   Max.   : 98    Max.   :574627  
               NA's   :139551   NA's   :453   NA's   :4860   NA's   :139551  
  SalesVolume     DetachedPrice     DetachedIndex  Detached1m%Change
 Min.   :     2   Min.   :  40117   Min.   :  9    Min.   :-20      
 1st Qu.:   141   1st Qu.: 175805   1st Qu.: 40    1st Qu.:  0      
 Median :   215   Median : 255125   Median : 59    Median :  0      
 Mean   :  1253   Mean   : 334792   Mean   : 59    Mean   :  0      
 3rd Qu.:   378   3rd Qu.: 386383   3rd Qu.: 78    3rd Qu.:  1      
 Max.   :183609   Max.   :5820174   Max.   :147    Max.   : 35      
 NA's   :4467     NA's   :6669      NA's   :6669   NA's   :7064     
 Detached12m%Change SemiDetachedPrice SemiDetachedIndex SemiDetached1m%Change
 Min.   :-30        Min.   :  24973   Min.   :  9       Min.   :-20          
 1st Qu.:  1        1st Qu.: 106979   1st Qu.: 38       1st Qu.:  0          
 Median :  5        Median : 157112   Median : 59       Median :  0          
 Mean   :  6        Mean   : 205559   Mean   : 58       Mean   :  0          
 3rd Qu.: 10        3rd Qu.: 237162   3rd Qu.: 77       3rd Qu.:  1          
 Max.   : 93        Max.   :3919683   Max.   :148       Max.   : 35          
 NA's   :11385      NA's   :6669      NA's   :6669      NA's   :7064         
 SemiDetached12m%Change TerracedPrice     TerracedIndex  Terraced1m%Change
 Min.   :-29            Min.   :  19407   Min.   :  9    Min.   :-20      
 1st Qu.:  1            1st Qu.:  83911   1st Qu.: 37    1st Qu.:  0      
 Median :  5            Median : 125775   Median : 60    Median :  0      
 Mean   :  6            Mean   : 165004   Mean   : 58    Mean   :  0      
 3rd Qu.: 10            3rd Qu.: 191917   3rd Qu.: 77    3rd Qu.:  2      
 Max.   :102            Max.   :3888915   Max.   :148    Max.   : 36      
 NA's   :11385          NA's   :6557      NA's   :6557   NA's   :6953     
 Terraced12m%Change   FlatPrice         FlatIndex    Flat1m%Change 
 Min.   :-29        Min.   :  15712   Min.   :  9    Min.   :-30   
 1st Qu.:  1        1st Qu.:  66734   1st Qu.: 46    1st Qu.:  0   
 Median :  5        Median :  98350   Median : 68    Median :  0   
 Mean   :  6        Mean   : 119450   Mean   : 65    Mean   :  0   
 3rd Qu.: 10        3rd Qu.: 143512   3rd Qu.: 84    3rd Qu.:  1   
 Max.   :107        Max.   :1330583   Max.   :160    Max.   : 36   
 NA's   :11285      NA's   :6308      NA's   :6308   NA's   :6704  
 Flat12m%Change    CashPrice         CashIndex     Cash1m%Change  
 Min.   :-29     Min.   :  58405   Min.   : 44     Min.   :-19    
 1st Qu.:  0     1st Qu.: 149055   1st Qu.: 69     1st Qu.:  0    
 Median :  4     Median : 210636   Median : 81     Median :  0    
 Mean   :  6     Mean   : 250534   Mean   : 80     Mean   :  0    
 3rd Qu.: 10     3rd Qu.: 307370   3rd Qu.: 93     3rd Qu.:  1    
 Max.   :103     Max.   :1621751   Max.   :154     Max.   : 21    
 NA's   :11036   NA's   :83086     NA's   :83086   NA's   :83478  
 Cash12m%Change  CashSalesVolume MortgagePrice     MortgageIndex  
 Min.   :-27     Min.   :    1   Min.   :  66241   Min.   : 43    
 1st Qu.:  1     1st Qu.:   38   1st Qu.: 152620   1st Qu.: 68    
 Median :  4     Median :   59   Median : 212456   Median : 80    
 Mean   :  4     Mean   :  324   Mean   : 251836   Mean   : 80    
 3rd Qu.:  7     3rd Qu.:  102   3rd Qu.: 306913   3rd Qu.: 93    
 Max.   : 56     Max.   :52162   Max.   :1686923   Max.   :153    
 NA's   :87790   NA's   :83871   NA's   :83086     NA's   :83086  
 Mortgage1m%Change Mortgage12m%Change MortgageSalesVolume    FTBPrice      
 Min.   :-19       Min.   :-28        Min.   :     1      Min.   :  57199  
 1st Qu.:  0       1st Qu.:  1        1st Qu.:    86      1st Qu.: 130184  
 Median :  0       Median :  4        Median :   134      Median : 177504  
 Mean   :  0       Mean   :  4        Mean   :   732      Mean   : 209559  
 3rd Qu.:  1       3rd Qu.:  7        3rd Qu.:   234      3rd Qu.: 251980  
 Max.   : 22       Max.   : 56        Max.   :128426      Max.   :1417178  
 NA's   :83478     NA's   :87790      NA's   :83872       NA's   :82690    
    FTBIndex      FTB1m%Change   FTB12m%Change      FOOPrice      
 Min.   : 42     Min.   :-19     Min.   :-28     Min.   :  71951  
 1st Qu.: 68     1st Qu.:  0     1st Qu.:  1     1st Qu.: 179186  
 Median : 80     Median :  0     Median :  4     Median : 250540  
 Mean   : 80     Mean   :  0     Mean   :  4     Mean   : 301419  
 3rd Qu.: 93     3rd Qu.:  1     3rd Qu.:  7     3rd Qu.: 369430  
 Max.   :154     Max.   : 24     Max.   : 56     Max.   :1956287  
 NA's   :82690   NA's   :83082   NA's   :87394   NA's   :83086    
    FOOIndex      FOO1m%Change   FOO12m%Change      NewPrice      
 Min.   : 43     Min.   :-19     Min.   :-27     Min.   :  22443  
 1st Qu.: 68     1st Qu.:  0     1st Qu.:  1     1st Qu.: 125043  
 Median : 80     Median :  0     Median :  4     Median : 194410  
 Mean   : 80     Mean   :  0     Mean   :  4     Mean   : 214956  
 3rd Qu.: 92     3rd Qu.:  1     3rd Qu.:  7     3rd Qu.: 271262  
 Max.   :151     Max.   : 21     Max.   : 56     Max.   :1414204  
 NA's   :83086   NA's   :83478   NA's   :87790   NA's   :7300     
    NewIndex     New1m%Change  New12m%Change   NewSalesVolume 
 Min.   :  8    Min.   :-30    Min.   :-29     Min.   :    1  
 1st Qu.: 41    1st Qu.:  0    1st Qu.:  2     1st Qu.:   10  
 Median : 61    Median :  0    Median :  6     Median :   21  
 Mean   : 60    Mean   :  1    Mean   :  6     Mean   :  137  
 3rd Qu.: 81    3rd Qu.:  2    3rd Qu.: 10     3rd Qu.:   47  
 Max.   :150    Max.   : 35    Max.   : 96     Max.   :21097  
 NA's   :7300   NA's   :7696   NA's   :12028   NA's   :10144  
    OldPrice          OldIndex     Old1m%Change  Old12m%Change  
 Min.   :  22716   Min.   :  9    Min.   :-31    Min.   :-30    
 1st Qu.: 100407   1st Qu.: 39    1st Qu.:  0    1st Qu.:  1    
 Median : 151856   Median : 60    Median :  0    Median :  5    
 Mean   : 181015   Mean   : 59    Mean   :  0    Mean   :  6    
 3rd Qu.: 225576   3rd Qu.: 78    3rd Qu.:  1    3rd Qu.: 10    
 Max.   :1665089   Max.   :153    Max.   : 36    Max.   : 99    
 NA's   :7096      NA's   :7096   NA's   :7492   NA's   :11824  
 OldSalesVolume  
 Min.   :     2  
 1st Qu.:   126  
 Median :   193  
 Mean   :  1137  
 3rd Qu.:   344  
 Max.   :166098  
 NA's   :7108    

The original data set is composed of 144,630 monthly observations across 54 variables capturing various dimensions of the UK House Price Index. It provides information regarding Average Prices, Sales Volumes, Percentage Changes M-o-M and Y-o-Y, and Raw and Seasonally Adjusted Price Indexes, across various geographic delimitations (from county level to London borough councils) and disaggregated by property type (Detached, Semi Detached, Terraced, Flat), payment method (Cash, Mortgage), buyer status (FTB, FOO), and building status (New, Old).

Notably, many NAs are present in the dataset primarily due to the fact that not all variables began to be recorded at the same time or for every region. Additionally, except for the columns “Date”, “RegionName”, and “AreaCode” that are stored as characters, the remaining variables are stores as double-precision numbers. Detailed descriptions for each variable are displayed below.

Variable Descriptions for UK House Price Index Data
Column_Header Explanation
Date The year and month to which the monthly statistics apply
RegionName Name of geography (Country, Regional, County/Unitary/District Authority and London Borough)
AreaCode Code of geography (Country, Regional, County/Unitary/District Authority and London Borough)
Average Price Average house price for a geography in a particular period
Index House price index for a geography in a particular period (January 2015 = 100)
IndexSA Seasonally adjusted house price index for a geography in a particular period (January 2015 = 100)
1m%change The percentage change in the Average Price compared to the previous month
12m%change The percentage change in the Average Price compared to the same period twelve months earlier
AveragePricesSA Seasonally adjusted Average Price for a geography in a particular period
Sales Volume Number of registered transactions for a geography in a particular period
[Property Type]Price Average house price for a particular property type (e.g., detached houses) for a geography in a particular period
[Property Type]Index House price index for a particular property type for a geography in a particular period (January 2015 = 100)
[Property Type]1m%change The percentage change in the [Property Type] Price compared to the previous month
[Property Type]12m%change The percentage change in the [Property Type] Price compared to the same period twelve months earlier
[Cash/Mortgage]Price Average house price by funding status (e.g., cash) for a geography in a particular period
[Cash/Mortgage]Index House price index by funding status for a geography in a particular period (January 2015 = 100)
[Cash/Mortgage]1m%change The percentage change in the [Cash/Mortgage]Price compared to the previous month
[Cash/Mortgage]12m%change The percentage change in the [Cash/Mortgage]Price compared to the same period twelve months earlier
[Cash/Mortgage] Sales Volume Number of registered transactions [Cash/Mortgage] for a geography in a particular period
[FTB/FOO]Price Average house price by buyer status (e.g., first time buyer/former owner occupier) for a geography in a particular period
[FTB/FOO]Index House price index by buyer status for a geography in a particular period (January 2015 = 100)
[FTB/FOO]1m%change The percentage change in the [FTB/FOO]Price compared to the previous month
[FTB/FOO]12m%change The percentage change in the [FTB/FOO]Price compared to the same period twelve months earlier
[New/Old]Price Average house price by property status (e.g., new or existing property) for a geography in a particular period
[New/Old]Index House price index by property status for a geography in a particular period (January 2015 = 100)
[New/Old]1m%change The percentage change in the [New/Old]Price compared to the previous month
[New/Old]12m%change The percentage change in the [New/Old]Price compared to the same period twelve months earlier
[New/Old] Sales Volume Number of registered transactions [New/Old] for a geography in a particular period

Source: About the UK House Price Index

However, as the data is in a wide format and many functions in R are optimized to the long format (tidy), first the data will be transformed to such format. Then, considering that the analysis is dependent on the evolution of the variables over time, the column date will be transformed from “character” to “date”

# A tibble: 6 × 6
  Dates     RegionName    AreaCode  Category Measurement   Value
  <yearmon> <chr>         <chr>     <chr>    <chr>         <dbl>
1 Jan 2004  Aberdeenshire S12000034 All      Price       84638  
2 Jan 2004  Aberdeenshire S12000034 All      Index          41.1
3 Jan 2004  Aberdeenshire S12000034 All      IndexSA        NA  
4 Jan 2004  Aberdeenshire S12000034 All      1m%Change      NA  
5 Jan 2004  Aberdeenshire S12000034 All      12m%Change     NA  
6 Jan 2004  Aberdeenshire S12000034 All      PriceSA        NA  

Sorting London Information

As the object of study of this project is the London are, the data is sorted to according to this area code “E12000007”.

[1] "London"
# A tibble: 6 × 6
  Dates     RegionName AreaCode  Category Measurement  Value
  <yearmon> <chr>      <chr>     <chr>    <chr>        <dbl>
1 Apr 1968  London     E12000007 All      Price       4730  
2 Apr 1968  London     E12000007 All      Index          0.8
3 Apr 1968  London     E12000007 All      IndexSA       NA  
4 Apr 1968  London     E12000007 All      1m%Change     NA  
5 Apr 1968  London     E12000007 All      12m%Change    NA  
6 Apr 1968  London     E12000007 All      PriceSA       NA  
Gap Check
Category Measurement ObsCount FirstDate LastDate MissingDatesStr
All Index 682 Apr 1968 Jan 2025 None
All IndexSA 361 Jan 1995 Jan 2025 None
All Price 682 Apr 1968 Jan 2025 None
All PriceSA 361 Jan 1995 Jan 2025 None
All SalesVolume 359 Jan 1995 Nov 2024 None
Detached Index 361 Jan 1995 Jan 2025 None
Detached Price 361 Jan 1995 Jan 2025 None
SemiDetached Index 361 Jan 1995 Jan 2025 None
SemiDetached Price 361 Jan 1995 Jan 2025 None
Terraced Index 361 Jan 1995 Jan 2025 None
Terraced Price 361 Jan 1995 Jan 2025 None
Flat Index 361 Jan 1995 Jan 2025 None
Flat Price 361 Jan 1995 Jan 2025 None
Cash Index 157 Jan 2012 Jan 2025 None
Cash Price 157 Jan 2012 Jan 2025 None
Cash SalesVolume 155 Jan 2012 Nov 2024 None
Mortgage Index 157 Jan 2012 Jan 2025 None
Mortgage Price 157 Jan 2012 Jan 2025 None
Mortgage SalesVolume 155 Jan 2012 Nov 2024 None
FTB Index 157 Jan 2012 Jan 2025 None
FTB Price 157 Jan 2012 Jan 2025 None
FOO Index 157 Jan 2012 Jan 2025 None
FOO Price 157 Jan 2012 Jan 2025 None
New Index 359 Jan 1995 Nov 2024 None
New Price 359 Jan 1995 Nov 2024 None
New SalesVolume 359 Jan 1995 Nov 2024 None
Old Index 359 Jan 1995 Nov 2024 None
Old Price 359 Jan 1995 Nov 2024 None
Old SalesVolume 359 Jan 1995 Nov 2024 None

For the region of London, the House Price Index and the Average Price region is recorded since April of 1968, with the last observation in January of 2025, totaling 682 monthly observations. From January of 1995, other variables such as Sales Volume and [Index/Prices] Seasonally Adjusted started to be recorded as well, and the measurements (Index, Average Price, and in some cases Sales Volume) begun to be segmented by type of properties (“Detached”, “Semi Detached”, “Terraced”, “Flat”) and status of building (“New”, and “Old”) was introduced. Later in 2012, other two categorizations were added, distinguishing between type of payment (“Cash” or “Mortgage”) and status of the buyer (First time buyer or not). Consequently, the number of observations for each one of the measurements ranges from 155 to 682 depending on the categorization.

Exploratory Data Analysis

An overview of London Housing Prices

Since 1965 the House Price Index (HPI) followed a clear upward trend (non-stationary), with some short periods when the prices remained flat and with marked periods of corrections around 1990-1994 and 2008-2009. However, apparently from 2016 to 2025 the house index is apparently flattening, with smaller month over month growth rates leading to a visual smaller positive slope than that of previous decades, suggesting a break in the time series. Regarding the comparison of the Index and the IndexSA, it is noticeable that within the same year, prices tend to decelerate from January to March and November to December, and warming-up from May to September, with April and October marking transition months - such seasonality is also confirmed in the decomposition of the series across Trend, Seasonal, and Remainder

Decade Summary Statistics for London (M-o-M and CAGR)
Decade ObsCount Avg1mChange SD1mChange Max1mChange Min1mChange AnnChange CAGR
2020 61 0.17 1.2 3.3 -2.0 2.0 1.7
2010 120 0.48 1.2 3.9 -3.2 5.8 5.5
2000 120 0.64 1.3 4.8 -3.4 7.6 7.5
1990 120 0.28 2.9 7.9 -8.0 3.4 4.8
1980 120 3.21 3.8 9.6 -7.0 38.5 12.6
1970 120 4.16 4.6 13.5 -4.3 50.0 16.9
1960 18 0.80 2.6 3.4 -2.2 9.6 6.1

Since the beginning of the time series, the compound annual growth rate (CAGR) on the house prices have been the lowest in the present decade, growing at 1.7% year on year. However, so far, the present decade has show the greatest stability on the Index, recording the lowest standard deviation, highest minimum value, and lowest range on monthly percentage change. Additionally, the beginning of this period was marked by the COVID-19 pandemics and low interest rates, leading to a surge in the Sales Volume changes in 12 months that hit +485% in June 2021 and another one of +208% in July 2022, sustaining the raise in house prices measured by HPI. Afterwards, with the raise of interest rates it was observed a deceleration in Sales Volume dragging year over year prices to a decline from May to November of 2023, with prices showings recover sign in December of 2023.

Regarding previous decades, HPI changes were visibly sharper, with year over year changes hitting approximately +60% in the 1970s, +35% in the 1980s, -12% and +18% in the 1990s, +28 and -17% in the 2000s (marked by the Global Financial Crisis), and a less volatile 2010s, that saw increases of +20% and a declining trend since the Brexit that culminated in a declining HPI of -3%.

Changes Month over Month on Ln(HPI)

As observed in the previous session, the variance of the data has clearly changed over time, which suggests that the data is heteroscedastic or that there is a structural break in the data, as 1995 appears to be a year of inflection; previously the variability is visibility higher than that presented after 1995.

Relative HPI Evolution Across Property Type

Common Start Date: 1995 
Baseline Date (lagged one month): 9131 

In the 30 years of observations, it is visible that Detached and Semi Detached properties had slower growth rates in the HPI than Flats and Terraced houses for most part of the period. However, in the end of 2021 a flip occurred, Flats, which was one of the categories that saw its prices raise the fast in past decades, was takeover by Semi Detached houses, probably due a movement influenced by years of working from home that pushed people to search for larger spaces. So at the end of 30 years, Terraced houses saw the greatest increase in prices, of +690%, followed by Semi Detached (+614%), Flats (+569%), and Detached (+545%) properties.

ARIMA Modeling London House Price Index

Testing for Structural Breaks


     Optimal (m+1)-segment partition: 

Call:
breakpoints.formula(formula = ln_ts ~ 1)

Breakpoints at observation number:
                           
m = 1       228            
m = 2   141     384        
m = 3   120 222 408        
m = 4   120 222 384     545
m = 5   120 222 362 464 566

Corresponding to breakdates:
                                                 
m = 1            1987(3)                         
m = 2   1979(12)         2000(3)                 
m = 3   1978(3)  1986(9) 2002(3)                 
m = 4   1978(3)  1986(9) 2000(3)          2013(8)
m = 5   1978(3)  1986(9) 1998(5) 2006(11) 2015(5)

Fit:
                                 
m   0    1    2    3    4    5   
RSS 1402  375  132   73   45   42
BIC 2440 1553  853  468  140  107

As suggest in a previous session, the Chow-Test rejects the hypothesis with 5% of significance that the data presents no structural breaks, as all F statistics are above the critical value. Additionally, the Bai-Perron analysis suggests that the data has 5 breakpoints, in March 1978, September 1986, May 1998, November 2006, and May 2015, as the Bayesian Information Criterion (BIC) is the lowest for 5 breakpoints. Therefore, following the modeling for the forecast, the data will be modeled from May 2015 until Jan 2024, hiding the last 12 months to test the efficacy of the model.

Testing for stationarity and seasonality


    Augmented Dickey-Fuller Test

data:  train_ln_ts
Dickey-Fuller = -2, Lag order = 4, p-value = 0.5
alternative hypothesis: stationary

    Augmented Dickey-Fuller Test

data:  train_ln_ts_d1
Dickey-Fuller = -4, Lag order = 4, p-value = 0.01
alternative hypothesis: stationary


    HEGY test for unit roots

data:  train_ln_ts_d1_sa

        statistic p-value   
t_1          -2.6    0.97   
t_2         -2.37    0.11   
F_3:4        5.58       0 **
F_5:6        7.59       0 **
F_7:8        3.84    0.01 * 
F_9:10        5.6       0 **
F_11:12      3.34    0.02 * 
F_2:12       9.19    0.26   
F_1:12       9.52    0.16   
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Deterministic terms: constant 
Lag selection criterion and order: AIC, 0
P-values: based on response surface regressions 

For testing for non-seasonal stationary, the Augmented Dickey-Fuller (ADF) indicated that the null hypothesis of non-stationary failed to be rejected with 5% of significance for the logarithm of HPI is non-stationary, which could be clearly inferred by the clear trend presented on the data. Therefore, it was taken the first difference of the data, and the null hypothesis of non-stationary was rejected with 5% of significance as the observed p-value was of 0.01. As a consequence, the original series is integrated of order 1 model I(1).

Now testing for seasonal unit roots, it was used the HEGY test to determine if the series has stochastic seasonality. In simple terms, the test decomposes the times series in several different seasonal cycles, and test them for a unit root. The null hypothesis of the test is non-stationary at that specific frequency, so failing to reject it means that seasonal differencing is needed. For instance we see that all tests for stochastic unit roots are rejected (from F_3:4 to F11:12) as they have p values lower than 0.05. Therefore the seasonality will be modeled deterministically.

AR and MA order

Series: train_ln_ts_d1_sa_xts 
ARIMA(1,0,2) with non-zero mean 

Coefficients:
       ar1    ma1   ma2   mean
      0.63  -0.99  0.54  0.002
s.e.  0.20   0.19  0.08  0.001

sigma^2 = 0.0000748:  log likelihood = 348
AIC=-686   AICc=-686   BIC=-673
Series: train_ln_ts_d1_sa_ts 
ARIMA(4,0,4) with zero mean 

Coefficients:
      ar1    ar2    ar3  ar4     ma1  ma2  ma3  ma4
        0  0.304  0.445    0  -0.346    0    0    0
s.e.    0  0.087  0.085    0   0.098    0    0    0

sigma^2 = 0.0000703:  log likelihood = 351
AIC=-693   AICc=-693   BIC=-683

Training set error measures:
                  ME   RMSE    MAE MPE MAPE MASE    ACF1
Training set 0.00054 0.0083 0.0066  91  157 0.71 0.00079

z test of coefficients:

    Estimate Std. Error z value   Pr(>|z|)    
ar2   0.3044     0.0869    3.50    0.00046 ***
ar3   0.4445     0.0847    5.25 0.00000016 ***
ma1  -0.3460     0.0985   -3.51    0.00044 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


    Ljung-Box test

data:  Residuals from ARIMA(4,0,4) with zero mean
Q* = 18, df = 13, p-value = 0.1

Model df: 8.   Total lags used: 21

    Box-Ljung test

data:  residuals(model)
X-squared = 25, df = 21, p-value = 0.3

From the PACF it is possible to see that exists some memory between lags 1 and 3, however the pattern is not clear. As such, it was used a function in R to find the model that better adjust to the data, and it was obtained an ARMA (1,2). However, as the residuals were not a white-noise, it was followed an approach of starting with a full model and taking out the most irrelevant terms until all the remaining ones were relevant, refining until an ARMA (3,2). As a final outcome, we have a final model ARIMA (3,1,2) with deterministic seasonality, where all the terms are relevant with 5% of significance and the residuals are white noise, as per the Ljung-Box test we fail to reject the null hypothesis (pvalue of 0.3) that the residuals are white noise.

Model Evaluation and Forecasting

Testing ARIMA(3,1,2) with Deterministic Seasonal Component

Although the model projected the Ln(HPI) inside the confidence interval, it is clear that the forecast was in simple terms flat in comparison to the observed values

Testing with Prophet

Differently of what happened on the Arima model, the prophet apparently followed a downward trend mode pronounced

Performance Comparison

           ME RMSE   MAE  MPE MAPE
ARIMA   0.014 0.02 0.015 0.31 0.33
Prophet 0.014 0.02 0.017 0.31 0.38

By the performance metrics in the table above, it is possible to realize that both models performed similarly, however the ARIMA model performed better on the Mean Absolute Error (MAE) and in the Mean Absolute Percentage Error (MAPE), indicating that the ARIMA has a slight better accuracy for the prediction of London House Price Index.

Forecast for Feb. 2025 to Jan. 2026

Therefore, with the use of the ARIMA model, it is expected that the logarithm of the House Price Index continue its flat tendency that it has presented since 2022.