This project aims to analyse, understand, and forecast how the Housing Price Index (HPI) has evolved over time for the region of Greater London.
# A tibble: 6 × 54
Date RegionName AreaCode AveragePrice Index IndexSA `1m%Change` `12m%Change`
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 01/01… Aberdeens… S120000… 84638 41.1 NA NA NA
2 01/02… Aberdeens… S120000… 84623 41.1 NA 0 NA
3 01/03… Aberdeens… S120000… 86536 42.1 NA 2.3 NA
4 01/04… Aberdeens… S120000… 87373 42.5 NA 1 NA
5 01/05… Aberdeens… S120000… 89493 43.5 NA 2.4 NA
6 01/06… Aberdeens… S120000… 92485 44.9 NA 3.3 NA
# ℹ 46 more variables: AveragePriceSA <dbl>, SalesVolume <dbl>,
# DetachedPrice <dbl>, DetachedIndex <dbl>, `Detached1m%Change` <dbl>,
# `Detached12m%Change` <dbl>, SemiDetachedPrice <dbl>,
# SemiDetachedIndex <dbl>, `SemiDetached1m%Change` <dbl>,
# `SemiDetached12m%Change` <dbl>, TerracedPrice <dbl>, TerracedIndex <dbl>,
# `Terraced1m%Change` <dbl>, `Terraced12m%Change` <dbl>, FlatPrice <dbl>,
# FlatIndex <dbl>, `Flat1m%Change` <dbl>, `Flat12m%Change` <dbl>, …
spc_tbl_ [144,630 × 54] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ Date : chr [1:144630] "01/01/2004" "01/02/2004" "01/03/2004" "01/04/2004" ...
$ RegionName : chr [1:144630] "Aberdeenshire" "Aberdeenshire" "Aberdeenshire" "Aberdeenshire" ...
$ AreaCode : chr [1:144630] "S12000034" "S12000034" "S12000034" "S12000034" ...
$ AveragePrice : num [1:144630] 84638 84623 86536 87373 89493 ...
$ Index : num [1:144630] 41.1 41.1 42.1 42.5 43.5 44.9 46.8 49.2 49.7 49.9 ...
$ IndexSA : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
$ 1m%Change : num [1:144630] NA 0 2.3 1 2.4 3.3 4.2 5.1 0.9 0.4 ...
$ 12m%Change : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
$ AveragePriceSA : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
$ SalesVolume : num [1:144630] 388 326 453 571 502 525 652 512 497 590 ...
$ DetachedPrice : num [1:144630] 130620 129330 131585 130454 132762 ...
$ DetachedIndex : num [1:144630] 43.2 42.7 43.5 43.1 43.9 45.1 47.4 50.1 50.7 50.9 ...
$ Detached1m%Change : num [1:144630] NA -1 1.7 -0.9 1.8 2.9 5.1 5.7 1.2 0.4 ...
$ Detached12m%Change : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
$ SemiDetachedPrice : num [1:144630] 73972 74225 76201 78082 80340 ...
$ SemiDetachedIndex : num [1:144630] 40.9 41.1 42.2 43.2 44.5 46 47.7 49.9 50.3 50.4 ...
$ SemiDetached1m%Change : num [1:144630] NA 0.3 2.7 2.5 2.9 3.5 3.5 4.7 0.7 0.2 ...
$ SemiDetached12m%Change: num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
$ TerracedPrice : num [1:144630] 58247 58669 60399 62326 64442 ...
$ TerracedIndex : num [1:144630] 38.8 39.1 40.2 41.5 42.9 44.6 46.1 48.2 48.5 48.6 ...
$ Terraced1m%Change : num [1:144630] NA 0.7 2.9 3.2 3.4 3.9 3.4 4.6 0.6 0.3 ...
$ Terraced12m%Change : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
$ FlatPrice : num [1:144630] 49322 50364 51719 53143 54678 ...
$ FlatIndex : num [1:144630] 45.7 46.7 47.9 49.2 50.7 52.7 54.4 56.7 57 57.5 ...
$ Flat1m%Change : num [1:144630] NA 2.1 2.7 2.8 2.9 4.1 3.2 4.3 0.5 0.9 ...
$ Flat12m%Change : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
$ CashPrice : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
$ CashIndex : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
$ Cash1m%Change : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
$ Cash12m%Change : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
$ CashSalesVolume : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
$ MortgagePrice : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
$ MortgageIndex : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
$ Mortgage1m%Change : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
$ Mortgage12m%Change : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
$ MortgageSalesVolume : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
$ FTBPrice : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
$ FTBIndex : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
$ FTB1m%Change : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
$ FTB12m%Change : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
$ FOOPrice : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
$ FOOIndex : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
$ FOO1m%Change : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
$ FOO12m%Change : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
$ NewPrice : num [1:144630] 112843 113061 115218 115247 117377 ...
$ NewIndex : num [1:144630] 40.7 40.8 41.6 41.6 42.4 43.7 45.4 47.5 47.8 48.3 ...
$ New1m%Change : num [1:144630] NA 0.2 1.9 0 1.8 3.2 3.9 4.6 0.6 1 ...
$ New12m%Change : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
$ NewSalesVolume : num [1:144630] 103 107 140 180 167 164 163 130 142 164 ...
$ OldPrice : num [1:144630] 81273 81194 83137 84241 86466 ...
$ OldIndex : num [1:144630] 41 40.9 41.9 42.5 43.6 45.1 47 49.5 50 50.1 ...
$ Old1m%Change : num [1:144630] NA -0.1 2.4 1.3 2.6 3.4 4.4 5.3 1 0.2 ...
$ Old12m%Change : num [1:144630] NA NA NA NA NA NA NA NA NA NA ...
$ OldSalesVolume : num [1:144630] 285 219 313 391 335 361 489 382 355 426 ...
- attr(*, "spec")=
.. cols(
.. Date = col_character(),
.. RegionName = col_character(),
.. AreaCode = col_character(),
.. AveragePrice = col_double(),
.. Index = col_double(),
.. IndexSA = col_double(),
.. `1m%Change` = col_double(),
.. `12m%Change` = col_double(),
.. AveragePriceSA = col_double(),
.. SalesVolume = col_double(),
.. DetachedPrice = col_double(),
.. DetachedIndex = col_double(),
.. `Detached1m%Change` = col_double(),
.. `Detached12m%Change` = col_double(),
.. SemiDetachedPrice = col_double(),
.. SemiDetachedIndex = col_double(),
.. `SemiDetached1m%Change` = col_double(),
.. `SemiDetached12m%Change` = col_double(),
.. TerracedPrice = col_double(),
.. TerracedIndex = col_double(),
.. `Terraced1m%Change` = col_double(),
.. `Terraced12m%Change` = col_double(),
.. FlatPrice = col_double(),
.. FlatIndex = col_double(),
.. `Flat1m%Change` = col_double(),
.. `Flat12m%Change` = col_double(),
.. CashPrice = col_double(),
.. CashIndex = col_double(),
.. `Cash1m%Change` = col_double(),
.. `Cash12m%Change` = col_double(),
.. CashSalesVolume = col_double(),
.. MortgagePrice = col_double(),
.. MortgageIndex = col_double(),
.. `Mortgage1m%Change` = col_double(),
.. `Mortgage12m%Change` = col_double(),
.. MortgageSalesVolume = col_double(),
.. FTBPrice = col_double(),
.. FTBIndex = col_double(),
.. `FTB1m%Change` = col_double(),
.. `FTB12m%Change` = col_double(),
.. FOOPrice = col_double(),
.. FOOIndex = col_double(),
.. `FOO1m%Change` = col_double(),
.. `FOO12m%Change` = col_double(),
.. NewPrice = col_double(),
.. NewIndex = col_double(),
.. `New1m%Change` = col_double(),
.. `New12m%Change` = col_double(),
.. NewSalesVolume = col_double(),
.. OldPrice = col_double(),
.. OldIndex = col_double(),
.. `Old1m%Change` = col_double(),
.. `Old12m%Change` = col_double(),
.. OldSalesVolume = col_double()
.. )
- attr(*, "problems")=<externalptr>
Date RegionName AreaCode AveragePrice
Length:144630 Length:144630 Length:144630 Min. : 2553
Class :character Class :character Class :character 1st Qu.: 98624
Mode :character Mode :character Mode :character Median : 151494
Mean : 179345
3rd Qu.: 225218
Max. :1656986
Index IndexSA 1m%Change 12m%Change AveragePriceSA
Min. : 1 Min. : 14 Min. :-30 Min. :-36 Min. : 39008
1st Qu.: 36 1st Qu.: 45 1st Qu.: 0 1st Qu.: 1 1st Qu.:113751
Median : 60 Median : 62 Median : 0 Median : 5 Median :150794
Mean : 59 Mean : 61 Mean : 1 Mean : 6 Mean :165313
3rd Qu.: 78 3rd Qu.: 79 3rd Qu.: 2 3rd Qu.: 10 3rd Qu.:203270
Max. :153 Max. :107 Max. : 35 Max. : 98 Max. :574627
NA's :139551 NA's :453 NA's :4860 NA's :139551
SalesVolume DetachedPrice DetachedIndex Detached1m%Change
Min. : 2 Min. : 40117 Min. : 9 Min. :-20
1st Qu.: 141 1st Qu.: 175805 1st Qu.: 40 1st Qu.: 0
Median : 215 Median : 255125 Median : 59 Median : 0
Mean : 1253 Mean : 334792 Mean : 59 Mean : 0
3rd Qu.: 378 3rd Qu.: 386383 3rd Qu.: 78 3rd Qu.: 1
Max. :183609 Max. :5820174 Max. :147 Max. : 35
NA's :4467 NA's :6669 NA's :6669 NA's :7064
Detached12m%Change SemiDetachedPrice SemiDetachedIndex SemiDetached1m%Change
Min. :-30 Min. : 24973 Min. : 9 Min. :-20
1st Qu.: 1 1st Qu.: 106979 1st Qu.: 38 1st Qu.: 0
Median : 5 Median : 157112 Median : 59 Median : 0
Mean : 6 Mean : 205559 Mean : 58 Mean : 0
3rd Qu.: 10 3rd Qu.: 237162 3rd Qu.: 77 3rd Qu.: 1
Max. : 93 Max. :3919683 Max. :148 Max. : 35
NA's :11385 NA's :6669 NA's :6669 NA's :7064
SemiDetached12m%Change TerracedPrice TerracedIndex Terraced1m%Change
Min. :-29 Min. : 19407 Min. : 9 Min. :-20
1st Qu.: 1 1st Qu.: 83911 1st Qu.: 37 1st Qu.: 0
Median : 5 Median : 125775 Median : 60 Median : 0
Mean : 6 Mean : 165004 Mean : 58 Mean : 0
3rd Qu.: 10 3rd Qu.: 191917 3rd Qu.: 77 3rd Qu.: 2
Max. :102 Max. :3888915 Max. :148 Max. : 36
NA's :11385 NA's :6557 NA's :6557 NA's :6953
Terraced12m%Change FlatPrice FlatIndex Flat1m%Change
Min. :-29 Min. : 15712 Min. : 9 Min. :-30
1st Qu.: 1 1st Qu.: 66734 1st Qu.: 46 1st Qu.: 0
Median : 5 Median : 98350 Median : 68 Median : 0
Mean : 6 Mean : 119450 Mean : 65 Mean : 0
3rd Qu.: 10 3rd Qu.: 143512 3rd Qu.: 84 3rd Qu.: 1
Max. :107 Max. :1330583 Max. :160 Max. : 36
NA's :11285 NA's :6308 NA's :6308 NA's :6704
Flat12m%Change CashPrice CashIndex Cash1m%Change
Min. :-29 Min. : 58405 Min. : 44 Min. :-19
1st Qu.: 0 1st Qu.: 149055 1st Qu.: 69 1st Qu.: 0
Median : 4 Median : 210636 Median : 81 Median : 0
Mean : 6 Mean : 250534 Mean : 80 Mean : 0
3rd Qu.: 10 3rd Qu.: 307370 3rd Qu.: 93 3rd Qu.: 1
Max. :103 Max. :1621751 Max. :154 Max. : 21
NA's :11036 NA's :83086 NA's :83086 NA's :83478
Cash12m%Change CashSalesVolume MortgagePrice MortgageIndex
Min. :-27 Min. : 1 Min. : 66241 Min. : 43
1st Qu.: 1 1st Qu.: 38 1st Qu.: 152620 1st Qu.: 68
Median : 4 Median : 59 Median : 212456 Median : 80
Mean : 4 Mean : 324 Mean : 251836 Mean : 80
3rd Qu.: 7 3rd Qu.: 102 3rd Qu.: 306913 3rd Qu.: 93
Max. : 56 Max. :52162 Max. :1686923 Max. :153
NA's :87790 NA's :83871 NA's :83086 NA's :83086
Mortgage1m%Change Mortgage12m%Change MortgageSalesVolume FTBPrice
Min. :-19 Min. :-28 Min. : 1 Min. : 57199
1st Qu.: 0 1st Qu.: 1 1st Qu.: 86 1st Qu.: 130184
Median : 0 Median : 4 Median : 134 Median : 177504
Mean : 0 Mean : 4 Mean : 732 Mean : 209559
3rd Qu.: 1 3rd Qu.: 7 3rd Qu.: 234 3rd Qu.: 251980
Max. : 22 Max. : 56 Max. :128426 Max. :1417178
NA's :83478 NA's :87790 NA's :83872 NA's :82690
FTBIndex FTB1m%Change FTB12m%Change FOOPrice
Min. : 42 Min. :-19 Min. :-28 Min. : 71951
1st Qu.: 68 1st Qu.: 0 1st Qu.: 1 1st Qu.: 179186
Median : 80 Median : 0 Median : 4 Median : 250540
Mean : 80 Mean : 0 Mean : 4 Mean : 301419
3rd Qu.: 93 3rd Qu.: 1 3rd Qu.: 7 3rd Qu.: 369430
Max. :154 Max. : 24 Max. : 56 Max. :1956287
NA's :82690 NA's :83082 NA's :87394 NA's :83086
FOOIndex FOO1m%Change FOO12m%Change NewPrice
Min. : 43 Min. :-19 Min. :-27 Min. : 22443
1st Qu.: 68 1st Qu.: 0 1st Qu.: 1 1st Qu.: 125043
Median : 80 Median : 0 Median : 4 Median : 194410
Mean : 80 Mean : 0 Mean : 4 Mean : 214956
3rd Qu.: 92 3rd Qu.: 1 3rd Qu.: 7 3rd Qu.: 271262
Max. :151 Max. : 21 Max. : 56 Max. :1414204
NA's :83086 NA's :83478 NA's :87790 NA's :7300
NewIndex New1m%Change New12m%Change NewSalesVolume
Min. : 8 Min. :-30 Min. :-29 Min. : 1
1st Qu.: 41 1st Qu.: 0 1st Qu.: 2 1st Qu.: 10
Median : 61 Median : 0 Median : 6 Median : 21
Mean : 60 Mean : 1 Mean : 6 Mean : 137
3rd Qu.: 81 3rd Qu.: 2 3rd Qu.: 10 3rd Qu.: 47
Max. :150 Max. : 35 Max. : 96 Max. :21097
NA's :7300 NA's :7696 NA's :12028 NA's :10144
OldPrice OldIndex Old1m%Change Old12m%Change
Min. : 22716 Min. : 9 Min. :-31 Min. :-30
1st Qu.: 100407 1st Qu.: 39 1st Qu.: 0 1st Qu.: 1
Median : 151856 Median : 60 Median : 0 Median : 5
Mean : 181015 Mean : 59 Mean : 0 Mean : 6
3rd Qu.: 225576 3rd Qu.: 78 3rd Qu.: 1 3rd Qu.: 10
Max. :1665089 Max. :153 Max. : 36 Max. : 99
NA's :7096 NA's :7096 NA's :7492 NA's :11824
OldSalesVolume
Min. : 2
1st Qu.: 126
Median : 193
Mean : 1137
3rd Qu.: 344
Max. :166098
NA's :7108
The original data set is composed of 144,630 monthly observations across 54 variables capturing various dimensions of the UK House Price Index. It provides information regarding Average Prices, Sales Volumes, Percentage Changes M-o-M and Y-o-Y, and Raw and Seasonally Adjusted Price Indexes, across various geographic delimitations (from county level to London borough councils) and disaggregated by property type (Detached, Semi Detached, Terraced, Flat), payment method (Cash, Mortgage), buyer status (FTB, FOO), and building status (New, Old).
Notably, many NAs are present in the dataset primarily due to the fact that not all variables began to be recorded at the same time or for every region. Additionally, except for the columns “Date”, “RegionName”, and “AreaCode” that are stored as characters, the remaining variables are stores as double-precision numbers. Detailed descriptions for each variable are displayed below.
| Column_Header | Explanation |
|---|---|
| Date | The year and month to which the monthly statistics apply |
| RegionName | Name of geography (Country, Regional, County/Unitary/District Authority and London Borough) |
| AreaCode | Code of geography (Country, Regional, County/Unitary/District Authority and London Borough) |
| Average Price | Average house price for a geography in a particular period |
| Index | House price index for a geography in a particular period (January 2015 = 100) |
| IndexSA | Seasonally adjusted house price index for a geography in a particular period (January 2015 = 100) |
| 1m%change | The percentage change in the Average Price compared to the previous month |
| 12m%change | The percentage change in the Average Price compared to the same period twelve months earlier |
| AveragePricesSA | Seasonally adjusted Average Price for a geography in a particular period |
| Sales Volume | Number of registered transactions for a geography in a particular period |
| [Property Type]Price | Average house price for a particular property type (e.g., detached houses) for a geography in a particular period |
| [Property Type]Index | House price index for a particular property type for a geography in a particular period (January 2015 = 100) |
| [Property Type]1m%change | The percentage change in the [Property Type] Price compared to the previous month |
| [Property Type]12m%change | The percentage change in the [Property Type] Price compared to the same period twelve months earlier |
| [Cash/Mortgage]Price | Average house price by funding status (e.g., cash) for a geography in a particular period |
| [Cash/Mortgage]Index | House price index by funding status for a geography in a particular period (January 2015 = 100) |
| [Cash/Mortgage]1m%change | The percentage change in the [Cash/Mortgage]Price compared to the previous month |
| [Cash/Mortgage]12m%change | The percentage change in the [Cash/Mortgage]Price compared to the same period twelve months earlier |
| [Cash/Mortgage] Sales Volume | Number of registered transactions [Cash/Mortgage] for a geography in a particular period |
| [FTB/FOO]Price | Average house price by buyer status (e.g., first time buyer/former owner occupier) for a geography in a particular period |
| [FTB/FOO]Index | House price index by buyer status for a geography in a particular period (January 2015 = 100) |
| [FTB/FOO]1m%change | The percentage change in the [FTB/FOO]Price compared to the previous month |
| [FTB/FOO]12m%change | The percentage change in the [FTB/FOO]Price compared to the same period twelve months earlier |
| [New/Old]Price | Average house price by property status (e.g., new or existing property) for a geography in a particular period |
| [New/Old]Index | House price index by property status for a geography in a particular period (January 2015 = 100) |
| [New/Old]1m%change | The percentage change in the [New/Old]Price compared to the previous month |
| [New/Old]12m%change | The percentage change in the [New/Old]Price compared to the same period twelve months earlier |
| [New/Old] Sales Volume | Number of registered transactions [New/Old] for a geography in a particular period |
Source: About the UK House Price Index
However, as the data is in a wide format and many functions in R are optimized to the long format (tidy), first the data will be transformed to such format. Then, considering that the analysis is dependent on the evolution of the variables over time, the column date will be transformed from “character” to “date”
# A tibble: 6 × 6
Dates RegionName AreaCode Category Measurement Value
<yearmon> <chr> <chr> <chr> <chr> <dbl>
1 Jan 2004 Aberdeenshire S12000034 All Price 84638
2 Jan 2004 Aberdeenshire S12000034 All Index 41.1
3 Jan 2004 Aberdeenshire S12000034 All IndexSA NA
4 Jan 2004 Aberdeenshire S12000034 All 1m%Change NA
5 Jan 2004 Aberdeenshire S12000034 All 12m%Change NA
6 Jan 2004 Aberdeenshire S12000034 All PriceSA NA
As the object of study of this project is the London are, the data is sorted to according to this area code “E12000007”.
[1] "London"
# A tibble: 6 × 6
Dates RegionName AreaCode Category Measurement Value
<yearmon> <chr> <chr> <chr> <chr> <dbl>
1 Apr 1968 London E12000007 All Price 4730
2 Apr 1968 London E12000007 All Index 0.8
3 Apr 1968 London E12000007 All IndexSA NA
4 Apr 1968 London E12000007 All 1m%Change NA
5 Apr 1968 London E12000007 All 12m%Change NA
6 Apr 1968 London E12000007 All PriceSA NA
| Category | Measurement | ObsCount | FirstDate | LastDate | MissingDatesStr |
|---|---|---|---|---|---|
| All | Index | 682 | Apr 1968 | Jan 2025 | None |
| All | IndexSA | 361 | Jan 1995 | Jan 2025 | None |
| All | Price | 682 | Apr 1968 | Jan 2025 | None |
| All | PriceSA | 361 | Jan 1995 | Jan 2025 | None |
| All | SalesVolume | 359 | Jan 1995 | Nov 2024 | None |
| Detached | Index | 361 | Jan 1995 | Jan 2025 | None |
| Detached | Price | 361 | Jan 1995 | Jan 2025 | None |
| SemiDetached | Index | 361 | Jan 1995 | Jan 2025 | None |
| SemiDetached | Price | 361 | Jan 1995 | Jan 2025 | None |
| Terraced | Index | 361 | Jan 1995 | Jan 2025 | None |
| Terraced | Price | 361 | Jan 1995 | Jan 2025 | None |
| Flat | Index | 361 | Jan 1995 | Jan 2025 | None |
| Flat | Price | 361 | Jan 1995 | Jan 2025 | None |
| Cash | Index | 157 | Jan 2012 | Jan 2025 | None |
| Cash | Price | 157 | Jan 2012 | Jan 2025 | None |
| Cash | SalesVolume | 155 | Jan 2012 | Nov 2024 | None |
| Mortgage | Index | 157 | Jan 2012 | Jan 2025 | None |
| Mortgage | Price | 157 | Jan 2012 | Jan 2025 | None |
| Mortgage | SalesVolume | 155 | Jan 2012 | Nov 2024 | None |
| FTB | Index | 157 | Jan 2012 | Jan 2025 | None |
| FTB | Price | 157 | Jan 2012 | Jan 2025 | None |
| FOO | Index | 157 | Jan 2012 | Jan 2025 | None |
| FOO | Price | 157 | Jan 2012 | Jan 2025 | None |
| New | Index | 359 | Jan 1995 | Nov 2024 | None |
| New | Price | 359 | Jan 1995 | Nov 2024 | None |
| New | SalesVolume | 359 | Jan 1995 | Nov 2024 | None |
| Old | Index | 359 | Jan 1995 | Nov 2024 | None |
| Old | Price | 359 | Jan 1995 | Nov 2024 | None |
| Old | SalesVolume | 359 | Jan 1995 | Nov 2024 | None |
For the region of London, the House Price Index and the Average Price region is recorded since April of 1968, with the last observation in January of 2025, totaling 682 monthly observations. From January of 1995, other variables such as Sales Volume and [Index/Prices] Seasonally Adjusted started to be recorded as well, and the measurements (Index, Average Price, and in some cases Sales Volume) begun to be segmented by type of properties (“Detached”, “Semi Detached”, “Terraced”, “Flat”) and status of building (“New”, and “Old”) was introduced. Later in 2012, other two categorizations were added, distinguishing between type of payment (“Cash” or “Mortgage”) and status of the buyer (First time buyer or not). Consequently, the number of observations for each one of the measurements ranges from 155 to 682 depending on the categorization.
Since 1965 the House Price Index (HPI) followed a clear upward trend (non-stationary), with some short periods when the prices remained flat and with marked periods of corrections around 1990-1994 and 2008-2009. However, apparently from 2016 to 2025 the house index is apparently flattening, with smaller month over month growth rates leading to a visual smaller positive slope than that of previous decades, suggesting a break in the time series. Regarding the comparison of the Index and the IndexSA, it is noticeable that within the same year, prices tend to decelerate from January to March and November to December, and warming-up from May to September, with April and October marking transition months - such seasonality is also confirmed in the decomposition of the series across Trend, Seasonal, and Remainder
| Decade | ObsCount | Avg1mChange | SD1mChange | Max1mChange | Min1mChange | AnnChange | CAGR |
|---|---|---|---|---|---|---|---|
| 2020 | 61 | 0.17 | 1.2 | 3.3 | -2.0 | 2.0 | 1.7 |
| 2010 | 120 | 0.48 | 1.2 | 3.9 | -3.2 | 5.8 | 5.5 |
| 2000 | 120 | 0.64 | 1.3 | 4.8 | -3.4 | 7.6 | 7.5 |
| 1990 | 120 | 0.28 | 2.9 | 7.9 | -8.0 | 3.4 | 4.8 |
| 1980 | 120 | 3.21 | 3.8 | 9.6 | -7.0 | 38.5 | 12.6 |
| 1970 | 120 | 4.16 | 4.6 | 13.5 | -4.3 | 50.0 | 16.9 |
| 1960 | 18 | 0.80 | 2.6 | 3.4 | -2.2 | 9.6 | 6.1 |
Since the beginning of the time series, the compound annual growth rate (CAGR) on the house prices have been the lowest in the present decade, growing at 1.7% year on year. However, so far, the present decade has show the greatest stability on the Index, recording the lowest standard deviation, highest minimum value, and lowest range on monthly percentage change. Additionally, the beginning of this period was marked by the COVID-19 pandemics and low interest rates, leading to a surge in the Sales Volume changes in 12 months that hit +485% in June 2021 and another one of +208% in July 2022, sustaining the raise in house prices measured by HPI. Afterwards, with the raise of interest rates it was observed a deceleration in Sales Volume dragging year over year prices to a decline from May to November of 2023, with prices showings recover sign in December of 2023.
Regarding previous decades, HPI changes were visibly sharper, with year over year changes hitting approximately +60% in the 1970s, +35% in the 1980s, -12% and +18% in the 1990s, +28 and -17% in the 2000s (marked by the Global Financial Crisis), and a less volatile 2010s, that saw increases of +20% and a declining trend since the Brexit that culminated in a declining HPI of -3%.
As observed in the previous session, the variance of the data has clearly changed over time, which suggests that the data is heteroscedastic or that there is a structural break in the data, as 1995 appears to be a year of inflection; previously the variability is visibility higher than that presented after 1995.
Common Start Date: 1995
Baseline Date (lagged one month): 9131
In the 30 years of observations, it is visible that Detached and Semi Detached properties had slower growth rates in the HPI than Flats and Terraced houses for most part of the period. However, in the end of 2021 a flip occurred, Flats, which was one of the categories that saw its prices raise the fast in past decades, was takeover by Semi Detached houses, probably due a movement influenced by years of working from home that pushed people to search for larger spaces. So at the end of 30 years, Terraced houses saw the greatest increase in prices, of +690%, followed by Semi Detached (+614%), Flats (+569%), and Detached (+545%) properties.
Optimal (m+1)-segment partition:
Call:
breakpoints.formula(formula = ln_ts ~ 1)
Breakpoints at observation number:
m = 1 228
m = 2 141 384
m = 3 120 222 408
m = 4 120 222 384 545
m = 5 120 222 362 464 566
Corresponding to breakdates:
m = 1 1987(3)
m = 2 1979(12) 2000(3)
m = 3 1978(3) 1986(9) 2002(3)
m = 4 1978(3) 1986(9) 2000(3) 2013(8)
m = 5 1978(3) 1986(9) 1998(5) 2006(11) 2015(5)
Fit:
m 0 1 2 3 4 5
RSS 1402 375 132 73 45 42
BIC 2440 1553 853 468 140 107
As suggest in a previous session, the Chow-Test rejects the hypothesis
with 5% of significance that the data presents no structural breaks, as
all F statistics are above the critical value. Additionally, the
Bai-Perron analysis suggests that the data has 5 breakpoints, in March
1978, September 1986, May 1998, November 2006, and May 2015, as the
Bayesian Information Criterion (BIC) is the lowest for 5 breakpoints.
Therefore, following the modeling for the forecast, the data will be
modeled from May 2015 until Jan 2024, hiding the last 12 months to test
the efficacy of the model.
Augmented Dickey-Fuller Test
data: train_ln_ts
Dickey-Fuller = -2, Lag order = 4, p-value = 0.5
alternative hypothesis: stationary
Augmented Dickey-Fuller Test
data: train_ln_ts_d1
Dickey-Fuller = -4, Lag order = 4, p-value = 0.01
alternative hypothesis: stationary
HEGY test for unit roots
data: train_ln_ts_d1_sa
statistic p-value
t_1 -2.6 0.97
t_2 -2.37 0.11
F_3:4 5.58 0 **
F_5:6 7.59 0 **
F_7:8 3.84 0.01 *
F_9:10 5.6 0 **
F_11:12 3.34 0.02 *
F_2:12 9.19 0.26
F_1:12 9.52 0.16
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Deterministic terms: constant
Lag selection criterion and order: AIC, 0
P-values: based on response surface regressions
For testing for non-seasonal stationary, the Augmented Dickey-Fuller (ADF) indicated that the null hypothesis of non-stationary failed to be rejected with 5% of significance for the logarithm of HPI is non-stationary, which could be clearly inferred by the clear trend presented on the data. Therefore, it was taken the first difference of the data, and the null hypothesis of non-stationary was rejected with 5% of significance as the observed p-value was of 0.01. As a consequence, the original series is integrated of order 1 model I(1).
Now testing for seasonal unit roots, it was used the HEGY test to determine if the series has stochastic seasonality. In simple terms, the test decomposes the times series in several different seasonal cycles, and test them for a unit root. The null hypothesis of the test is non-stationary at that specific frequency, so failing to reject it means that seasonal differencing is needed. For instance we see that all tests for stochastic unit roots are rejected (from F_3:4 to F11:12) as they have p values lower than 0.05. Therefore the seasonality will be modeled deterministically.
Series: train_ln_ts_d1_sa_xts
ARIMA(1,0,2) with non-zero mean
Coefficients:
ar1 ma1 ma2 mean
0.63 -0.99 0.54 0.002
s.e. 0.20 0.19 0.08 0.001
sigma^2 = 0.0000748: log likelihood = 348
AIC=-686 AICc=-686 BIC=-673
Series: train_ln_ts_d1_sa_ts
ARIMA(4,0,4) with zero mean
Coefficients:
ar1 ar2 ar3 ar4 ma1 ma2 ma3 ma4
0 0.304 0.445 0 -0.346 0 0 0
s.e. 0 0.087 0.085 0 0.098 0 0 0
sigma^2 = 0.0000703: log likelihood = 351
AIC=-693 AICc=-693 BIC=-683
Training set error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set 0.00054 0.0083 0.0066 91 157 0.71 0.00079
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
ar2 0.3044 0.0869 3.50 0.00046 ***
ar3 0.4445 0.0847 5.25 0.00000016 ***
ma1 -0.3460 0.0985 -3.51 0.00044 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Ljung-Box test
data: Residuals from ARIMA(4,0,4) with zero mean
Q* = 18, df = 13, p-value = 0.1
Model df: 8. Total lags used: 21
Box-Ljung test
data: residuals(model)
X-squared = 25, df = 21, p-value = 0.3
From the PACF it is possible to see that exists some memory between lags
1 and 3, however the pattern is not clear. As such, it was used a
function in R to find the model that better adjust to the data, and it
was obtained an ARMA (1,2). However, as the residuals were not a
white-noise, it was followed an approach of starting with a full model
and taking out the most irrelevant terms until all the remaining ones
were relevant, refining until an ARMA (3,2). As a final outcome, we have
a final model ARIMA (3,1,2) with deterministic seasonality, where all
the terms are relevant with 5% of significance and the residuals are
white noise, as per the Ljung-Box test we fail to reject the null
hypothesis (pvalue of 0.3) that the residuals are white noise.
Although the model projected the Ln(HPI) inside the confidence interval,
it is clear that the forecast was in simple terms flat in comparison to
the observed values
Differently of what happened on the Arima model, the prophet apparently
followed a downward trend mode pronounced
ME RMSE MAE MPE MAPE
ARIMA 0.014 0.02 0.015 0.31 0.33
Prophet 0.014 0.02 0.017 0.31 0.38
By the performance metrics in the table above, it is possible to realize that both models performed similarly, however the ARIMA model performed better on the Mean Absolute Error (MAE) and in the Mean Absolute Percentage Error (MAPE), indicating that the ARIMA has a slight better accuracy for the prediction of London House Price Index.
Therefore, with the use of the ARIMA model, it is expected that the
logarithm of the House Price Index continue its flat tendency that it
has presented since 2022.