Properties Prices in London: Exploratory Data Analysis
Introduction
As the Largest city of England and UK, properties are an integral aspect for the people. It is common knowledge that the price of the property around London is rather high than any other city in England. The property price itself affected by several parameters such as the number of room, land and building area, location, and the type of property.
With Housing Prices in London data set downloaded from kaggle, We will try to do Exploratory Data Analysis to see any information that we can get from the data set.
Attaching Necessary Library
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Data Inspection
These are an overview of our data frame
## X Property.Name Price House.Type Area.in.sq.ft No..of.Bedrooms
## 1 0 Queens Road 1675000 House 2716 5
## 2 1 Seward Street 650000 Flat / Apartment 814 2
## 3 2 Hotham Road 735000 Flat / Apartment 761 2
## 4 3 Festing Road 1765000 House 1986 4
## 5 4 Spencer Walk 675000 Flat / Apartment 700 2
## 6 5 Craven Hill Gardens 420000 Flat / Apartment 403 1
## No..of.Bathrooms No..of.Receptions Location City.County Postal.Code
## 1 5 5 Wimbledon London SW19 8NY
## 2 2 2 Clerkenwell London EC1V 3PA
## 3 2 2 Putney London SW15 1QL
## 4 4 4 Putney London SW15 1LP
## 5 2 2 Putney London SW15 1PL
## 6 1 1 London W2 3EA
Next, we will see the dimension of our data frame. with dim()
function we will get the amount of row and column of our dataframe/
## [1] 3480 11
The information above shown us that our data frame have 11 columns and 3480 number of rows or observation contained.
Each column contain different informations. with names()
function we will see name of the column so we will get a big picture what are the information contained in it and is it already have the correct class or data type.
## [1] "X" "Property.Name" "Price"
## [4] "House.Type" "Area.in.sq.ft" "No..of.Bedrooms"
## [7] "No..of.Bathrooms" "No..of.Receptions" "Location"
## [10] "City.County" "Postal.Code"
Based on the column name, the information contained in each column can be described as below:
x
: row index Property.Name
: Name of the property in the listings Price
: Price of the property in Pounds (£) House.Type
: Type of property Area.in.sq.ft
: Area of the property No..of.Bedrooms
: Number of Bedrooms No..of.Bathrooms
: Number of Bathrooms No..of.Receptions
: Number of Receptions Location
: Location City.County
: City / County of the property Postal.Code
: Postal Code
source: https://www.kaggle.com/arnavkulkarni/housing-prices-in-london
Data Wrangling and Transformation
another way to see the column name is by using str()
function. str()
will show us the structure of our data frame including the class or data type of our column.
## 'data.frame': 3480 obs. of 11 variables:
## $ X : int 0 1 2 3 4 5 6 7 8 9 ...
## $ Property.Name : chr "Queens Road" "Seward Street" "Hotham Road" "Festing Road" ...
## $ Price : int 1675000 650000 735000 1765000 675000 420000 1475000 650000 2500000 925000 ...
## $ House.Type : chr "House" "Flat / Apartment" "Flat / Apartment" "House" ...
## $ Area.in.sq.ft : int 2716 814 761 1986 700 403 1548 560 1308 646 ...
## $ No..of.Bedrooms : int 5 2 2 4 2 1 4 1 3 2 ...
## $ No..of.Bathrooms : int 5 2 2 4 2 1 4 1 3 2 ...
## $ No..of.Receptions: int 5 2 2 4 2 1 4 1 3 2 ...
## $ Location : chr "Wimbledon" "Clerkenwell" "Putney" "Putney" ...
## $ City.County : chr "London" "London" "London" "London" ...
## $ Postal.Code : chr "SW19 8NY" "EC1V 3PA" "SW15 1QL" "SW15 1LP" ...
Based on the information above, our chr
class does not correctly classified as a chr
class as some of it is a repeated value. We will check the levels of any chr
class that can be grouped into a categorical class.
## [1] "Bungalow" "Duplex" "Flat / Apartment" "House"
## [5] "Mews" "New development" "Penthouse" "Studio"
## [1] "110 Battersea Park Road" "27 Carlton Drive"
## [3] "311 Goldhawk Road" "4 Circus Road West"
## [5] "52 Holloway Road" "6 Deal Street"
## [7] "82-88 Fulham High Street" "Battersea"
## [9] "Blackheath" "Bushey"
## [11] "Chelsea" "Chessington"
## [13] "City Of London" "Clapton"
## [15] "Clerkenwell" "De Beauvoir"
## [17] "Deptford" "Downs Road"
## [19] "E5 8DE" "Ealing"
## [21] "Essex" "Fitzrovia"
## [23] "Fulham" "Fulham High Street"
## [25] "Greenford" "Hertfordshire"
## [27] "Holland Park" "Hornchurch"
## [29] "Kensington" "Kent"
## [31] "Lambourne End" "Lillie Square"
## [33] "Little Venice" "London"
## [35] "London1500" "Marylebone"
## [37] "Middlesex" "Middx"
## [39] "N1 6FU" "N7 6QX"
## [41] "Northwood" "Oxshott"
## [43] "Queens Park" "Richmond"
## [45] "Richmond Hill" "Romford"
## [47] "Spitalfields" "Surrey"
## [49] "Surrey Quays" "Thames Ditton"
## [51] "The Metal Works" "Thurleigh Road"
## [53] "Twickenham" "Wandsworth"
## [55] "Watford" "Wimbledon"
## [57] "Wornington Road"
As from the result, it can be conclude that House.Type
and City.County
have a repeated value and can be categorized as a fctr
class. We also need to drop any column that will not give us any information such as x
as it only contain the row index of our data.
# Transform data type
prop$House.Type <- as.factor(prop$House.Type)
prop$City.County <- as.factor(prop$City.County)
prop$Location <- replace(prop$Location, prop$Location == "", "Unidentified")
prop <- prop[c(2:11)]
## 'data.frame': 3480 obs. of 10 variables:
## $ Property.Name : chr "Queens Road" "Seward Street" "Hotham Road" "Festing Road" ...
## $ Price : int 1675000 650000 735000 1765000 675000 420000 1475000 650000 2500000 925000 ...
## $ House.Type : Factor w/ 8 levels "Bungalow","Duplex",..: 4 3 3 4 3 3 4 6 4 3 ...
## $ Area.in.sq.ft : int 2716 814 761 1986 700 403 1548 560 1308 646 ...
## $ No..of.Bedrooms : int 5 2 2 4 2 1 4 1 3 2 ...
## $ No..of.Bathrooms : int 5 2 2 4 2 1 4 1 3 2 ...
## $ No..of.Receptions: int 5 2 2 4 2 1 4 1 3 2 ...
## $ Location : chr "Wimbledon" "Clerkenwell" "Putney" "Putney" ...
## $ City.County : Factor w/ 57 levels "110 Battersea Park Road",..: 34 34 34 34 34 34 34 34 34 34 ...
## $ Postal.Code : chr "SW19 8NY" "EC1V 3PA" "SW15 1QL" "SW15 1LP" ...
base on the structure, our data frame now have a more suitable data type for each of their column. These will ease us if we are going to do data aggregation to extract information from the data frame.
Another important thing to do for our data frame is to check any missing values. we will try to use anyNA()
to check the availability of missing values, then colsums(is.na())
will help us to detect the column that contain the missing value in our data frame.
## [1] FALSE
## Property.Name Price House.Type Area.in.sq.ft
## 0 0 0 0
## No..of.Bedrooms No..of.Bathrooms No..of.Receptions Location
## 0 0 0 0
## City.County Postal.Code
## 0 0
Based on these information, there are no missing value in our data set, which mean we did not need to transform or omit any missing value in our data.
If we intended to do data aggregation we need to replace this missing values with specific values.
Suppose we wanted to replace missing values, we will use replace_na()
function from tidyr
library.
Data Explanation
with summary()
function we will try to summarize brief explanation from our data set
## Property.Name Price House.Type Area.in.sq.ft
## Length:3480 Min. : 180000 Flat / Apartment:1565 Min. : 274
## Class :character 1st Qu.: 750000 House :1430 1st Qu.: 834
## Mode :character Median : 1220000 New development : 357 Median : 1310
## Mean : 1864173 Penthouse : 100 Mean : 1713
## 3rd Qu.: 2150000 Studio : 10 3rd Qu.: 2157
## Max. :39750000 Bungalow : 9 Max. :15405
## (Other) : 9
## No..of.Bedrooms No..of.Bathrooms No..of.Receptions Location
## Min. : 0.000 Min. : 0.000 Min. : 0.000 Length:3480
## 1st Qu.: 2.000 1st Qu.: 2.000 1st Qu.: 2.000 Class :character
## Median : 3.000 Median : 3.000 Median : 3.000 Mode :character
## Mean : 3.104 Mean : 3.104 Mean : 3.104
## 3rd Qu.: 4.000 3rd Qu.: 4.000 3rd Qu.: 4.000
## Max. :10.000 Max. :10.000 Max. :10.000
##
## City.County Postal.Code
## London :2972 Length:3480
## Surrey : 262 Class :character
## Middlesex : 78 Mode :character
## Essex : 62
## Twickenham : 12
## Hertfordshire: 9
## (Other) : 85
Summary:
- London have the highest amount of property than any other region.
- The Priciest Price of the property around London including the neighborhood region is 39.750.000 Pounds, while the cheapest Price of the property is 180.000 Pounds.
- There are 8 types of property in the region.
- The largest property have an area of 15.405 square feet, while the smallest property have an area is 274 square feet. 5.The maximum capacity of Bedrooms, Bathroom, and receptions of all properties around the region are 10 for each rooms.
with summary()
function, it surely ease people to gain brief information of our data frame. Another way to get information is by conducting data aggregation.
Data Aggregation
Data aggregation will let you aggregate your data into a set of data frame that contained only information that you need. By aggregating the data it will filter the necessary information that you wish to present. Aggregating data will certainly help you especially when creating any visualization chart that will ease your interpretability of your aggregated data frame.
Although we already have several information. We will try to find other kind information that can be obtained using Data Aggregation using dplyr
library.
Top 5 Priciest Property
## Property.Name Price City.County
## 1 No.1 Grosvenor Square 39750000 London
## 2 Cadogan Place 34000000 London
## 3 Hamilton Terrace 25000000 London
## 4 Park Place 25000000 London
## 5 Ikins House 23950000 London
The priciest Property is property named No.1 Grosvenor Square in London. The price of the property is 39,750,000 Pounds
Top 5 Cheapest Property
## Property.Name Price City.County
## 1 Park View Court 180000 London
## 2 Kersfield House 210000 London
## 3 Pullman Court 249999 London
## 4 Tottenham Road 255000 London
## 5 Mare Street 260000 London
The cheapest Property is property named Park View Court in London. The price of the property is 180,000 Pounds
Where are the most affordable Location if you planning to buy a property?
prop %>%
group_by(City.County) %>%
summarise(Average.Price = mean(Price)) %>%
arrange(Average.Price) %>%
head(5)
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 5 x 2
## City.County Average.Price
## <fct> <dbl>
## 1 Chessington 375000
## 2 Blackheath 399995
## 3 Deptford 400000
## 4 Surrey Quays 485000
## 5 Clapton 490000
If you are planning to buy a property with a tight budget on your account. The area of Chessington or Blackheath can be considered as the average price of properties in those area have not exceed 400,000 Pounds.
Top 5 biggest property
## Property.Name Area.in.sq.ft City.County
## 1 Ikins House 15405 London
## 2 Ingram Avenue 14358 London
## 3 Birdshill Road 12546 Surrey
## 4 Birds Hill Road 12526 Surrey
## 5 Hamilton Terrace 12435 London
The biggest Property is property named Ikins House in London with 15405 square feet area.
Top 5 smallest property
## Property.Name Area.in.sq.ft City.County
## 1 Cadogan Terrace 274 London
## 2 Cautley Avenue 277 London
## 3 Erskine Road 292 London
## 4 Eardley Crescent 297 London
## 5 Foskett Road 302 London
The smallest Property is property named Cadogan Terrace in London with 274 square feet area.
Case:
You have just been transferred by your office to work in the London and it neighborhood region as a branch manager. Instead of renting, the thought of yours suggest it is better to buy a properties as it not only serve as your shelter but also as a future investment.
As you have a certain budget and terms that you need to fulfill, you decide to obtain some information about the properties in London before calling the property agent.
What are the average property area you can get in London?
prop %>%
filter(City.County == "London") %>%
group_by(City.County) %>%
summarise(Average.Sq.Feet = round(mean(Area.in.sq.ft))) %>%
arrange(Average.Sq.Feet) %>%
head(5)
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 1 x 2
## City.County Average.Sq.Feet
## <fct> <dbl>
## 1 London 1564
If you are planning to get a property in London, you will have a relatively big space for your property as the average area of property in London is 1564 square feet.
Properties type in London
prop %>%
filter(City.County == "London") %>%
group_by(House.Type) %>%
summarise(Average.Price = round(mean(Price)),
Average.Sq.Feet = round(mean(Area.in.sq.ft))) %>%
arrange(-Average.Price)
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 7 x 3
## House.Type Average.Price Average.Sq.Feet
## <fct> <dbl> <dbl>
## 1 Penthouse 3152420 1814
## 2 House 2552856 2354
## 3 New development 2341259 1394
## 4 Mews 1400000 1280
## 5 Flat / Apartment 1277463 1013
## 6 Duplex 910833 1277
## 7 Studio 357500 399
Suppose you are planning to get a property, but still have not sure what kind of property you wanted. these data frame may useful as an insight for you to decide.
There are 7 type of properties in London. with a Studio have the cheapest and smallest as the average price of studio properties in London is 357,000 Pounds with the average area you will owned is 398.8 Square Feet. This type of properties may be suitable for a single person or budget oriented person.
If you are married, and looking for a place with more space, a place like House kind of properties are suitable for you. it may be more expensive than any other kind of properties, but it sure have a more space for your family to live.
Number properties listed on sale in the London
listed_prop <- prop %>%
filter(City.County == "London") %>%
group_by(House.Type) %>%
summarise(Listed = n()) %>%
arrange(-Listed)
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 7 x 2
## House.Type Listed
## <fct> <int>
## 1 Flat / Apartment 1483
## 2 House 1083
## 3 New development 291
## 4 Penthouse 97
## 5 Studio 10
## 6 Duplex 6
## 7 Mews 2
## [1] 2972
As it shows, in total there are 2972 properties in London that listed on sale. Currently the most common of properties that listed in the market have is the type of Flat/Apartment while the rarest one is thetype of Mews properties.
What will we got if we buy a property in London?
prop %>%
filter(City.County == "London") %>%
group_by(House.Type) %>%
summarise(Bedrooms = round(mean(No..of.Bedrooms)),
Bathrooms = round(mean(No..of.Bathrooms)),
Receptions = round(mean(No..of.Receptions)),
Average.Price = round(mean(Price)),
Average.Sq.ft = round(mean(Area.in.sq.ft))) %>%
arrange(-Average.Price)
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 7 x 6
## House.Type Bedrooms Bathrooms Receptions Average.Price Average.Sq.ft
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Penthouse 3 3 3 3152420 1814
## 2 House 4 4 4 2552856 2354
## 3 New development 2 2 2 2341259 1394
## 4 Mews 3 3 3 1400000 1280
## 5 Flat / Apartment 2 2 2 1277463 1013
## 6 Duplex 3 3 3 910833 1277
## 7 Studio 0 0 0 357500 399
Let say that you already decide to settle for a property in London, still you have not figure it out which type of property that you will obtained? Aside from the price and the area, what are the major difference you can get from each of the properties?
Based on the data above one of the major difference of property types in London is the number of rooms that you can get from each of the properties. The area of property is directly proportional with the more room you can obtained.
from the data frame it can be seen how the property such as house in London have an average 4 Bedrooms, Bathrooms, and Receptions which is possible because house in London have an average area with 2354 square feet.
Conclusion: We have a winner!
From the information that you have got, there are several parameter that can affect the price of the properties in London. Now you will make a decision for what kind of properties that suitable for you.
As a single man, you do not need bigger space for you to live, still you have a requirement for area to not less than 1,000 square feet because you have some big furniture. You also have a maximum budget no more than 1,000,000 Pounds.
by filtering our data, we will list several possible name to be compared.
prop %>%
filter(City.County == "London") %>%
filter(No..of.Bedrooms == 1) %>%
filter(No..of.Bathrooms == 1) %>%
filter(No..of.Receptions == 1) %>%
filter(Area.in.sq.ft >= 1000) %>%
filter(Price <= 1000000)
## Property.Name Price House.Type Area.in.sq.ft No..of.Bedrooms
## 1 Melody Road 600000 Flat / Apartment 1024 1
## 2 Buckingham Road 699950 House 1255 1
## 3 Addison Gardens 775000 Flat / Apartment 1173 1
## 4 Church Street 765000 Flat / Apartment 1122 1
## No..of.Bathrooms No..of.Receptions Location City.County Postal.Code
## 1 1 1 Unidentified London SW18 2QF
## 2 1 1 Islington London N1 4JA
## 3 1 1 Unidentified London W14 0DS
## 4 1 1 St. John's Wood London NW8 8EP
After we applied some condition for our data set, we got four list of property around London that match our criteria. All of the properties have 1 room bedroom, bathroom, and reception. The area of the properties exceed 1,000 square feet. This will make us convenient to put the furniture that takes up big space. Finally, the price of all properties are still in our budget.
For you who will have a big furniture that takes up your space and still in budget, properties with the largest area but have a fair price will be suitable for you.
Based on our data, the largest property but with a fair share of price (Lower tier) are going to be a house type property named Buckingham Road in Islington, London. This place only have 3 rooms consist of 1 Bedroom, 1 Bathroom, and 1 Reception and have an area around 1255 square feet.