Introduction

As the Largest city of England and UK, properties are an integral aspect for the people. It is common knowledge that the price of the property around London is rather high than any other city in England. The property price itself affected by several parameters such as the number of room, land and building area, location, and the type of property.

With Housing Prices in London data set downloaded from kaggle, We will try to do Exploratory Data Analysis to see any information that we can get from the data set.

Attaching Necessary Library

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Data Input

prop <- read.csv("archive/London.csv")

Data Inspection

These are an overview of our data frame

# Check our data frame with head

head(prop)
##   X       Property.Name   Price       House.Type Area.in.sq.ft No..of.Bedrooms
## 1 0         Queens Road 1675000            House          2716               5
## 2 1       Seward Street  650000 Flat / Apartment           814               2
## 3 2         Hotham Road  735000 Flat / Apartment           761               2
## 4 3        Festing Road 1765000            House          1986               4
## 5 4        Spencer Walk  675000 Flat / Apartment           700               2
## 6 5 Craven Hill Gardens  420000 Flat / Apartment           403               1
##   No..of.Bathrooms No..of.Receptions    Location City.County Postal.Code
## 1                5                 5   Wimbledon      London    SW19 8NY
## 2                2                 2 Clerkenwell      London    EC1V 3PA
## 3                2                 2      Putney      London    SW15 1QL
## 4                4                 4      Putney      London    SW15 1LP
## 5                2                 2      Putney      London    SW15 1PL
## 6                1                 1                  London      W2 3EA

Next, we will see the dimension of our data frame. with dim() function we will get the amount of row and column of our dataframe/

# Checking our data frame dimension

dim(prop)
## [1] 3480   11

The information above shown us that our data frame have 11 columns and 3480 number of rows or observation contained.

Each column contain different informations. with names() function we will see name of the column so we will get a big picture what are the information contained in it and is it already have the correct class or data type.

# Checking our column names

names(prop)
##  [1] "X"                 "Property.Name"     "Price"            
##  [4] "House.Type"        "Area.in.sq.ft"     "No..of.Bedrooms"  
##  [7] "No..of.Bathrooms"  "No..of.Receptions" "Location"         
## [10] "City.County"       "Postal.Code"

Based on the column name, the information contained in each column can be described as below:

x : row index Property.Name : Name of the property in the listings Price : Price of the property in Pounds (£) House.Type : Type of property Area.in.sq.ft : Area of the property No..of.Bedrooms : Number of Bedrooms No..of.Bathrooms : Number of Bathrooms No..of.Receptions : Number of Receptions Location : Location City.County : City / County of the property Postal.Code : Postal Code

source: https://www.kaggle.com/arnavkulkarni/housing-prices-in-london

Data Wrangling and Transformation

another way to see the column name is by using str() function. str() will show us the structure of our data frame including the class or data type of our column.

# Check the structure of our data frame

str(prop)
## 'data.frame':    3480 obs. of  11 variables:
##  $ X                : int  0 1 2 3 4 5 6 7 8 9 ...
##  $ Property.Name    : chr  "Queens Road" "Seward Street" "Hotham Road" "Festing Road" ...
##  $ Price            : int  1675000 650000 735000 1765000 675000 420000 1475000 650000 2500000 925000 ...
##  $ House.Type       : chr  "House" "Flat / Apartment" "Flat / Apartment" "House" ...
##  $ Area.in.sq.ft    : int  2716 814 761 1986 700 403 1548 560 1308 646 ...
##  $ No..of.Bedrooms  : int  5 2 2 4 2 1 4 1 3 2 ...
##  $ No..of.Bathrooms : int  5 2 2 4 2 1 4 1 3 2 ...
##  $ No..of.Receptions: int  5 2 2 4 2 1 4 1 3 2 ...
##  $ Location         : chr  "Wimbledon" "Clerkenwell" "Putney" "Putney" ...
##  $ City.County      : chr  "London" "London" "London" "London" ...
##  $ Postal.Code      : chr  "SW19 8NY" "EC1V 3PA" "SW15 1QL" "SW15 1LP" ...

Based on the information above, our chr class does not correctly classified as a chr class as some of it is a repeated value. We will check the levels of any chr class that can be grouped into a categorical class.

levels(as.factor(prop$House.Type))
## [1] "Bungalow"         "Duplex"           "Flat / Apartment" "House"           
## [5] "Mews"             "New development"  "Penthouse"        "Studio"
levels(as.factor(prop$City.County))
##  [1] "110 Battersea Park Road"  "27 Carlton Drive"        
##  [3] "311 Goldhawk Road"        "4 Circus Road West"      
##  [5] "52 Holloway Road"         "6 Deal Street"           
##  [7] "82-88 Fulham High Street" "Battersea"               
##  [9] "Blackheath"               "Bushey"                  
## [11] "Chelsea"                  "Chessington"             
## [13] "City Of London"           "Clapton"                 
## [15] "Clerkenwell"              "De Beauvoir"             
## [17] "Deptford"                 "Downs Road"              
## [19] "E5 8DE"                   "Ealing"                  
## [21] "Essex"                    "Fitzrovia"               
## [23] "Fulham"                   "Fulham High Street"      
## [25] "Greenford"                "Hertfordshire"           
## [27] "Holland Park"             "Hornchurch"              
## [29] "Kensington"               "Kent"                    
## [31] "Lambourne End"            "Lillie Square"           
## [33] "Little Venice"            "London"                  
## [35] "London1500"               "Marylebone"              
## [37] "Middlesex"                "Middx"                   
## [39] "N1 6FU"                   "N7 6QX"                  
## [41] "Northwood"                "Oxshott"                 
## [43] "Queens Park"              "Richmond"                
## [45] "Richmond Hill"            "Romford"                 
## [47] "Spitalfields"             "Surrey"                  
## [49] "Surrey Quays"             "Thames Ditton"           
## [51] "The Metal Works"          "Thurleigh Road"          
## [53] "Twickenham"               "Wandsworth"              
## [55] "Watford"                  "Wimbledon"               
## [57] "Wornington Road"

As from the result, it can be conclude that House.Type and City.County have a repeated value and can be categorized as a fctr class. We also need to drop any column that will not give us any information such as x as it only contain the row index of our data.

# Transform data type

prop$House.Type <- as.factor(prop$House.Type)

prop$City.County <- as.factor(prop$City.County)

prop$Location <- replace(prop$Location, prop$Location == "", "Unidentified")

prop <- prop[c(2:11)]
# Recheck our data frame structure

str(prop)
## 'data.frame':    3480 obs. of  10 variables:
##  $ Property.Name    : chr  "Queens Road" "Seward Street" "Hotham Road" "Festing Road" ...
##  $ Price            : int  1675000 650000 735000 1765000 675000 420000 1475000 650000 2500000 925000 ...
##  $ House.Type       : Factor w/ 8 levels "Bungalow","Duplex",..: 4 3 3 4 3 3 4 6 4 3 ...
##  $ Area.in.sq.ft    : int  2716 814 761 1986 700 403 1548 560 1308 646 ...
##  $ No..of.Bedrooms  : int  5 2 2 4 2 1 4 1 3 2 ...
##  $ No..of.Bathrooms : int  5 2 2 4 2 1 4 1 3 2 ...
##  $ No..of.Receptions: int  5 2 2 4 2 1 4 1 3 2 ...
##  $ Location         : chr  "Wimbledon" "Clerkenwell" "Putney" "Putney" ...
##  $ City.County      : Factor w/ 57 levels "110 Battersea Park Road",..: 34 34 34 34 34 34 34 34 34 34 ...
##  $ Postal.Code      : chr  "SW19 8NY" "EC1V 3PA" "SW15 1QL" "SW15 1LP" ...

base on the structure, our data frame now have a more suitable data type for each of their column. These will ease us if we are going to do data aggregation to extract information from the data frame.

Another important thing to do for our data frame is to check any missing values. we will try to use anyNA() to check the availability of missing values, then colsums(is.na()) will help us to detect the column that contain the missing value in our data frame.

# Check missing value availability

anyNA(prop)
## [1] FALSE
# Check missing value location

colSums(is.na(prop))
##     Property.Name             Price        House.Type     Area.in.sq.ft 
##                 0                 0                 0                 0 
##   No..of.Bedrooms  No..of.Bathrooms No..of.Receptions          Location 
##                 0                 0                 0                 0 
##       City.County       Postal.Code 
##                 0                 0

Based on these information, there are no missing value in our data set, which mean we did not need to transform or omit any missing value in our data.

If we intended to do data aggregation we need to replace this missing values with specific values.

Suppose we wanted to replace missing values, we will use replace_na() function from tidyr library.

Data Explanation

with summary() function we will try to summarize brief explanation from our data set

# Check data set summary

summary(prop)
##  Property.Name          Price                     House.Type   Area.in.sq.ft  
##  Length:3480        Min.   :  180000   Flat / Apartment:1565   Min.   :  274  
##  Class :character   1st Qu.:  750000   House           :1430   1st Qu.:  834  
##  Mode  :character   Median : 1220000   New development : 357   Median : 1310  
##                     Mean   : 1864173   Penthouse       : 100   Mean   : 1713  
##                     3rd Qu.: 2150000   Studio          :  10   3rd Qu.: 2157  
##                     Max.   :39750000   Bungalow        :   9   Max.   :15405  
##                                        (Other)         :   9                  
##  No..of.Bedrooms  No..of.Bathrooms No..of.Receptions   Location        
##  Min.   : 0.000   Min.   : 0.000   Min.   : 0.000    Length:3480       
##  1st Qu.: 2.000   1st Qu.: 2.000   1st Qu.: 2.000    Class :character  
##  Median : 3.000   Median : 3.000   Median : 3.000    Mode  :character  
##  Mean   : 3.104   Mean   : 3.104   Mean   : 3.104                      
##  3rd Qu.: 4.000   3rd Qu.: 4.000   3rd Qu.: 4.000                      
##  Max.   :10.000   Max.   :10.000   Max.   :10.000                      
##                                                                        
##         City.County   Postal.Code       
##  London       :2972   Length:3480       
##  Surrey       : 262   Class :character  
##  Middlesex    :  78   Mode  :character  
##  Essex        :  62                     
##  Twickenham   :  12                     
##  Hertfordshire:   9                     
##  (Other)      :  85

Summary:

  1. London have the highest amount of property than any other region.
  2. The Priciest Price of the property around London including the neighborhood region is 39.750.000 Pounds, while the cheapest Price of the property is 180.000 Pounds.
  3. There are 8 types of property in the region.
  4. The largest property have an area of 15.405 square feet, while the smallest property have an area is 274 square feet. 5.The maximum capacity of Bedrooms, Bathroom, and receptions of all properties around the region are 10 for each rooms.

with summary() function, it surely ease people to gain brief information of our data frame. Another way to get information is by conducting data aggregation.

Data Aggregation

Data aggregation will let you aggregate your data into a set of data frame that contained only information that you need. By aggregating the data it will filter the necessary information that you wish to present. Aggregating data will certainly help you especially when creating any visualization chart that will ease your interpretability of your aggregated data frame.

Although we already have several information. We will try to find other kind information that can be obtained using Data Aggregation using dplyr library.

Top 5 Priciest Property

prop %>%
  select(Property.Name, Price, City.County) %>%
  arrange(-Price) %>%
  head(5)
##           Property.Name    Price City.County
## 1 No.1 Grosvenor Square 39750000      London
## 2         Cadogan Place 34000000      London
## 3      Hamilton Terrace 25000000      London
## 4            Park Place 25000000      London
## 5           Ikins House 23950000      London

The priciest Property is property named No.1 Grosvenor Square in London. The price of the property is 39,750,000 Pounds

Top 5 Cheapest Property

prop %>%
  select(Property.Name, Price, City.County) %>%
  arrange(Price) %>%
  head(5)
##     Property.Name  Price City.County
## 1 Park View Court 180000      London
## 2 Kersfield House 210000      London
## 3   Pullman Court 249999      London
## 4  Tottenham Road 255000      London
## 5     Mare Street 260000      London

The cheapest Property is property named Park View Court in London. The price of the property is 180,000 Pounds

Where are the most affordable Location if you planning to buy a property?

prop %>%
  group_by(City.County) %>%
  summarise(Average.Price = mean(Price)) %>%
  arrange(Average.Price) %>%
  head(5)
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 5 x 2
##   City.County  Average.Price
##   <fct>                <dbl>
## 1 Chessington         375000
## 2 Blackheath          399995
## 3 Deptford            400000
## 4 Surrey Quays        485000
## 5 Clapton             490000

If you are planning to buy a property with a tight budget on your account. The area of Chessington or Blackheath can be considered as the average price of properties in those area have not exceed 400,000 Pounds.

Top 5 biggest property

prop %>%
  select(Property.Name, Area.in.sq.ft, City.County) %>%
  arrange(-Area.in.sq.ft) %>%
  head(5)
##      Property.Name Area.in.sq.ft City.County
## 1      Ikins House         15405      London
## 2    Ingram Avenue         14358      London
## 3   Birdshill Road         12546      Surrey
## 4  Birds Hill Road         12526      Surrey
## 5 Hamilton Terrace         12435      London

The biggest Property is property named Ikins House in London with 15405 square feet area.

Top 5 smallest property

prop %>%
  select(Property.Name, Area.in.sq.ft, City.County) %>%
  arrange(Area.in.sq.ft) %>%
  head(5)
##      Property.Name Area.in.sq.ft City.County
## 1  Cadogan Terrace           274      London
## 2   Cautley Avenue           277      London
## 3     Erskine Road           292      London
## 4 Eardley Crescent           297      London
## 5     Foskett Road           302      London

The smallest Property is property named Cadogan Terrace in London with 274 square feet area.

Case:

You have just been transferred by your office to work in the London and it neighborhood region as a branch manager. Instead of renting, the thought of yours suggest it is better to buy a properties as it not only serve as your shelter but also as a future investment.

As you have a certain budget and terms that you need to fulfill, you decide to obtain some information about the properties in London before calling the property agent.

What are the average property area you can get in London?

prop %>%
  filter(City.County == "London") %>%
  group_by(City.County) %>% 
  summarise(Average.Sq.Feet = round(mean(Area.in.sq.ft))) %>%
  arrange(Average.Sq.Feet) %>%
  head(5)  
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 1 x 2
##   City.County Average.Sq.Feet
##   <fct>                 <dbl>
## 1 London                 1564

If you are planning to get a property in London, you will have a relatively big space for your property as the average area of property in London is 1564 square feet.

Properties type in London

prop %>%
  filter(City.County == "London") %>%
  group_by(House.Type) %>%
  summarise(Average.Price = round(mean(Price)),
            Average.Sq.Feet = round(mean(Area.in.sq.ft))) %>%
  arrange(-Average.Price)
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 7 x 3
##   House.Type       Average.Price Average.Sq.Feet
##   <fct>                    <dbl>           <dbl>
## 1 Penthouse              3152420            1814
## 2 House                  2552856            2354
## 3 New development        2341259            1394
## 4 Mews                   1400000            1280
## 5 Flat / Apartment       1277463            1013
## 6 Duplex                  910833            1277
## 7 Studio                  357500             399

Suppose you are planning to get a property, but still have not sure what kind of property you wanted. these data frame may useful as an insight for you to decide.

There are 7 type of properties in London. with a Studio have the cheapest and smallest as the average price of studio properties in London is 357,000 Pounds with the average area you will owned is 398.8 Square Feet. This type of properties may be suitable for a single person or budget oriented person.

If you are married, and looking for a place with more space, a place like House kind of properties are suitable for you. it may be more expensive than any other kind of properties, but it sure have a more space for your family to live.

Number properties listed on sale in the London

listed_prop <- prop %>%
  filter(City.County == "London") %>%
  group_by(House.Type) %>%
  summarise(Listed = n()) %>%
  arrange(-Listed)
## `summarise()` ungrouping output (override with `.groups` argument)
listed_prop
## # A tibble: 7 x 2
##   House.Type       Listed
##   <fct>             <int>
## 1 Flat / Apartment   1483
## 2 House              1083
## 3 New development     291
## 4 Penthouse            97
## 5 Studio               10
## 6 Duplex                6
## 7 Mews                  2
sum(listed_prop$Listed)
## [1] 2972

As it shows, in total there are 2972 properties in London that listed on sale. Currently the most common of properties that listed in the market have is the type of Flat/Apartment while the rarest one is thetype of Mews properties.

What will we got if we buy a property in London?

prop %>%
  filter(City.County == "London") %>%
  group_by(House.Type) %>%
  summarise(Bedrooms = round(mean(No..of.Bedrooms)),
            Bathrooms = round(mean(No..of.Bathrooms)),
            Receptions = round(mean(No..of.Receptions)),
            Average.Price = round(mean(Price)),
            Average.Sq.ft = round(mean(Area.in.sq.ft))) %>%
  arrange(-Average.Price)
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 7 x 6
##   House.Type       Bedrooms Bathrooms Receptions Average.Price Average.Sq.ft
##   <fct>               <dbl>     <dbl>      <dbl>         <dbl>         <dbl>
## 1 Penthouse               3         3          3       3152420          1814
## 2 House                   4         4          4       2552856          2354
## 3 New development         2         2          2       2341259          1394
## 4 Mews                    3         3          3       1400000          1280
## 5 Flat / Apartment        2         2          2       1277463          1013
## 6 Duplex                  3         3          3        910833          1277
## 7 Studio                  0         0          0        357500           399

Let say that you already decide to settle for a property in London, still you have not figure it out which type of property that you will obtained? Aside from the price and the area, what are the major difference you can get from each of the properties?

Based on the data above one of the major difference of property types in London is the number of rooms that you can get from each of the properties. The area of property is directly proportional with the more room you can obtained.

from the data frame it can be seen how the property such as house in London have an average 4 Bedrooms, Bathrooms, and Receptions which is possible because house in London have an average area with 2354 square feet.

Conclusion: We have a winner!

From the information that you have got, there are several parameter that can affect the price of the properties in London. Now you will make a decision for what kind of properties that suitable for you.

As a single man, you do not need bigger space for you to live, still you have a requirement for area to not less than 1,000 square feet because you have some big furniture. You also have a maximum budget no more than 1,000,000 Pounds.

by filtering our data, we will list several possible name to be compared.

prop %>%
  filter(City.County == "London") %>%
  filter(No..of.Bedrooms == 1) %>%
  filter(No..of.Bathrooms == 1) %>%
  filter(No..of.Receptions == 1) %>%
  filter(Area.in.sq.ft >= 1000) %>%
  filter(Price <= 1000000)
##     Property.Name  Price       House.Type Area.in.sq.ft No..of.Bedrooms
## 1     Melody Road 600000 Flat / Apartment          1024               1
## 2 Buckingham Road 699950            House          1255               1
## 3 Addison Gardens 775000 Flat / Apartment          1173               1
## 4   Church Street 765000 Flat / Apartment          1122               1
##   No..of.Bathrooms No..of.Receptions        Location City.County Postal.Code
## 1                1                 1    Unidentified      London    SW18 2QF
## 2                1                 1       Islington      London      N1 4JA
## 3                1                 1    Unidentified      London     W14 0DS
## 4                1                 1 St. John's Wood      London     NW8 8EP

After we applied some condition for our data set, we got four list of property around London that match our criteria. All of the properties have 1 room bedroom, bathroom, and reception. The area of the properties exceed 1,000 square feet. This will make us convenient to put the furniture that takes up big space. Finally, the price of all properties are still in our budget.

For you who will have a big furniture that takes up your space and still in budget, properties with the largest area but have a fair price will be suitable for you.

Based on our data, the largest property but with a fair share of price (Lower tier) are going to be a house type property named Buckingham Road in Islington, London. This place only have 3 rooms consist of 1 Bedroom, 1 Bathroom, and 1 Reception and have an area around 1255 square feet.