The data set that I am using for Assignment 2 details property sale price in England and Wales (2013 - 2023). The raw data can be found on Kaggle as ‘UK Property Price Data 1995-2023-04’; https://www.kaggle.com/datasets/.
Property Pricing Analysis - England & Wales (2013-2023): This visualization provides insight into the skewness of the average property price across England and Wales. It also identifies the most expensive counties in both countries. It is worth noting that all ten of the most expensive counties reside in England.
Average Home Sale Value by Year (2013-2023): This visualization shows the average property sale value from 2013 to 2023 for both England and Wales. English properties tend to be more expensive than their Welsh counterparts. Both countries have experienced significant increases in property sale price during the 10 year period.
Property Sales: Lease or Own (2013-2023): The next
visualization explores the ‘Duration’ of the property sale. In
otherwords, are the buyers of properties freeholding/owning or
leaseholding.
Freeholding: Buyers own both the property and the land
indefinitely.
Leaseholding: Buyers own the property for a fixed
period of time but do not own the land its built on. As a leaseholder,
you are effectively renting from the freeholder of the property.
Interestingly, England had a higher percentage of leaseholding
property sales over the last 10 years.
Average Property Value by Type and Year (2013-2023):
This visualization explores the average property sale value by property
type during the 2013 - 2023 time frame. You’ll see that the three
‘House’ categories experienced an increase in average property sale
value while ‘Flats’ seemed to remain relatively constant.
Total Sales by Property Type and Sale Type (2013-2023): To get a better idea of what property types are the most popular and how they are typically owned, I put together the following visualization. Most notably, almost all ‘Flat’ purchases in the last 10 years have been in a leasehold while almost all ‘House’ purchases have been a freehold arrangement.
Sale Price Heatmap - England (2013-2023): This visualization explores the average property sale price by year and month. Average English property sale prices peaked in June of 2021. There appears to be evidence that property sale prices are at there highest during the June - Sept timeframe.
Sale Price Heatmap - Wales (2013-2023): This visualization explores the average property sale price by year and month. Average Welsh property sale prices peaked in September of 2022. There appears to be evidence that property sale prices are at there highest during the July - Oct timeframe.
import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH'] = 'c:/users/615486/Anaconda3/Library/plugins/platforms'
import wget
import os
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import numpy as np
import warnings
import itertools
from matplotlib.ticker import FuncFormatter
URL = "https://www.kaggle.com/datasets/willianoliveiragibin/uk-property-price-data-1995-2023-04/download?datasetVersionNumber=1"
path = "C:/Users/615486/Documents/Personal/Loyola/Data Visualization for Decision Making/Python/"
wget.download(URL, path + 'UKHomePricing1995-2023.csv')
# Overarching data prep
filename = 'UKHomePricing1995-2023.csv'
df_UK = pd.read_csv(path + filename)
df_UK = df_UK
df_UK.columns = ['Transaction_Num', 'Price', 'Date of Transfer','Postcode', 'Property Type',
'Old/New', 'Duration', 'PAON', 'SAON', 'Street', 'Locality', 'Town/City',
'District', 'County', 'PPDCategory_Type', 'Record_Status']
df_UK['Date of Transfer'] = pd.to_datetime(df_UK['Date of Transfer'], format= '%Y-%m-%d %H:%M')
df_UK['Sale Year'] = df_UK['Date of Transfer'].dt.year
df_UK['Sale Month'] = df_UK['Date of Transfer'].dt.strftime('%b')
df_UK = df_UK[df_UK['Sale Year'] > 2012]
# bar graph data prep
df_bar = df_UK.copy()
df_bar = df_bar.groupby(['County']).agg({'County':['count'], 'Price':['sum', 'mean']}).reset_index()
df_bar.columns = ['County','Count of Sales', 'Sum of Sales', 'Avg Sales Price']
df_bar = df_bar.sort_values('Avg Sales Price', ascending=False)
df_bar.reset_index(inplace=True, drop=True)
#bar graph creation
def pick_colors_according_to_mean_price(this_data):
colors=[]
avg = df_UK['Price'].mean()
for each in this_data['Avg Sales Price']:
if each > avg*1.15:
colors.append('tomato') # if at least 15% greater than mean
elif each < avg*0.85:
colors.append('lightblue') # if at least 15% less than mean
else:
colors.append('gray') # if within 15% of mean
return colors
df_bar1 = df_bar[0:9].copy() # only top 10 (already ordered above)
df_bar2 = df_bar.copy() # all data
fig = plt.figure(figsize=(18, 16))
ax1 = fig.add_subplot(2, 1, 1)
ax2 = fig.add_subplot(2, 1, 2)
plt.style.use('tableau-colorblind10')
above = mpatches.Patch(color='tomato', label = 'Above UK Average')
at = mpatches.Patch(color='gray', label = 'Within 15% of UK Average')
below = mpatches.Patch(color='lightblue', label = 'Below UK Average')
my_colors = pick_colors_according_to_mean_price(df_bar1)
my_colors2 = pick_colors_according_to_mean_price(df_bar2)
fig.suptitle('Home Pricing Analysis in England/Wales (2013-2023)', fontsize=18, fontweight='bold')
ax1.bar(df_bar1['County'], df_bar1['Avg Sales Price'], color=my_colors)
ax1.legend(handles=[above, at, below], fontsize=14)
ax1.spines['right'].set_visible(False)
ax1.spines['top'].set_visible(False)
ax1.set_title("Top 10 Counties", size=20)
ax2.bar(df_bar2['County'], df_bar2['Avg Sales Price'], color=my_colors2)
ax2.legend(handles=[above, at, below], fontsize=14)
ax2.spines['right'].set_visible(False)
ax2.spines['top'].set_visible(False)
ax2.axes.xaxis.set_visible(False)
ax2.set_title("All Counties", size=20)
plt.show()
Wales = ['BLAENAU GWENT', 'BRIDGEND', 'CAERPHILLY', 'CARDIFF', 'CARMARTHENSHIRE',
'CEREDIGION', 'CONWY', 'DENBIGHSHIRE', 'FLINTSHIRE', 'GWYNEDD', 'ISLE OF ANGLESEY',
'MERTHYR TYDFIL', 'MONMOUTHSHIRE', 'NEATH PORT TALBOT', 'PEMBROKESHIRE',
'POWYS', 'RHONDDA CYNON TAFF', 'SWANSEA', 'THE VALE OF GLAMORGAN',
'TORFAEN', 'WREXHAM']
df_UK['Country'] = np.where(df_UK['County'].isin(Wales), 'Wales', 'England')
line_df = df_UK.copy()
line_df = line_df.groupby(['Sale Year', 'Country'])['Price'].mean().reset_index(name='Avg Home Sale Price')
fig = plt.figure(figsize = (18,10))
ax = fig.add_subplot(1, 1, 1)
my_color_line = {'England':'tomato', 'Wales':'black'}
plt.style.use('tableau-colorblind10')
for key, grp in line_df.groupby(['Country']):
grp.plot(ax=ax, kind='line', x='Sale Year', y='Avg Home Sale Price', color=my_color_line,
label = key, marker='8')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.set_title("Average Home Sale Value by Year (2013-2023)", size=20, fontweight='bold')
ax.set_xlabel('Year', fontsize=16, labelpad = 10)
ax.set_ylabel('Sale Price ($)', fontsize=16)
ax.yaxis.set_major_formatter(FuncFormatter(lambda x, p: format(int(x), ',')))
plt.show()
Duration_Dict = {'L':'Leasehold', 'F':'Freehold'}
df_UK['Duration'] = df_UK['Duration'].replace(Duration_Dict)
eng_pie_df = df_UK.copy()
eng_pie_df = eng_pie_df[eng_pie_df['Property Type'] != 'Other']
eng_pie_df = eng_pie_df[eng_pie_df['Country'] != 'Wales']
eng_pie_df = eng_pie_df.groupby(['Duration'])['Duration'].count().reset_index(name='Count')
wal_pie_df = df_UK.copy()
wal_pie_df = wal_pie_df[wal_pie_df['Property Type'] != 'Other']
wal_pie_df = wal_pie_df[wal_pie_df['Country'] != 'England']
wal_pie_df = wal_pie_df.groupby(['Duration'])['Duration'].count().reset_index(name='Count')
fig = plt.figure(figsize=(20, 20))
ax1 = fig.add_subplot(2, 1, 1)
ax2 = fig.add_subplot(2, 1, 2)
total_count_eng = eng_pie_df['Count'].sum()
total_count_wal = wal_pie_df['Count'].sum()
def make_autopct(values):
def my_autopct(pct):
return '{p:.1f}%'.format(p=pct)
return my_autopct
eng_pie_df['Count'].plot(kind='pie', ax=ax1, subplots=True, autopct = make_autopct(eng_pie_df['Count']),
radius=1, labels = eng_pie_df['Duration'],
labeldistance=1.1, wedgeprops = dict(edgecolor='white'),
textprops = {'fontsize':16}, startangle = 90)
## array([<Axes: ylabel='Count'>], dtype=object)
wal_pie_df['Count'].plot(kind='pie', ax=ax2, subplots=True, autopct = make_autopct(wal_pie_df['Count']),
radius=1, labels = wal_pie_df['Duration'],
labeldistance=1.1, wedgeprops = dict(edgecolor='white'),
textprops = {'fontsize':16}, startangle = 90)
## array([<Axes: ylabel='Count'>], dtype=object)
ax1.yaxis.set_visible(False)
ax2.yaxis.set_visible(False)
fig.suptitle('Property Sales: Lease or Own (2013-2023)', fontsize=20, fontweight='bold')
ax1.set_title('Property Sales: England', size=18)
ax2.set_title('Property Sales: Wales', size=18)
plt.show()
def tableauColors(mydata):
colors = []
for each in mydata['Property Type']:
if each == 'Detached House':
colors.append('#006BA4')
elif each == 'Flat':
colors.append('#FF800E')
elif each == 'Semi-Detached House':
colors.append('#ABABAB')
elif each == 'Terraced House':
colors.append('#595959')
else:
colors.append('#A2C8EC')
return colors
Property_Type_Dict = {'T':'Terraced House', 'S':'Semi-Detached House', 'D':'Detached House', 'F':'Flat', 'O':'Other'}
Duration_Dict = {'L':'Leasehold', 'F':'Freehold'}
df_UK['Property Type'] = df_UK['Property Type'].replace(Property_Type_Dict)
df_UK['Duration'] = df_UK['Duration'].replace(Duration_Dict)
stacked_df = df_UK.copy()
stacked_df = stacked_df[stacked_df['Property Type'] != 'Other']
stacked_df = stacked_df.groupby(['Sale Year', 'Property Type'])['Price'].mean().reset_index(name='Average Sale Price')
stacked_df = stacked_df.pivot(index='Sale Year', columns='Property Type', values='Average Sale Price')
fig = plt.figure(figsize=(18, 10))
ax1 = fig.add_subplot(1, 1, 1)
plt.style.use('tableau-colorblind10')
stacked_df.plot(kind='bar', stacked=True, ax=ax1)
ax1.spines['right'].set_visible(False)
ax1.spines['top'].set_visible(False)
ax1.set_title("Average Home Sale Value by Year by Property Type (2013-2023)", size=20, fontweight='bold')
ax1.set_xlabel('Year', fontsize=16, labelpad = 10)
ax1.set_ylabel('Sale Price ($)', fontsize=16)
ax1.yaxis.set_major_formatter(FuncFormatter(lambda x, p: format(int(x), ',')))
handles, labels = ax1.get_legend_handles_labels()
ax1.legend(reversed(handles), reversed(labels), loc='best')
plt.show()
pie_df = df_UK.copy()
pie_df = pie_df[pie_df['Property Type'] != 'Other']
pie_df = pie_df.groupby(['Property Type','Duration'])['Price'].sum().reset_index(name='Total Sales')
pie_df_colors = pie_df.groupby(['Property Type'])['Total Sales'].sum().reset_index()
mycolors = tableauColors(pie_df_colors)
def lighten_color(color, amount=0.75):
import matplotlib.colors as mc
import colorsys
try:
c = mc.cnames[color]
except:
c = color
c = colorsys.rgb_to_hls(*mc.to_rgb(c))
return colorsys.hls_to_rgb(c[0], 1 - amount * (1 - c[1]), c[2])
def more_lighten_color(color, amount=0.45):
import matplotlib.colors as mc
import colorsys
try:
c = mc.cnames[color]
except:
c = color
c = colorsys.rgb_to_hls(*mc.to_rgb(c))
return colorsys.hls_to_rgb(c[0], 1 - amount * (1 - c[1]), c[2])
def color_manipulation(color):
light_color = list(lighten_color(color))
r, g, b = light_color
light_color_fmt = '#%02x%02x%02x' % (int(round(r*255, 0)), int(round(g*255, 0)), int(round(b*255, 0)))
lighter_color = list(more_lighten_color(color))
r, g, b = lighter_color
lighter_color_fmt = '#%02x%02x%02x' % (int(round(r*255, 0)), int(round(g*255, 0)), int(round(b*255, 0)))
return lighter_color_fmt, light_color_fmt
inner_colors = [color_manipulation(color) for color in mycolors]
inner_colors = [item for t in inner_colors for item in t]
fig = plt.figure(figsize=(12, 12))
ax1 = fig.add_subplot(1, 1, 1)
mycolors = tableauColors(pie_df_colors)
total_sales = pie_df['Total Sales'].sum()
pie_df.groupby(['Property Type'])['Total Sales'].sum().plot(
kind='pie', radius=1, colors=mycolors, pctdistance=0.85,
labeldistance=1.1, wedgeprops = dict(edgecolor='white'),
textprops = {'fontsize':14},
autopct = lambda p: '{:.1f}%\n(${:.1f}B)'.format(p,(p/100)*total_sales/1e+9),
startangle = 90)
pie_df.groupby(['Property Type', 'Duration'])['Total Sales'].sum().plot(
kind='pie', radius=0.7, colors=inner_colors, pctdistance=0.65,
labeldistance=0.7, wedgeprops = dict(edgecolor='white'),
textprops = {'fontsize':12}, labels = pie_df['Duration'],
startangle = 90)
hole = plt.Circle((0,0), 0.4, fc='white')
fig1 = plt.gcf()
fig1.gca().add_artist(hole)
ax1.yaxis.set_visible(False)
plt.title('Total Sales by Property Type and Sale Type (2013-2023)', fontsize=18, fontweight='bold')
ax1.axis('equal')
## (-1.0999999302273666, 1.0999988385311117, -1.0999967015718972, 1.099999842931995)
plt.tight_layout()
plt.show()
# Heatmap setup
import seaborn as sns
month_order_dict = {"Jan":1, "Feb":2, "Mar":3, "Apr":4, "May":5, "Jun":6, "Jul":7, "Aug":8, "Sep":9, "Oct":10, "Nov":11, "Dec":12}
df_heatmap_eng = df_UK.copy()
df_heatmap_eng = df_heatmap_eng[df_heatmap_eng['Property Type'] != 'Other']
df_heatmap_eng = df_heatmap_eng[df_heatmap_eng['Country'] != 'Wales']
df_heatmap_eng['Sale Month'] = df_heatmap_eng['Sale Month'].replace(month_order_dict)
df_heatmap_eng = df_heatmap_eng.groupby(['Sale Year', 'Sale Month'])['Price'].mean().reset_index(name='Average Sale Price')
df_heatmap_wal = df_UK.copy()
df_heatmap_wal = df_heatmap_wal[df_heatmap_wal['Property Type'] != 'Other']
df_heatmap_wal = df_heatmap_wal[df_heatmap_wal['Country'] != 'England']
df_heatmap_wal['Sale Month'] = df_heatmap_wal['Sale Month'].replace(month_order_dict)
df_heatmap_wal = df_heatmap_wal.groupby(['Sale Year', 'Sale Month'])['Price'].mean().reset_index(name='Average Sale Price')
hm_df_eng = pd.pivot_table(df_heatmap_eng, index='Sale Year', columns='Sale Month', values='Average Sale Price')
hm_df_wal = pd.pivot_table(df_heatmap_wal, index='Sale Year', columns='Sale Month', values='Average Sale Price')
# English Heatmap
fig = plt.figure(figsize=(18, 10))
ax = fig.add_subplot(1, 1, 1)
comma_fmt = FuncFormatter(lambda x, p: format(int(x), ','))
ax = sns.heatmap(hm_df_eng, linewidth = 0.2, annot = True, cmap = 'coolwarm',
fmt = ',.0f', square = True, annot_kws={'size': 11},
cbar_kws = {'format': comma_fmt, 'orientation':'vertical'})
plt.title('Heatmap of Average Sale Price by Year and Month in England', fontsize=18, fontweight='bold')
plt.xlabel('Sale Month', fontsize=18, labelpad=10)
plt.ylabel('Sale Year', fontsize=18, labelpad=10)
plt.yticks(rotation=0, size=14)
## (array([ 0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 10.5]), [Text(0, 0.5, '2013'), Text(0, 1.5, '2014'), Text(0, 2.5, '2015'), Text(0, 3.5, '2016'), Text(0, 4.5, '2017'), Text(0, 5.5, '2018'), Text(0, 6.5, '2019'), Text(0, 7.5, '2020'), Text(0, 8.5, '2021'), Text(0, 9.5, '2022'), Text(0, 10.5, '2023')])
plt.xticks(size=14)
## (array([ 0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 10.5,
## 11.5]), [Text(0.5, 0, '1'), Text(1.5, 0, '2'), Text(2.5, 0, '3'), Text(3.5, 0, '4'), Text(4.5, 0, '5'), Text(5.5, 0, '6'), Text(6.5, 0, '7'), Text(7.5, 0, '8'), Text(8.5, 0, '9'), Text(9.5, 0, '10'), Text(10.5, 0, '11'), Text(11.5, 0, '12')])
ax.invert_yaxis()
cbar = ax.collections[0].colorbar
cbar.set_label('Sale Price', rotation = 270, fontsize=14, color='black', labelpad=20)
plt.show()
# Wales Heatmap
fig = plt.figure(figsize=(18, 10))
ax = fig.add_subplot(1, 1, 1)
comma_fmt = FuncFormatter(lambda x, p: format(int(x), ','))
ax = sns.heatmap(hm_df_wal, linewidth = 0.2, annot = True, cmap = 'coolwarm',
fmt = ',.0f', square = True, annot_kws={'size': 11},
cbar_kws = {'format': comma_fmt, 'orientation':'vertical'})
plt.title('Heatmap of Average Sale Price by Year and Month in Wales', fontsize=18, fontweight='bold')
plt.xlabel('Sale Month', fontsize=18, labelpad=10)
plt.ylabel('Sale Year', fontsize=18, labelpad=10)
plt.yticks(rotation=0, size=14)
## (array([ 0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 10.5]), [Text(0, 0.5, '2013'), Text(0, 1.5, '2014'), Text(0, 2.5, '2015'), Text(0, 3.5, '2016'), Text(0, 4.5, '2017'), Text(0, 5.5, '2018'), Text(0, 6.5, '2019'), Text(0, 7.5, '2020'), Text(0, 8.5, '2021'), Text(0, 9.5, '2022'), Text(0, 10.5, '2023')])
plt.xticks(size=14)
## (array([ 0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 10.5,
## 11.5]), [Text(0.5, 0, '1'), Text(1.5, 0, '2'), Text(2.5, 0, '3'), Text(3.5, 0, '4'), Text(4.5, 0, '5'), Text(5.5, 0, '6'), Text(6.5, 0, '7'), Text(7.5, 0, '8'), Text(8.5, 0, '9'), Text(9.5, 0, '10'), Text(10.5, 0, '11'), Text(11.5, 0, '12')])
ax.invert_yaxis()
cbar = ax.collections[0].colorbar
cbar.set_label('Sale Price', rotation = 270, fontsize=14, color='black', labelpad=20)
plt.show()