Data Visualizations - Single Family Home Values in the U.S.

Introduction

The housing market always seems to be a topic in the news - whether it’s hot and homes are selling fast, or when inventory is low and interest rates continue to soar (all those factors apply to today’s market!) If in the market for a new home, it’s important to understand the markets (economic and housing). This will help you in making informed decisions on the area, how much you can afford and for what type of home, and how those home types are typically valued.

Home value trends can be helpful too. Buying a home is a major financial investment of a large asset and should be treated as such! The value trends can forecast whether you’re buying in an area that has opportunity for growth, potentially increasing the value of your home as time goes on. If an area has been steady stagnant for 10+ years, perhaps browse homes for sale in the area that’s just starting to see value jumps to get in ‘low’.

Dataset

Using Zillow’s database for typical Single Family Resident values (referred to as SFR)*, the following charts will analyze market trends within the U.S. and over the decades. I’ll also take a deeper dive into regions within Maryland and analyze how they stand against the broader U.S. housing market.

*Measured as $, typical value for all single-family homes in each region

View Zillow Housing Data Page

Findings

The housing market is booming right now when it comes to the home values. The largest cities have seen major growth that it almost makes them seem like outliers when compared to the rest of the country (looking at you San Fran). Maryland’s market is, on average, priced higher than the country’s average, but still very affordable when you compare us to the cost of SFR’s in large cities.

As you dive into the charts I pulled together, you’ll gain a better understanding of the ways in which the market has made an astonishing comeback since the 2007-2008 financial crisis - and where we see the biggest gains across U.S. cities.

import matplotlib.pyplot as plt
import numpy as np
import folium
import pandas as pd
from matplotlib.colors import ListedColormap
import matplotlib.patches as mpatches
from matplotlib.ticker import FuncFormatter
import plotly.graph_objects as go
import geojson
import json
import folium
path= '/Users/hayleyfrazier/Desktop/LOYOLA/DataVis/Files/'
df = pd.read_csv('/Users/hayleyfrazier/Desktop/LOYOLA/DataVis/Files/Zillow_Download_0314.csv', skiprows = 0)
df.columns = df.columns[:5].tolist() + pd.to_datetime(df.columns[5:]).strftime('%b %Y').tolist() 
#formatting to month year
#Filtering to only January months each year, & keeping just RegionName for now
jan_cols = [col for col in df.columns if col.startswith('Jan')]
cols_to_select = ['RegionName'] + jan_cols
df_new = df.loc[0:20, cols_to_select]
df_new.rename(columns={'RegionName': 'Region'}, inplace=True)
# making sure this is formatted as a df
df_new=pd.DataFrame(df_new)

#Deleted row 0 
df_new.drop(0, inplace=True)
df_new=df_new.reset_index(drop=True)
#coercing to integers 
df_new.iloc[:,1:] = df_new.iloc[:,1:].astype('int')

#removing last 3 digits for readability 
df_new.iloc[:,1:] = df_new.iloc[:,1:].applymap(lambda x: int(str(x)[:-3]))
#DF two for MD county 
df_county = pd.read_csv(path + 'County_SFR_Values.csv')
df_county.columns = df_county.columns[:9].tolist() + pd.to_datetime(df_county.columns[9:]).strftime('%b %Y').tolist() 

pattern = 'MD|United States'
mask = df['RegionName'].str.contains(pattern)

mask = df_county['State'].str.contains(pattern)
df_MDcounty = df_county[mask]
df_MDcounty=pd.DataFrame(df_MDcounty)
df_MDcounty=df_MDcounty.reset_index(drop=True)
df_MDcounty['RegionName'] = df_MDcounty['RegionName'].str.replace(' County', '')
df_MDcounty.loc[df_MDcounty['RegionName'] == 'Queen Annes', 
                'RegionName'] = df_MDcounty['RegionName'].str.replace("Queen Annes", "Queen Anne's")
df_MDcounty.loc[df_MDcounty['RegionName'] == 'Prince Georges', 
                'RegionName'] = df_MDcounty['RegionName'].str.replace("Prince Georges", "Prince George's")
df_MDcounty.loc[df_MDcounty['RegionName'] == 'Saint Marys', 
                'RegionName'] = df_MDcounty['RegionName'].str.replace("Saint Marys", "St. Mary's")

City Values Then & Now

The follow two bar charts compare SFR values in January 2023(top) and 2015(bottom) across 20 of the most populated cities in the U.S. The horizontal line in each chart shows the mean of all 20 cities in that time period and color codes against whether they fall above or below that average or fall within 50% of that price.

Note: 2023 y axis is in millions while 2015’s y axis falls in the hundred thousands. The one city pushing the axis up into the millions is San Francisco at $1.165M.

Comparing these two charts side-by-side gives us some insight on whether a certain city trended upwards or remained the same in an 8-year time span and using the mean to say whether that move is in proportion with the general rise-in-costs that occurs over time or if something larger is occurring in that area. For example, St. Louis saw an 89k increase over those 8 years and created a larger gap between itself and the rising mean cost. Looking at Tampa, their typical home increased 215k and closed the gap between their cost and the mean cost in 2023.

This chart also illustrates how very drastic the distribution of wealth is in the U.S., but that’s a chart for another report.


#Barchart Setup


df_sorted = df_new.sort_values('Jan 2023', ascending=False)
df_sorted2015=df_new.sort_values('Jan 2015', ascending=False)



average_2023 = int(df_new['Jan 2023'].mean())

def pick_colors_according_to_mean_value(df_sorted):
    colors=[]
    average_2023 = df_sorted.iloc[0:20,24].mean()
    for each in df_sorted['Jan 2023']:
        if each > average_2023*1.25:
            colors.append('lightcoral')
        elif each < average_2023*0.75:
            colors.append('lightgreen')
        else:
            colors.append('lightblue')
    return colors

my_colors1 = pick_colors_according_to_mean_value(df_sorted)

average_2015 = int(df_new['Jan 2015'].mean())

def pick_colors_according_to_mean_value_2015(df_sorted2015):
    colors=[]
    average_2015 = df_sorted2015.iloc[0:20,16].mean()
    for each in df_sorted2015['Jan 2015']:
        if each > average_2015*1.25:
            colors.append('lightcoral')
        elif each < average_2015*0.75:
            colors.append('lightgreen')
        else:
            colors.append('lightblue')
    return colors

my_colors2 = pick_colors_according_to_mean_value_2015(df_sorted2015)

Above = mpatches.Patch(color='lightcoral', label='Above Average')
At = mpatches.Patch(color='lightblue', label='Within 50% of the Average')
Below = mpatches.Patch(color='lightgreen', label='Below Average')

f2 = plt.figure(2, figsize = (40,35))
f2.suptitle('Typical Single Family Residence Values Analysis', fontsize=9, fontweight='bold')
ax1=f2.add_subplot(2,1,1)
b_bars=ax1.bar(df_sorted['Region'],df_sorted['Jan 2023'], label='SFR Value', color=my_colors1)
ax1.legend(handles=[Above, At, Below],fontsize=9)
plt.axhline(average_2023,color='black',linestyle='dashed')
ax1.set_title('SFR Typical Values in Jan. 2023, Top 20 U.S. Cities by Population')
ax1.text(18.75,average_2023+5, 'Mean = $' + str(average_2023)+ 'k',rotation=0,fontsize=9)
ax1.set_ylabel("Typical Home Value (in millions)")
ax1.set_xlabel("Region", fontweight='bold')
plt.setp(ax1.get_xticklabels(), rotation=30, horizontalalignment='right')

ax1.bar_label(ax1.containers[-1], fmt='$%.0fK' , label_type='center', fontsize=9)

ax1.yaxis.set_major_formatter(FuncFormatter(lambda m, pos:('$%1.3fM')%(m*1e-3)))


ax2=f2.add_subplot(2,1,2)
b_bars=ax2.bar(df_sorted2015['Region'],df_sorted2015['Jan 2015'], label='SFR Value', color=my_colors2)
ax2.legend(handles=[Above, At, Below],fontsize=14)
plt.axhline(average_2015,color='black',linestyle='dashed')
ax2.set_title('SFR Typical Values in Jan. 2015, Top 20 U.S. Cities by Population')
ax2.text(18.75,average_2015+5, 'Mean = $' + str(average_2015)+ 'k',rotation=0,fontsize=9)
ax2.set_ylabel("Typical Home Value (in thousands)")
ax2.set_xlabel("Region", fontweight='bold')
plt.setp(ax2.get_xticklabels(), rotation=30, horizontalalignment='right')

ax2.bar_label(ax2.containers[-1], fmt='$%.0fK' , label_type='center', fontsize=9)

ax2.yaxis.set_major_formatter(FuncFormatter(lambda k, pos:('$%1.0fK')%(k)))
f2.subplots_adjust(hspace=1.55) 

plt.savefig(path + 'Barcharts.png')

#Saved as from jupyter notebook coded chart, explains the difference in png filenames
knitr::include_graphics("/Users/hayleyfrazier/Desktop/LOYOLA/DataVis/Files/Barcharts2.png")

City Value YoY

This line chart uses time-series data to illustrate typical SFR values across the 10 most populated U.S. cities. From this view we can interpret a few trends.

One being the rise and fall of housing market amid the 2007-2008 financial crisis, followed by the slow rebuild which begins in 2012. Some of these cities don’t see home values return to their 2007-high until 2019-2021 (Chicago). Dallas and Houston had the shortest ‘recoveries’, although they appear to be the least impacted out of the 10 cities.

Another trend that is remarkable: the constant (with the exception of the above) rise of the SFR value across the board. Some cities like Los Angeles, New York, and Miami, have drastically surpassed their 2007-peak-home-value.

Lastly, we’re able to interpret the types of cities that share similar home value trends. Coastal cities in CA, NY, DC, and FL are valued the highest. All of which also have major job markets, which could play into their recovery speed and overall higher valued homes. Another example is how the two Texas cities almost mirror each other’s values, up until 2016 when we see Dallas homes increase their value at a faster rate.


#Line Chart Data Setup

df_new10 = df_new.drop(range(10,20))
df_melt = df_new10.melt(id_vars=["Region"], var_name="Year", value_name="Value")
year_list = ["2000",'2001','2002','2003','2004','2005','2006','2007', '2008','2009','2010','2011','2012','2013','2014','2015','2016','2017','2018','2019','2020','2021','2022','2023']

#Line plot

fig = plt.figure(figsize = (22, 15))
ax=fig.add_subplot(1,1,1)
#region is defining the key, grp represents the resulting 
#subset that we'll get when we pull out just one of those things 
#key in swuare brackets after colors is how you reference a dictionary

my_colors_line = {'New York, NY':'darkgreen',
                  'Los Angeles, CA':'purple',
                  'Chicago, IL':'darkorange',
                  'Dallas, TX':'blue',
                  'Houston, TX':'deeppink',
                  'Washington, DC':'gold',
                  'Miami, FL':'turquoise',
                  'Philadelphia, PA':'teal',
                  'Atlanta, GA':'black',
                  'Phoenix, AZ':'firebrick'}
                  
                 
for key, grp in df_melt.groupby(['Region']):
    grp.plot(ax=ax, kind='line', x='Year', y='Value', color=my_colors_line[key], label=key, marker='8', linewidth='3')

plt.title('Typical Cost of SFR Over Time, by 10 Major Cities', fontsize=30)
ax.set_xlabel('Year', fontsize=21)
ax.set_ylabel('Value (in thousands)', fontsize=21, labelpad=20)
ax.tick_params(axis='y',labelsize=20, rotation=0)
ax.tick_params(axis='x',labelsize=16, rotation=20)
ax.set_xticks(np.arange(len(year_list)))
ax.set_xticklabels(['2000','2001','2002','2003','2004','2005','2006','2007', 
'2008','2009','2010','2011','2012','2013','2014','2015',
'2016','2017','2018','2019','2020','2021','2022','2023'])

handles,labels = ax.get_legend_handles_labels()
handles = [handles[6], handles[4], handles[1],handles[2],handles[3],handles[9],
          handles[5],handles[7],handles[0],handles[8]]
labels = [labels[6],labels[4],labels[1],labels[2],labels[3],labels[9],
         labels[5],labels[7],labels[0],labels[8]]
plt.legend(handles, labels, loc='best', fontsize=18, ncol=1)

ax.yaxis.set_major_formatter( FuncFormatter(lambda num, pos:('$%1.0fK')%(num)))
plt.grid(color='gainsboro')

plt.savefig(path + 'Linechart.png')

knitr::include_graphics("/Users/hayleyfrazier/Desktop/LOYOLA/DataVis/Files/Linechart.png")

Maryland Counties

Moving to a more localized lens, this map of Maryland counties is shaded in based on the typical SFR value in January 2023. Maryland’s major metropolis cities (as collected by Zillow’s housing data) are denoted by the yellow circles. Hover over each space to reveal the county name and the circles to reveal the city and their typical SFR dollar value.

Observations to note are the deep purple counties are those that surround Washington D.C. (with the easiest access to major highways leading there). Those are followed by Anne Arundel County which is where the state’s capital resides and is located right off the bay - a generally high-income area.

Note: Baltimore City is blocked out while other cities are not. This is common practice for Maryland county maps to separate Baltimore City from Baltimore County.

#MAP SET UP

mask = df['RegionName'].str.contains(pattern)
df_MD = df[mask]
df_MD=pd.DataFrame(df_MD)
df_MD.drop(0, inplace=True)
df_MD=df_MD.reset_index(drop=True)
df_MD['Lat'] = [39.2904,38.3607,39.6418,38.3004,39.6529,38.7743,38.5632]
df_MD['Long']=[-76.6122,-75.5994,-77.7200,-76.5075,-78.7625,-76.0763,-76.0788]
df_MD['County']=['Baltimore City', 'Wicomico', 'Washington', 'St. Marys', 'Allegany', 'Talbot', 'Dorchester']

df_county = pd.read_csv(path + 'County_SFR_Values.csv')
df_county.columns = df_county.columns[:9].tolist() + pd.to_datetime(df_county.columns[9:]).strftime('%b %Y').tolist() 
mask = df_county['State'].str.contains(pattern)
df_MDcounty = df_county[mask]
df_MDcounty=pd.DataFrame(df_MDcounty)
df_MDcounty=df_MDcounty.reset_index(drop=True)
df_MDcounty['RegionName'] = df_MDcounty['RegionName'].str.replace(' County', '')
df_MDcounty.loc[df_MDcounty['RegionName'] == 'Queen Annes', 
                'RegionName'] = df_MDcounty['RegionName'].str.replace("Queen Annes", "Queen Anne's")
df_MDcounty.loc[df_MDcounty['RegionName'] == 'Prince Georges', 
                'RegionName'] = df_MDcounty['RegionName'].str.replace("Prince Georges", "Prince George's")
df_MDcounty.loc[df_MDcounty['RegionName'] == 'Saint Marys', 
                'RegionName'] = df_MDcounty['RegionName'].str.replace("Saint Marys", "St. Mary's")


#MAPS

Map_geo = path +'MD_lines.txt'
with open(Map_geo, 'r')as file:
    data_str = file.read()

center_of_map=[38.7849, -76.8721]
md_map=folium.Map(location=center_of_map, 
                  zoom_start=7, tiles = 'cartodbpositron',
                  width = '90%', height = '90%',
                  left='5%', right='5%',top='5%', bottom='5%')



ch_map=folium.Choropleth(geo_data = data_str,
                           name = 'choropleth', 
                           data = df_MDcounty,
                           columns = ['RegionName','Jan 2023'],
                           key_on='properties.name',
                           fill_color = 'BuPu', fill_opacity= 1,
                           line_opacity = 1,
                           legend_name = 'Typical SFR Values, Jan 2023 ($)',
                           highlight=True).add_to(md_map)

folium.LayerControl().add_to(md_map)

for idx, row in df_MD.iterrows():
    folium.CircleMarker(location=[row['Lat'], row['Long']],
                        radius=5,
                        fill=True,
                        fill_color='yellow',
                        fill_opacity=1.0,
                        color='black',
                        weight=1,
                        tooltip=f"<b>City:</b> {row['RegionName'].split(',')[0]}<br> <b>Value:</b> ${row['Jan 2023']/1000:.0f}K"
                       ).add_to(md_map)

ch_map.geojson.add_child(folium.features.GeoJsonTooltip
                         (fields=['name'], aliases=['County:'],
                          labels=True, style=('background-color: white; color: black;')))

md_map.save(path + 'MD_County_SFR_Map.html')

M.D. vs. U.S.

Continuing with our analysis of Maryland county SFR home values, the following waterfall chart depicts where each Maryland county compares against the U.S. typical SFR value in January 2023 and by how much do they differ from it. This caps off with Maryland’s typical SFR value as a state ($303k).

While this type of chart is best used for a running budget or financial analysis, it makes this information easy to quickly assess how a State and its’ Counties compare against a baseline number (the U.S. average for example, as used here). It’s important to note here that with this type of chart, the deviations become relative to each other, not to just the U.S. value (the horizontal line). The colors are the best indicator here for how they fall against the country’s average.

mask=df['RegionName'].str.contains(pattern)
df_MD_US=df[mask]
df_MD_US=pd.DataFrame(df_MD_US)
df_MD_US.drop(['RegionID', 'SizeRank','RegionType','StateName'], axis=1, inplace=True)
df_MD_WF = df_MD_US.get(['RegionName','Jan 2023'])
df_MD_WF = pd.DataFrame(df_MD_WF)
df_MD_WF['USAvg'] = [327494.641724 for i in range(8)]
df_MD_WF.drop(0, inplace=True)
df_MD_WF=df_MD_WF.reset_index(drop=True)
df_MD_WF.rename(columns={'Jan 2023': 'Jan2023'}, inplace=True)
df_MD_WF['Deviation']= df_MD_WF.Jan2023- df_MD_WF.USAvg
df_MD_WF.loc[df_MD_WF.index.max()+1] = ['State Average',
                                        df_MD_WF.Jan2023.mean(),
                                        df_MD_WF.USAvg.mean(),
                                        df_MD_WF.Jan2023.mean()-df_MD_WF.USAvg.mean()]

if df_MD_WF.loc[7, 'Deviation']>0:
    end_color = 'black'
elif df_MD_WF.loc[7, 'Deviation']<0:
    end_color='red'
else: end_color='blue'

fig = go.Figure(go.Waterfall(name='', orientation = 'v', x = df_MD_WF['RegionName'], textposition='outside', measure=['relative','relative','relative','relative','relative','relative', 'relative', 'total'],y=df_MD_WF['Deviation']/1e3, text=['${:.0f}K'.format(each/1e3) for each in df_MD_WF['Jan2023']],
                             decreasing = {'marker':{'color':'red'}},
                             increasing = {'marker':{'color':'green'}},
                             hovertemplate='Area Deviation: '+ '$%{y:,.2f}K'+ '<br>' +
                                 'Value: %{text}'))
fig.layout = go.Layout(yaxis=dict(tickformat=''));
fig.update_xaxes(title_text='Maryland Metropolis Regions', title_font={'size':18});
fig.update_yaxes(title_text='Typical SFR Value Deviation($K)', title_font={'size':18}, 
                 tickprefix = '$', ticksuffix='K', zeroline=True);
fig.update_layout(title= dict( text='Deviation between U.S. Average Typical SFR Value and M.D Metro Area Typical Values, Jan 2023<br>' +
                              'Above U.S. Average appear in Green, Below U.S. Average Appear in Red',
                              font= dict(family='Arial', size=18, color='black')),
                  template='simple_white',
                  title_x=0.5,
                  autosize=True,
                  margin=dict(l=30, r=30, t=100, b=50));

#fig.show()

import plotly.io as pio
pio.write_html(fig, path+"plotly_result.html", auto_open=False)

M.D. Areas YoY

A better view to understand Maryland area home value trends would be a scatter plot. Rather than counties, we’re focusing on the Maryland metropolis regions (the yellow dots shown on the map a few tabs back) to keep the chart as clean as possible.

At a quick glance, this chart allows us to see trends over the years and compare them to the U.S.’s typical SFR value (located in the first column). Not all the metro areas see a significant jump in values between 2020 and 2022, but the majority do - mainly Easton and Salisbury, both on the Eastern shore. Perhaps this could indicate a large move to beach properties during the pandemic.

This scatter plot reveals how a typical SFR in Maryland is valued compared to the rest of the country. More than half are valued higher than the Typical U.S. home, which isn’t surprising as that’s a common criticism about living in Maryland: The cost of living here is high. Now we can put a number to at least one factor of that statement.


f5 = plt.figure(1, figsize=(12,14))
f5.ax=plt.subplots()
plt.scatter(df_MD_US['RegionName'],df_MD_US['Jan 2023'], label='Jan 2023');
plt.scatter(df_MD_US['RegionName'],df_MD_US['Jan 2022'], label='Jan 2022');
plt.scatter(df_MD_US['RegionName'],df_MD_US['Jan 2021'], label='Jan 2021');
plt.scatter(df_MD_US['RegionName'],df_MD_US['Jan 2020'], label='Jan 2020');
plt.scatter(df_MD_US['RegionName'],df_MD_US['Jan 2019'], label='Jan 2019');
plt.scatter(df_MD_US['RegionName'],df_MD_US['Jan 2018'], label='Jan 2018');
plt.gca().yaxis.set_major_formatter('${x:,.0f}K');
plt.title('Maryland Metropolis Areas Typical SFR Over Time');
plt.grid(True);
plt.ylabel('Typical SFR Values (in thousands)', fontsize=9, labelpad=15);
plt.xlabel('Region', fontsize=9);
plt.xticks(fontsize=6, rotation=25);
plt.yticks(fontsize=7);
plt.legend(loc='best', fontsize=7);

knitr::include_graphics("/Users/hayleyfrazier/Desktop/LOYOLA/DataVis/Files/MDmetroAreasScatter.png")

Conclusion

Using data visualization for home values can lead to insights on a country’s economy, political/social climate, and even the type of area whether it’s urban or rural. This sort of information can be used as predictive measures to find an up-and-coming city or the next ghost-town. Patterns can indicate whether people are moving away from States prone to disasters as climate change continues to create consequences, or if Los Angeles is only going to continue to see growth (that could be population size or wealth).

Zillow provides numerous detailed reports that can help anyone in their search for a home. Whether you’re looking for a new city our county within your price range or want to make sure you’re getting into the housing market at the right time and right place, having visuals such as these will make for a more educated purchase.