For my python visualizations, I chose Airbnb data that is based in Chicago. This dataset explores the trend over time with prices, what type of lstings people usually go for when booking, and which areas are more popular than others. I have created 5 graphs to showcase my findings within the dataset. Many of this can be used to understand what consumers want and help Airbnb cater to those needs.
import pandas as pd
listings = pd.read_csv("/Users/rubynguyen/Downloads/Chicago/listings.csv")
reviews = pd.read_csv("/Users/rubynguyen/Downloads/Chicago/reviews.csv")
neighbourhoods = pd.read_csv("/Users/rubynguyen/Downloads/Chicago/neighbourhoods.csv")
df = listings.merge(reviews,left_on="id",right_on="listing_id",how="right")
df["date"] = pd.to_datetime(df["date"])
df["month"] = df["date"].dt.month_name()
df["day_name"] = df["date"].dt.day_name()
df["day"] = df["date"].dt.day
df["year"] = df["date"].dt.year
The average Airbnb Listing price is about $179 The median price is $138, which means the prices are right-skewed. The most common room type are entire homes or apartments. The standard deviation is 151.25. This means prices vary by about $151 on average from the mean. It shows high variability in listing prices.
df['price'].mean()
## np.float64(178.8097012092105)
df['price'].median()
## 138.0
df['room_type'].mode()
## 0 Entire home/apt
## Name: room_type, dtype: object
df['price'].std()
## 151.251615686642
For my first visualization, I used a bar chart to display the mean average price among the top ten neighborhoods in Chicago. I also added a dotted line representing the overall mean to help highlight which neighborhoods fall above or below the average. This makes it easier for the audience to compare prices and identify areas with relatively higher or lower costs.
df_bar = df.groupby("neighbourhood")["price"].mean().sort_values(ascending=False).head(10)
df_bar = df_bar.reset_index()
mean_price = df_bar["price"].mean()
colors = []
for val in df_bar["price"]:
if val > mean_price:
colors.append("lightcoral")
elif abs(val - mean_price) / mean_price < 0.01:
colors.append("black")
else:
colors.append("green")
import matplotlib.patches as mpatches
Above = mpatches.Patch(color='lightcoral', label='Above Average')
At = mpatches.Patch(color='black', label='Within 1% of Average')
Below = mpatches.Patch(color='green', label='Below Average')
import matplotlib.pyplot as plt
plt.figure(figsize=(10,6))
plt.bar(df_bar["neighbourhood"], df_bar["price"], color=colors)
plt.axhline(mean_price, color='black', linestyle='dashed')
# legend
plt.legend(handles=[Above, At, Below])
plt.xticks(rotation=45)
## ([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [Text(0, 0, 'Loop'), Text(1, 0, 'Oakland'), Text(2, 0, 'Near North Side'), Text(3, 0, 'Near South Side'), Text(4, 0, 'Lincoln Park'), Text(5, 0, 'Near West Side'), Text(6, 0, 'Belmont Cragin'), Text(7, 0, 'Lake View'), Text(8, 0, 'North Park'), Text(9, 0, 'West Town')])
plt.xlabel("Neighbourhood")
plt.ylabel("Average Price")
plt.title("Top 10 Neighbourhoods by Average Price")
plt.text(8, mean_price + 5, f"Mean = {mean_price:.2f}")
plt.tight_layout()
plt.show()
For my second graph, I used a line chart to illustrate time series trends from 2009 to 2024. I wanted to visualize whether average prices increased or decreased over time. Each line represents one of the top five neighborhoods. As we can see, the lines begin to rise in later years, indicating that prices have generally increased. This trend may be influenced by the housing market, leading Airbnb hosts to raise prices to cover property costs. We can also see that listings in the Near North Side are increasing rapidly, suggesting high demand for accommodations in that area when people visit Chicago.
avg_price = (df.groupby(['year', 'neighbourhood'])['price']
.mean()
.round(0)
.reset_index())
top_neigh = (
df['neighbourhood']
.value_counts()
.head(5)
.index
)
avg_price = avg_price[avg_price['neighbourhood'].isin(top_neigh)]
from matplotlib.ticker import FuncFormatter
fig = plt.figure(figsize=(18, 10))
ax = fig.add_subplot(1, 1, 1)
my_colors = {
'Lake View': 'blue',
'Lincoln Park': 'red',
'Logan Square': 'green',
'Near North Side': 'purple',
'West Town': 'orange'}
for key, grp in avg_price.groupby('neighbourhood'):
grp.plot(ax=ax,kind='line',x='year',y='price', color=my_colors.get(key, 'black'), label=key, marker='o')
plt.title('Average Price by Top 5 Neighborhood Over Years', fontsize=18)
ax.set_xlabel('Year', fontsize=18)
ax.set_ylabel('Average Price ($)', fontsize=18, labelpad=20)
ax.tick_params(axis='x', labelsize=14)
ax.tick_params(axis='y', labelsize=14)
ax.set_xticks(sorted(avg_price['year'].unique()))
ax.yaxis.set_major_formatter(FuncFormatter(lambda x, _: f'${int(x):,}'))
handles, labels = ax.get_legend_handles_labels()
plt.legend(handles, labels, loc='best', fontsize=14)
plt.show()
For my third graph, I used a heatmap to illustrate patterns and trends in the dataset through color coding. A heatmap is useful for identifying areas of high and low activity, such as which neighborhoods have higher average prices in different years. Many neighborhoods show values of “0” because there were no Airbnb listings recorded in those years. As we can see, prices begin to increase more noticeably between 2021 and 2024.
df_year = df.groupby(['neighbourhood', 'year'])['price'].mean()
df_year = df_year.fillna(0)
top_neigh = df.groupby("neighbourhood")["price"] \
.mean().sort_values(ascending=False).head(10).index
df_year = df_year.loc[top_neigh]
df_year = df.pivot_table(
index='neighbourhood',
columns='year',
values='price',
aggfunc='mean'
)
df_year = df_year.loc[top_neigh].fillna(0)
import seaborn as sns
from matplotlib.ticker import FuncFormatter
fig = plt.figure(figsize=(18,10))
ax = fig.add_subplot(1, 1, 1)
comma_fmt = FuncFormatter(lambda x, p: format(int(x), ','))
ax = sns.heatmap(df_year, linewidth = 0.2, annot = True, cmap = 'coolwarm', fmt=',.0f',
square = True, annot_kws={'size': 7},
cbar_kws = {'format': comma_fmt, 'orientation':'vertical'})
plt.title('Heatmap of Airbnb Prices in each Neighborhood by Year', fontsize=18, pad=15)
plt.xlabel('Airbnb Average Prices per Year', fontsize=18, labelpad=10)
plt.ylabel('Airbnb Top Ten Neighborhoods', fontsize=10, labelpad=10)
plt.xticks(rotation=45)
## (array([ 0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 10.5,
## 11.5, 12.5, 13.5, 14.5, 15.5]), [Text(0.5, 0, '2009'), Text(1.5, 0, '2010'), Text(2.5, 0, '2011'), Text(3.5, 0, '2012'), Text(4.5, 0, '2013'), Text(5.5, 0, '2014'), Text(6.5, 0, '2015'), Text(7.5, 0, '2016'), Text(8.5, 0, '2017'), Text(9.5, 0, '2018'), Text(10.5, 0, '2019'), Text(11.5, 0, '2020'), Text(12.5, 0, '2021'), Text(13.5, 0, '2022'), Text(14.5, 0, '2023'), Text(15.5, 0, '2024')])
cbar = ax.collections[0].colorbar
cbar.set_label('Airbnb Average Prices', rotation = 270, fontsize=14, color='black', labelpad=20)
plt.show()
In my fourth graph, I used a waterfall chart to show how Airbnb listings accumulate across different neighborhoods. A waterfall chart is typically used to illustrate how a value increases or decreases over time or across categories. In this case, the bars consistently increase with each neighborhood, indicating that all neighborhoods contribute positively to the total number of listings. Chicago has a large number of listings across many neighborhoods, with no decreases shown in this chart. Because there are so many neighborhoods, I focused on the top five and grouped the rest as “Other.” The total number of listings in Chicago is 435,791. As more influencers promote travel and more people seek new experiences, it is likely that the number of listings will continue to grow in the area.
My code:
import_cols = ['name', 'date', 'neighbourhood', 'latitude', 'longitude', 'room_type']
map_df = df[import_cols].copy()
map_df['latitude'] = pd.to_numeric(map_df['latitude'], errors='coerce')
map_df['longitude'] = pd.to_numeric(map_df['longitude'], errors='coerce')
map_df['date'] = pd.to_datetime(map_df['date'], errors='coerce')
map_df['month'] = map_df['date'].dt.month
map_df['year'] = map_df['date'].dt.year
import pandas as pd
import plotly.graph_objects as go
import plotly.io as pio
top5_neigh_counts = map_df['neighbourhood'].value_counts().nlargest(5)
waterfall_values = list(top5_neigh_counts.values)
waterfall_labels = list(top5_neigh_counts.index)
other_count = map_df.shape[0] - sum(waterfall_values)
waterfall_values.append(other_count)
waterfall_labels.append('Other')
total_value = sum(waterfall_values)
waterfall_values.append(total_value)
waterfall_labels.append('Total')
measure = ["relative"] * (len(waterfall_values) - 1) + ["total"]
fig = go.Figure(go.Waterfall(
name="Listings",
orientation="v",
measure=measure,
x=waterfall_labels,
y=waterfall_values,
text=[f"+{v:,}" for v in waterfall_values[:-1]] + [f"{waterfall_values[-1]:,}"],
textposition="outside",
connector={"line": {"color": "rgb(63, 63, 63)"}}
))
fig.update_layout(
height=700,
title="Waterfall of Listings by Neighborhood",
xaxis_title="Neighborhoods",
yaxis_title="Total Listings"
)
knitr::include_graphics("waterfall.png")
In my final graph, I created an interactive map showing the distribution of Airbnb listings throughout Chicago. The map is centered on Chicago (latitude 41.8781, longitude -87.6298). Because the dataset was large, I filtered the listings to include only those with prices under $500 and fewer than 300 reviews. This helps remove extremely expensive listings as well as those that are highly popular. Each circle marker represents one Airbnb listing. Blue indicates an entire home or apartment, green represents a private room, and red represents other types of accommodations, such as shared rooms. From the map, we can see that blue markers dominate, suggesting that entire homes or apartments are the most common type of listing. Many listings are concentrated near the city center, but they are also spread throughout Chicago.
My code:
import_cols = ['name', 'date', 'neighbourhood', 'latitude', 'longitude', 'room_type']
map_df = df[import_cols].copy()
map_df['latitude'] = pd.to_numeric(map_df['latitude'], errors='coerce')
map_df['longitude'] = pd.to_numeric(map_df['longitude'], errors='coerce')
map_df['date'] = pd.to_datetime(map_df['date'], errors='coerce')
map_df['month'] = map_df['date'].dt.month
map_df['year'] = map_df['date'].dt.year
import pandas as pd
import plotly.graph_objects as go
import plotly.io as pio
top5_neigh_counts = map_df['neighbourhood'].value_counts().nlargest(5)
waterfall_values = list(top5_neigh_counts.values)
waterfall_labels = list(top5_neigh_counts.index)
other_count = map_df.shape[0] - sum(waterfall_values)
waterfall_values.append(other_count)
waterfall_labels.append('Other')
total_value = sum(waterfall_values)
waterfall_values.append(total_value)
waterfall_labels.append('Total')
measure = ["relative"] * (len(waterfall_values) - 1) + ["total"]
fig = go.Figure(go.Waterfall(
name="Listings",
orientation="v",
measure=measure,
x=waterfall_labels,
y=waterfall_values,
text=[f"+{v:,}" for v in waterfall_values[:-1]] + [f"{waterfall_values[-1]:,}"],
textposition="outside",
connector={"line": {"color": "rgb(63, 63, 63)"}}
))
fig.update_layout(
height=700,
title="Waterfall of Listings by Neighborhood",
xaxis_title="Neighborhoods",
yaxis_title="Total Listings"
)
knitr::include_graphics("map.png")