Chicago AirBnb Data

Analysis of Airbnb data based in Chicago

For my python visualizations, I chose Airbnb data that is based in Chicago. This dataset explores the trend over time with prices, what type of lstings people usually go for when booking, and which areas are more popular than others. I have created 5 graphs to showcase my findings within the dataset. Many of this can be used to understand what consumers want and help Airbnb cater to those needs.

import pandas as pd

listings = pd.read_csv("/Users/rubynguyen/Downloads/Chicago/listings.csv")
reviews = pd.read_csv("/Users/rubynguyen/Downloads/Chicago/reviews.csv")
neighbourhoods = pd.read_csv("/Users/rubynguyen/Downloads/Chicago/neighbourhoods.csv")

df = listings.merge(reviews,left_on="id",right_on="listing_id",how="right")

df["date"] = pd.to_datetime(df["date"])
df["month"] = df["date"].dt.month_name()
df["day_name"] = df["date"].dt.day_name()
df["day"] = df["date"].dt.day

df["year"] = df["date"].dt.year

Descriptive statistics:

The average Airbnb Listing price is about $179 The median price is $138, which means the prices are right-skewed. The most common room type are entire homes or apartments. The standard deviation is 151.25. This means prices vary by about $151 on average from the mean. It shows high variability in listing prices.

df['price'].mean()

## np.float64(178.8097012092105)

df['price'].median()

## 138.0

df['room_type'].mode()

## 0    Entire home/apt
## Name: room_type, dtype: object

df['price'].std()

## 151.251615686642

Visualizations

Bar Chart (Prices by Neighborhood)

For my first visualization, I used a bar chart to display the mean average price among the top ten neighborhoods in Chicago. I also added a dotted line representing the overall mean to help highlight which neighborhoods fall above or below the average. This makes it easier for the audience to compare prices and identify areas with relatively higher or lower costs.

df_bar = df.groupby("neighbourhood")["price"].mean().sort_values(ascending=False).head(10)

df_bar = df_bar.reset_index()
mean_price = df_bar["price"].mean()

colors = []

for val in df_bar["price"]:
    if val > mean_price:
        colors.append("lightcoral")   
    elif abs(val - mean_price) / mean_price < 0.01:
        colors.append("black")        
    else:
        colors.append("green")      

import matplotlib.patches as mpatches

Above = mpatches.Patch(color='lightcoral', label='Above Average')
At = mpatches.Patch(color='black', label='Within 1% of Average')
Below = mpatches.Patch(color='green', label='Below Average')

import matplotlib.pyplot as plt

plt.figure(figsize=(10,6))

plt.bar(df_bar["neighbourhood"], df_bar["price"], color=colors)

plt.axhline(mean_price, color='black', linestyle='dashed')

# legend
plt.legend(handles=[Above, At, Below])


plt.xticks(rotation=45)

## ([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [Text(0, 0, 'Loop'), Text(1, 0, 'Oakland'), Text(2, 0, 'Near North Side'), Text(3, 0, 'Near South Side'), Text(4, 0, 'Lincoln Park'), Text(5, 0, 'Near West Side'), Text(6, 0, 'Belmont Cragin'), Text(7, 0, 'Lake View'), Text(8, 0, 'North Park'), Text(9, 0, 'West Town')])

plt.xlabel("Neighbourhood")
plt.ylabel("Average Price")
plt.title("Top 10 Neighbourhoods by Average Price")

plt.text(8, mean_price + 5, f"Mean = {mean_price:.2f}")

plt.tight_layout()
plt.show()

Line Chart

For my second graph, I used a line chart to illustrate time series trends from 2009 to 2024. I wanted to visualize whether average prices increased or decreased over time. Each line represents one of the top five neighborhoods. As we can see, the lines begin to rise in later years, indicating that prices have generally increased. This trend may be influenced by the housing market, leading Airbnb hosts to raise prices to cover property costs. We can also see that listings in the Near North Side are increasing rapidly, suggesting high demand for accommodations in that area when people visit Chicago.

avg_price = (df.groupby(['year', 'neighbourhood'])['price']
    .mean()
    .round(0) 
    .reset_index())
    
    
top_neigh = (
    df['neighbourhood']
    .value_counts()
    .head(5)
    .index
)

avg_price = avg_price[avg_price['neighbourhood'].isin(top_neigh)]

from matplotlib.ticker import FuncFormatter

fig = plt.figure(figsize=(18, 10))
ax = fig.add_subplot(1, 1, 1)

my_colors = {
    'Lake View': 'blue',
    'Lincoln Park': 'red',
    'Logan Square': 'green',
    'Near North Side': 'purple',
    'West Town': 'orange'}


for key, grp in avg_price.groupby('neighbourhood'):
    grp.plot(ax=ax,kind='line',x='year',y='price', color=my_colors.get(key, 'black'), label=key, marker='o')


plt.title('Average Price by Top 5 Neighborhood Over Years', fontsize=18)
ax.set_xlabel('Year', fontsize=18)
ax.set_ylabel('Average Price ($)', fontsize=18, labelpad=20)

ax.tick_params(axis='x', labelsize=14)
ax.tick_params(axis='y', labelsize=14)


ax.set_xticks(sorted(avg_price['year'].unique()))


ax.yaxis.set_major_formatter(FuncFormatter(lambda x, _: f'${int(x):,}'))


handles, labels = ax.get_legend_handles_labels()
plt.legend(handles, labels, loc='best', fontsize=14)


plt.show()

Heatmap

For my third graph, I used a heatmap to illustrate patterns and trends in the dataset through color coding. A heatmap is useful for identifying areas of high and low activity, such as which neighborhoods have higher average prices in different years. Many neighborhoods show values of “0” because there were no Airbnb listings recorded in those years. As we can see, prices begin to increase more noticeably between 2021 and 2024.

df_year = df.groupby(['neighbourhood', 'year'])['price'].mean()
df_year = df_year.fillna(0)

top_neigh = df.groupby("neighbourhood")["price"] \
              .mean().sort_values(ascending=False).head(10).index

df_year = df_year.loc[top_neigh]

df_year = df.pivot_table(
    index='neighbourhood',
    columns='year',
    values='price',
    aggfunc='mean'
)

df_year = df_year.loc[top_neigh].fillna(0)


import seaborn as sns
from matplotlib.ticker import FuncFormatter

fig = plt.figure(figsize=(18,10))
ax = fig.add_subplot(1, 1, 1)
comma_fmt = FuncFormatter(lambda x, p: format(int(x), ','))
ax = sns.heatmap(df_year, linewidth = 0.2, annot = True, cmap = 'coolwarm', fmt=',.0f', 
                 square = True, annot_kws={'size': 7},
                 cbar_kws = {'format': comma_fmt, 'orientation':'vertical'})
plt.title('Heatmap of Airbnb Prices in each Neighborhood by Year', fontsize=18, pad=15)
plt.xlabel('Airbnb Average Prices per Year', fontsize=18, labelpad=10)
plt.ylabel('Airbnb Top Ten Neighborhoods', fontsize=10, labelpad=10)
plt.xticks(rotation=45)

## (array([ 0.5,  1.5,  2.5,  3.5,  4.5,  5.5,  6.5,  7.5,  8.5,  9.5, 10.5,
##        11.5, 12.5, 13.5, 14.5, 15.5]), [Text(0.5, 0, '2009'), Text(1.5, 0, '2010'), Text(2.5, 0, '2011'), Text(3.5, 0, '2012'), Text(4.5, 0, '2013'), Text(5.5, 0, '2014'), Text(6.5, 0, '2015'), Text(7.5, 0, '2016'), Text(8.5, 0, '2017'), Text(9.5, 0, '2018'), Text(10.5, 0, '2019'), Text(11.5, 0, '2020'), Text(12.5, 0, '2021'), Text(13.5, 0, '2022'), Text(14.5, 0, '2023'), Text(15.5, 0, '2024')])

cbar = ax.collections[0].colorbar
cbar.set_label('Airbnb Average Prices', rotation = 270, fontsize=14, color='black', labelpad=20)

plt.show()

Waterfall Graph

In my fourth graph, I used a waterfall chart to show how Airbnb listings accumulate across different neighborhoods. A waterfall chart is typically used to illustrate how a value increases or decreases over time or across categories. In this case, the bars consistently increase with each neighborhood, indicating that all neighborhoods contribute positively to the total number of listings. Chicago has a large number of listings across many neighborhoods, with no decreases shown in this chart. Because there are so many neighborhoods, I focused on the top five and grouped the rest as “Other.” The total number of listings in Chicago is 435,791. As more influencers promote travel and more people seek new experiences, it is likely that the number of listings will continue to grow in the area.

My code:

import_cols = ['name', 'date', 'neighbourhood', 'latitude', 'longitude', 'room_type']
map_df = df[import_cols].copy()

map_df['latitude'] = pd.to_numeric(map_df['latitude'], errors='coerce')
map_df['longitude'] = pd.to_numeric(map_df['longitude'], errors='coerce')


map_df['date'] = pd.to_datetime(map_df['date'], errors='coerce')


map_df['month'] = map_df['date'].dt.month
map_df['year'] = map_df['date'].dt.year

import pandas as pd
import plotly.graph_objects as go
import plotly.io as pio

top5_neigh_counts = map_df['neighbourhood'].value_counts().nlargest(5)

waterfall_values = list(top5_neigh_counts.values)
waterfall_labels = list(top5_neigh_counts.index)

other_count = map_df.shape[0] - sum(waterfall_values)
waterfall_values.append(other_count)
waterfall_labels.append('Other')

total_value = sum(waterfall_values)
waterfall_values.append(total_value)
waterfall_labels.append('Total')

measure = ["relative"] * (len(waterfall_values) - 1) + ["total"]

fig = go.Figure(go.Waterfall(
 name="Listings",
 orientation="v",
 measure=measure, 
x=waterfall_labels,
y=waterfall_values,
 text=[f"+{v:,}" for v in waterfall_values[:-1]] + [f"{waterfall_values[-1]:,}"],
textposition="outside",
 connector={"line": {"color": "rgb(63, 63, 63)"}}
))

fig.update_layout(
 height=700,
 title="Waterfall of Listings by Neighborhood",
 xaxis_title="Neighborhoods",
 yaxis_title="Total Listings"
)

knitr::include_graphics("waterfall.png")

Map of Chicago

In my final graph, I created an interactive map showing the distribution of Airbnb listings throughout Chicago. The map is centered on Chicago (latitude 41.8781, longitude -87.6298). Because the dataset was large, I filtered the listings to include only those with prices under $500 and fewer than 300 reviews. This helps remove extremely expensive listings as well as those that are highly popular. Each circle marker represents one Airbnb listing. Blue indicates an entire home or apartment, green represents a private room, and red represents other types of accommodations, such as shared rooms. From the map, we can see that blue markers dominate, suggesting that entire homes or apartments are the most common type of listing. Many listings are concentrated near the city center, but they are also spread throughout Chicago.