Introduction

Planning a trip, especially an overseas one, can be stressful. One of the biggest challenges is finding accommodations that meet your needs, and with so many factors to consider, including price, location, number of reviews, and ratings, the decision-making process can quickly become overwhelming. This analysis explores various characteristics of Airbnb listings in Tokyo using data-driven insights to provide travelers with a clearer understanding of the options available and help ease the decision-making process. By breaking down key characteristics and identifying relationships between them, this study offers a deeper understanding of how different accommodations compare, ultimately aiming to ease the selection process for the traveler.

Dataset

This dataset was downloaded from Inside Airbnb, and can be found by scrolling down to find Tokyo, Japan from their list of host cities. Here we had access to a full set of 20,000+ listings which included important variables such as number of reviews, latitude and longitude, rating variables, neighborhood (or district), room type, property type, availability, and price. By default, the price of each listing was in Yen, so a ‘usdprice’ variable was hard coded by using exchange rate as of April 3rd, 2025. These are the main variables that I made use of during this analysis. There were many variables that rated the listing, but the main one I used was the basic review score rating. The only alternations that were done to the data was categorizing the property type variable from 63 different types, to 8 different categories. The dataset also included a geojson file, which was helpful in mapping some of these listings.


import pandas as pd
import numpy as np
import wget
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.colors as mcolors
import folium
import matplotlib.patches as mpatches
from matplotlib.ticker import FuncFormatter
import json
import plotly.express as px
import geopandas as gpd
import plotly.graph_objects as go
import branca

path = "C:/Users/Robert/Desktop/Loyola/gb736/Module 2 Python/"

filename = "TokyoListings.csv"

df1 = pd.read_csv(path + filename, encoding='utf-8-sig')

# Currency Conversion

rate = 0.0068
df1['price'] = df1['price'].str.replace(',', '')
df1['price'] = df1['price'].str.replace('$', '')
df1['price'] = pd.to_numeric(df1['price'])
amount = df1['price']

df1['priceusd'] = rate * amount

Findings


top_5_nbhd = df1['neighbourhood_cleansed'].value_counts().head(5).index
nbhd_df = df1[df1['neighbourhood_cleansed'].isin(top_5_nbhd)]
nbhd_df = nbhd_df[['neighbourhood_cleansed', 'room_type', 'priceusd']]

avg_price = nbhd_df.groupby(['neighbourhood_cleansed', 'room_type'])['priceusd'].mean().unstack()

top_10_nbhd = df1.groupby("neighbourhood_cleansed")["number_of_reviews"].sum().nlargest(10).index

df_top10_nbhd = df1[df1["neighbourhood_cleansed"].isin(top_10_nbhd)]

room_type_colors = ['blue', 'orange', 'purple', 'red']

nbhd_df['availability_365'] = pd.to_numeric(df1['availability_365'])
nbhd_df['neighbourhood_cleansed'] = nbhd_df['neighbourhood_cleansed'].astype(str)
nbhd_df['room_type'] = nbhd_df['room_type'].astype(str)

property_dict = {
'Entire rental unit' : 'Residential Rental',
'Private room in home': 'Residential Rental',
'Entire home': 'Residential Rental',
'Shared room in rental unit': 'Residential Rental',
'Private room in rental unit': 'Residential Rental',
'Private room in condo' : 'Residential Rental',
'Entire condo' : 'Residential Rental',
'Private room in serviced apartment': 'Residential Rental',
'Entire townhouse' : 'Residential Rental',
'Private room in townhouse' : 'Residential Rental',
'Private room' : 'Residential Rental',
'Shared room in home': 'Residential Rental',
'Room in serviced apartment':'Residential Rental',
'Entire serviced apartment' : 'Residential Rental',
'Room in rental unit': 'Residential Rental',
'Shared room in condo' : 'Residential Rental',
'Entire place': 'Residential Rental',

'Shared room in hotel':'Hotel',
'Room in hotel':'Hotel',
'Shared room in aparthotel' :'Hotel',
'Room in aparthotel' :'Hotel',
'Room in boutique hotel':'Hotel',

'Private room in hostel': 'Hostel',
'Shared room in hostel' : 'Hostel',
'Room in hostel':'Hostel',

'Private room in bed and breakfast':'Standard BnB',
'Shared room in bed and breakfast' :'Standard BnB',

'Entire guest suite':'Guesthouse',
'Entire guesthouse':'Guesthouse',
'Private room in guesthouse':'Guesthouse',
'Private room in guest suite':'Guesthouse',
    
'Private room in ryokan':'Japanese Accommodations',
'Room in ryokan':'Japanese Accommodations',
'Private room in minsu':'Japanese Accommodations',
'Private room in kezhan':'Japanese Accommodations',
'Shared room in kezhan':'Japanese Accommodations',
'Private room in bungalow' :'Japanese Accommodations',
'Shared room in ryokan':'Japanese Accommodations',

'Tent':'Unique Experiences',
'Shipping container':'Unique Experiences',
'Barn' :'Unique Experiences',
'Private room in tower':'Unique Experiences',
'Hut':'Unique Experiences',
'Private room in religious building':'Unique Experiences',
'Private room in hut':'Unique Experiences',
'Private room in camper/rv':'Unique Experiences',
'Entire loft':'Unique Experiences',
'Treehouse':'Unique Experiences',
'Private room in barn':'Unique Experiences',
'Earthen home' :'Unique Experiences',
'Shared room in hut':'Unique Experiences',
'Tiny home':'Unique Experiences',
'Private room in tiny home':'Unique Experiences',
    
'Entire vacation home':'Vacation Style',
'Private room in vacation home':'Vacation Style',
'Private room in resort' :'Vacation Style',
'Entire bungalow' :'Vacation Style',
'Private room in cabin' :'Vacation Style',
'Entire cottage':'Vacation Style',
'Entire chalet':'Vacation Style',
'Private room in villa':'Vacation Style',
'Entire villa':'Vacation Style',
'Entire cabin':'Vacation Style'
}

default_category = "Other"

df_top10_nbhd.loc[:, "property_category"] = df_top10_nbhd["property_type"].map(property_dict).fillna(default_category)
df1.loc[:, "property_category"] = df1["property_type"].map(property_dict).fillna(default_category)

Average Price by Room Type and Neighborhood

fig, ax = plt.subplots(figsize=(12, 10))
x_pos = np.arange(len(avg_price))
width = 0.2

order = nbhd_df.groupby("neighbourhood_cleansed")["priceusd"].median().sort_values().index
sns.boxplot(data=nbhd_df, x="neighbourhood_cleansed", y="priceusd", hue="room_type", palette="deep")


plt.ylim(0, 700);
ax.set_xlabel('Neighborhood')
ax.set_ylabel('Average Price (USD)') 
ax.yaxis.set_major_formatter(FuncFormatter(lambda x, pos: '${:.0f}'.format(x)))
ax.set_title('Average Price by Room Type and District')
ax.legend(loc='upper right')

plt.show()

This chart shows multiple box plots displaying the price ranges for the five most popular neighborhoods in Tokyo (in terms of reviews). A boxplot shows a range of values, with the top and bottom bars of the plot being the minimum and maximum values. The box in the middle is divided in two, with the bottom half representing the first quartile (25th percentile), and the top half representing the third quartile (75th percentile). The line that divides them is the median, or the middle value in the range of prices. Any points outside of this range are outliers.

From this chart, there are a few trends we can take note of. Given the abundance of overpriced Airbnb’s in the very popular tourist spot, it is clear we are witnessing the long tail effect in full force. Each of the dots outside the range are listings that have their prices well above the median The cheapest options seem to be shared rooms, although it turns out that these are in the least abundant of all room types.

Surprisingly, hotels seem to be a relatively affordable option depending on how strict ones budget is. It also makes sense that private rooms have a wider range of prices given different types, as well as the nature of having a room to ones self. While there are a lot of outliers exceeding the 75th percentile of prices for residential rentals, it should not defer prospective travelers from planning a trip as we do see there are cases where you can find affordable accommodations by understanding the amount of outlier dots compared to the amount of total listings.

Median Price Heat Map

def bin_review_scores(score):
    if score <= 1.49:
        return 1
    elif 1.5 <= score <= 2.49:
        return 2
    elif 2.5 <= score <= 3.49:
        return 3
    elif 3.5 <= score <= 4.49:
        return 4
    else:
        return 5

df1['binned_scores'] = df1['review_scores_rating'].apply(bin_review_scores)

df_heatmap = df1.pivot_table(
    values='priceusd',
    index='property_category',
    columns='binned_scores', 
    aggfunc="median"
)

df_heatmap = df_heatmap.apply(pd.to_numeric, errors='coerce')  # Convert & fill NaNs

annot = df_heatmap.map(lambda x: f"${x:,.2f}" if pd.notna(x) else "N/A")

plt.figure(figsize=(12, 10))
sns.heatmap(df_heatmap, annot=annot, cmap='YlGnBu', fmt='', linewidths=0.5, linecolor = 'black')

plt.title('Median Price (USD) by Property Category and Review Scores Rating')
plt.xlabel('Review Scores Rating')
plt.ylabel('Property Category')

plt.show()

To create this visualization, some data manipulation was done. Each review rating score was rounded (X.49 and below rounded down, and X.5 and above rounded up). Additionally, by the nature of some property categories being less common than others, some did not have every rating. These categories are left white. For the rest of the visualizations in this analysis, the median will be used instead of the mean to reduce the impact of outliers, giving a clearer picture of typical prices.

Building upon our previous analyses, the goal of this visualization is to understand the price ranges travelers should expect for each kind of accommodation. This heatmap shows the relationship between each property category and their respective ratings. The cells show the median price for each property category at that rating. We see that while being the most popular, there are a fair range of residential rentals that are affordable, but do get more expensive with higher ratings. For those that are looking for a lavish experience, vacation style accommodations are quite expensive, but interestingly have a trend that 4s and 5s are cheaper than 3s. For budget friendly options at high ratings, standard bnb’s are your best bet with a price range of $27 - $43 per night, with hostels being a close second with prices between $70 and $80 per night.

This heatmap offers a clear picture of the diverse price points across different accommodation types and rating levels. Whether travelers are looking for a budget-friendly stay or a more upscale experience, understanding these pricing trends can help them make informed decisions that align with their preferences and budget.

Map of Median Price by Neighborhood

with open(r'C:\Users\Robert\Desktop\Loyola\gb736\Module 2 Python\tokyoneighborhoods1.geojson', 'r', encoding='utf-8') as f:
    tokyo_geo = json.load(f)

# Group by neighborhood and calculate mean price
neigh_df = df1.groupby('neighbourhood_cleansed')['priceusd'].median().reset_index()

# Apply log transformation
neigh_df['log_priceusd'] = np.log1p(neigh_df['priceusd'])

# Create the Mapbox choropleth
fig = px.choropleth_mapbox(
    neigh_df,
    geojson=tokyo_geo,
    locations='neighbourhood_cleansed',  
    featureidkey='properties.neighbourhood',  
    color='log_priceusd',  
    color_continuous_scale=['green', 'yellow', 'red'],  
    hover_data={'neighbourhood_cleansed': True, 'priceusd': ':.2f'},  # Format price with 2 decimals
    title="Tokyo Airbnb Prices by Neighborhood (Log Scale)",
    labels={'log_priceusd': 'Log(Price + 1)'},
    mapbox_style="carto-positron",  # Change to Carto-Positron style
    center={"lat": 35.682855, "lon": 139.417413},  # Center on Tokyo
    zoom=9,  # Adjust zoom level for better visibility
    width = 1000,
    height = 800
)

# Define log-scale range
log_min, log_max = np.log1p(neigh_df['priceusd'].min()), np.log1p(neigh_df['priceusd'].max())

# Define tick positions (log values) and corresponding price labels with comma separators
tick_positions = np.linspace(log_min, log_max, num=5)
tick_labels = [f"${format(int(np.expm1(v)), ',')}" for v in tick_positions]  # Add commas

fig.update_layout(
    coloraxis_colorbar=dict(
        title="Price (USD)",  
        tickvals=tick_positions,  
        ticktext=tick_labels,  # Convert log values back to readable price labels with commas
    )
);

fig.show()

This map of Tokyo shows the median price to rent an airbnb in each neighborhood. The color scale on the right reflects the price level of the listing: Dark green represents lower-cost areas, and shifts to yellow and red indicate increasingly expensive neighborhoods.

The most expensive neighborhood by far is Hinohara Mura, with a median price of $957 per night, while some areas have median prices as low as $30 per night. This map also shows that the middle of the city is the cheapest place to go, and other than Hinohara Mura, the west side seems to be relatively affordable. The further east you go we see more yellow and some orange, indicating higher prices.

These price variations highlight the diverse options for accommodations available, allowing prospective travelers to be flexible in choosing their stays allowing for budget restrictions and preferred itinerary. This visualization provides an insightful comparison to the different parts of Tokyo, allowing travelers to better understand what parts of the city are best accommodated for them based on their preferences.

Availability Paleto Chart

def bin_avail(avail):
    if avail == 365:
        return 'Always Available'
    elif avail >= 240:
        return 'Mostly Available'
    elif avail >= 120:
        return 'Frequently Available'
    elif avail >= 60:
        return 'Occasionally Available'
    else:
        return 'Rarely Available'

df1['binned_avail'] = df1['availability_365'].apply(bin_avail)

availability_counts = df1.groupby('binned_avail').size().reset_index(name='Count').sort_values(by='Count', ascending=False)

availability_counts['Cumulative %'] = availability_counts['Count'].cumsum() / availability_counts['Count'].sum() * 100

fig, ax = plt.subplots(figsize=(12, 10))

ax.bar(availability_counts['binned_avail'], availability_counts['Count'], color='blue', label='Count')

ax2 = ax.twinx()
ax2.plot(availability_counts['binned_avail'], availability_counts['Cumulative %'], color='#ff7f0e', marker='o', linestyle='-', label='Cumulative %')

ax.set_xlabel('Availability Category')
ax.set_ylabel('Number of Listings', color='blue')
ax.yaxis.set_major_formatter(FuncFormatter(lambda x, pos: '{:,.0f}'.format(x)))
ax2.yaxis.set_major_formatter(FuncFormatter(lambda x, pos: '{:,.0f}%'.format(x)))
ax2.set_ylabel('Cumulative %', color='#ff7f0e')
ax2.set_ylim(0, 115);

plt.title('Listings by Availability Category')

ax.legend(loc='upper right', handlelength=3, prop={'size': 10})
ax2.legend(loc='upper right', bbox_to_anchor=(1, 0.96), handlelength=3, prop={'size': 10})

plt.show()

This visualization uses the “availability_365” variable, which provides a value between 0-365 indicating how many days out of the year that an Airbnb is able to be booked. For the purposes of this chart, each was coded into one of five different categories. If an Airbnb was available all year, it is “Always Available”, if it is available at least 240 days it is “Mostly Available”, at least 120 days is “Frequently Available”, at least 60 is “Occasionally Available” and anything less than 60 days is “Rarely Available”.

There are many insights to be gained from this chart. Firstly, there is a very low population of listings available throughout the entire year, so prospective travelers who plan on staying for prolonged periods of time, it may be more challenging to find those proper accommodations. On the other end, we see that the majority of listings are available at least 120 days, and a little less than that are those available at least 180 days. The orange bar tells us that approximately 60% of listings are available for at least 120 days per year, indicating that many properties are open for a significant portion of the year. However, the relatively low number of ‘Always Available’ listings suggests that a large share of these are still part-time or seasonal rentals rather than full-time options.

Wrap up

This analysis of Tokyo Airbnb’s took use of many different variables such as number of reviews, neighborhoods, price, availability, property category, room type, and rating, in order to help prospective travelers get an idea of what their options for accommodations are like. By understanding the price ranges of different room types in each neighborhood, the amount of different property types available, the price relationship between rating and property type allows travelers to get a better picture of the options available when starting to plan their trip to Tokyo.